Market Prediction: Timing the Market with Machine Learning

This project was developed for the course Applied Finance with Python at Reutlingen University. The objective is to predict the tomorrow's price of the Tesla (TSLA) stock using Machine Learning.

Disclaimer: Prediction not guaranteed.

🚀 Project Overview

We attempt to "time the market" by using a binary classification approach to predict short-term stock trends (Up/Down).

Data Sources

Market Data (Yahoo Finance API): Open, Close, Low, High, and Volume. Comparison assets include BYD, S&P500, Lithium_ETF, NASDAQ, VIX (Volatility Index), 10Y_Yield (10-year US Treasury bonds), China_FX (USD/CNY), and the Dollar_Index.
Sentiment Data (GDELT API): Timeline Volume, Timeline Volume Raw, and Timeline Tone (ranging from -100 for extremely negative to +100 for extremely positive).

🛠 Installation & Setup

We use uv for fast Python package management and provide a Docker setup for containerization.

1. Clone the Repository

git clone https://github.com/do-martin/Market_Prediction.git
cd Market_Prediction

2. Local Setup with `uv`

# Synchronize the virtual environment and install dependencies
uv sync

# Run the project pipeline
uv run streamlit run src/app/app.py

3. Containerized Setup with Docker Compose

# Build the image and start the container
docker-compose up --build

📊 Methodology

1. Data Preparation

Missing Values: Filled missing values for Close, High, Low, and Open using Forward Fill (FFILL Close). Missing Volume was set to 0.
Outliers: Corrected extreme anomalies, such as a NASDAQ Volume spike (278,927,768,403 in 2025), via internet research.
Merging: The final base dataset consists of 3,302 records from 04-01-2017 to 24-01-2026, totaling 46 features.

2. Feature Engineering

Simple Setup (56 Features): Calculated the return (percentage change of Close to the previous day) and created a binary target feature target_return.
Advanced Setup (303 Features): Market Data: Added indicators for Trend (long-term), Momentum (speed), Volatility (risk), and History (last week).
- Sentiment Data: Tracked mood over recent days/weeks and sudden news volume spikes.
- Pipeline: Used Spearman-Correlation to remove redundancy, calculated dependencies to the target feature, and applied Recursive Feature Elimination.

3. Model Training

Split & Scaling: Chronological split of 70/15/15 and data normalization.
Algorithm: XGBoost for binary classification.
Tuning: Grid-Search with 324 different hyperparameter combinations.

⚠️ Critical Reflection

Negative values are overrepresented in the dataset.
The news data from GDELT is highly noisy.
Past performance does not equal future results.
Strict risk management is required, and further experiments are necessary.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Market Prediction: Timing the Market with Machine Learning

🚀 Project Overview

Data Sources

🛠 Installation & Setup

1. Clone the Repository

2. Local Setup with `uv`

3. Containerized Setup with Docker Compose

📊 Methodology

1. Data Preparation

2. Feature Engineering

3. Model Training

⚠️ Critical Reflection

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Market Prediction: Timing the Market with Machine Learning

🚀 Project Overview

Data Sources

🛠 Installation & Setup

1. Clone the Repository

2. Local Setup with uv

3. Containerized Setup with Docker Compose

📊 Methodology

1. Data Preparation

2. Feature Engineering

3. Model Training

⚠️ Critical Reflection

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Local Setup with `uv`

Packages