This project was developed for the course Applied Finance with Python at Reutlingen University. The objective is to predict the tomorrow's price of the Tesla (TSLA) stock using Machine Learning.
Disclaimer: Prediction not guaranteed.
We attempt to "time the market" by using a binary classification approach to predict short-term stock trends (Up/Down).
- Market Data (Yahoo Finance API): Open, Close, Low, High, and Volume. Comparison assets include BYD, S&P500, Lithium_ETF, NASDAQ, VIX (Volatility Index), 10Y_Yield (10-year US Treasury bonds), China_FX (USD/CNY), and the Dollar_Index.
- Sentiment Data (GDELT API): Timeline Volume, Timeline Volume Raw, and Timeline Tone (ranging from -100 for extremely negative to +100 for extremely positive).
We use uv for fast Python package management and provide a Docker setup for containerization.
git clone https://github.com/do-martin/Market_Prediction.git
cd Market_Prediction# Synchronize the virtual environment and install dependencies
uv sync
# Run the project pipeline
uv run streamlit run src/app/app.py# Build the image and start the container
docker-compose up --build- Missing Values: Filled missing values for Close, High, Low, and Open using Forward Fill (
FFILL Close). Missing Volume was set to 0. - Outliers: Corrected extreme anomalies, such as a NASDAQ Volume spike (278,927,768,403 in 2025), via internet research.
- Merging: The final base dataset consists of 3,302 records from 04-01-2017 to 24-01-2026, totaling 46 features.
- Simple Setup (56 Features): Calculated the return (percentage change of Close to the previous day) and created a binary target feature
target_return. - Advanced Setup (303 Features): Market Data: Added indicators for Trend (long-term), Momentum (speed), Volatility (risk), and History (last week).
- Sentiment Data: Tracked mood over recent days/weeks and sudden news volume spikes.
- Pipeline: Used Spearman-Correlation to remove redundancy, calculated dependencies to the target feature, and applied Recursive Feature Elimination.
- Split & Scaling: Chronological split of 70/15/15 and data normalization.
- Algorithm: XGBoost for binary classification.
- Tuning: Grid-Search with 324 different hyperparameter combinations.
- Negative values are overrepresented in the dataset.
- The news data from GDELT is highly noisy.
- Past performance does not equal future results.
- Strict risk management is required, and further experiments are necessary.