Skip to content

do-martin/Market_Prediction

Repository files navigation

Market Prediction: Timing the Market with Machine Learning

This project was developed for the course Applied Finance with Python at Reutlingen University. The objective is to predict the tomorrow's price of the Tesla (TSLA) stock using Machine Learning.

Disclaimer: Prediction not guaranteed.


🚀 Project Overview

We attempt to "time the market" by using a binary classification approach to predict short-term stock trends (Up/Down).

Data Sources

  • Market Data (Yahoo Finance API): Open, Close, Low, High, and Volume. Comparison assets include BYD, S&P500, Lithium_ETF, NASDAQ, VIX (Volatility Index), 10Y_Yield (10-year US Treasury bonds), China_FX (USD/CNY), and the Dollar_Index.
  • Sentiment Data (GDELT API): Timeline Volume, Timeline Volume Raw, and Timeline Tone (ranging from -100 for extremely negative to +100 for extremely positive).

🛠 Installation & Setup

We use uv for fast Python package management and provide a Docker setup for containerization.

1. Clone the Repository

git clone https://github.com/do-martin/Market_Prediction.git
cd Market_Prediction

2. Local Setup with uv

# Synchronize the virtual environment and install dependencies
uv sync

# Run the project pipeline
uv run streamlit run src/app/app.py

3. Containerized Setup with Docker Compose

# Build the image and start the container
docker-compose up --build

📊 Methodology

1. Data Preparation

  • Missing Values: Filled missing values for Close, High, Low, and Open using Forward Fill (FFILL Close). Missing Volume was set to 0.
  • Outliers: Corrected extreme anomalies, such as a NASDAQ Volume spike (278,927,768,403 in 2025), via internet research.
  • Merging: The final base dataset consists of 3,302 records from 04-01-2017 to 24-01-2026, totaling 46 features.

2. Feature Engineering

  • Simple Setup (56 Features): Calculated the return (percentage change of Close to the previous day) and created a binary target feature target_return.
  • Advanced Setup (303 Features): Market Data: Added indicators for Trend (long-term), Momentum (speed), Volatility (risk), and History (last week).
    • Sentiment Data: Tracked mood over recent days/weeks and sudden news volume spikes.
    • Pipeline: Used Spearman-Correlation to remove redundancy, calculated dependencies to the target feature, and applied Recursive Feature Elimination.

3. Model Training

  • Split & Scaling: Chronological split of 70/15/15 and data normalization.
  • Algorithm: XGBoost for binary classification.
  • Tuning: Grid-Search with 324 different hyperparameter combinations.

⚠️ Critical Reflection

  • Negative values are overrepresented in the dataset.
  • The news data from GDELT is highly noisy.
  • Past performance does not equal future results.
  • Strict risk management is required, and further experiments are necessary.

About

Predicting Tesla (TSLA) stock trends using XGBoost by combining market data and GDELT sentiment analysis. Developed for the Applied Finance course at Reutlingen University.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors