author:
- Daniela Ayala Chavez
- Jose Galvez Enriquez
- Jorge Guerrero Aguileta
In this project, we calibrate four machine learning models to classify daily stock records into Buy, Hold, or Sell decisions, using a combination of technical indicators, company financials, and market metadata. We contrast two financial management philosophies—active investing (selecting individual stocks) and passive investing (replicating a market index)—by assessing whether it is feasible to automate investment signals at the stock-day level.
-
Clone the Project
git clone git@github.com:uchicago-2025-capp30122/30122-project-chiffordable.git- Install Virtual Environment and Dependencies
uv sync- Fetch Data if you want to refresh The folder stock chooose:
- Creates the extraction of data from Yfinance
- Merge daily stock finance with company fundamentals - data was dowloaded from simfin (unzip the files if you want to recreate the whole process)
- Choose relevant variables, crates labels and normalizes numerical data.
- Run individual files fore each model
| File | Purpose |
|---|---|
dataset_manager.py |
Data loading, preprocessing, splitting into test and train data with balanced outputs |
utils.py |
Shared functions (evaluation, plotting) |
knn_model.py |
K-Nearest Neighbors implementation |
tree_model.py |
Decision Tree classifier |
logistic_model.py |
Logistic Regression classifier |
random_forest.ipynb |
Random Forest model |
random_forest_tune.ipynb |
Hyperparameter tuning for RF |
log_model.ipynb |
Logistic regression training notebook |
This project is structured in the following sections:
project-danayala-jmgalvez-jguerrero95/
│
├── .venv/ # Virtual environment
├── .gitignore # Git ignore rules
├── README.md # Project overview
├── pyproject.toml # Python config file
├── uv.lock # Dependency lock file
│
├── milestones/ # Project step by step reports
│
├── predictions/ # Model training and utility scripts
│ ├── dataset_manager.py # Data loading, preprocessing, splitting
│ ├── knn_model.py # K-Nearest Neighbors implementation
│ ├── log_model.ipynb # Logistic regression notebook
│ ├── logistic_model.py # Logistic Regression classifier
│ ├── random_forest_tune.ipynb # Random Forest hyperparameter tuning
│ ├── random_forest.ipynb # Random Forest model notebook
│ ├── tree_model.py # Decision Tree classifier
│ └── utils.py # Shared functions (metrics, plotting, etc.)
│
├── results/ # Contains the classification report and confusion matrix for the final models
│
├── stock_choose/ # Fetsh data process
│ ├── raw_data/ # Unprocessed data
│ ├── simfin_data/ # Company fundamentals
│ └── stock_data.csv # Final dataset we have included a .zip version
│ └── get_data.ipynb # Yahoo Finance data pull and clean
│ └── explore_data.ipynb # Exploratory data analysis
│
├── __init__.py # Python package init
└── MXL3_logo.png # Logo
CAPP 30254 - Machine Learning for Public Policy
- Instructor: Hannah Morgan
- Teaching Assistant: Whiskey
- Teaching Assistant: Dorka