Skip to content

ChemmZz/Stock-Classification-using-Machine-Learning

Repository files navigation

Final Report: ML/X 3.0 - Stock Classification using Machine Learning

author:

  • Daniela Ayala Chavez
  • Jose Galvez Enriquez
  • Jorge Guerrero Aguileta

In this project, we calibrate four machine learning models to classify daily stock records into Buy, Hold, or Sell decisions, using a combination of technical indicators, company financials, and market metadata. We contrast two financial management philosophies—active investing (selecting individual stocks) and passive investing (replicating a market index)—by assessing whether it is feasible to automate investment signals at the stock-day level.

How to run run this project? ▶️

  1. Install UV to Local Machine

  2. Clone the Project

git clone git@github.com:uchicago-2025-capp30122/30122-project-chiffordable.git
  1. Install Virtual Environment and Dependencies
uv sync
  1. Fetch Data if you want to refresh The folder stock chooose:
  • Creates the extraction of data from Yfinance
  • Merge daily stock finance with company fundamentals - data was dowloaded from simfin (unzip the files if you want to recreate the whole process)
  • Choose relevant variables, crates labels and normalizes numerical data.
  1. Run individual files fore each model
File Purpose
dataset_manager.py Data loading, preprocessing, splitting into test and train data with balanced outputs
utils.py Shared functions (evaluation, plotting)
knn_model.py K-Nearest Neighbors implementation
tree_model.py Decision Tree classifier
logistic_model.py Logistic Regression classifier
random_forest.ipynb Random Forest model
random_forest_tune.ipynb Hyperparameter tuning for RF
log_model.ipynb Logistic regression training notebook

Structure of Software 🛠️

This project is structured in the following sections:

project-danayala-jmgalvez-jguerrero95/
│
├── .venv/                      # Virtual environment
├── .gitignore                  # Git ignore rules
├── README.md                   # Project overview
├── pyproject.toml              # Python config file
├── uv.lock                     # Dependency lock file
│
├── milestones/                 # Project step by step reports
│
├── predictions/                # Model training and utility scripts
│   ├── dataset_manager.py          # Data loading, preprocessing, splitting
│   ├── knn_model.py                # K-Nearest Neighbors implementation
│   ├── log_model.ipynb             # Logistic regression notebook
│   ├── logistic_model.py           # Logistic Regression classifier
│   ├── random_forest_tune.ipynb    # Random Forest hyperparameter tuning
│   ├── random_forest.ipynb         # Random Forest model notebook
│   ├── tree_model.py               # Decision Tree classifier
│   └── utils.py                    # Shared functions (metrics, plotting, etc.)
│
├── results/                    # Contains the classification report and confusion matrix for the final models
│
├── stock_choose/               # Fetsh data process 
│   ├── raw_data/                   # Unprocessed data
│   ├── simfin_data/                # Company fundamentals
│   └── stock_data.csv              # Final dataset we have included a .zip version
│   └── get_data.ipynb              # Yahoo Finance data pull and clean
│   └── explore_data.ipynb          # Exploratory data analysis
│
├── __init__.py                 # Python package init
└── MXL3_logo.png               # Logo

Acknowledgments

CAPP 30254 - Machine Learning for Public Policy

  • Instructor: Hannah Morgan
  • Teaching Assistant: Whiskey
  • Teaching Assistant: Dorka

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors