Skip to content
Using classification models to predict if we should buy a stock to make money
Jupyter Notebook
Branch: master
Clone or download
Latest commit 29932bd Jun 24, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
data update all Jun 24, 2019
models updated run results May 8, 2019
notebooks update all Jun 24, 2019
src/models/TPOT docs: update README Apr 16, 2019
.gitignore update all Jun 24, 2019 docs: include data folder Apr 16, 2019
environment.yaml create environment.yaml Apr 24, 2019
requirements.txt docs: update requirements.txt Apr 14, 2019

Stock recommendations with Machine Learning


This project is using the Anaconda distribution of Python version 3.5. Libraries used listed in detail in the requirements.txt file. They primarily include Pandas, Matplotlib, Numpy, Seaborn, TA-Lib (, and Pandas Datareader. Additionally, TPOT ( is used. A free key from is required to load historical stock data.

Project Motivation

This project is the capstone project of my completion of the Udacity Data Science Nanodegree.

The goal is to build tool that can make trading recommendations for stock purchases. The analysis will explore ways to use historical stock data with technical analysis as inputs to machine learning algorithms. The data is preprocessed such that a classification algorithm can be used to predict if a stock should be purchased on a certain day or not.

Project Structure and Files

  1. Data preprocessing - notebooks/02.0-fl-data-preparation.ipynb:
    • Loading historic stock data from Quandl
    • Building data set for training
      • Strategy definition
      • Additional feature generation
      • Reshaping data for training set
  2. Machine Learning
    • using sklearn: notebooks/03.1-fl-model-training-sklearn.ipynb
    • using TPOT: notebooks/03.2-fl-model-training-TPOT.ipynb
  3. Model Performance Evaluation with backtesting: notebooks/04.01-fl-model_performance_comparison--backtesting.ipynb
    • build backtesting calculations
    • calculate and visualize performance of model predictions

Additional files and folders

  • confidential-API-key.txt - not included in the GitHub repo - a file containing my personal Quandl API key
  • /data subfolder - contains preprocessed data, model predictions, and other intermediate datasets that are used by the notebooks
  • /src/models/TPOT - output from TPOT optimizer


A detailed writeup of the project can be found at (

You can’t perform that action at this time.