GitHub - althk/spock: Stock predictions via ML

Spock - Stock predictions via ML; live long and prosper 🖖🏻

NOTE: The data_downloader.py is hardcoded for NSE stocks, but can easily be changed to any exchange with minimal effort.

Evaluates models (traditional and neural network based) for stock predictions to find the best fit.

The following models are currently included in the evaluation:

Traditional models
- LinearRegression
- DecisionTreeRegressor
- RandomForestRegressor
- XGBoostRegressor

High level overview

Download historical data
Train and evaluate each model on each stock's data
Find the overall best model across all stocks' evaluation
Train that model on each stock and save the model per stock
For real-time predictions, load the model instance for the requested ticker and make prediction

Steps to get started

Install the pre-reqs:

$ pip install pandas pandas-ta scikit-learn xgboost  # for traditional ML models
$ pip install tensorflow   # for deep learning models
$ pip install absl-py

Download raw data (this project uses yfinance) using the data_downloader.py script:

$ python data_downloader.py --symbols_file=./nfo_symbols.csv --data_dir=./data --logtostderr

Run prediction_research.py which then loads the downloaded data, trains a few models, evaluates them and finally stores some information about which model scored the best (lowest RMSE)

$ python prediction_research.py --data_dir='./data' --output_dir='./output' --logtostderr

Interpreting the results

The last step above dumps the evaluation results under the output_dir/YYYYmmddHHMMSS/evaluation_results.csv
Run parse_evaluation_results.py to find out the best overall model

$ python parse_evaluation_results.py --evaluation_results_file=<path to the csv>

This will print out some statistics and highlight the best model overall for all stocks

NOTE: the script currently picks the model with the min RMSE at p95

Training the selected/best model on all stocks

Now that we have found the model that works best overall across all stocks, we need to train it on all the stocks individually and save the model per stock. For example, if the evaluation results were saved at output/20240609185957/evaluation_results.csv and the stock data is in the data directory, running the following command would parse the model selected via grid search cv for each stock and save the model and scaler objects under the same output directory.

$ python train_model.py --data_dir=data --results_file=output/20240609185957/evaluation_results.csv --logtostderr

For example, if the model selected was XGBRegressor, then in the dir output/20240609185957 there will be files of the format {ticker}_XGBRegressor.pkl and {ticker}_Scaler.pkl.

Making predictions

To predict values for a given ticker:

Load the model and scaler for the specific ticker
Scale the data using the loaded scaler
Predict the price using the loaded model

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data_downloader.py		data_downloader.py
nfo_symbols.csv		nfo_symbols.csv
parse_evaluation_results.py		parse_evaluation_results.py
prediction_research.ipynb		prediction_research.ipynb
prediction_research.py		prediction_research.py
train_model.py		train_model.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spock - Stock predictions via ML; live long and prosper 🖖🏻

High level overview

Steps to get started

Interpreting the results

Training the selected/best model on all stocks

Making predictions

About

Releases

Packages

Languages

althk/spock

Folders and files

Latest commit

History

Repository files navigation

Spock - Stock predictions via ML; live long and prosper 🖖🏻

High level overview

Steps to get started

Interpreting the results

Training the selected/best model on all stocks

Making predictions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages