Stock Price Forecasting

Neural stock price forecasting system using fundamental analysis and technical analysis to predict the trend of stocks from the S&P 500 index. The main contributions of this work are summarized as follows:

Develop the first approach with Pytorch Lightning as a learning framework, employing attention and Recurrent Neural Networks (RNNs). For further insights, read the dedicated report or the related notebook.
Develop a distributed approach with Pytorch, PySpark, and Petastorm, leveraging a cluster of nodes to parallelize the computation. It builds on top of the former and extends it introducing the powerful Spark's SQL queries, enabling the system to scale with a large amount of data. For an overview of the system, see the slides or the related notebook.

Datasets

We use data from Kaggle's public challenges, namely a first dataset with financial reports from S&P 500 from 2003 to 2013, and a second dataset containing stock market data. By aligning the two datasets and removing outliers (refer to the notebooks to see how the alignment is performed), we get an enriched dataset that can be used to perform both fundamental and technical analysis.

Results

A benchmark showing the performance of our trading strategy algorithm (details in the slides, pages 14-16).

	MSE	R²	Adjusted R²	Operation accuracy	Profit
DecisionTreeRegressor	0.078	0.852	-	55.45%	35.97%
RandomForestRegressor	0.104	0.803	-	57.01%	51.61%
LSTM	0.021	0.939	0.897	56.52%	58.35%

How to train the distributed system?

In case you would like to install and configure PySpark on your local machine, please follow the instructions described here. Otherwise, you can clone the notebook and import it into Databricks as described here.

How to test the system?

For a simple and ready-to-use test, simply run the test/evaluate.py script that refers to the distributed system with pre-trained weights for the LSTM model. Otherwise, you can re-train the system using a model of your choice, and use the new weights to perform the evaluation.

Project structure

.
├── data/                     # Stock prices and fundamental data
├── report/
│   ├── main.pdf              # Project report for the dlai-2021 course
│   ├── main.tex
│   └── ...
├── test/
│   ├── data/                 # Model weights and test data
│   ├── evaluate.py           # Evaluation script
│   └── ...
├── dist_forecasting.ipynb    # PySpark distributed stock prediction system
├── forecasting.ipynb         # Stock prediction system
├── environment.yml           # Training environment
└── ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

report

report

slides

slides

test

test

.gitignore

.gitignore

README.md

README.md

dist_forecasting.ipynb

dist_forecasting.ipynb

environment.yml

environment.yml

forecasting.ipynb

forecasting.ipynb

Repository files navigation

Stock Price Forecasting

Datasets

Results

How to train the distributed system?

How to test the system?

Project structure

About

Releases 1

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
data		data
report		report
slides		slides
test		test
.gitignore		.gitignore
README.md		README.md
dist_forecasting.ipynb		dist_forecasting.ipynb
environment.yml		environment.yml
forecasting.ipynb		forecasting.ipynb

LeonardoEmili/stock-price-forecasting

Folders and files

Latest commit

History

Repository files navigation

Stock Price Forecasting

Datasets

Results

How to train the distributed system?

How to test the system?

Project structure

About

Topics

Resources

Stars

Watchers

Forks

Languages