# Project Finance LSTM

This project consists of the following:

1. Literature review and data collection (Part I)
2. Time-series modelling using ARIMA (Part II) 
3. Preparing the dataset according to literature (Part III) 
4. <font color = 'orange'>[Google Colab]</font> Train LSTM model to predict stock price (Part IV)

## Part 1: Data collection

<div style="background-color: #78E8A3; padding: 20px">
<h3>Project Scenario</h3>
<p>You've been doing investing for a while now.</p>
<p>Coincidentally, in a bid to upskill yourself, you also learned machine learning and deep learning.</p> 
<p>Armed with these two skills, you're interested in combining the two, leveraging data techniques to predict stock prices.</p>    
<p>Googling around, you found a research paper that might be a good starting point. In this project, you will use the research study as a reference and use Long Short Term Memory and historical stock prices for stock market prediction.</p>
</div>

### Step 1: Read the research paper
Let's start with the research publication that we will be referring to in this project.

Head on <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227222">here</a> and spend around 30-60 mins to go through the paper. 

In [2]:
import pandas as pd
import yfinance as yf

### Step 3: Acquire S&P 500 data
In the paper, the authors used three sets of data:
- Standard & Poor's 500 Index (S&P 500)
- Hang Seng Index (HSI)
- Dow Jones Industrial Average (DJIA)

In this project, we will be working with <strong>only S&P 500 data</strong>. The other two datasets can be done at your own once you're familiar with one dataset.

There are three possible ways to acquire the data you need:
1. Download the data used in the publication
2. Download the data directly from Yahoo Finance
3. Use yfinance library to download the data

### Retrieve S&P 500 using yfinance
The ticker for S&P 500 is "^GSPC". 

You'll need to retrieve it from the <strong>start</strong> at 2000-01-03 and the <strong>end</strong> at 2019-07-01.

In [3]:
# Retrieve the S&P 500 data using yfinance library
import yfinance as yf

df_sp = yf.download('^GSPC', start = '2000-01-03', end = '2019-07-02')

df_sp

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-03,1469.250000,1478.000000,1438.359985,1455.219971,1455.219971,931800000
2000-01-04,1455.219971,1455.219971,1397.430054,1399.420044,1399.420044,1009000000
2000-01-05,1399.420044,1413.270020,1377.680054,1402.109985,1402.109985,1085500000
2000-01-06,1402.109985,1411.900024,1392.099976,1403.449951,1403.449951,1092300000
2000-01-07,1403.449951,1441.469971,1400.729980,1441.469971,1441.469971,1225200000
...,...,...,...,...,...,...
2019-06-25,2945.780029,2946.520020,2916.010010,2917.379883,2917.379883,3578050000
2019-06-26,2926.070068,2932.590088,2912.989990,2913.780029,2913.780029,3478130000
2019-06-27,2919.659912,2929.300049,2918.570068,2924.919922,2924.919922,3122920000
2019-06-28,2932.939941,2943.979980,2929.050049,2941.760010,2941.760010,5420700000


In [5]:
# Export the yfinance DataFrame to a CSV
df_sp.to_csv('S&P500 data from yfinance.csv')

### End of Part I
All right! You've retrieved the data. There are just so many ways of doing the same thing, but if you followed the instructions well, the three different ways would have led to the same outome.

The research paper is lengthy and we can't cover everything, so we will try to do mainly answer two things:
1. Can LSTM help with stock predictions?
2. Can using signal processing technique help with improving stock predictions?

In Part II, we'll work with the data and build a time-series ARIMA model.