# Standard and Poor Machine Learning Techniques

** Standard and Poor's 500 is a stock market index based on the market capitalization of 500 large companies. **

**Question:** Is it possible to predict the movements of S&P 500 Index using publicly available data? Is there any hope for machine learning algorithms to make market predictions?
    
**Approach:** Use an LSTM network and various datasources to predict market movements.

## An overview of our data

+ Quandle API for stocks and commodities
+ Yahoo Finance and Kaggle for S&P
+ Google Trends for market interest
+ NewsAPI for market interest

## The Google Trends API

Access to this data is free and it has been used in some really cool research. 

#### We want daily granularity
![Daily Google Trends](images/daily_google_trends.png?raw=true)


#### Over a long period of time
![Weekly Google Trends](images/weekly_google_trends.png?raw=true)



### [Let's go take a peek into some of our data cleaning notebooks](http://127.0.0.1:8000/DataCleaning.slides.html#/)

##   Summary of the data we used in our model
<img src="images/all_data.JPG" width="550"/>



## Neural Networks for Market Prediction



```
When it comes to markets, past performance is not a good predictor of future returns looking in the rear-view mirror is a bad way to drive. Machine learning, on the other hand, is applicable to datasets where the past is a good predictor of the future.
```
*from **Deep Learning with Python** by Francois Chollet*



<img src="https://memegenerator.net/img/instances/74873992.jpg" width="500" align="middle"/>


## Recurrent Neural Network (RNN)

Highly complex pattern recognition can be achieved by using neural networks.

When using time series data, we’ll need to use recurrent neural networks that allow us to operate over *sequences* of vectors.

<img src="https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-rolled.png" width="100"/>

The appeal RNNs is the idea that they might be able to connect previous information to the present task. A recurrent neural network can conceptually be unfolded into multiple copies of the same network, each passing a message to a successor.

<img src="https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" width=400 height=60>




## Long Short Term Memory (LSTM) Networks



<img src="https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" width=600 height=80>



## Generating predictions

<img src="images/sliding_window.jpg" width="800"/>

## Commodities and past stock data only

#### Training could not fit well
<img src="images/training getting nowhere loss vs epoch.JPG" width="800"/>

#### Predictions
<img src="images/all predictions.JPG" width="800"/>


#### A closer look
<img src="images/predictions close up.JPG" width="800"/>

## Google Trends and News Sentiment

Adding google trends gives us more of the oscillations
<img src="images/with trends.JPG" width="800"/>


Adding daily news sentiment increases this
<img src="images/news_pred.JPG" width="800"/>

## Conclusion

We didn't get very accurate results with public data and our lack of domain knowledge.

However the pipeline we built could be a powerful tool for someone with better resources and knowledge.

<img src="http://i.huffpost.com/gen/1541321/images/o-THE-WOLF-OF-WALL-STREET-facebook.jpg" width="400"/>


In [None]:
!jupyter nbconvert Presentation.ipynb --to slides --post serve

[NbConvertApp] Converting notebook Presentation.ipynb to slides
[NbConvertApp] Writing 264104 bytes to Presentation.slides.html
[NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.5.0
Serving your slides at http://127.0.0.1:8000/Presentation.slides.html
Use Control-C to stop this server
