# Machine Learning to predict Down Jones Industrial Average

This simple Machine Learning example shows how to predict [^DJI value](https://finance.yahoo.com/quote/%5EDJI?p=^DJI&.tsrc=fin-srch) based on the past calculated averages.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Setup</a></span></li><li><span><a href="#Read-data-into-a-SFrame" data-toc-modified-id="Read-data-into-a-SFrame-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Read data into a SFrame</a></span><ul class="toc-item"><li><span><a href="#TODO:-Value-should-be-original-value" data-toc-modified-id="TODO:-Value-should-be-original-value-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>TODO: Value should be original value</a></span></li></ul></li><li><span><a href="#Select-the-data-to-train-and-test" data-toc-modified-id="Select-the-data-to-train-and-test-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Select the data to train and test</a></span><ul class="toc-item"><li><span><a href="#TODO:-Let's-NOT-take-last-few-days" data-toc-modified-id="TODO:-Let's-NOT-take-last-few-days-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>TODO: Let's NOT take last few days</a></span></li></ul></li><li><span><a href="#Create-the-model" data-toc-modified-id="Create-the-model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Create the model</a></span><ul class="toc-item"><li><span><a href="#Print-example-predictions" data-toc-modified-id="Print-example-predictions-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Print example predictions</a></span></li></ul></li><li><span><a href="#&quot;Be-Less-Wrong&quot;" data-toc-modified-id="&quot;Be-Less-Wrong&quot;-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>"Be Less Wrong"</a></span><ul class="toc-item"><li><span><a href="#Previous-results:" data-toc-modified-id="Previous-results:-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Previous results:</a></span><ul class="toc-item"><li><span><a href="#^DJI-averages-only" data-toc-modified-id="^DJI-averages-only-5.1.1"><span class="toc-item-num">5.1.1&nbsp;&nbsp;</span>^DJI averages only</a></span></li></ul></li><li><span><a href="#TODO:-find-the-best-model" data-toc-modified-id="TODO:-find-the-best-model-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>TODO: find the best model</a></span></li></ul></li><li><span><a href="#Save-the-model" data-toc-modified-id="Save-the-model-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Save the model</a></span></li></ul></div>

## Setup

In [12]:
# Install TuriCreate. Last updated November 4, 2020

# !pip install --upgrade pip
# !pip install Turicreate

In [13]:
import turicreate as tc

In [14]:
# Location of the spreadsheet (Comma Delimited Value) with all info that I prepared in a separate notebook.
data_path="./DATA/processed/uber.csv"

## Read data into a SFrame

In [20]:
# Load the data
data =  tc.SFrame(data_path)
data[363:370] # show data sample

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,float,float,float,float,float,float,float,float,float,float,float,float,float,float]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


Day,Date,DJIA_Value,DJIA_Original,DJIA_Avg005,DJIA_Avg030,DJIA_Avg090,DJIA_Avg180,DJIA_Avg365
735536,2014-10-30,16.0,17223.960938,13.6,11.87,13.09,12.77,9.74
735537,2014-10-31,17.0,17395.539063,14.6,11.97,13.17,12.81,9.78
735538,2014-11-01,17.0,17395.539063,15.6,12.1,13.24,12.84,9.82
735539,2014-11-02,17.0,17395.539063,16.2,12.2,13.32,12.88,9.87
735540,2014-11-03,18.0,17410.650391,17.0,12.33,13.41,12.93,9.91
735541,2014-11-04,17.0,17397.230469,17.2,12.43,13.5,12.97,9.95
735542,2014-11-05,18.0,17486.589844,17.4,12.53,13.6,13.01,9.99

ISM_MFC_EMP_Value,ISM_MFC_EMP_Original,ISM_MFC_EMP_Avg005,ISM_MFC_EMP_Avg030,ISM_MFC_EMP_Avg090,ISM_MFC_EMP_Avg180
83.0,54.6,83.0,83.0,101.34,87.66
83.0,54.6,83.0,83.0,101.03,87.66
83.0,54.6,83.0,83.0,100.72,87.66
83.0,54.6,83.0,83.0,100.41,87.66
90.0,55.5,84.4,83.23,100.18,87.7
90.0,55.5,85.8,83.47,99.94,87.74
90.0,55.5,87.2,83.7,99.71,87.78

ISM_MFC_EMP_Avg365
0.0
82.02
82.05
82.08
82.13
82.18
82.23


### TODO: Value should be original value

Please note the the "High" is normalized to Int8, 
but for the prediciton purposes it should be an original "real" value.

## Select the data to train and test

In [21]:
row_count = len(data)
# Do not take initial year data as averages are not complete
data = data[365:row_count] 
# Make a train-test split
train_data, test_data = data.random_split(0.8)

### TODO: Let's NOT take last few days

I need to save the last few days to see if I can really predict upcoming values.

## Create the model

- https://apple.github.io/turicreate/docs/api/generated/turicreate.regression.create.html
- Automatically picks the right model based on your data.
- target: is the number to be predicted.
- features: are the the values that we ues to try to find pattern leading to prediciton.

In [22]:
model = tc.regression.create(
    train_data, 
    target='DJIA_Original',
    features = [
        'DJIA_Avg005',
        'DJIA_Avg030',
        'DJIA_Avg090',
        'DJIA_Avg180',
        'DJIA_Avg365'
    ],
    validation_set='auto', 
    verbose=True
)

# Predict values on data that was NOT used in training

In [23]:
#test_data.explore()
test_data

Day,Date,DJIA_Value,DJIA_Original,DJIA_Avg005,DJIA_Avg030,DJIA_Avg090,DJIA_Avg180,DJIA_Avg365
735538,2014-11-01,17.0,17395.539063,15.6,12.1,13.24,12.84,9.82
735539,2014-11-02,17.0,17395.539063,16.2,12.2,13.32,12.88,9.87
735541,2014-11-04,17.0,17397.230469,17.2,12.43,13.5,12.97,9.95
735542,2014-11-05,18.0,17486.589844,17.4,12.53,13.6,13.01,9.99
735543,2014-11-06,19.0,17560.310547,17.8,12.7,13.7,13.06,10.04
735560,2014-11-23,22.0,17894.830078,21.2,18.13,15.18,14.01,10.79
735571,2014-12-04,22.0,17937.960938,21.8,20.57,16.0,14.63,11.27
735583,2014-12-16,18.0,17427.439453,18.4,21.03,16.78,15.16,11.81
735588,2014-12-21,22.0,17874.029297,20.8,21.1,17.01,15.35,12.01
735589,2014-12-22,22.0,17962.779297,21.8,21.1,17.09,15.4,12.05

ISM_MFC_EMP_Value,ISM_MFC_EMP_Original,ISM_MFC_EMP_Avg005,ISM_MFC_EMP_Avg030,ISM_MFC_EMP_Avg090,ISM_MFC_EMP_Avg180
83.0,54.6,83.0,83.0,100.72,87.66
83.0,54.6,83.0,83.0,100.41,87.66
90.0,55.5,85.8,83.47,99.94,87.74
90.0,55.5,87.2,83.7,99.71,87.78
90.0,55.5,88.6,83.93,99.48,87.82
90.0,55.5,90.0,87.9,95.51,88.48
85.0,54.9,86.0,89.33,92.77,89.26
85.0,54.9,85.0,87.33,89.43,90.33
85.0,54.9,85.0,86.5,88.04,90.77
85.0,54.9,85.0,86.33,87.77,90.86

ISM_MFC_EMP_Avg365
82.05
82.08
82.18
82.23
82.28
83.12
83.4
83.0
82.84
82.81


In [24]:
## Save predictions to an SArray
predictions = model.predict(test_data)
#predictions

### Print example predictions

In [26]:
start = 0
end = len(predictions)
step = 50

for id in range(start, end, step):
    a = round( predictions[id], 2)
    b = test_data[id]["DJIA_Original"]
    print( "predicted ", round(a, 0), "\t, but actual value was \t", round(b, 0) , "\t difference is \t", round(b-a, 2) ) # dict

predicted  17251.0 	, but actual value was 	 17396.0 	 difference is 	 144.75
predicted  18110.0 	, but actual value was 	 18121.0 	 difference is 	 10.72
predicted  18048.0 	, but actual value was 	 18085.0 	 difference is 	 36.51
predicted  19910.0 	, but actual value was 	 19952.0 	 difference is 	 41.65
predicted  23232.0 	, but actual value was 	 23329.0 	 difference is 	 97.3
predicted  24357.0 	, but actual value was 	 24319.0 	 difference is 	 -37.42
predicted  26016.0 	, but actual value was 	 26053.0 	 difference is 	 36.81
predicted  26887.0 	, but actual value was 	 26438.0 	 difference is 	 -448.48
predicted  26210.0 	, but actual value was 	 25993.0 	 difference is 	 -217.23


## "Be Less Wrong"

Evaluate how good is the model

It appears that the predition results vary from run to run so it is worth to run it until you find the model with minimum error, 

or **as Elon Musk says "Be less wrong"**.

### Previous results:

#### ^DJI averages only 

- {'max_error': 1749.5078773959249, 'rmse': 124.58897796835019}
- {'max_error': 1621.9227669335778, 'rmse': 106.39104997423203}
- {'max_error': 1297.117071650111, 'rmse': 101.14871945325757}
- {'max_error': 1122.2711616305896, 'rmse': 183.129076342891}

TODO: write this in a loop to select the best model

### TODO: find the best model

Create a "for" loop to find the best model

In [27]:
# Evaluate the model and save the results into a dictionary
results = model.evaluate( test_data ) #test_data[0:2531]
results

{'max_error': 1122.2711616305896, 'rmse': 183.129076342891}

## Save the model

Save the model for future use in MacOS, iOS, etc. applications

In [11]:
# Export to Core ML
model.export_coreml('./DATA/models/^DJI.mlmodel')