# Machine Learning to predict Down Jones Industrial Average

This simple Machine Learning example shows how to predict [^DJI value](https://finance.yahoo.com/quote/%5EDJI?p=^DJI&.tsrc=fin-srch) based on the past calculated averages.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Setup</a></span></li><li><span><a href="#Read-data-into-a-SFrame" data-toc-modified-id="Read-data-into-a-SFrame-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Read data into a SFrame</a></span><ul class="toc-item"><li><span><a href="#TODO:-Value-should-be-original-value" data-toc-modified-id="TODO:-Value-should-be-original-value-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>TODO: Value should be original value</a></span></li></ul></li><li><span><a href="#Select-the-data-to-train-and-test" data-toc-modified-id="Select-the-data-to-train-and-test-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Select the data to train and test</a></span><ul class="toc-item"><li><span><a href="#TODO:-Let's-NOT-take-last-few-days" data-toc-modified-id="TODO:-Let's-NOT-take-last-few-days-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>TODO: Let's NOT take last few days</a></span></li></ul></li><li><span><a href="#Create-the-model" data-toc-modified-id="Create-the-model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Create the model</a></span><ul class="toc-item"><li><span><a href="#Print-example-predictions" data-toc-modified-id="Print-example-predictions-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Print example predictions</a></span></li></ul></li><li><span><a href="#&quot;Be-Less-Wrong&quot;" data-toc-modified-id="&quot;Be-Less-Wrong&quot;-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>"Be Less Wrong"</a></span><ul class="toc-item"><li><span><a href="#Previous-results:" data-toc-modified-id="Previous-results:-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Previous results:</a></span><ul class="toc-item"><li><span><a href="#^DJI-averages-only" data-toc-modified-id="^DJI-averages-only-5.1.1"><span class="toc-item-num">5.1.1&nbsp;&nbsp;</span>^DJI averages only</a></span></li></ul></li><li><span><a href="#TODO:-find-the-best-model" data-toc-modified-id="TODO:-find-the-best-model-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>TODO: find the best model</a></span></li></ul></li><li><span><a href="#Save-the-model" data-toc-modified-id="Save-the-model-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Save the model</a></span></li></ul></div>

## Setup

In [1]:
# Install TuriCreate. Last updated November 4, 2020

# !pip install --upgrade pip
# !pip install Turicreate

In [2]:
import turicreate as tc

In [3]:
# Location of the spreadsheet (Comma Delimited Value) with ^DJI info that I prpared in a separate notebook.
data_path="./DATA/processed/^DJI.csv"

## Read data into a SFrame

In [4]:
# Load the data
data =  tc.SFrame(data_path)
data[363:370] # show data sample

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,float,float,float,float,float,float,float]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


Day,Date,Value,Original,Avg005,Avg030,Avg090,Avg180,Avg365
725033,1986-01-27,-125.0,1548.170044,-125.2,-125.1,-125.66,-126.33,0.0
725034,1986-01-28,-125.0,1561.349976,-125.0,-125.1,-125.63,-126.32,-126.98
725035,1986-01-29,-125.0,1578.099976,-125.0,-125.1,-125.61,-126.31,-126.97
725036,1986-01-30,-125.0,1572.589966,-125.0,-125.1,-125.59,-126.29,-126.96
725037,1986-01-31,-125.0,1582.910034,-125.0,-125.1,-125.57,-126.28,-126.96
725038,1986-02-01,-125.0,1582.910034,-125.0,-125.1,-125.54,-126.27,-126.95
725039,1986-02-02,-125.0,1582.910034,-125.0,-125.1,-125.52,-126.26,-126.94


### TODO: Value should be original value

Please note the the "High" is normalized to Int8, 
but for the prediciton purposes it should be an original "real" value.

## Select the data to train and test

In [5]:
row_count = len(data)
# Do not take initial year data as averages are not complete
data = data[365:row_count] 
# Make a train-test split
train_data, test_data = data.random_split(0.8)

### TODO: Let's NOT take last few days

I need to save the last few days to see if I can really predict upcoming values.

## Create the model

- https://apple.github.io/turicreate/docs/api/generated/turicreate.regression.create.html
- Automatically picks the right model based on your data.
- target: is the number to be predicted.
- features: are the the values that we ues to try to find pattern leading to prediciton.

In [6]:
model = tc.regression.create(
    train_data, 
    target='Original',
    features = [
        'Avg005',
        'Avg030',
        'Avg090',
        'Avg180',
        'Avg365'
    ],
    validation_set='auto', 
    verbose=True
)

# Predict values on data that was NOT used in training

In [7]:
#test_data.explore()
test_data

Day,Date,Value,Original,Avg005,Avg030,Avg090,Avg180,Avg365
725035,1986-01-29,-125.0,1578.099976,-125.0,-125.1,-125.61,-126.31,-126.97
725042,1986-02-05,-125.0,1601.829956,-125.0,-125.1,-125.46,-126.23,-126.92
725047,1986-02-10,-125.0,1633.140015,-125.0,-125.1,-125.37,-126.17,-126.87
725055,1986-02-18,-124.0,1685.550049,-124.0,-124.83,-125.2,-126.04,-126.79
725058,1986-02-21,-124.0,1702.75,-124.0,-124.7,-125.13,-125.99,-126.76
725061,1986-02-24,-124.0,1709.75,-124.0,-124.57,-125.07,-125.94,-126.73
725070,1986-03-05,-124.0,1695.400024,-124.0,-124.27,-124.87,-125.79,-126.64
725071,1986-03-06,-124.0,1711.209961,-124.0,-124.23,-124.84,-125.78,-126.62
725072,1986-03-07,-124.0,1713.47998,-124.0,-124.2,-124.82,-125.76,-126.61
725075,1986-03-10,-124.0,1715.160034,-124.0,-124.1,-124.76,-125.71,-126.58


In [8]:
## Save predictions to an SArray
predictions = model.predict(test_data)
#predictions

### Print example predictions

In [9]:
start = 0
end = len(predictions)
step = 50

for id in range(start, end, step):
    a = round( predictions[id], 2)
    b = test_data[id]["Original"]
    print( "predicted ", round(a, 0), "\t, but actual value was \t", round(b, 0) , "\t difference is \t", round(b-a, 2) ) # dict

predicted  1584.0 	, but actual value was 	 1578.0 	 difference is 	 -5.55
predicted  1921.0 	, but actual value was 	 1913.0 	 difference is 	 -7.79
predicted  2246.0 	, but actual value was 	 2306.0 	 difference is 	 59.31
predicted  2006.0 	, but actual value was 	 1979.0 	 difference is 	 -27.18
predicted  2140.0 	, but actual value was 	 2183.0 	 difference is 	 43.49
predicted  2689.0 	, but actual value was 	 2709.0 	 difference is 	 20.07
predicted  2813.0 	, but actual value was 	 2838.0 	 difference is 	 25.65
predicted  2703.0 	, but actual value was 	 2679.0 	 difference is 	 -23.21
predicted  3029.0 	, but actual value was 	 3032.0 	 difference is 	 2.72
predicted  3251.0 	, but actual value was 	 3251.0 	 difference is 	 -0.38
predicted  3361.0 	, but actual value was 	 3328.0 	 difference is 	 -33.4
predicted  3583.0 	, but actual value was 	 3576.0 	 difference is 	 -6.58
predicted  3781.0 	, but actual value was 	 3742.0 	 difference is 	 -39.12
predicted  4250.0 	, bu

## "Be Less Wrong"

Evaluate how good is the model

It appears that the predition results vary from run to run so it is worth to run it until you find the model with minimum error, 

or **as Elon Musk says "Be less wrong"**.

### Previous results:

#### ^DJI averages only 

- {'max_error': 1749.5078773959249, 'rmse': 124.58897796835019}
- {'max_error': 1621.9227669335778, 'rmse': 106.39104997423203}
- {'max_error': 1297.117071650111, 'rmse': 101.14871945325757}

TODO: write this in a loop to select the best model

### TODO: find the best model

Create a "for" loop to find the best model

In [10]:
# Evaluate the model and save the results into a dictionary
results = model.evaluate( test_data ) #test_data[0:2531]
results

{'max_error': 1297.117071650111, 'rmse': 101.14871945325757}

## Save the model

Save the model for future use in MacOS, iOS, etc. applications

In [11]:
# Export to Core ML
model.export_coreml('./DATA/models/^DJI.mlmodel')