# Market prediction using Flux.jl

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Market-prediction-using-Flux.jl" data-toc-modified-id="Market-prediction-using-Flux.jl-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Market prediction using Flux.jl</a></span><ul class="toc-item"><li><span><a href="#Libraries-used-in-this-notebook" data-toc-modified-id="Libraries-used-in-this-notebook-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Libraries used in this notebook</a></span></li></ul></li><li><span><a href="#Define-Hyperparameters" data-toc-modified-id="Define-Hyperparameters-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Define Hyperparameters</a></span></li><li><span><a href="#Load--Data" data-toc-modified-id="Load--Data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Load  Data</a></span><ul class="toc-item"><li><span><a href="#Define-Fetch-Data-function" data-toc-modified-id="Define-Fetch-Data-function-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Define Fetch Data function</a></span></li><li><span><a href="#ISM-Manufacturing-Employment" data-toc-modified-id="ISM-Manufacturing-Employment-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>ISM Manufacturing Employment</a></span></li><li><span><a href="#Markit---Manufacturing-PMI" data-toc-modified-id="Markit---Manufacturing-PMI-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Markit - Manufacturing PMI</a></span></li></ul></li><li><span><a href="#Visualize-Data" data-toc-modified-id="Visualize-Data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Visualize Data</a></span><ul class="toc-item"><li><span><a href="#Prepare-x-axist-data-(time-periods)" data-toc-modified-id="Prepare-x-axist-data-(time-periods)-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Prepare x-axist data (time periods)</a></span></li><li><span><a href="#Plotting-Data" data-toc-modified-id="Plotting-Data-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Plotting Data</a></span></li></ul></li><li><span><a href="#Extract-Data" data-toc-modified-id="Extract-Data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Extract Data</a></span><ul class="toc-item"><li><span><a href="#Extract-Independent-Variables-(i.e.-features)" data-toc-modified-id="Extract-Independent-Variables-(i.e.-features)-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Extract Independent Variables (i.e. features)</a></span></li><li><span><a href="#Extract-Dependent-Variable-(i.e.-price)" data-toc-modified-id="Extract-Dependent-Variable-(i.e.-price)-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Extract Dependent Variable (i.e. price)</a></span></li></ul></li><li><span><a href="#Normalize-the-data" data-toc-modified-id="Normalize-the-data-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Normalize the data</a></span><ul class="toc-item"><li><span><a href="#Show-mean()-values" data-toc-modified-id="Show-mean()-values-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Show mean() values</a></span></li><li><span><a href="#Show-std()-values" data-toc-modified-id="Show-std()-values-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Show std() values</a></span></li><li><span><a href="#Normalize-the-independent-variables" data-toc-modified-id="Normalize-the-independent-variables-6.3"><span class="toc-item-num">6.3&nbsp;&nbsp;</span>Normalize the independent variables</a></span></li></ul></li><li><span><a href="#Mean-Squared-Error-(MSE)" data-toc-modified-id="Mean-Squared-Error-(MSE)-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Mean Squared Error (MSE)</a></span></li></ul></div>

## Libraries used in this notebook

Uncomment the lines below if you are missing given libraries, or you want to update to the newest version.
Warning: the updates will take time and will require pre-compiling.

In [1]:
import Pkg; 

# Pkg.add("Flux")
using Flux

# Pkg.add("DelimitedFiles")
using DelimitedFiles

#Pkg.add("Statistics")
using Statistics

# Pkg.add("Flux")
using Flux.Optimise: update!

# Pkg.add("Parameters")
using Parameters #: @with_kw


#Pkg.add("Plots")
using Plots

#Pkg.add("DataFrames")
using DataFrames

#Pkg.add("CSV")
using DataFrames, CSV, Dates, Plots

#Pkg.add("Dates")
using Dates

In [2]:
 println( "Last update: ", Dates.format(Dates.now(), "u. d, yyyy HH:mm"))

Last update: Sep. 2, 2020 21:09


# Define Hyperparameters

In [3]:
# Struct to define hyperparameters

@with_kw mutable struct Hyperparams
    learning_rate::Float64 = 0.1       # learning rate
    split_ratio::Float64 = 0.1         # train/test split ratio 0.1 => 90/10
end
file_ISM_Mfc_Emp = "data/united-states.ism-manufacturing-employment.csv"
file_Markkit_Mfc_PMI = "data/united-states.markit-manufacturing-pmi.csv"

"data/united-states.markit-manufacturing-pmi.csv"

# Load  Data



## Define Fetch Data function

In the future I expect to have large data sets that are too big for the GitHub,
so I am planning for the separate downlod location i.e. [Google Drive](https://drive.google.com/drive/folders/1_cPeoIdjw-e-1llean_2OPd3HJi9-Hvc?usp=sharing), etc.

In [4]:
function fetch_data(file_path)
    isfile(file_path) || # does file exist locally?
        download( string("https://raw.githubusercontent.com/UkiDLucas/MarketIndicators.jl/master/", file_path), # URL
        file_path) # save it as name
    
    return CSV.read(file_path) # returnd DataFrame
end

fetch_data (generic function with 1 method)

## ISM Manufacturing Employment

In [5]:
df_ISM_Mfc_Emp = fetch_data(file_ISM_Mfc_Emp)

│   caller = read at CSV.jl:40 [inlined]
└ @ Core /Users/uki/.julia/packages/CSV/MKemC/src/CSV.jl:40


Unnamed: 0_level_0,Date,ActualValue,ForecastValue,PreviousValue
Unnamed: 0_level_1,String,Float64,Float64?,Float64?
1,2020.08.03,44.3,34.4,42.1
2,2020.07.01,42.1,38.1,32.1
3,2020.06.01,32.1,34.1,27.5
4,2020.05.01,27.5,44.8,43.8
5,2020.04.01,43.8,45.4,46.9
6,2020.03.02,46.9,43.9,46.6
7,2020.02.03,46.6,43.3,45.2
8,2020.01.03,45.1,53.5,46.6
9,2019.12.02,46.6,54.3,47.7
10,2019.11.01,47.7,54.7,46.3


## Markit - Manufacturing PMI

In [6]:
df_Markkit_Mfc_PMI = fetch_data(file_Markkit_Mfc_PMI)

Unnamed: 0_level_0,Date,ActualValue,ForecastValue,PreviousValue
Unnamed: 0_level_1,String,Float64,Float64?,Float64?
1,2020.08.21,53.6,51.1,50.9
2,2020.08.03,50.9,51.3,51.3
3,2020.07.24,51.3,49.7,49.8
4,2020.07.01,49.8,49.6,49.6
5,2020.06.23,49.6,42.3,39.8
6,2020.06.01,39.8,39.8,39.8
7,2020.05.21,39.8,37.9,36.1
8,2020.05.01,36.1,36.9,36.9
9,2020.04.23,36.9,47.5,48.5
10,2020.04.01,48.5,49.2,49.2


# Visualize Data

Visualization of the data is important to identify:
- possible correlations (patterns)
- missing data (e.g. before certain date)
- wrong types of data (text vs. numbers, etc.)

## Prepare x-axist data (time periods)

In [7]:
periods = data_Markkit_Mfc_PMI[:,1]

UndefVarError: UndefVarError: data_Markkit_Mfc_PMI not defined

## Plotting Data

In [8]:
gr()
plot(periods, 
    [data_Markkit_Mfc_PMI[:,2]  data_Markkit_Mfc_PMI[:,3]  ], 
    label    = ["original" "normalized"],
    legend    =:topleft, # :right, :left, :top, :bottom, :inside, :best, :legend, :topright, :topleft, :bottomleft, :bottomright
    xlabel   = "time",
    ylabel   = "indicators",
    size     = (1750, 600), # width, height
    layout = (2, 1)
    )

UndefVarError: UndefVarError: data_Markkit_Mfc_PMI not defined

# Extract Data

In [9]:
# rotate the matrix (switch columns to rows) - does not work on DataFrame)
# data = data' 

## Extract Independent Variables (i.e. features)

In [10]:
x = rawdata[1:13,:]     # independent variables: all rows before last

UndefVarError: UndefVarError: rawdata not defined

## Extract Dependent Variable (i.e. price)

In [11]:
y = rawdata[14:14,:]          # Dependent Variable (price) last ROW

UndefVarError: UndefVarError: rawdata not defined

# Normalize the data

## Show mean() values 

Calculate mean values for each feature in the 2-dimentional matrix.

In [12]:
mean(x, dims = 2)

UndefVarError: UndefVarError: x not defined

## Show std() values 

Calculate sample standard deviation (STD).

- https://docs.julialang.org/en/v1/stdlib/Statistics/

In [13]:
std(x, dims = 2) 

UndefVarError: UndefVarError: x not defined

## Normalize the independent variables

In [14]:
x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2) # math on 13× Arrays

UndefVarError: UndefVarError: x not defined

In [15]:
records = size(x,2) # number of columns

UndefVarError: UndefVarError: x not defined

In [16]:
args = Hyperparams()

Hyperparams
  learning_rate: Float64 0.1
  split_ratio: Float64 0.1


In [17]:
split_ratio = args.split_ratio

split_index = floor(Int, records * split_ratio)

UndefVarError: UndefVarError: records not defined

In [18]:
x_train = x[:,1:split_index]           # training features
y_train = y[:,1:split_index]           # training results
x_test = x[:,split_index+1:records]  # testing features
y_test = y[:,split_index+1:records]  # testing results

UndefVarError: UndefVarError: split_index not defined

In [19]:
train_data = (x_train, y_train) # tuples
test_data = (x_test, y_test)
size(test_data[1])

UndefVarError: UndefVarError: x_train not defined

In [20]:
function get_processed_data(args) # expects struct Hyperparams

    isfile("housing.data") ||
        download(
            "https://raw.githubusercontent.com/MikeInnes/notebooks/master/housing.data",
            "housing.data")

    rawdata = readdlm("housing.data")'

    # The last feature is our target -- the price of the house.
    split_ratio = args.split_ratio # For the train/test split

    x = rawdata[1:13,:]
    y = rawdata[14:14,:]

    # Normalise the data
    x = (x .- mean(x, dims = 2)) ./ std(x, dims = 2)

    # Split into train and test sets
    split_index = floor(Int,size(x,2)*split_ratio)
    x_train = x[:,1:split_index]
    y_train = y[:,1:split_index]
    x_test = x[:,split_index+1:size(x,2)]
    y_test = y[:,split_index+1:size(x,2)]

    train_data = (x_train, y_train)
    test_data = (x_test, y_test)

    return train_data,test_data
end

get_processed_data (generic function with 1 method)

In [21]:
# Struct to define model
mutable struct model
    W::AbstractArray
    b::AbstractVector
end

In [22]:
# Function to predict output from given parameters

predict(x, m) = m.W*x .+ m.b

predict (generic function with 1 method)

# Mean Squared Error (MSE)

<center><span style="font-size:x-large;" >$ MSE = \sum \limits _{i=1} ^{n} {   \frac{(ŷ_i - y)^2}{n} }$</span></center>

In [23]:
n = size(y, 2) # e.g. 505 columns

# Mean Squared Error
meansquarederror(ŷ, y) = sum((ŷ .- y).^2)/n

UndefVarError: UndefVarError: y not defined

In [24]:
function train(; kws...)
    # Initialize the Hyperparamters
    args = Hyperparams(; kws...)
    
    # Load the data
    (x_train,y_train),(x_test,y_test) = get_processed_data(args)
    
    test_data = (x_test,y_test)
    
    # The model
    m = model((randn(1,13)),[0.])
    
    loss(x, y) = meansquarederror(predict(x, m), y)

    ## Training
    η = args.learning_rate
    θ = params([m.W, m.b])

    for i = 1:1000
      g = gradient(() -> loss(x_train, y_train), θ)
      for x in θ
        update!(x, -g[x]*η)
      end
      if i%100==0
          @show loss(x_train, y_train)
        end
    end
    
    # Predict the RMSE on the test set
    err = meansquarederror(predict(x_test, m),y_test)
    println("error: ", err)
    return m , test_data# model
end

train (generic function with 1 method)

In [25]:
cd(@__DIR__)
resulting_model, test_data = train()
resulting_model.W

UndefVarError: UndefVarError: meansquarederror not defined

In [26]:
resulting_model.b

UndefVarError: UndefVarError: resulting_model not defined

In [27]:
function test(model, test)
    # Testing model performance on test data 
    X_test, y_test = test
    #accuracy_score = accuracy(X_test, y_test, model)

    #println("\nAccuracy: $accuracy_score")

    # Sanity check.
    #@assert accuracy_score > 0.8

    # To avoid confusion, here is the definition of a Confusion Matrix: https://en.wikipedia.org/wiki/Confusion_matrix
    println("\nConfusion Matrix:\n")
    #display(confusion_matrix(X_test, y_test, model))
end

test (generic function with 1 method)

In [28]:
test(model, test_data)
features = test_data[1]

UndefVarError: UndefVarError: test_data not defined

In [29]:
one_record = features[:,1]

UndefVarError: UndefVarError: features not defined

In [30]:
results = test_data[1][1,:]

UndefVarError: UndefVarError: test_data not defined

In [31]:
records = size(results)[1]

UndefVarError: UndefVarError: results not defined

In [32]:
get_price(data, model) = model.W * data .+ model.b

get_price (generic function with 1 method)

In [33]:
get_price(one_record, resulting_model)

UndefVarError: UndefVarError: one_record not defined

In [34]:
for i in 1:records # 455
    record = features[:,i] # 13-element Array{Float64,1}:
    result = get_price(record, resulting_model)
    println(i, " ", result, " =? ", results[1])
end

UndefVarError: UndefVarError: records not defined