# Horse Racing Prediction Hyper-parameters Tuning by Keras

This Jupyter notebook is used to tune the hyper-parameters for the horse racing prediction model. The tuning will provide the best values of:

- number of hidden layers
- number of neurons in the hidden layers
- activation function
- drop out rate in the dropout layer

<H2> Part 1: Data Input and Preprocessing </H2>

In this part of the program, we will import the data obtained from the HKJC. First of all, the following features were selected based on my past horse picking experience, namely:

- position: The starting position of the horse. If the position is "1", it indicates the closest position to the hurdle and should be benficial in non-straight race courses.

- load: This is the loading of the horse in pounds. Maximum is 133.

- ON odds: This is the overnight odds of the horse provided by the HKJC.

- odds: This is the odds of the horse 15 min before the race.

- class: This is the class of the case. It is common to all horses in a race except special races.

- num horses: This is the number of horses participated in the race.


In [1]:
#Loading the data and preprocessing it.

import pandas as pd
import tensorflow as tf


print ("tensorflow version = " + str(tf.__version__))

PATH_TRAINING_DATA = 'training_data/horse_data_train_test.csv'

dataset = pd.read_csv(PATH_TRAINING_DATA)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, shuffle=False)


tensorflow version = 2.5.0


In [2]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train) #use "fit_transform" for training data
X_test = sc.transform(X_test)       #use "transform" for testing data

y_train = tf.keras.utils.to_categorical((y_train-1), 14)
y_test = tf.keras.utils.to_categorical((y_test-1), 14)

print(X_train)

[[-0.06771695  0.89160737 -0.38687433 -0.63908619 -0.45749246 -1.03163272]
 [-0.57595046 -0.56040189 -0.73466068 -0.85907985 -0.45749246 -1.03163272]
 [ 0.69463333 -0.95640441 -0.52063831 -0.66841868 -0.45749246 -1.03163272]
 ...
 [-1.33830074 -1.2204061  -0.05246437 -0.08176894 -1.4736673   0.69097648]
 [ 0.69463333 -1.35240694 -0.08590537  0.79820568 -1.4736673   0.69097648]
 [-1.08418398 -1.48440778 -0.52063831 -0.70068442 -1.4736673   0.69097648]]


<H2> Part 2: Building the Model </H2>

We will introduce Keras Tuner to find the best number of nuerons in the hidden layer.


<H2> Part 2a: Clearing the cache</H2>

In [3]:
import os
import sys
import shutil

tmp_project_folder = "untitled_project" 
tmp_checkpoint_folder = ".ipynb_checkpoints"

# checking whether file exists or not
if os.path.exists(tmp_project_folder):    
    shutil.rmtree(tmp_project_folder)

if os.path.exists(tmp_checkpoint_folder):        
    shutil.rmtree(tmp_checkpoint_folder)
    

<H2> Part 2b: Building the model </H2>

In [6]:
from keras_tuner.tuners import RandomSearch

#create a new model
def build_model(hp):

    num_hidden_layers = hp.Choice('num_hidden_layers', values=[2, 3, 4])                   #3 choices
    dropout_rate = hp.Float('dropout_rate', min_value=0.1, max_value=0.5)                  #? choices
    #num_units = hp.Choice('num_units', values=[3, 4, 5, 6, 7, 8, 9, 10, 11, 12])           #10 choices    
        
    # Initializing the ANN
    model = tf.keras.models.Sequential()
    
    # Adding the input layer and the first layer
    # We have six features now, and one bias term, so the input layer has a size of 6+1
    model.add(tf.keras.layers.Dense(units=7, activation='sigmoid'))
    
    for i in range(0, num_hidden_layers):
        activation = hp.Choice('activation', values=['sigmoid', 'relu', 'tanh'])           #3 choices
        model.add(tf.keras.layers.Dense(units=hp.Int('units_' + str(i),                    #10 choices
                                                 min_value=3,
                                                 max_value=12,        
                                                 step=1),
                                        activation=activation))
    
    # Adding a drop out layer
    model.add(tf.keras.layers.Dropout(dropout_rate))
    
    # Adding the output layer
    # We have 14 outputs, so the output later has a size of 14
    model.add(tf.keras.layers.Dense(units=14, activation='softmax'))

    # Compiling the model
    #opt = tf.keras.optimizers.Adam(learning_rate=0.03) 
    opt = tf.keras.optimizers.Adam(learning_rate=0.01)
    model.compile(loss = 'categorical_crossentropy', optimizer = opt, metrics = ['accuracy'])
    
    return model


In [7]:
tuner = RandomSearch(
    build_model,
    objective = 'val_accuracy',
    max_trials = 50 #180 #20 #100
)

tuner.search_space_summary()

Search space summary
Default search space size: 5
num_hidden_layers (Choice)
{'default': 2, 'conditions': [], 'values': [2, 3, 4], 'ordered': True}
dropout_rate (Float)
{'default': 0.1, 'conditions': [], 'min_value': 0.1, 'max_value': 0.5, 'step': None, 'sampling': None}
activation (Choice)
{'default': 'sigmoid', 'conditions': [], 'values': ['sigmoid', 'relu', 'tanh'], 'ordered': False}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 3, 'max_value': 12, 'step': 1, 'sampling': None}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 3, 'max_value': 12, 'step': 1, 'sampling': None}


<H2> Part 3a: Using Keras Tuner to find the best number of layers and neurons </H2>

In this section, we would invetigate the best number of neurons in our model.


In [8]:
import tensorflow as tf
import datetime, os

print("rows in X_train = " + str(X_train.shape[0]) )
print("rows in y_train = " + str(y_train.shape[0]) )

print("Training model...")
tuner.search(X_train, y_train, epochs=1000, validation_data=(X_test, y_test))

print("results:")
best_model = tuner.get_best_models()[0]

tuner.results_summary()


Trial 50 Complete [00h 00m 57s]
val_accuracy: 0.16201117634773254

Best val_accuracy So Far: 0.18435753881931305
Total elapsed time: 01h 09m 57s
INFO:tensorflow:Oracle triggered exit
results:
Results summary
Results in .\untitled_project
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
num_hidden_layers: 4
dropout_rate: 0.39438245342289924
activation: tanh
units_0: 7
units_1: 9
units_2: 11
units_3: 4
Score: 0.18435753881931305
Trial summary
Hyperparameters:
num_hidden_layers: 3
dropout_rate: 0.4221952237820654
activation: sigmoid
units_0: 4
units_1: 11
units_2: 7
units_3: 7
Score: 0.17877094447612762
Trial summary
Hyperparameters:
num_hidden_layers: 4
dropout_rate: 0.29687287001423956
activation: sigmoid
units_0: 3
units_1: 6
units_2: 9
units_3: 11
Score: 0.16759777069091797
Trial summary
Hyperparameters:
num_hidden_layers: 4
dropout_rate: 0.1328106090509752
activation: tanh
units_0: 8
units_1: 6
units_2: 4
units_3: 6
Score: 0.167597

In [13]:
#This is not working somehow:
print(best_model)

#best_model.build()
#best_model.summary() 

<tensorflow.python.keras.engine.sequential.Sequential object at 0x000001EDEF295348>
