# A loss function defined to optimize our return

In the selection of the bets to place, we have defined thresholds. 
Hereby not making a distinction between good bets and bad bets.

In this notebook, we will try to implement a new loss function, designed to optimize our return.

## Initialization

In [1]:
import dataset
import keras
import os

import pandas as pd
import numpy as np
import tensorflow as tf

# from keras import regularizers
from keras import metrics
# from keras.models import Sequential
# from keras.layers import Dense, BatchNormalization
# from keras.optimizers import Adagrad, Adam
from keras.utils import np_utils
from keras_tqdm import TQDMNotebookCallback

from sklearn.preprocessing import StandardScaler, LabelEncoder

from keras import backend as K

import sys
import import_notebook
sys.meta_path.append(import_notebook.NotebookFinder())

import common

Using TensorFlow backend.


importing Jupyter notebook from common.ipynb


## Data Preparation

In [2]:
np.set_printoptions(suppress=True)

book = dataset.Dataset('data/book.csv')
df = pd.DataFrame(book.processed_results)

# df = pd.read_csv('all_processed.csv')

TRAINING_SET_FRACTION = 0.95
train_results_len = int(TRAINING_SET_FRACTION * df.shape[0])

features, labels = common.get_feables(df)
train_features = features[:train_results_len]
test_features = features[train_results_len:]
y_train = labels[:train_results_len]
y_test = labels[train_results_len:]

scaler = StandardScaler()
X_train = scaler.fit_transform(train_features.astype(float))
X_test = scaler.transform(test_features.astype(float))

In [3]:
odds_train = train_features[['odds-home','odds-draw','odds-away']]
odds_test = test_features[['odds-home','odds-draw','odds-away']]

c_train = odds_train * (2 * y_train - 1)
c_test = odds_test * (2 * y_test - 1)

## Some utility functions

In [4]:
def evaluate(loss, y_train=c_train, y_test=c_test, th1=0.05, th2=0.9, batch_size=500, normalize=False):
    output_classes = y_train.shape[1]
    name = '%s_%02d' % (str(loss.__qualname__), output_classes)
    if output_classes == 4:
        risk = d_train['no-bet'].mean()
        if risk != 1:
            name = name + '_risk_%3.2f' % y_test['no-bet'].mean()
    if os.path.exists('logs/%s' % name):
        try:
            shutil.rmtree('logs/%s' % name)
        except:
            pass
    model = common.construct_model(X_train.shape[1], output_classes=output_classes, loss=loss, normalize=normalize, metrics=['accuracy'] + common.bet_metrics())
    _ = model.fit(X_train, y_train,
      epochs=200,
      batch_size=batch_size, verbose=0,
      validation_data = [X_test, y_test],
      callbacks=[keras.callbacks.TensorBoard(log_dir='./logs/%s' % name, write_graph=True), 
                 TQDMNotebookCallback(show_inner=False)]
     )
    return common.performance(model, X_test, y_test, th1=th1, th2=th2)

In [5]:
_EPSILON = 10e-8

def cat_loss(b_true, y_pred):
    prob_true = K.clip(b_true, 0., 1.)
    prob = K.clip(y_pred, _EPSILON, 1. - _EPSILON)
    res = K.sum(prob_true * -K.log(prob), axis=-1)
    return res

## Reference

Let's set a reference in terms of performance.

In [6]:
evaluate(cat_loss)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,8.0,11.0,0.62,5.0
draw,-2.0,2.0,0.0,0.0
away,-0.84,13.0,-0.32,4.0


Ways to generalize the model is to decrease the batch_size or add batch normalization.
Batch normalization will be done in other notebooks, but let's see what the effect of the batch_size is.

In [7]:
evaluate(cat_loss, batch_size=50)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,9.0,10.0,0.62,5.0
draw,0.5,3.0,0.0,0.0
away,-2.29,16.0,-0.32,4.0


Besides from slower execution, the models seems to generalize a little bit better in terms of profit.
The accuracy of both models is about the same, but the profit for the training set seems already very different.
The same is true for the profit of the test set.

## Custom loss function

Because our goal is to optimize the profit, and not to predict as much outcomes as possible,
we could also try to optimize the profit directly.


* I have implemented the bet_loss function before reading this good article 
[Machine Learning for Sports betting: not a basic classification problem](https://medium.com/@media_73863/machine-learning-for-sports-betting-not-a-basic-classification-problem-b42ae4900782). To my opinion, both functions model the same. We will check this in this notebook. Our odds_loss function is also a little bit more compact than theirs. I am not sure that this improves readibility, but it should improve performance.

In [8]:
def odds_loss(b_true, y_pred):
    profit = K.clip(b_true, 0., np.inf) 
    prob = K.clip(y_pred, _EPSILON, 1. - _EPSILON)
    res2 = K.sum(profit * prob - 1 * (1 - prob), axis=-1)
    return -res2

def bet_loss(b_true, y_pred):
    profit = K.clip(b_true, 0., np.inf) - 1
    prob = K.clip(y_pred, _EPSILON, 1. - _EPSILON)
    res2 = K.sum(profit * prob, axis=-1)
    return -res2

In [9]:
evaluate(odds_loss)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-8.619999,54.0,-1.019999,37.0
draw,5.290001,71.0,-9.0,56.0
away,-24.41,54.0,-20.91,46.0


In [10]:
evaluate(bet_loss)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-8.619999,54.0,-1.019999,37.0
draw,5.290001,71.0,-9.0,56.0
away,-24.41,54.0,-20.91,46.0


It seems that both our implementations are equivalent. 
Let's focus on our implementation bet_loss.

### Adding a category

Like in the same [article](https://medium.com/@media_73863/machine-learning-for-sports-betting-not-a-basic-classification-problem-b42ae4900782),
I thought about adding a category which would classify matches for which it is not opportune to bet.

In order to do this, we design a fourth class with odds set to 1.
If the network decides to place this bet, no money will be lost or won.

In [11]:
d_train = c_train.copy()
d_test = c_test.copy()
d_train['no-bet'] = 1
d_test['no-bet'] = 1

In [12]:
evaluate(bet_loss, y_train=d_train, y_test=d_test)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-2.81,35.0,-1.600001,28.0
draw,10.11,53.0,7.72,35.0
away,-12.399999,47.0,-13.819999,39.0
no-bet,0.0,0.0,0.0,34.0


Still not making profit with this scheme. Let's add a [batch normalization](https://towardsdatascience.com/dont-use-dropout-in-convolutional-networks-81486c823c16) step.

In [13]:
evaluate(bet_loss, y_train=d_train, y_test=d_test, normalize=True)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,3.86,34.0,4.809999,29.0
draw,9.889999,66.0,8.689999,64.0
away,-15.279998,46.0,-10.309999,36.0
no-bet,0.0,0.0,0.0,25.0


That's still not very profitable :-(
We also see that the probabilities with this cost function shift a lot towards high probability.
This is most likely because of the linear nature of the cost function.

### Decreasing the risk

In our example, only 25 matches are not bet upon.
Let's try to increase this by expecting more than 10% return on each bet.

We accomplish this by increasing the `reward` for a bet not placed by 10%.

In [14]:
d_train['no-bet'] = 1.1
d_test['no-bet'] = 1.1

In [15]:
evaluate(bet_loss, y_train=d_train, y_test=d_test)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-4.77,23.0,0.14,14.0
draw,0.54,16.0,4.15,9.0
away,0.0,0.0,0.0,0.0
no-bet,10.900005,109.0,11.400007,114.0


This seems to work for the absolute profits :-) 
    
Let's see if this is stable by adding a normalization layer.

In [16]:
evaluate(bet_loss, y_train=d_train, y_test=d_test, normalize=True)

HBox(children=(IntProgress(value=0, description='Training', max=200, style=ProgressStyle(description_width='in…




Unnamed: 0,Profit,Count,|Profit|,|Count|
home,3.25,13.0,-6.0,6.0
draw,2.14,18.0,-0.45,14.0
away,-2.63,10.0,-1.93,8.0
no-bet,12.300011,123.0,12.600012,126.0


With the normalization layer, we can switch back to relative probabilities (or is this just coincidence).

The only way to find out, is to repeat our experiments with cross validation.

# Cross validation

In [19]:
from sklearn.model_selection import KFold
kfold = KFold(n_splits=20, random_state=42)

from IPython.display import clear_output

def cross_validate(epochs=200, loss=bet_loss, risk=1, batch_size=500, output_classes=3, 
                   normalize=False, verbose=False, hidden_layer=[10], drop_odds=False):
    results = []
    models = []
    predictions = []
    odds = []

    for i, (trainidx, valididx) in enumerate(kfold.split(df)):
        train = df.iloc[trainidx]
        test = df.iloc[valididx]
        train_features = train.drop(columns=['result', 'date'])
        train_labels = train.result.copy()
        test_features = test.drop(columns=['result', 'date'])
        test_labels = test.result.copy()
        scaler = StandardScaler()

        if drop_odds:
            trf = train_features.drop(columns=['odds-home','odds-draw','odds-away'])
            tef = test_features.drop(columns=['odds-home','odds-draw','odds-away'])

            X_train = scaler.fit_transform(trf.astype(float))
            X_test = scaler.transform(tef.astype(float))
        else:
            X_train = scaler.fit_transform(train_features.astype(float))
            X_test = scaler.transform(test_features.astype(float))
        
        encoder = LabelEncoder()
        Y_train = -encoder.fit_transform(train_labels) +2
        Y_test = -encoder.transform(test_labels) +2
        y_train = np_utils.to_categorical(Y_train)
        y_test = np_utils.to_categorical(Y_test)    
        odds_train = train_features[['odds-home','odds-draw','odds-away']]
        odds_test = test_features[['odds-home','odds-draw','odds-away']]

        c_train = odds_train * (2 * y_train - 1)
        c_test = odds_test * (2 * y_test - 1)
        if output_classes == 4:
            c_train['no-bet'] = risk
            c_test['no-bet'] = risk

        model = common.construct_model(input_classes = X_train.shape[1], loss=loss, 
                                       hidden_layer = hidden_layer,
                                        output_classes=output_classes, metrics=['accuracy'] + common.bet_metrics(),
                                        normalize=normalize)

        _ = model.fit(X_train, c_train,
              epochs=epochs,
              batch_size=batch_size, verbose=0,
              validation_data = [X_test, c_test],
              callbacks=[keras.callbacks.TensorBoard(log_dir='./logs/fold_%02d' % i, write_graph=True), 
                         TQDMNotebookCallback(show_inner=False)]
         )    

        cm = common.performance(model, X_test, c_test)
        results.append(cm)
        models.append(model)
        
        predictions.append(model.predict(X_test))
        odds.append(c_test)

        if verbose:
            display(cm)
        else:
            clear_output()
        
    return results, (models, predictions, odds)

## Categorical loss

Let's first set a reference using categorical loss function

In [20]:
results, models = cross_validate(loss=cat_loss)

In [21]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,14.639999,269.0,2.02,32.0
draw,23.1,96.0,0.0,0.0
away,15.29,85.0,-1.81,8.0


### Decreased batch size

In [22]:
results, models = cross_validate(loss=cat_loss, batch_size=50)

In [21]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,22.93,279.0,1.99,32.0
draw,8.649999,104.0,0.0,0.0
away,6.400001,83.0,-1.03,6.0


That's a pretty decent profit. While we are using a comparable model to the revampled notebook, it seems that less epochs results in less overfitting for this particular problem. I would still not bet my money on this.

## Custom loss function

Let's check if we can get similar profits with the loss functions designed to optimize the profit.

In [23]:
results, models = cross_validate(loss=bet_loss)

In [23]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-74.080002,1055.0,-48.899998,879.0
draw,-114.82,1658.0,-100.480003,1390.0
away,-149.839996,831.0,-111.849998,657.0


In [24]:
results, models = cross_validate(loss=bet_loss, output_classes=4)

In [25]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-46.450005,864.0,-25.850002,702.0
draw,-75.75,1118.0,-23.230003,831.0
away,-104.969994,504.0,-63.120003,283.0
no-bet,0.0,0.0,0.0,843.0


In [35]:
results, models = cross_validate(loss=odds_loss, output_classes=4)

In [36]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-48.350006,863.0,-25.850002,702.0
draw,-74.75,1117.0,-23.230003,831.0
away,-104.969994,504.0,-62.120003,282.0
no-bet,0.0,0.0,0.0,843.0


In [26]:
results, models = cross_validate(loss=bet_loss, output_classes=4, risk=1.1)

In [27]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-36.940002,155.0,-23.460001,86.0
draw,-4.999999,383.0,-7.639998,198.0
away,-20.0,20.0,-6.0,6.0
no-bet,255.800247,2558.0,270.600311,2706.0


In [28]:
results, models = cross_validate(loss=bet_loss, output_classes=4, risk=1.1, normalize=True)

In [29]:
result = pd.concat({n: df for n, df in enumerate(results)},axis=0)
result.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-24.92,275.0,-39.209999,179.0
draw,20.550001,298.0,16.57,179.0
away,-0.33,58.0,-3.23,32.0
no-bet,261.600281,2616.0,268.600311,2686.0


The `better` loss functions do not seem to generate more profit than our categorical model.

# All leagues

We need to verify if the simple model with the categorical crossentropy is able to generate profit,
if we apply this across a set of all leagues.

In [24]:
df = pd.read_csv('all_processed.csv')

In [25]:
all_results, all_models = cross_validate(loss=cat_loss)

In [39]:
all_results = pd.concat({n: df for n, df in enumerate(all_results)},axis=0)
all_results.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-3.670002,247.0,-4.2,332.0
draw,-3.0,3.0,0.0,0.0
away,-17.119999,185.0,1.229999,69.0


In [40]:
all_results, all_models = cross_validate(loss=cat_loss, normalize=True)

In [41]:
all_results = pd.concat({n: df for n, df in enumerate(all_results)},axis=0)
all_results.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-44.609997,544.0,-3.78,366.0
draw,0.0,0.0,0.0,0.0
away,-1.11,201.0,0.6,59.0


In [42]:
all_results, all_models = cross_validate(loss=bet_loss, output_classes=4, risk=1, normalize=True)

In [43]:
all_results = pd.concat({n: df for n, df in enumerate(all_results)},axis=0)
all_results.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-81.269974,3211.0,-49.93,2436.0
draw,-86.07,573.0,-21.199999,235.0
away,-4.679987,1059.0,50.099998,632.0
no-bet,0.0,0.0,0.0,11069.0


In [44]:
all_results, all_models = cross_validate(loss=odds_loss, output_classes=4, risk=1, normalize=True)

In [45]:
all_results = pd.concat({n: df for n, df in enumerate(all_results)},axis=0)
all_results.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-70.75,3289.0,-26.419987,2484.0
draw,-53.579998,566.0,-20.030001,220.0
away,-24.499992,1047.0,45.259998,621.0
no-bet,0.0,0.0,0.0,11021.0


In [46]:
all_results, all_models = cross_validate(loss=cat_loss, output_classes=3, epochs=1000, normalize=True, hidden_layer=[20,20])

In [47]:
all_results = pd.concat({n: df for n, df in enumerate(all_results)},axis=0)
all_results.sum(level=1)

Unnamed: 0,Profit,Count,|Profit|,|Count|
home,-42.16,561.0,-0.39,293.0
draw,-4.5,13.0,0.0,0.0
away,6.210001,151.0,-0.730001,46.0


Maybe it's time to decide that it is not so simple to make some profit with sports betting.

In another notebook, I will dive deeper into these loss functions and maybe it's also time to dive into hyperparameter optimization.