# **Football betting model**

In this notebook, we'll try to train a *neural network* to find **value bets**.


Value bets are just bets where we have an advantage over the bookie. It is when the **provided odds** are lower than the **actual odds**.


So let's try to find them!

## **1. Importing necessary files and libraries**

In [1]:
import os

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

import pandas as pd

pd.set_option('display.max_columns', None)

In [2]:
# Import our files

from google.colab import files
uploaded = files.upload()


files = [f for f in uploaded.keys() if f.endswith('.csv')]
data_frames = [pd.read_csv(f) for f in files]
full_data = pd.concat(data_frames, ignore_index=True)

Saving bu20_ratings.csv to bu20_ratings (2).csv
Saving bu21_ratings.csv to bu21_ratings (2).csv
Saving bu22_ratings.csv to bu22_ratings (2).csv
Saving bu23_ratings.csv to bu23_ratings (2).csv
Saving bu24_ratings.csv to bu24_ratings (2).csv
Saving l121_ratings.csv to l121_ratings (2).csv
Saving l122_ratings.csv to l122_ratings (2).csv
Saving l123_ratings.csv to l123_ratings (2).csv
Saving l124_ratings.csv to l124_ratings (2).csv
Saving li20_ratings.csv to li20_ratings (2).csv
Saving li21_ratings.csv to li21_ratings (2).csv
Saving li22_ratings.csv to li22_ratings (2).csv
Saving li23_ratings.csv to li23_ratings (2).csv
Saving li24_ratings.csv to li24_ratings (2).csv
Saving pl20_ratings.csv to pl20_ratings (2).csv
Saving pl21_ratings.csv to pl21_ratings (2).csv
Saving pl22_ratings.csv to pl22_ratings (2).csv
Saving pl23_ratings.csv to pl23_ratings (2).csv
Saving pl24_ratings.csv to pl24_ratings (2).csv
Saving pr20_ratings.csv to pr20_ratings (2).csv
Saving pr21_ratings.csv to pr21_ratings 

## **2. Organising and normalizing our features**

In [3]:
# We'll add the rank difference, rating difference and result features (home - away) to make it one column insteadof two and normalize them

full_data['Rank Difference'] = full_data['Home ranking'] - full_data['Away ranking']
full_data['Rank Difference'] = (full_data['Rank Difference'] - full_data['Rank Difference'].mean()) / full_data['Rank Difference'].std()

full_data['Rating Difference'] = full_data['Home Last Avg Rating'] - full_data['Away Last Avg Rating']
full_data['Rating Difference'] = (full_data['Rating Difference'] - full_data['Rating Difference'].mean()) / full_data['Rating Difference'].std()

full_data['Results'] = full_data['Home score'] - full_data['Away score']
full_data['Label'] = full_data['Results'].apply(lambda x: 1 if x > 0 else (0 if x == 0 else 2))
full_data['Results'] = (full_data['Results'] - full_data['Results'].mean()) / full_data['Results'].std()
odds_features = ['Home odds', 'Draw odds','Away odds']
odds = full_data[odds_features].values

# And make our tensors

features = ['Rank Difference', 'Rating Difference', 'Home odds', 'Draw odds', 'Away odds']

X = full_data[features].values
y = full_data['Label'].values

X_train, X_test, y_train, y_test, odds_train, odds_test = train_test_split(X, y, odds, test_size=0.2, random_state=42)

# Convert to PyTorch tensors

X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)
odds_train_tensor = torch.tensor(odds_train, dtype=torch.float32)
odds_test_tensor = torch.tensor(odds_test, dtype=torch.float32)

# Move tensors to GPU if available

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

X_train_tensor = X_train_tensor.to(device)
y_train_tensor = y_train_tensor.to(device)
X_test_tensor = X_test_tensor.to(device)
y_test_tensor = y_test_tensor.to(device)
odds_train_tensor = odds_train_tensor.to(device)
odds_test_tensor = odds_test_tensor.to(device)

In [18]:
# We'll make our Dataset class (to ensure our structure conformity)

class FootballDataset(Dataset):
    def __init__(self, X, y, odds):
        self.X = X
        self.y = y
        self.odds = odds

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx], self.odds[idx]


train_dataset = FootballDataset(X_train_tensor, y_train_tensor, odds_train_tensor)
test_dataset = FootballDataset(X_test_tensor, y_test_tensor, odds_test_tensor)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

## **3. Making our neural network**

In [19]:
# And make our model

class BettingNN(nn.Module):
    def __init__(self, input_size, num_classes):
        super(BettingNN, self).__init__()
        self.fc1 = nn.Linear(input_size, 64)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(64, 32)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(32, num_classes)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

In [20]:
model = BettingNN(input_size=5, num_classes=3).to(device)

We'll customise our **loss function**, because we're trying to find value bets, not just predict the outcome, because we can find a value bet and it'll lose, so we have to take that into consideration.

In [21]:

def custom_loss_function(predictions, labels, odds):
    probabilities = torch.softmax(predictions, dim=1)

    batch_size = labels.size(0)
    ev = probabilities * odds

    actual_ev = ev.gather(1, labels.unsqueeze(1)).squeeze()
    loss = -actual_ev.mean()

    return loss

def train_model(model, train_loader, optimizer, num_epochs):
    model.to(device)
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for inputs, labels, odds in train_loader:

            inputs, labels, odds = inputs.to(device), labels.to(device), odds.to(device)

            optimizer.zero_grad()

            outputs = model(inputs)
            loss = custom_loss_function(outputs, labels, odds)

            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        if (epoch + 1) % 10 == 0:
          print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {running_loss / len(train_loader):.4f}")

def evaluate_profitability(model, test_loader):
    model.eval()
    total_bets = 0
    total_profit = 0.0

    with torch.no_grad():
        for inputs, labels, odds in test_loader:
            inputs, odds = inputs.to(device), odds.to(device)
            outputs = model(inputs)

            probabilities = torch.softmax(outputs, dim=1)
            predicted_outcomes = probabilities.argmax(dim=1)
            for i in range(inputs.size(0)):
                pred = predicted_outcomes[i].item()
                if pred == labels[i].item():
                    profit = odds[i, pred] - 1
                else:
                    profit = -1
                total_profit += profit
                total_bets += 1
    roi = (total_profit / total_bets) if total_bets > 0 else 0
    print(f"Total Bets: {total_bets}, Total Profit: {total_profit:.2f}, ROI: {roi * 100:.2f}%")

In [22]:
optimizer = optim.Adam(model.parameters(), lr=0.0001)

## **4. Training our neural network**

In [23]:
num_epochs = 10000
train_model(model, train_loader, optimizer, num_epochs)
evaluate_profitability(model, test_loader)

Epoch 10/10000, Loss: -1.8725
Epoch 20/10000, Loss: -1.9157
Epoch 30/10000, Loss: -1.9300
Epoch 40/10000, Loss: -1.9415
Epoch 50/10000, Loss: -1.9462
Epoch 60/10000, Loss: -1.9495
Epoch 70/10000, Loss: -1.9512
Epoch 80/10000, Loss: -1.9524
Epoch 90/10000, Loss: -1.9545
Epoch 100/10000, Loss: -1.9554
Epoch 110/10000, Loss: -1.9568
Epoch 120/10000, Loss: -1.9573
Epoch 130/10000, Loss: -1.9582
Epoch 140/10000, Loss: -1.9592
Epoch 150/10000, Loss: -1.9603
Epoch 160/10000, Loss: -1.9608
Epoch 170/10000, Loss: -1.9611
Epoch 180/10000, Loss: -1.9618
Epoch 190/10000, Loss: -1.9618
Epoch 200/10000, Loss: -1.9626
Epoch 210/10000, Loss: -1.9629
Epoch 220/10000, Loss: -1.9628
Epoch 230/10000, Loss: -1.9628
Epoch 240/10000, Loss: -1.9635
Epoch 250/10000, Loss: -1.9643
Epoch 260/10000, Loss: -1.9646
Epoch 270/10000, Loss: -1.9645
Epoch 280/10000, Loss: -1.9649
Epoch 290/10000, Loss: -1.9654
Epoch 300/10000, Loss: -1.9655
Epoch 310/10000, Loss: -1.9655
Epoch 320/10000, Loss: -1.9664
Epoch 330/10000, 

We can clearly see that we had a *return on investment* by almost **100%** ! We almost **doubled** the amount we started with by betting on 2014 matches ! The total profit means that, if we bet 1\$ on every match, we'd end up with a profit of more than 2000\$.

Even though the data contained just *10,000 matches*, we have to take this as a **win**, because as we've seen in the previous step (EDA), it's not that easy to find a **profitable strategy**.

## **5. Saving the model**

In [25]:
torch.save({
    'epoch': 10000,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': -2.0102,
}, 'checkpoint.pth')


In [None]:
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
start_epoch = checkpoint['epoch']