# Simple Football Arbitrage Bot

## Brief rundown on how this works:

Different sports bookmakers have different odds on the outcomes of a match (Team A wins, Draw, Team B wins). 

The odds of these match outcomes add up to exactly 100%, but a bookmaker will shift the odds such that the odds of all match otucomes is above 100%, the difference is the bookmaker's edge (i.e. profit). 

For example: PaddyPower rates the odds of Barnsley vs Port Vale (on 2023-07-23) to be 1.62:5:3.5 (win1:win2:draw). Since probabilities are the inverse of the odds we have 0.617:0.20:0.285, this adds up to 1.102, so PaddyPower's edge is 10.2%.

For us to make profit, we have to find odds from different bookmakers that total under 100%. But in reality, since all three outcomes total a 100% probability, we will make money off this difference.

I will be using the-odds-api.com's free API to collect odds from different bookmakers, and then do the rest on here.

In [2]:
import requests
import pandas as pd
import numpy as np
import collections
from pandas import json_normalize

In [3]:
with open('API_KEY.txt') as f:
    API_KEY = f.read()
print(f"API key is {API_KEY}")

API key is f6198e12290f5456ce6161b2a5d38720


The next block is straight from the-odds-api.com's sample code. No need to make it myself.

In [4]:
SPORT = 'soccer_england_league1' # use the sport_key from the /sports endpoint below, or use 'upcoming' to see the next 8 games across all sports

REGIONS = 'uk' # uk | us | eu | au. Multiple can be specified if comma delimited

MARKETS = 'h2h' # h2h | spreads | totals. Multiple can be specified if comma delimited

ODDS_FORMAT = 'decimal' # decimal | american

DATE_FORMAT = 'iso' # iso | unix


# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
#
# Now get a list of live & upcoming games for the sport you want, along with odds for different bookmakers
# This will deduct from the usage quota
# The usage quota cost = [number of markets specified] x [number of regions specified]
# For examples of usage quota costs, see https://the-odds-api.com/liveapi/guides/v4/#usage-quota-costs
#
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 

odds_response = requests.get(
    f'https://api.the-odds-api.com/v4/sports/{SPORT}/odds',
    params={
        'api_key': API_KEY,
        'regions': REGIONS,
        'markets': MARKETS,
        'oddsFormat': ODDS_FORMAT,
        'dateFormat': DATE_FORMAT,
    }
)

if odds_response.status_code != 200:
    print(f'Failed to get odds: status_code {odds_response.status_code}, response body {odds_response.text}')

else:
    odds_json = odds_response.json()
    print('Number of events:', len(odds_json))
    print(odds_json)

    # Check the usage quota
    print('Remaining requests', odds_response.headers['x-requests-remaining'])
    print('Used requests', odds_response.headers['x-requests-used'])

Number of events: 12
[{'id': '8372d6cc49af461e9a719f8b63f74055', 'sport_key': 'soccer_england_league1', 'sport_title': 'League 1', 'commence_time': '2023-08-05T14:00:00Z', 'home_team': 'Barnsley', 'away_team': 'Port Vale', 'bookmakers': [{'key': 'boylesports', 'title': 'BoyleSports', 'last_update': '2023-07-24T16:39:04Z', 'markets': [{'key': 'h2h', 'last_update': '2023-07-24T16:39:04Z', 'outcomes': [{'name': 'Barnsley', 'price': 1.7}, {'name': 'Port Vale', 'price': 4.75}, {'name': 'Draw', 'price': 3.3}]}]}, {'key': 'paddypower', 'title': 'Paddy Power', 'last_update': '2023-07-24T16:39:01Z', 'markets': [{'key': 'h2h', 'last_update': '2023-07-24T16:39:01Z', 'outcomes': [{'name': 'Barnsley', 'price': 1.62}, {'name': 'Port Vale', 'price': 5.0}, {'name': 'Draw', 'price': 3.5}]}]}, {'key': 'skybet', 'title': 'Sky Bet', 'last_update': '2023-07-24T16:40:42Z', 'markets': [{'key': 'h2h', 'last_update': '2023-07-24T16:40:42Z', 'outcomes': [{'name': 'Barnsley', 'price': 1.7}, {'name': 'Port Vale',

In [31]:
print(type(odds_json))

<class 'list'>


Now we'll make odds_json more readable using pandas.

First we have to make a function that reads odds_json into something that pandas can handle.

In [32]:
def extract_info(item):
    """
    Extracts match and bookmaker information from a given dictionary representing a single match.
    """
    extracted_data = []
    for bookmaker in item['bookmakers']: # We iterate over bookmakers and markets because they are nested in the json
        for market in bookmaker['markets']:
            outcomes = {outcome['name']: outcome['price'] for outcome in market['outcomes']}
            extracted_data.append({
                'Team 1': item['home_team'],
                'Team 2': item['away_team'],
                'Date': item['commence_time'],
                'Bookmaker': bookmaker['title'],
                'Team 1 win odds': outcomes.get(item['home_team'], None),
                'Team 2 win odds': outcomes.get(item['away_team'], None),
                'Draw odds': outcomes.get('Draw', None),
            })
    return extracted_data


In [33]:
data = odds_json

match_tables = {} # Dictionary that stores the DataFrames for every match

for item in data:
    match_name = f"{item['home_team']} vs {item['away_team']}"
    match_data = extract_info(item) 
    if match_name not in match_tables:
        match_tables[match_name] = pd.DataFrame(match_data)
    else:
        match_tables[match_name] = pd.concat([match_tables[match_name], pd.DataFrame(match_data)])
        
# if multiple occurrences of the same match are encountered, their data is concatenated

match_tables.keys()

# Display each match's DataFrame
#for match_name, match_df in match_tables.items():
#    print(f"Table for {match_name}:")
#    display(match_df)
#    print("\n")

dict_keys(['Barnsley vs Port Vale', 'Blackpool vs Burton Albion', 'Bolton vs Lincoln City', 'Portsmouth vs Bristol Rovers', 'Cambridge United vs Oxford United', 'Carlisle United vs Fleetwood Town', 'Charlton Athletic vs Leyton Orient', 'Shrewsbury Town vs Cheltenham', 'Derby County vs Wigan Athletic', 'Wycombe Wanderers vs Exeter City', 'Northampton Town vs Stevenage', 'Reading vs Peterborough United'])

Let's analyse the first game

In [34]:
df = match_tables['Barnsley vs Port Vale']

df['Team 1 win prob'] = 1 / df['Team 1 win odds']
df['Team 2 win prob'] = 1 / df['Team 2 win odds']
df['Draw prob'] = 1 / df['Draw odds']

display(df)

Unnamed: 0,Team 1,Team 2,Date,Bookmaker,Team 1 win odds,Team 2 win odds,Draw odds,Team 1 win prob,Team 2 win prob,Draw prob
0,Barnsley,Port Vale,2023-08-05T14:00:00Z,BoyleSports,1.7,4.75,3.3,0.588235,0.210526,0.30303
1,Barnsley,Port Vale,2023-08-05T14:00:00Z,Paddy Power,1.62,5.0,3.5,0.617284,0.2,0.285714
2,Barnsley,Port Vale,2023-08-05T14:00:00Z,Sky Bet,1.7,4.5,3.5,0.588235,0.222222,0.285714
3,Barnsley,Port Vale,2023-08-05T14:00:00Z,Bet Victor,1.7,4.33,3.5,0.588235,0.230947,0.285714
4,Barnsley,Port Vale,2023-08-05T14:00:00Z,Coral,1.8,4.6,3.4,0.555556,0.217391,0.294118
5,Barnsley,Port Vale,2023-08-05T14:00:00Z,Ladbrokes,1.8,4.6,3.4,0.555556,0.217391,0.294118
6,Barnsley,Port Vale,2023-08-05T14:00:00Z,888sport,1.73,4.5,3.4,0.578035,0.222222,0.294118
7,Barnsley,Port Vale,2023-08-05T14:00:00Z,Betway,1.7,4.2,3.4,0.588235,0.238095,0.294118
8,Barnsley,Port Vale,2023-08-05T14:00:00Z,Betfair,1.68,4.7,3.45,0.595238,0.212766,0.289855
9,Barnsley,Port Vale,2023-08-05T14:00:00Z,Betfair,1.82,6.2,4.3,0.549451,0.16129,0.232558


In [37]:
team_1_win_list = list(df['Team 1 win odds'])
team_2_win_list = list(df['Team 2 win odds'])
draw_list = list(df['Draw odds'])

team_1_win_dict = {df['Bookmaker'].iloc[i] : team_1_win_list[i] for i in range(len(team_1_win_list))}
team_2_win_dict = {df['Bookmaker'].iloc[i] : team_2_win_list[i] for i in range(len(team_2_win_list))}
draw_dict = {df['Bookmaker'].iloc[i] : draw_list[i] for i in range(len(draw_list))}

#for i in range(len(team_1_win_list)):
#    team_1_win_dict[df['Bookmaker'].iloc[i]] = team_1_win_list[i]
#    team_2_win_dict[df['Bookmaker'].iloc[i]] = team_2_win_list[i]
#    draw_dict[df['Bookmaker'].iloc[i]] = draw_list[i]

unique_triplets = {} #collections.defaultdict(list) # {('BoyleSports', 'Paddy Power', 'Sky Bet') : (1.70, 4.75, 3.75)}
    
for key1, val1 in team_1_win_dict.items():
    for key2, val2 in team_2_win_dict.items():
        for key3, val3 in draw_dict.items():

            if key1 != key2 and key1 != key3 and key2 != key3:
                string = key1 + " " + key2 + " " + key3
                unique_triplets[string] = [val1, val2, val3]
            
arbitrage_opportunities = {}

for key, val in unique_triplets.items():
    team1win, team2win, draw = val
    if 1/ team1win + 1 / team2win + 1 / draw < 1:
        arbitrage_opportunities[key] = val

In [38]:
arbitrage_df = pd.DataFrame.from_dict(arbitrage_opportunities).T

arbitrage_df['Profit (%)'] = (1 - (1 / arbitrage_df[0] + 1 / arbitrage_df[1] + 1 / arbitrage_df[2])) * 100

arbitrage_df.rename(columns={0: "Team 1 win odds", 1: "Team 2 win odds", 2 : "Draw odds"}, inplace = True)

arbitrage_df.sort_values(by = 'Profit (%)', ascending = False, inplace = True)

display(arbitrage_df)

Unnamed: 0,Team 1 win odds,Team 2 win odds,Draw odds,Profit (%)
Coral Betfair William Hill,1.8,6.2,3.7,1.288385
Ladbrokes Betfair William Hill,1.8,6.2,3.7,1.288385
Coral Paddy Power Betfair,1.8,5.0,4.3,1.18863
Ladbrokes Paddy Power Betfair,1.8,5.0,4.3,1.18863
Coral BoyleSports Betfair,1.8,4.75,4.3,0.135999
Ladbrokes BoyleSports Betfair,1.8,4.75,4.3,0.135999
