## Initial Tipping AFL

This notebook provides an example of automated betting execution using the mean score of the predictive models published at squiggle. This notebook uses both python and r to interface with the betfair API, the fitzroy library (R) and perform some basic analysis with pandas.

The conda environment file can be found in the directory for this project.

Next steps to build on this approach are to:
- backtest the approach using 2017/2018 data
- consider weighting the predictive number using the algo performances

The next model to build will be a score model prediction (ML) followed by statistical simulation. This will provide higher granuality on what markets are attractive interms of betting lines and total scores.

The questions remains wether the match odds do deviate enough from true likelihood to return alpha. We may also want to explore develop an informational edge, maybe with natural language processing.

In [1]:
# requires R kernel
# requires fitzRoy package
# devtools::install_github("jimmyday12/fitzRoy")

In [2]:
# Import libraries
import betfairlightweight
from betfairlightweight import filters
import pandas as pd
import numpy as np
import os
import datetime
import json

import rpy2.rinterface
import pandas
from utils import process_runner_books, get_markets, get_sport_id, name_convert

%load_ext rpy2.ipython

We will set out set values here. The edge value is the percentage difference between the probability suggested by the squiggle models against the implied liklihood from the market price.

In [11]:
BET_SIZE = 5
EDGE = .01
DAYS = 5

We will use the betfairlightweight library to interact with the Betfair exchange. 

In [4]:
with open('secrets/config.json', 'r') as fp:
    config = json.load(fp)

trading = betfairlightweight.APIClient(username=config['my_username'],
                                       password=config['my_password'],
                                       app_key=config['my_app_key'],
                                       certs=config['certs_path'])

trading.login();

In [5]:
afl_id = get_sport_id(trading, 'Australian Rules')

Below we collect the data for the game identifiers (market id), the identfiers for our potential bets (selection id) and the prices and liquidity. We compute the implied odds. We are looking to exploit incorrectly priced bets so this implied probability is what we will be basing our analysis on.

We produce this for only the homes teams because the implication is the away team probability is 1 - p(hteam).

In [6]:
# Define a market filter
afl_event_filter = betfairlightweight.filters.market_filter(
    event_type_ids=[afl_id],
    market_countries=['AU'],
    market_start_time={
        'to': (datetime.datetime.utcnow() + datetime.timedelta(days=DAYS)).strftime("%Y-%m-%dT%TZ")
    }
)

# Get a list of all thoroughbred events as objects
afl_events = trading.betting.list_events(
    filter=afl_event_filter
)

# Create a DataFrame with all the events by iterating over each event object
afl_events_five_days = pd.DataFrame({
    'Event Name': [event_object.event.name for event_object in afl_events],
    'Event ID': [event_object.event.id for event_object in afl_events],
    'Event Venue': [event_object.event.venue for event_object in afl_events],
    'Country Code': [event_object.event.country_code for event_object in afl_events],
    'Time Zone': [event_object.event.time_zone for event_object in afl_events],
    'Open Date': [event_object.event.open_date for event_object in afl_events],
    'Market Count': [event_object.market_count for event_object in afl_events]
})

total_event_dfs = [] 
for event in afl_events_five_days['Event ID']:
    market_types_df = get_markets(event, trading)
    total_event_dfs.append(market_types_df[market_types_df['Market Name']=='Match Odds'])

total_event_df = pd.concat(total_event_dfs)

total_event_df = total_event_df.merge(
    afl_events_five_days[['Event Name', 'Event ID']],
    how='left', 
    on='Event ID')

market_ids = total_event_df['Market ID']

# Create a price filter. Get all traded and offer data
price_filter = betfairlightweight.filters.price_projection(
    price_data=['EX_BEST_OFFERS']
)

# Request market books
market_books = trading.betting.list_market_book(
    market_ids=market_ids,
    price_projection=price_filter
)

# Grab the first market book from the returned list as we only requested one market 
dfs = []
for market_book in market_books:
    runners_df = process_runner_books(market_book.runners)
    runners_df['Market ID'] = market_book.market_id
    dfs.append(runners_df)

final_df = pd.concat(dfs)

final_df =  final_df.merge(total_event_df, how='left', on='Market ID')
hteams_df = final_df.loc[range(0, final_df.shape[0], 2)]
hteams_df['implied odds'] = 1 / hteams_df['Best Back Price']

Let's write some R! Luckily one line will put a pandas dataframe into our python variables we can use to make the decisions.

In [7]:
%%R -o tips
tips <- fitzRoy::get_squiggle_data("tips", round = 8, year = 2019)


  res = PandasDataFrame.from_items(items)


Squiggle.com.au is a site that allows analysts to submit AFL prediction models for competitive tipping. In building this approach we are assuming that the squiggle predictions will out perform the market possibly a fatal flaw.

What we are doing here is a very simple ensemble model of the squiggle predictions. In the future  we will look to  do something more sophisticated but for now we will just look to get our 3c into the market.

Using the squiggle data we compute a simple mean value and calculate the difference between the squiggle mean and the implied probabilities at the market.

In [8]:
home_tips_df = tips[['ateam','hteam','hconfidence']].groupby(by=['ateam','hteam'], as_index=False).mean()
home_tips_df.hconfidence = home_tips_df.hconfidence / 100
home_tips_df.ateam = home_tips_df.ateam.apply(name_convert)
home_tips_df.hteam = home_tips_df.hteam.apply(name_convert)
home_tips_df['Event Name'] = home_tips_df.hteam + ' v ' + home_tips_df.ateam

home_tips_df = home_tips_df.merge(hteams_df[['implied odds', 'Event Name', 'Market ID']], how='outer', on='Event Name')
home_tips_df['diff'] = home_tips_df.hconfidence - home_tips_df['implied odds']
home_tips_df.drop(['ateam', 'hteam'], axis=1, inplace=True)

home_tips_df

Unnamed: 0,hconfidence,Event Name,implied odds,Market ID,diff
0,0.475163,Port Adelaide v Adelaide,0.416667,1.157637279,0.058496
1,0.549513,Western Bulldogs v Brisbane,0.574713,1.157635824,-0.0252
2,0.175894,Carlton v Collingwood,0.125,1.157636081,0.050894
3,0.385812,Sydney v Essendon,,,
4,0.268581,North Melbourne v Geelong,0.25641,1.157637604,0.012171
5,0.446731,Hawthorn v GWS,0.4,1.157638008,0.046731
6,0.344037,Gold Coast v Melbourne,0.307692,1.157636297,0.036345
7,0.524138,Fremantle v Richmond,0.60241,1.157638224,-0.078272
8,0.486819,St Kilda v West Coast,0.454545,1.157636607,0.032273


The null values are due to the Collingwood v Port match already being completed but is still present on squiggle. Another note is the 25% and 30% pricing differences, those are matches in play and should not be bet on. We aren't handling that right now outside of the simple try/except on execution.

A negative value implies we are mispriced on the away team.

Below we filter the odds that are above the level of edge desired we set above. At the moment is a pretty aggressive 1%.

In [10]:
bets = home_tips_df[home_tips_df['diff'].abs() > EDGE].copy()
bets['Away Team'] = bets['diff'].apply(lambda x: 0 if x < 0 else 1)

bets['Market ID']=bets['Market ID'].astype(str)
bets

Unnamed: 0,hconfidence,Event Name,implied odds,Market ID,diff,Away Team
0,0.475163,Port Adelaide v Adelaide,0.416667,1.157637279,0.058496,1
1,0.549513,Western Bulldogs v Brisbane,0.574713,1.157635824,-0.0252,0
2,0.175894,Carlton v Collingwood,0.125,1.157636081,0.050894,1
4,0.268581,North Melbourne v Geelong,0.25641,1.157637604,0.012171,1
5,0.446731,Hawthorn v GWS,0.4,1.157638008,0.046731,1
6,0.344037,Gold Coast v Melbourne,0.307692,1.157636297,0.036345,1
7,0.524138,Fremantle v Richmond,0.60241,1.157638224,-0.078272,0
8,0.486819,St Kilda v West Coast,0.454545,1.157636607,0.032273,1


In [None]:
assert False: 'So I dont run the whole nb being lazy'

Now we will range these up and execute the trade through the Betfair API. 

In [12]:
final_bets = []
for row in bets[['Market ID', 'Away Team']].iterrows():
    market = row[1][0]
    idx = row[1][1]
    selection_id = final_df[final_df['Market ID']==market]['Selection ID'].reset_index().loc[idx]
    best_back = final_df[final_df['Market ID']==market]['Best Back Price'].reset_index().loc[idx]
    final_bets.append((market, selection_id[1], best_back[1]))

final_bets

[('1.157637279', 39982, 1.7),
 ('1.157635824', 39986, 1.74),
 ('1.157636081', 217709, 1.13),
 ('1.157637604', 39988, 1.33),
 ('1.157638008', 5304641, 1.66),
 ('1.157636297', 298609, 1.44),
 ('1.157638224', 39992, 1.66),
 ('1.157636607', 39991, 1.81)]

In [13]:
orders = []
for market_id, selection_id, price in final_bets:
    try:
        # Define a limit order filter
        limit_order_filter = betfairlightweight.filters.limit_order(
            size=BET_SIZE, 
            price=price,
            persistence_type='LAPSE'
        )

        # Define an instructions filter
        instructions_filter = betfairlightweight.filters.place_instruction(
            selection_id=str(selection_id),
            order_type="LIMIT",
            side="BACK",
            limit_order=limit_order_filter
        )

        # Place the order
        order = trading.betting.place_orders(
            market_id=str(market_id), 
            customer_strategy_ref='simple_squiggle',
            instructions=[instructions_filter]
        )
    except Exception as e:
        order = str(e)
    
    orders.append(order)

Below we can review what orders we currently have at the market.

In [14]:
# uncomment to look at current orders
current_orders = trading.betting.list_current_orders(customer_strategy_refs=['simple_squiggle'])
pd.DataFrame(current_orders.__dict__['_data']['currentOrders']).head()

Unnamed: 0,averagePriceMatched,betId,bspLiability,customerStrategyRef,handicap,marketId,matchedDate,orderType,persistenceType,placedDate,priceSize,regulatorCode,selectionId,side,sizeCancelled,sizeLapsed,sizeMatched,sizeRemaining,sizeVoided,status
0,2.4,164051644520,0.0,simple_squiggle,0.0,1.157637279,2019-05-10T08:51:25.000Z,LIMIT,LAPSE,2019-05-10T08:51:25.000Z,"{'price': 2.4, 'size': 0.1}",MALTA LOTTERIES AND GAMBLING AUTHORITY,217710,BACK,0.0,0.0,0.1,0.0,0.0,EXECUTION_COMPLETE
1,1.7,164140756035,0.0,simple_squiggle,0.0,1.157637279,2019-05-11T01:04:55.000Z,LIMIT,LAPSE,2019-05-11T01:04:55.000Z,"{'price': 1.7, 'size': 5.0}",MALTA LOTTERIES AND GAMBLING AUTHORITY,39982,BACK,0.0,0.0,5.0,0.0,0.0,EXECUTION_COMPLETE
2,0.0,164140757140,0.0,simple_squiggle,0.0,1.157635824,,LIMIT,LAPSE,2019-05-11T01:04:56.000Z,"{'price': 1.74, 'size': 5.0}",MALTA LOTTERIES AND GAMBLING AUTHORITY,39986,BACK,0.0,0.0,0.0,5.0,0.0,EXECUTABLE
3,1.13,164140758494,0.0,simple_squiggle,0.0,1.157636081,2019-05-11T01:04:58.000Z,LIMIT,LAPSE,2019-05-11T01:04:58.000Z,"{'price': 1.13, 'size': 5.0}",MALTA LOTTERIES AND GAMBLING AUTHORITY,217709,BACK,0.0,0.0,5.0,0.0,0.0,EXECUTION_COMPLETE
4,1.33,164140759776,0.0,simple_squiggle,0.0,1.157637604,2019-05-11T01:05:00.000Z,LIMIT,LAPSE,2019-05-11T01:05:00.000Z,"{'price': 1.33, 'size': 5.0}",MALTA LOTTERIES AND GAMBLING AUTHORITY,39988,BACK,0.0,0.0,5.0,0.0,0.0,EXECUTION_COMPLETE


In [17]:
[order.status for order in orders]

['SUCCESS',
 'SUCCESS',
 'SUCCESS',
 'SUCCESS',
 'SUCCESS',
 'SUCCESS',
 'SUCCESS',
 'SUCCESS']

Done! Fully automated bets have been made in a sort of educated way. Now we have a strategy we really need to backtest it. Luckily we have the data to do that! Other automated logic we can put in would be dynamic betsizing according to the level of risk and potentiall mispricing error so we will look at optimisation in the backtesting.

Following optimisation and backtesting of this model we will move on to building a two step model that will:
1. Predict the teams score given the opposition for the upcoming match
1. Perform statistical simulation of the teams scoring to be able to effectively infer the efficency of the current market pricing acorss a number of markets

The question remains as to wether we will make money with this simple squiggle strategy - highly unlikely, I will keeping my bet size to 3c for now. Just from deving this up the model seems to favour the underdog, in particular those at long odds. We will explore these potential shortfalls in more depth next week.

### References
- Betfair data scientist API example: https://github.com/betfair-datascientists/API
- FitzRoy AFL data library in R: https://jimmyday12.github.io/fitzRoy/articles/mens-stats.html
- Squiggle, AFL betting and analysis: https://squiggle.com.au/