# Betting Angles with Betfair stream data

## 0.1 Setup

[technical context]

In [9]:
import requests
import pandas as pd
from datetime import date, timedelta

## 0.2 Context

Formulating betting angles (or "strategies" as some call them) is quite a common pasttime for some. These angles can range all the way from very simple to quite sophisticated, and could include things like:

* Betting against an AFL team coming of the bye who are playing against a team who played last week
* Backing a greyhound in boxes 1 or 2 in short sprint style races
* Backing a horse pre-race who typically runs at the front of the field and placing an order to lay the same horse if it shortens to some lower price in-play, locking in a profit

Beyond the complexity of the actual concept what really seperates these angles is evidence and rigour. You might have heard many TV personalities or betting ads suggest a certain strategy (resembling one of the above) are real-world predictive trends but rarely are they derived from the right historical data or concluded with the necessary statistical rigour. Most simply formulated their angles off intuition or observing a trend across a relatively small sample of data.

There **are** many users on betting exchanges who profit off these angles. In fact, when most people talk about automated or sophisticated exchange betting they are often talking about automating these kind of betting angles, rather than sophisticated bottom-up fundemental modelling. That's because profitable fundemental modelling (where your model which arrives at some estimation of fair value from first principles) is very hard.

To profit off one of these strategies one must have thene lksdklfkc,.vcksklvmakldmfdkfvmk

Self serving the stream data is a great way to test your angles and refine and tweak them until they suggest a statistically significant long term profit.


## 0.3 Examples

I'll got through and end-to-end example implementing one of these betting angles on Australia Thoroughbred Racing. Which will include:

- Sourcing data
- Assembling data
- Formulating hypotheses
- Testing Hypotheses
- Simple implementation


# 1.0 Data





## 1.1 Betfair Odds Data

We'll follow a very similar template as other tutorials extracting key information from the betfair stream data.

It's important to note that your hypothese

## 1.2 Race Data

If you're building a fundemental bottom-up model finding a managing ETL from an approriate data source is a large part of the exercise. If your needs are simpler (for this type of automated strategy for example) there's plenty of good information that's available right inside the betfair API itself. 

The `RUNNER_METADATA` slot inside the `listMarketCatalogue` response for example will return a pretty good slice of metadata about the horses racing in upcoming race including but not limited to: the trainer, the jockey, the horses age, and a class rating. The [documentaion for this endpoint](https://docs.developer.betfair.com/display/1smk3cen4v3lu3yomq5qye0ni/listMarketCatalogue) will give you the full extent of this what's inside this response.

Our problem for this exercise is that the historical stream files don't include this `RUNNER_METADATA` so we weren't able to extract it in the previous step. However, a sneaky workaround is to use a different back-end endpoint that betfair don't mind you accessing that they use for the betfair racing results page.

These API endpoints are:

- Market result data:        [https://apigateway.betfair.com.au/hub/raceevent/1.154620281](https://apigateway.betfair.com.au/hub/raceevent/1.154620281)
- Day’s markets:             [https://apigateway.betfair.com.au/hub/racecard?date=2018-12-18](https://apigateway.betfair.com.au/hub/racecard?date=2018-12-18)

### Extract Betfair Racing Markets for a Given Date

First we'll hit the `https://apigateway.betfair.com.au/hub/racecard` enpoint to get the racing markets available on betfair for a given day in the past

In [5]:
def getBfMarkets(dte):

    url = 'https://apigateway.betfair.com.au/hub/racecard?date={}'.format(dte)

    responseJson = requests.get(url).json()

    marketList = []

    for meeting in responseJson['MEETINGS']:
        for markets in meeting['MARKETS']:
            marketList.append(
                {
                    'date': dte,
                    'track': meeting['VENUE_NAME'],
                    'country': meeting['COUNTRY'],
                    'race_type': meeting['RACE_TYPE'],
                    'race_number': markets['RACE_NO'],
                    'market_id': str('1.' + markets['MARKET_ID']),
                    'start_time': markets['START_TIME']
                }
            )
    
    marketDf = pd.DataFrame(marketList)

    return(marketDf)

### Extract Key Race Metadata

Then (for one of these `market_id`s) we'll hit the `https://apigateway.betfair.com.au/hub/raceevent/` enpoint to get some key runner metadata for the runners in this race. It's important to note that this information is available through the betfair API so we won't need to go to a secondary datasource to find it at the point of implementation, this would add a large layer of complexity to the project including things like string cleaning and matching.

In [6]:
def getBfRaceMeta(market_id):

    url = 'https://apigateway.betfair.com.au/hub/raceevent/{}'.format(market_id)

    responseJson = requests.get(url).json()

    if 'error' in responseJson:
        return(pd.DataFrame())

    raceList = []

    for runner in responseJson['runners']:

        if 'isScratched' in runner and runner['isScratched']:
            continue

        # Jockey not always populated
        try:
            jockey = runner['jockeyName']
        except:
            jockey = ""

        # Place not always populated
        try:
            placeResult = runner['placedResult']
        except:
            placeResult = ""

        # Place not always populated
        try:
            trainer = runner['trainerName']
        except:
            trainer = ""

        raceList.append(
            {
                'market_id': market_id,
                'weather': responseJson['weather'],
                'track_condition': responseJson['trackCondition'],
                'race_distance': responseJson['raceLength'],
                'selection_id': runner['selectionId'],
                'selection_name': runner['runnerName'],
                'barrier': runner['barrierNo'],
                'place': placeResult,
                'trainer': trainer,
                'jockey': jockey,
                'weight': runner['weight']
            }
        )

    raceDf = pd.DataFrame(raceList)

    return(raceDf)

### Wrapper Function

Stiching these two functions together we can create a wrapper function that hits both endpoints for all the thoroughbred races in a given day and extract all the runner metadata and results.

In [7]:
def scrapeThoroughbredBfDate(dte):

    markets = getBfMarkets(dte)

    if markets.shape[0] == 0:
        return(pd.DataFrame())

    thoMarkets = markets.query('country == "AUS" and race_type == "R"')

    if thoMarkets.shape[0] == 0:
        return(pd.DataFrame())

    raceMetaList = []

    for market in thoMarkets.market_id:
        raceMetaList.append(getBfRaceMeta(market))

    raceMeta = pd.concat(raceMetaList)

    return(markets.merge(raceMeta, on = 'market_id'))

In [8]:
# Executing the wrapper for an example date
scrapeThoroughbredBfDate(date(2021,2,10))

Unnamed: 0,date,track,country,race_type,race_number,market_id,start_time,weather,track_condition,race_distance,selection_id,selection_name,barrier,place,trainer,jockey,weight
0,2021-02-10,Ascot,AUS,R,1,1.179077389,2021-02-10 04:34:00,,,1000,38448397,Triple Missile,3,1,Todd Harvey,Paul Harvey,60.0
1,2021-02-10,Ascot,AUS,R,1,1.179077389,2021-02-10 04:34:00,,,1000,28763768,Shock Result,5,4,P H Jordan,Craig Staples,59.5
2,2021-02-10,Ascot,AUS,R,1,1.179077389,2021-02-10 04:34:00,,,1000,8772321,Secret Plan,6,3,G & A Williams,William Pike,59.0
3,2021-02-10,Ascot,AUS,R,1,1.179077389,2021-02-10 04:34:00,,,1000,9021011,Command Force,2,0,Daniel & Ben Pearce,J Azzopardi,58.0
4,2021-02-10,Ascot,AUS,R,1,1.179077389,2021-02-10 04:34:00,,,1000,38448398,Fish Hook,7,2,M P Allan,Madi Derrick,57.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
458,2021-02-10,Warwick Farm,AUS,R,7,1.179081635,2021-02-10 06:50:00,,,1200,133456,Sedition,12,2,Richard Litt,Ms Rachel King,58.0
459,2021-02-10,Warwick Farm,AUS,R,7,1.179081635,2021-02-10 06:50:00,,,1200,38447782,Amusez Moi,9,6,Richard Litt,Josh Parr,57.0
460,2021-02-10,Warwick Farm,AUS,R,7,1.179081635,2021-02-10 06:50:00,,,1200,25388274,Savoury,1,5,Bjorn Baker,Jason Collett,57.0
461,2021-02-10,Warwick Farm,AUS,R,7,1.179081635,2021-02-10 06:50:00,,,1200,38447783,Born A Warrior,7,3,Michael & Wayne & John Hawkes,Tommy Berry,56.5


Then to produce a historical slice of all races between two dates we could just loop over a set of dates and append each results set

In [10]:
# Description:
#   Will loop through a set of dates (starting July 2020 in this instance) and return race metadata from betfair 
# Estimated Time:
#   ~60 mins
# 
# dataList = []
# dateList = pd.date_range(date(2020,7,1),date.today()-timedelta(days=1),freq='d')
# for dte in dateList:
#     dte = dte.date()
#     print(dte)
#     races = scrapeThoroughbredBfDate(dte)
#     dataList.append(races)
# data = pd.concat(dataList)
# data.to_csv("[LOCAL PATH SOMEWHERE]", index=False)

# 2.0 Analysis