# AFL Model - Part 2 - Feature Creation

These tutorials will walk you through how to construct your own basic AFL model. The output will be odds for each team to win, which will be shown on [The Hub](https://www.betfair.com.au/hub/tools/models/afl-prediction-model/).

In this notebook we will walk you through creating features from our dataset, which was cleaned in the first tutorial. Feature engineering is an integral part of the Data Science process. Creative and smart features can be the difference between an average performing model and a model profitable which beats the market odds.


## Grabbing Our Dataset
First, we will import our required modules, as well as the prepare_afl_data function which we created in our afl_data_cleaning script. This essentially cleans all the data for us so that we're ready to explore the data and make some features.

In [95]:
# Import modules
from afl_data_cleaning import prepare_afl_data
import pandas as pd
pd.set_option('display.max_columns', None)
from afl_data_cleaning import prepare_match_results
import warnings
warnings.filterwarnings('ignore')
import numpy as np

In [96]:
# Use the prepare_afl_data function to prepare the data for us; this function condenses what we walked through in the previous tutorial
afl_data = prepare_afl_data()

In [99]:
# Look at the last 4 rows - this should be the last 2 games played
afl_data[afl_data.duplicated()]

Unnamed: 0,Team,odds,Date,home_team,away_team,Behinds,Game,Goals,Home?,Margin,Opposition,Opposition Behinds,Opposition Goals,Opposition Points,Points,Round,Venue,Season,Status,GA,CP,UP,ED,CM,MI5,One.Percenters,BO,K,HB,D,M,G,B,T,HO,I50,CL,CG,R50,FF,FA,AF,SC,CCL,SCL,SI,MG,TO,ITC,T5
2719,Carlton,5.6983,2018-03-22,Richmond,Carlton,5,15201,15,0,-26,Richmond,19,17,121,95,1,M.C.G.,2018,Away,6,152,221,268,12,8,50,9,207,169,376,88,15,3,49,42,47,42,57,47,20,22,1508,1621,14.0,28.0,79.0,5874.0,74.0,61.0,6.0
2721,Richmond,1.2112,2018-03-22,Richmond,Carlton,19,15201,17,1,26,Carlton,5,15,95,121,1,M.C.G.,2018,Home,14,153,188,247,13,19,40,14,207,133,340,68,17,15,70,34,71,41,56,29,22,20,1484,1681,18.0,23.0,135.0,6328.0,61.0,74.0,11.0
2723,Adelaide,2.1393,2018-03-23,Essendon,Adelaide,15,15202,12,0,-12,Essendon,15,14,99,87,1,Docklands,2018,Away,7,142,242,296,11,10,60,10,204,183,387,83,12,14,50,37,53,40,48,43,12,19,1505,1553,17.0,23.0,104.0,6086.0,80.0,77.0,8.0
2725,Essendon,1.8628,2018-03-23,Essendon,Adelaide,15,15202,14,1,12,Adelaide,15,12,87,99,1,Docklands,2018,Home,10,156,246,294,16,19,52,14,214,177,391,92,14,14,69,39,60,32,48,39,19,12,1668,1748,13.0,19.0,111.0,6299.0,77.0,81.0,11.0
2727,Brisbane,3.7985,2018-03-24,St Kilda,Brisbane,10,15203,12,0,-25,St Kilda,11,16,107,82,1,Docklands,2018,Away,8,139,232,265,9,9,45,3,215,160,375,87,12,9,44,44,49,44,65,37,25,21,1489,1497,17.0,27.0,89.0,5054.0,68.0,61.0,10.0
2729,St Kilda,1.3583,2018-03-24,St Kilda,Brisbane,11,15203,16,1,25,Brisbane,10,12,82,107,1,Docklands,2018,Home,14,133,300,325,5,13,54,11,242,180,422,135,16,8,48,25,59,27,57,35,21,25,1758,1803,12.0,15.0,116.0,5829.0,61.0,70.0,5.0
2731,Fremantle,5.4459,2018-03-24,Port Adelaide,Fremantle,6,15204,9,0,-50,Port Adelaide,14,16,110,60,1,Adelaide Oval,2018,Away,7,132,228,269,10,3,35,6,198,160,358,95,9,5,44,57,49,35,52,43,17,23,1439,1414,12.0,23.0,58.0,5719.0,64.0,59.0,6.0
2733,Port Adelaide,1.2117,2018-03-24,Port Adelaide,Fremantle,14,15204,16,1,50,Fremantle,6,9,60,110,1,Adelaide Oval,2018,Home,12,146,294,333,5,18,70,10,260,179,439,112,16,13,63,26,62,45,51,38,23,17,1833,1886,14.0,31.0,150.0,6586.0,58.0,65.0,7.0
2735,Collingwood,2.2857,2018-03-24,Hawthorn,Collingwood,13,15205,9,0,-34,Hawthorn,11,15,101,67,1,M.C.G.,2018,Away,5,165,200,254,10,6,53,4,205,158,363,70,9,8,71,46,58,40,64,37,30,20,1503,1533,15.0,25.0,68.0,5398.0,88.0,90.0,16.0
2737,Hawthorn,1.7781,2018-03-24,Hawthorn,Collingwood,11,15205,15,1,34,Collingwood,13,9,67,101,1,M.C.G.,2018,Home,8,166,233,290,6,11,84,8,225,177,402,86,15,8,64,29,60,33,75,41,20,30,1600,1770,12.0,21.0,103.0,5695.0,88.0,89.0,7.0


## What Each Column Refers To
Below is a DataFrame which outlines what each column refers to.

In [75]:
column_abbreviations = pd.read_csv("data/afl_data_columns_mapping.csv")
column_abbreviations

Unnamed: 0,Feature Abbreviated,Feature
0,GA,Goal Assists
1,CP,Contested Possessions
2,UP,Uncontested Possessions
3,ED,Effective Disposals
4,CM,Contested Marks
5,MI5,Marks Inside 50
6,One.Percenters,One Percenters
7,BO,Bounces
8,K,Kicks
9,HB,Handballs


## Feature Creation
Now let's think about what features we can create. We have a enormous amount of stats to sift through. To start, let's create some simple features based on our domain knowledge of Aussie Rules.


### Creating Efficiency Features
#### Disposal Efficiency
Disposal efficiency is pivotal in Aussie Rules football. If you are dispose of the ball effectively you are much more likely to score and much less likely to concede goals than if you dispose of it ineffectively.

Let's create a disposal efficiency feature by dividing Effective Disposals by Disposals.

#### Inside 50/Rebound 50 Efficiency
Similarly, one could hypothesise that teams who keep the footy in their Inside 50 regularly will be more likely to score, whilst teams who are effective at getting the ball out of their Defensive 50 will be less likely to concede. Let's use this logic to create Inside 50 Efficiency and Rebound 50 Efficiency features.

The formula used will be:
```
Inside 50 Efficiency = R50_Opponents / I50 (lower is better).
Rebound 50 Efficiency = R50 / I50_Opponents (higher is better).
```

To create these features we will need the opposition's Inside 50s/Rebound 50s. So we will split out data into two DataFrames, create a new DataFrame by joining these two DataFrames on the Game, calculate our efficiency features, then join our features with our main afl_data DataFrame.

In [76]:
# Create Disposal Efficiency feature - the proportion of disposals which are 'effective disposals'
afl_data['disposal_efficiency'] = afl_data['ED'] / afl_data['D']

# Create Rebound 50 Efficiency feature - the proportion of Opposition Inside 50s which are rebounded

# First we will define a function which creates a new DataFrame with Opposition Statistics on the same row as the Team Statistics 
def get_opp_stats_df(df):
    # Filter the DataFrames by whether it was a home or away game
    home_df = df[df['Status'] == 'Home']
    away_df = df[df['Status'] == 'Away']

    # Rename the away columns so we know that they are from the away team
    home_df_renamed = home_df.rename(columns={col: col + '_Opp' for col in home_df.columns if col != 'Game'})
    away_df_renamed = away_df.rename(columns={col: col + '_Opp' for col in home_df.columns if col != 'Game'})

    # Merge the two DataFrames on the Game
    merged_1 = pd.merge(home_df, away_df_renamed, on=['Game'])
    merged_2 = pd.merge(away_df, home_df_renamed, on=['Game'])

    # Append the DataFrames together and then sort by the Game Id, reset the index and drop unrequired columns
    merged = merged_1.append(merged_2).sort_values(by='Game').reset_index(drop=True)
    return merged

opponent_stats_df = get_opp_stats_df(afl_data)

# Create Rebound 50 Efficiency - the proportion of Rebound 50s from opposition Inside 50s
opponent_stats_df['R50_efficiency'] = opponent_stats_df['R50'] / opponent_stats_df['I50_Opp']

# Create Inside 50 Efficiency - the proportion of opposition Rebound 50s from Inside 50s
opponent_stats_df['I50_efficiency'] = opponent_stats_df['R50_Opp'] / opponent_stats_df['I50']

# Merge features to main afl_data DataFrame
afl_data = pd.merge(afl_data, opponent_stats_df[['Team', 'Game', 'R50_efficiency', 'I50_efficiency']], on=['Team', 'Game'])

### Creating Rolling Averages as Features
Next, we will create rolling averages of statistics such as Tackles, which we will use as features.

It is fair to assume that a team's performance in a certain stat may have predictive power to the overall result. And in general, if a team consistently performs well in this stat, this may have predictive power to the result of their future games. We can't simply train a model on stats from the game which we are trying to predict (i.e. data that we don't have before the game begins), as this will leak the result. We need to train our model on past data. One way of doing this is to train our model on average stats over a certain amount of games. If a team is averaging high in this stat, this may give insight into if they are a strong team. Similarly, if the team is averaging poorly in this stat (relative to the team they are playing), this may have predictive power and give rise to a predicted loss.

To do this we will create a function which calculates the rolling averages, known as create_rolling_averages, which takes our cleaned DataFrame as an input, as well as the window to calculate the average over, and the columns to create the average for and output a new DataFrame with the rolling averages instead of the statistics.

In [77]:
# Define a function which returns a DataFrame with the rolling averages for each game. Cols refers to the columns which we want
# to create a rolling average for
def create_rolling_averages(df, window, cols):
    new_cols = [col + '_ave_{}'.format(window) for col in cols]
    df[new_cols] = df.groupby('Team')[cols].apply(lambda x: x.rolling(window).mean().shift())
    df = df.drop(columns=cols)
    return df

In [78]:
# Get index of GA - as this is the first column where we will be taking the average
cols_indx_start = afl_data.columns.get_loc("GA")

afl_avgs = create_rolling_averages(afl_data, 6, afl_data.columns[cols_indx_start:])

As you can see our function worked perfectly! Note that we are creating a rolling average and then shifting this average down a row to ensure that we use previous data for the average so we don't leak the result. As a result, the first window rows have NaN values as these rows are used to calculate the rolling average. For example, if you are calculating a rolling average with a window of 2, with the values \[3, 5, 8\], then the associated rolling average would be \[NaN, 4, 6.5\]. When we shift this down (to ensure no data leakage), we get \[NaN, NaN, 4\].

### Creating an Elo Feature
Another feature which we could create is an Elo feature. If you don't know what Elo is, go ahead and read our article on it [here](https://www.betfair.com.au/hub/better-betting/betting-strategies/tennis/tennis-elo-modelling/). We have also written a guide on using elo to model the 2018 FIFA World Cup [here](https://www.betfair.com.au/hub/how-to-use-elo-to-model-the-world-cup/).

Essentially, Elo ratings increase if you win. The amount the rating increases is based on how strong the opponent is relative to the team who won. Weak teams get more points for beating stronger teams than they do for beating weaker teams, and vice versa for losses (teams lose points for losses).

Mathematically, Elo ratings can also assign a probability for winning or losing based on the two Elo Ratings of the teams playing.

So let's get into it. We will first define a function which calculates the elo for each team and applies these elos to our DataFrame.

In [79]:
# Define a function which finds the elo for each team in each game and returns a dictionary with the game ID as a key and the
# elos as the key's value, in a list. It also outputs the probabilities and a dictionary of the final elos for each team
def elo_applier(df, k_factor):
    # Initialise a dictionary with default elos for each team
    elo_dict = {team: 1500 for team in df['Team'].unique()}
    elos, elo_probs = {}, {}
    
    # Sort by Game and then only grab the Home Games so that the same isn't repeated
    df = df.sort_values(by=['Game']).reset_index(drop=True)
    df = df[df['Home?'] == 1]
    df = df.drop_duplicates(subset='Game', keep='first')
    
    # Loop over the rows in the DataFrame
    for index, row in df.iterrows():
        # Get the Game ID
        game_id = row['Game']
        
        # If the game already has the elos for the home and away team in the elos dictionary, go to the next game
        if game_id in elos.keys():
            continue
        
        # Get the team and opposition
        home_team = row['home_team']
        away_team = row['away_team']
        
        # Get the team and opposition elo score
        home_team_elo = elo_dict[home_team]
        away_team_elo = elo_dict[away_team]
        
        # Calculated the probability of winning for the team and opposition
        prob_win_home = 1 / (1 + 10**((away_team_elo - home_team_elo) / 400))
        prob_win_away = 1 - prob_win_home
        
        # Add the elos and probabilities our elos dictionary and elo_probs dictionary based on the Game ID
        elos[game_id] = [home_team_elo, away_team_elo]
        elo_probs[game_id] = [prob_win_home, prob_win_away]
        
        # Calculate the new elos of each team
        if row['Margin'] > 0: # Team wins; update both teams' elo
            new_home_team_elo = home_team_elo + k_factor*(1 - prob_win_home)
            new_away_team_elo = away_team_elo + k_factor*(0 - prob_win_away)
        elif row['Margin'] < 0: # Away team wins; update both teams' elo
            new_home_team_elo = home_team_elo + k_factor*(0 - prob_win_home)
            new_away_team_elo = away_team_elo + k_factor*(1 - prob_win_away)
        elif row['Margin'] == 0: # Drawn game' update both teams' elo
            new_home_team_elo = home_team_elo + k_factor*(0.5 - prob_win_home)
            new_away_team_elo = away_team_elo + k_factor*(0.5 - prob_win_away)
        
        # Update elos in elo dictionary
        elo_dict[home_team] = new_home_team_elo
        elo_dict[away_team] = new_away_team_elo
    
    return elos, elo_probs, elo_dict

In [80]:
# Grab the elos and elo probabilities for each game then map them to our DataFrame
elos, probs, elo_dict = elo_applier(afl_avgs, 24)
afl_avgs['home_elo'] = afl_avgs['Game'].map(elos).apply(lambda x: x[0])
afl_avgs['away_elo'] = afl_avgs['Game'].map(elos).apply(lambda x: x[1])

In [81]:
afl_avgs[['home_team', 'away_team', 'Margin', 'home_elo', 'away_elo']].tail()

Unnamed: 0,home_team,away_team,Margin,home_elo,away_elo
3123,Collingwood,West Coast,-35,1508.140954,1592.391714
3124,North Melbourne,Sydney,6,1472.869065,1662.752549
3125,North Melbourne,Sydney,-6,1472.869065,1662.752549
3126,Fremantle,Port Adelaide,-9,1426.443873,1583.704866
3127,Fremantle,Port Adelaide,9,1426.443873,1583.704866


Great! now we have both rolling averages for stats as a feature, and the elo of the teams! Let's have a quick look at the current elo standings with a k-factor of 24, out of curiosities sake.

In [82]:
for team in sorted(elo_dict, key=elo_dict.get)[::-1]:
    print(team, elo_dict[team])

Sydney 1668.777581980045
Adelaide 1633.9657573442942
Geelong 1633.1903276498356
Richmond 1609.933982518531
West Coast 1601.5375222412556
GWS 1596.4761980441315
Hawthorn 1580.2376666556431
Port Adelaide 1566.616129492136
Essendon 1501.9697554047555
Melbourne 1500.3011337455516
Collingwood 1498.995145523196
North Melbourne 1466.8440321971916
Western Bulldogs 1452.042223634068
Fremantle 1443.5326093907054
St Kilda 1429.1689613991873
Gold Coast 1287.5807756144195
Brisbane 1285.7823600203562
Carlton 1243.0478371446964


This looks extremely similar to the currently AFL ladder, so this is a good sign for elo being an effective predictor of winning.

### Creating an Elo Adjusted Average Margin Feature
Now that we have elo defined, we can adjust the margin for each game based on the elo for each team, and then average this margin, to create a new feature. To do this we will first restructure our elo columns so that they are associated with the actual team on the row, rather than the 'home' and 'away' elo measures we currently have.

In [83]:
# Create a function which maps the current home and away team elos to 'Team Elo' and 'Opposition Elo'
def map_elos(df):
    home_df = df[df['Home?'] == 1]
    away_df = df[df['Home?'] == 0]
    home_df['elo'] = home_df['home_elo']
    home_df['elo_Opp'] = home_df['away_elo']
    away_df['elo'] = away_df['away_elo']
    away_df['elo_Opp'] = away_df['home_elo']
    final_df = home_df.append(away_df).sort_values(by=['Game', 'Home?'])
    return final_df

In [84]:
# Map the elos using our function
afl_avgs = map_elos(afl_avgs)

# Create Adjusted Margin and then Average it over a 6 game window
afl_avgs['Adj_elo_ave_margin'] = afl_avgs['Margin'] * afl_avgs['elo_Opp'] / afl_avgs['elo']
afl_avgs = create_rolling_averages(afl_avgs, 6, ['Adj_elo_ave_margin'])

### Creating a 'Form Between the Teams' Feature
It is well known in Aussie Rules that often some teams perform better against certain teams than others. If we isolate our features to pure stats based on previous games not between the teams playing, or elo ratings, we won't account for any relationships between certain teams. An example is the [Kennett Curse](https://en.wikipedia.org/wiki/Kennett_curse), where Geelong won 11 consecutive games against Hawthorn, despite being similarly matched teams. Let's create a feature which calculates how many games a team has won against their opposition over a given window of games.

To do this, we will need to use historical data that dates back well before our current DataFrame starts at. Otherwise we will be using a lot of our games to calculate form, meaning we will have to drop these rows before feeding it into an algorithm. So let's use our prepare_match_results function which we defined in the afl_data_cleaning tutorial to grab a clean DataFrame of all match results since 1897. We can then calculate the form and join this to our current DataFrame.

In [85]:
match_results = prepare_match_results("data/afl_match_results.csv")
# Filter for games after 2004
match_results = match_results.iloc[25000:].reset_index(drop=True)

In [86]:
match_results.head(3)

Unnamed: 0,Behinds,Date,Game,Goals,Home?,Margin,Opposition,Opposition Behinds,Opposition Goals,Opposition Points,Points,Round,Team,Venue
0,8,2004-04-24,12501,14,1,49,West Coast,7,6,43,92,5,Carlton,Princes Park
1,7,2004-04-24,12501,6,0,-49,Carlton,8,14,92,43,5,West Coast,Princes Park
2,10,2004-04-24,12502,20,1,65,North Melbourne,17,8,65,130,5,St Kilda,Docklands


In [87]:
# Define a function which calculates the form between teams over a given window
def form_between_teams(df, window):
    num_wins_over_opposition = []
    # Iterate over rows
    for idx, row in df.iterrows():
        # Get a DataFrame of recent games between the teams
        recent_games_between_teams = df[(df['Team'] == row['Team']) & (df['Opposition'] == row['Opposition'])].loc[:idx-1][-window:]
        # Calculate the number of wins for the current team
        num_wins = recent_games_between_teams[recent_games_between_teams['Margin'] > 0].shape[0]
        # Append this to a list
        num_wins_over_opposition.append(num_wins)
    # Add the new feature as a column
    df['form_over_opposition_{}'.format(window)] = num_wins_over_opposition
    return df

In [88]:
# Find the historical form between teams and create a new DataFrame with the form_over_opposition feature in it
form = form_between_teams(match_results, 5)[['Date', 'Team', 'Opposition', 'form_over_opposition_5']]

# Join the new DataFrame to our main DataFrame
afl_avgs = pd.merge(afl_avgs, form, on=['Date', 'Team', 'Opposition'])

### Creating an Average Elo of Beaten Opposition Feature
Whilst the average margin between the teams feature allows us to capture form between specific teams, and average statistics allows us to capture general form of a team in certain areas, we currently cannot capture how teams perform against better teams and how teams perform against worse teams. For example, in 2018 Essendon have beaten teams such as West Coast and Geelong (top teams), but lost to poor teams like Carlton and Western Bulldogs. Let's create this feature and then inspect Essendon's Average Elo of Beaten teams to check if it's relatively higher than other teams.

So let's create this feature. While we're at it, we'll also create an 'average elo of opponent' for losses, too. This will allow us to capture the opposite; if teams perform worse against poor teams.

To do this we will create a function which loops over the DataFrame and appends the elo of opponents into a list within a dictionary based on the team name if they win/lose, depending on if we want the 'beaten' feature or 'lost' feature. Let's calculate both for a window of 6.

In [89]:
def create_ave_elo_opponent(df, window, beaten_or_lost='beaten'):
    elos_of_opponents = {team: [] for team in df['Team'].unique()}
    ave_elo_opponents = []
    
    # Loop over rows of the DataFrame
    for idx, row in df.iterrows():
        # Grab the mean elos of opponents beaten and append it to a list
        if len(elos_of_opponents[row['Team']]) >= window:
            ave_elo_opponents.append(np.mean(elos_of_opponents[row['Team']][-window:]))
        else:
            ave_elo_opponents.append(np.nan)
        
        if beaten_or_lost == 'beaten':
            # Update the elos of opponents beaten for this game (if the team wins, add their opponents elo to the dictionary)
            if row['Margin'] > 0 and row['Home?'] == 1:
                elos_of_opponents[row['Team']].append(row['away_elo']) 
            elif row['Margin'] > 0 and row['Home?'] == 0:
                elos_of_opponents[row['Team']].append(row['home_elo'])
        
        elif beaten_or_lost == 'lost':
            # Update the elos of opponents lost to for this game (if the team wins, add their opponents elo to the dictionary)
            if row['Margin'] < 0 and row['Home?'] == 1:
                elos_of_opponents[row['Team']].append(row['away_elo']) 
            elif row['Margin'] < 0 and row['Home?'] == 0:
                elos_of_opponents[row['Team']].append(row['home_elo'])
                
    df['average_elo_opponents_{}_{}'.format(beaten_or_lost, window)] = ave_elo_opponents
    return df

In [90]:
afl_avgs = create_ave_elo_opponent(afl_avgs, 6, beaten_or_lost='beaten')
afl_avgs = create_ave_elo_opponent(afl_avgs, 6, beaten_or_lost='lost')

In [94]:
# Get Essendon's most recent game as of 29/06/2018
afl_avgs.iloc[2920:2925]

Unnamed: 0,Team,odds,Date,home_team,away_team,Behinds,Game,Goals,Home?,Margin,Opposition,Opposition Behinds,Opposition Goals,Opposition Points,Points,Round,Venue,Season,Status,GA_ave_6,CP_ave_6,UP_ave_6,ED_ave_6,CM_ave_6,MI5_ave_6,One.Percenters_ave_6,BO_ave_6,K_ave_6,HB_ave_6,D_ave_6,M_ave_6,G_ave_6,B_ave_6,T_ave_6,HO_ave_6,I50_ave_6,CL_ave_6,CG_ave_6,R50_ave_6,FF_ave_6,FA_ave_6,AF_ave_6,SC_ave_6,CCL_ave_6,SCL_ave_6,SI_ave_6,MG_ave_6,TO_ave_6,ITC_ave_6,T5_ave_6,disposal_efficiency_ave_6,R50_efficiency_ave_6,I50_efficiency_ave_6,home_elo,away_elo,elo,elo_Opp,Adj_elo_ave_margin_ave_6,form_over_opposition_5,average_elo_opponents_beaten_6,average_elo_opponents_lost_6
2920,Brisbane,3.8438,2018-03-31,Brisbane,Melbourne,14,15214,10,1,-26,Melbourne,16,14,100,74,2,Gabba,2018,Home,8.333333,141.333333,216.0,251.0,7.0,8.666667,49.666667,2.666667,206.666667,156.333333,363.0,77.666667,11.333333,10.333333,57.0,42.333333,50.0,44.333333,58.333333,37.0,24.666667,19.0,1482.0,1512.333333,14.666667,29.666667,87.0,5124.666667,71.0,63.666667,12.0,0.690383,0.638317,0.71978,1261.465624,1450.169432,1261.465624,1450.169432,-29.167891,2,1375.803883,1457.002376
2921,Brisbane,3.8438,2018-03-31,Brisbane,Melbourne,14,15214,10,1,-26,Melbourne,16,14,100,74,2,Gabba,2018,Home,8.5,142.5,208.0,244.0,6.0,8.5,52.0,2.5,202.5,154.5,357.0,73.0,11.0,11.0,63.5,41.5,50.5,44.5,55.0,37.0,24.5,18.0,1478.5,1520.0,13.5,31.0,86.0,5160.0,72.5,65.0,13.0,0.682242,0.643916,0.722527,1261.465624,1450.169432,1261.465624,1450.169432,-29.348259,2,1375.803883,1455.29414
2922,Brisbane,3.8438,2018-03-31,Brisbane,Melbourne,14,15214,10,1,-26,Melbourne,16,14,100,74,2,Gabba,2018,Home,8.666667,143.666667,200.0,237.0,5.0,8.333333,54.333333,2.333333,198.333333,152.666667,351.0,68.333333,10.666667,11.666667,70.0,40.666667,51.0,44.666667,51.666667,37.0,24.333333,17.0,1475.0,1527.666667,12.333333,32.333333,85.0,5195.333333,74.0,66.333333,14.0,0.6741,0.649516,0.725275,1261.465624,1450.169432,1261.465624,1450.169432,-29.528628,2,1375.803883,1453.585904
2923,Brisbane,3.8438,2018-03-31,Brisbane,Melbourne,14,15214,10,1,-26,Melbourne,16,14,100,74,2,Gabba,2018,Home,8.833333,144.833333,192.0,230.0,4.0,8.166667,56.666667,2.166667,194.166667,150.833333,345.0,63.666667,10.333333,12.333333,76.5,39.833333,51.5,44.833333,48.333333,37.0,24.166667,16.0,1471.5,1535.333333,11.166667,33.666667,84.0,5230.666667,75.5,67.666667,15.0,0.665959,0.655115,0.728022,1261.465624,1450.169432,1261.465624,1450.169432,-29.708996,2,1375.803883,1451.877668
2924,Brisbane,3.8438,2018-03-31,Brisbane,Melbourne,14,15214,10,1,-26,Melbourne,16,14,100,74,2,Gabba,2018,Home,9.0,146.0,184.0,223.0,3.0,8.0,59.0,2.0,190.0,149.0,339.0,59.0,10.0,13.0,83.0,39.0,52.0,45.0,45.0,37.0,24.0,15.0,1468.0,1543.0,10.0,35.0,83.0,5266.0,77.0,69.0,16.0,0.657817,0.660714,0.730769,1261.465624,1450.169432,1261.465624,1450.169432,-29.889364,2,1375.803883,1450.169432


As we can see, Essendon has an average elo of opponents beaten of 1539 for a window of 6, whereas Collingwood (who have been a solid performing team recently) only have a rating of 1449. This means that Essendon's recent wins are against 'good' teams. Similarly Essendon's average elo of opponents that they lose to is 1486, whereas Collingwood's is 1598.

In [47]:
# Create the home_win column - this will be our target variable
afl_avgs['home_win'] = afl_avgs.apply(lambda x: 1 if x['Margin'] > 0 else 0, axis=1)

# Create regular margin rolling average
afl_avgs = create_rolling_averages(afl_avgs, 6, ['Margin'])

NameError: name 'afl' is not defined

## Wrapping it Up
We now have a fairly decent amount of features. Some other features which could be added include whether the game is in a major Capital city outisde of Mebourne (i.e. Sydney, Adelaide or Peth), how many 'Elite' players are playing (which could be judged by average SuperCoach scores over 110, for example), as well as your own metrics for attacking and defending.