# AFL Model - Part 4 - Weekly Predictions
Now that we have explored different algorithms for modelling, we can implement our chosen model and predict this week's AFL games! All you need to do is run the afl_modelling script each Thursday or Friday to predict the following week's games.

In [34]:
# Import Modules
from afl_feature_creation_v2 import prepare_afl_features
import afl_data_cleaning_v2
import afl_feature_creation_v2
import afl_modelling_v2
import datetime
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

## Creating The Features For This Weekend's Games
To actually predict this weekend's games, we need to create the same features that we have created in the previous tutorials for the games that will be played this weekend. This includes all the rolling averages, efficiency features, elo features etc. So the majority of this tutorial will be using previously defined functions to create features for the following weekend's games.

### Create Next Week's DataFrame
Let's first get our cleaned afl_data dataset, as well as the odds for next weekend and the 2018 fixture.

In [35]:
# Grab the cleaned AFL dataset and the column order
afl_data = afl_data_cleaning_v2.prepare_afl_data()
ordered_cols = afl_data.columns

# Define a function which grabs the odds for each game for the following weekend
def get_next_week_odds(path):
    # Get next week's odds
    next_week_odds = pd.read_csv(path)
    next_week_odds = next_week_odds.rename(columns={"team_1": "home_team", 
                                                "team_2": "away_team", 
                                                "team_1_odds": "odds", 
                                                "team_2_odds": "odds_away"
                                               })
    return next_week_odds

# Import the fixture
# Define a function which gets the fixture and cleans it up
def get_fixture(path):
    # Get the afl fixture
    fixture = pd.read_csv(path)

    # Replace team names and reformat
    fixture = fixture.replace({'Brisbane Lions': 'Brisbane', 'Footscray': 'Western Bulldogs'})
    fixture['Date'] = pd.to_datetime(fixture['Date']).dt.date.astype(str)
    fixture = fixture.rename(columns={"Home.Team": "home_team", "Away.Team": "away_team"})
    return fixture

next_week_odds = get_next_week_odds("data/weekly_odds.csv")
fixture = get_fixture("data/afl_fixture_2018.csv")

In [36]:
fixture.tail()

Unnamed: 0,Date,Season,Season.Game,Round,home_team,away_team,Venue
201,2018-09-08,2018,1,25,West Coast,Collingwood,Optus Stadium
202,2018-09-14,2018,1,26,Hawthorn,Melbourne,MCG
203,2018-09-15,2018,1,26,Collingwood,GWS,MCG
204,2018-09-21,2018,1,27,Richmond,Collingwood,MCG
205,2018-09-22,2018,1,27,West Coast,Melbourne,Optus Stadium


In [37]:
next_week_odds

Unnamed: 0,home_team,away_team,odds,odds_away
0,Richmond,Collingwood,1.42,3.4
1,West Coast,Melbourne,1.88,2.12


Now that we have these DataFrames, we will define a function which combines the fixture and next week's odds to create a single DataFrame for the games over the next 7 days. To use this function we will need Game IDs for next week. So we will create another function which creates Game IDs by using the Game ID from the last game played and adding 1 to it.

In [38]:
# Define a function which creates game IDs for this week's footy games
def create_next_weeks_game_ids(afl_data):
    odds = get_next_week_odds("data/weekly_odds.csv")

    # Get last week's Game ID
    last_afl_data_game = afl_data['game'].iloc[-1]

    # Create Game IDs for next week
    game_ids = [(i+1) + last_afl_data_game for i in range(odds.shape[0])]
    return game_ids


# Define a function which creates this week's footy game DataFrame
def get_next_week_df(afl_data):
    # Get the fixture and the odds for next week's footy games
    fixture = get_fixture("data/afl_fixture_2018.csv")
    next_week_odds = get_next_week_odds("data/weekly_odds.csv")
    next_week_odds['game'] = create_next_weeks_game_ids(afl_data)

    # Get today's date and next week's date and create a DataFrame for next week's games
#     todays_date = datetime.datetime.today().strftime('%Y-%m-%d')

#     date_in_7_days = (datetime.datetime.today() + datetime.timedelta(days=7)).strftime('%Y-%m-%d')
    todays_date = '2018-09-19'
    date_in_7_days = '2018-09-26'
    fixture = fixture[(fixture['Date'] >= todays_date) & (fixture['Date'] < date_in_7_days)].drop(columns=['Season.Game'])
    next_week_df = pd.merge(fixture, next_week_odds, on=['home_team', 'away_team'])

    # Split the DataFrame onto two rows for each game
    h_df = (next_week_df[['Date', 'game', 'home_team', 'away_team', 'odds', 'Season', 'Round', 'Venue']]
               .rename(columns={'home_team': 'team', 'away_team': 'opponent'})
               .assign(home_game=1))

    a_df = (next_week_df[['Date', 'game', 'home_team', 'away_team', 'odds_away', 'Season', 'Round', 'Venue']]
                .rename(columns={'odds_away': 'odds', 'home_team': 'opponent', 'away_team': 'team'})
                .assign(home_game=0))

    next_week = a_df.append(h_df).sort_values(by='game').rename(columns={
        'Date': 'date',
        'Season': 'season',
        'Round': 'round',
        'Venue': 'venue'
    })
    next_week['date'] = pd.to_datetime(next_week.date)
    next_week['round'] = afl_data['round'].iloc[-1] + 1
    return next_week

In [39]:
next_week_df = get_next_week_df(afl_data)
game_ids_next_round = create_next_weeks_game_ids(afl_data)
next_week_df

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


Unnamed: 0,date,round,season,venue,game,home_game,odds,opponent,team
0,2018-09-21,26,2018,MCG,15405,0,3.4,Richmond,Collingwood
0,2018-09-21,26,2018,MCG,15405,1,1.42,Collingwood,Richmond
1,2018-09-22,26,2018,Optus Stadium,15406,0,2.12,West Coast,Melbourne
1,2018-09-22,26,2018,Optus Stadium,15406,1,1.88,Melbourne,West Coast


In [40]:
fixture.tail()

Unnamed: 0,Date,Season,Season.Game,Round,home_team,away_team,Venue
201,2018-09-08,2018,1,25,West Coast,Collingwood,Optus Stadium
202,2018-09-14,2018,1,26,Hawthorn,Melbourne,MCG
203,2018-09-15,2018,1,26,Collingwood,GWS,MCG
204,2018-09-21,2018,1,27,Richmond,Collingwood,MCG
205,2018-09-22,2018,1,27,West Coast,Melbourne,Optus Stadium


### Create Each Feature
Now let's append next week's DataFrame to our afl_data, match_results and odds DataFrames and then create all the features we used in the [AFL Feature Creation Tutorial](0.2. afl_feature_creation_tutorial.ipynb). We need to append the games and then feed them into our function so that we can create features for upcoming games.

In [41]:
# Append next week's games to our afl_data DataFrame
afl_data = afl_data.append(next_week_df).reset_index(drop=True)

# Append next week's games to match results (we need to do this for our feature creation to run)
match_results = afl_data_cleaning_v2.get_cleaned_match_results().append(next_week_df)

# Append next week's games to odds
odds = (afl_data_cleaning_v2.get_cleaned_odds().pipe(lambda df: df.append(next_week_df[df.columns]))
       .reset_index(drop=True))

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


In [42]:
features_df = afl_feature_creation_v2.prepare_afl_features(afl_data=afl_data, match_results=match_results, odds=odds)

In [43]:
features_df.tail()

Unnamed: 0,game,home_team,away_team,date,round,venue,season,f_odds,f_form_margin_btwn_teams,f_form_past_5_btwn_teams,f_odds_away,f_elo_home,f_elo_away,f_I50_efficiency_home,f_R50_efficiency_home,f_I50_efficiency_away,f_R50_efficiency_away,f_AF_diff,f_B_diff,f_BO_diff,f_CCL_diff,f_CG_diff,f_CL_diff,f_CM_diff,f_CP_diff,f_D_diff,f_ED_diff,f_FA_diff,f_FF_diff,f_G_diff,f_GA_diff,f_GA1_diff,f_HB_diff,f_HO_diff,f_I50_diff,f_ITC_diff,f_K_diff,f_M_diff,f_MG_diff,f_MI5_diff,f_One.Percenters_diff,f_R50_diff,f_SC_diff,f_SCL_diff,f_SI_diff,f_T_diff,f_T5_diff,f_TO_diff,f_UP_diff,f_behinds_diff,f_goals_diff,f_margin_diff,f_opponent_behinds_diff,f_opponent_goals_diff,f_opponent_points_diff,f_points_diff,f_current_odds_prob,f_current_odds_prob_away
1632,15400,Melbourne,Geelong,2018-09-07,24,M.C.G.,2018,1.671054,-24.2,1.0,1.608695,1558.566657,1625.423155,0.660402,0.708517,0.668104,0.759021,71.143537,1.252766,2.92582,0.959649,1.814545,-1.360352,4.151592,10.761703,2.487297,-17.22315,-1.671834,-6.469596,-0.268716,0.79744,0.0,-3.674919,25.644698,7.369751,8.626891,6.162216,6.295908,435.479121,3.741955,-11.460528,-7.520927,-55.501462,-2.319974,7.932475,4.321983,-3.341998,3.822951,-13.208704,1.840913,-0.775489,-14.337935,-1.959517,2.247572,11.525916,-2.812019,0.516662,0.492951
1633,15402,West Coast,Collingwood,2018-09-08,24,Perth Stadium,2018,2.03252,20.2,3.0,1.740685,1625.871702,1560.370309,0.69675,0.706745,0.70033,0.694586,-85.634097,-1.562483,1.055257,-0.827688,-12.787818,-4.748463,1.482025,-21.803841,-57.649756,-41.443506,2.262408,1.030872,-1.409203,-0.677635,0.0,-79.254067,-4.188338,-3.660205,-5.348511,21.604311,25.318926,158.053807,-2.318048,-2.547266,5.826568,-95.674177,-3.920767,-18.229421,-11.983321,-0.761877,-9.806857,-36.203653,-1.057973,-0.444098,-6.947073,0.123237,0.516879,3.224511,-3.722562,0.606612,0.392218
1634,15404,Collingwood,GWS,2018-09-15,25,M.C.G.,2018,1.88776,12.6,3.0,2.011444,1548.165197,1597.41586,0.712035,0.693075,0.701884,0.73121,-143.613203,-2.138746,-5.252952,-2.285955,2.14792,-9.040037,-1.190511,-9.344954,-21.276848,-15.871758,-8.688885,2.762415,-0.694853,-2.179841,0.909091,20.498281,7.152118,-5.272869,0.658795,-41.775129,-20.57375,-1290.764048,-0.529944,-7.933524,-15.488238,-228.882754,-6.754092,-30.280555,-6.809112,-5.609639,3.794805,-11.068573,-0.927235,0.66619,5.615838,-1.571654,-0.16238,-2.545936,3.069902,0.608495,0.393856
1635,15405,Richmond,Collingwood,2018-09-21,26,MCG,2018,1.378281,20.8,4.0,1.843331,1707.546289,1565.27739,0.711027,0.68543,0.726775,0.698589,17.671669,0.992843,7.211524,1.508542,10.487274,-0.736008,-0.567646,6.871611,14.157172,8.91936,9.82864,-5.133199,3.688358,4.038168,-0.016529,-2.21911,-20.457555,15.996809,18.803869,16.376281,5.215247,1173.730316,5.155884,10.893795,1.199255,132.976319,-2.244524,22.412564,-2.677754,2.740631,5.339491,5.732576,-0.020483,1.915009,12.736595,0.172343,-0.239895,-1.267025,11.46957,0.704225,0.294118
1636,15406,West Coast,Melbourne,2018-09-22,26,Optus Stadium,2018,1.962699,21.2,3.0,1.719135,1638.076814,1576.417977,0.694846,0.717283,0.670632,0.736721,-141.01199,-3.669598,0.519964,-3.436143,-9.211704,-5.194045,0.240276,-25.132511,-51.41054,-27.439194,-0.445089,6.067431,-2.92519,-4.016487,0.181818,-71.941486,-12.95473,-11.139864,-0.656193,20.530946,13.825124,-581.725801,-9.436123,2.916125,6.396286,-127.657479,-1.757928,-44.841672,-18.356114,-4.653871,-5.730648,-27.333506,-2.406656,-1.356685,-16.089375,0.023015,0.919932,5.542609,-10.546767,0.531915,0.471698


## Create Predictions For the Upcoming Round
Now that we have our features, we can use our model that we created in part 3 to predict the next round. First we need to filter our features_df into a training df and a df with next round's features/matches. Then we can use the model created in the last tutorial to create predictions. For simplicity, I have hardcoded the parameters we used in the last tutorial.

In [44]:
# Get the train df by only taking the games IDs which aren't in the next week df
train_df = features_df[~features_df.game.isin(next_week_df.game)]

# Get the result and merge to the feature_df
match_results = (pd.read_csv("data/afl_match_results.csv")
                    .rename(columns={'Game': 'game'})
                    .assign(result=lambda df: df.apply(lambda row: 1 if row['Home.Points'] > row['Away.Points'] else 0, axis=1)))

train_df = pd.merge(train_df,  match_results[['game', 'result']], on='game')

train_x = train_df.drop(columns=['result'])
train_y = train_df.result

next_round_x = features_df[features_df.game.isin(next_week_df.game)]

In [45]:
# Fit out logistic regression model - note that our predictions come out in the order of [away_team_prob, home_team_prob]

lr_best_params = {'C': 0.01,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'max_iter': 100,
 'multi_class': 'ovr',
 'n_jobs': 1,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'newton-cg',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

feature_cols = [col for col in train_df if col.startswith('f_')]

# Scale features
scaler = StandardScaler()
train_x[feature_cols] = scaler.fit_transform(train_x[feature_cols])
next_round_x[feature_cols] = scaler.transform(next_round_x[feature_cols])

lr = LogisticRegression(**lr_best_params)
lr.fit(train_x[feature_cols], train_y)
prediction_probs = lr.predict_proba(next_round_x[feature_cols])

modelled_home_odds = [1/i[1] for i in prediction_probs]
modelled_away_odds = [1/i[0] for i in prediction_probs]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [46]:
# Create a predictions df
preds_df = (next_round_x[['date', 'home_team', 'away_team', 'venue', 'game']].copy()
               .assign(modelled_home_odds=modelled_home_odds,
                      modelled_away_odds=modelled_away_odds)
               .pipe(pd.merge, next_week_odds, on=['home_team', 'away_team'])
               .pipe(pd.merge, features_df[['game', 'f_elo_home', 'f_elo_away']], on='game')
               .drop(columns='game')
           )

In [47]:
preds_df

Unnamed: 0,date,home_team,away_team,venue,modelled_home_odds,modelled_away_odds,odds,odds_away,f_elo_home,f_elo_away
0,2018-09-21,Richmond,Collingwood,MCG,1.304895,4.279819,1.42,3.4,1707.546289,1565.27739
1,2018-09-22,West Coast,Melbourne,Optus Stadium,1.895771,2.116357,1.88,2.12,1638.076814,1576.417977


Alternatively, if you want to generate predictions using a script which uses all the above code, just run the following:

In [48]:
afl_modelling_v2.create_predictions()

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  next_round_x[feature_cols] = scaler.transform(next_round_x[feature_cols])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Unnamed: 0,date,home_team,away_team,venue,modelled_home_odds,modelled_away_odds,odds,odds_away,f_elo_home,f_elo_away
0,2018-09-21,Richmond,Collingwood,MCG,1.304895,4.279819,1.42,3.4,1707.546289,1565.27739
1,2018-09-22,West Coast,Melbourne,Optus Stadium,1.895771,2.116357,1.88,2.12,1638.076814,1576.417977


## Conclusion
Congratulations! You have created AFL predictions for this week. If you are beginner to this, don't be overwhelmed. The process gets easier each time you do it. And it is super rewarding. In future iterations we will update this tutorial to predict actual odds, and then integrate this with Betfair's API so that you can create an automated betting strategy using Machine Learning to create your predictions!