# **Notebook Information**

The following notebook has been setup to give you quick access to training data for a tipping model. Some suggestions/information for using this notebook:<br>
<br>
- If you load a large number of seasons, it would be best to save as a CSV when the run has been completed and then load the csv back into a new notebook to design your model and experiment. This will prevent you from needing to download each time you open up your notebook.<br>
- The way the data is structured is that each game is joined with the table information at the end of the previous round.<br>
- Even though i dont use all of the data for features in my model, I have left everything in there to create your feature set from. <br>
- Additionally, I have merged in more extensive datasets to create features for my model (its a comp for street cred, so not going to give everything away) so if you have time have a look around and see what you can bring in. <br>

# **Libraries**

In [5]:
# This will check to see if pyAFL has been installed or not, and instal it when it is absent.
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pandas as pd

from datetime import datetime
import requests

pd.set_option('display.max_columns', None)

# Functions for data preperation
from functions.add_leading_zero import add_leading_zero
from functions.get_season_stats import get_season_stats
from functions.get_table_hist import get_table_hist
from functions.clean_table_data import clean_table_data
from functions.clean_games_data import clean_games_data


# **Data Load**

To load the base data to build a model, update the below fields with your desired start season and end season. 

In [3]:
games = get_season_stats(start_year=2022, end_year=2022)

table_hist = get_table_hist(start_year=2022, end_year=2022)

Round 1 Season 2022 is completed
Round 2 Season 2022 is completed
Round 3 Season 2022 is completed
Round 4 Season 2022 is completed
Round 5 Season 2022 is completed
Round 6 Season 2022 is completed
Round 7 Season 2022 is completed
Round 8 Season 2022 is completed
Round 9 Season 2022 is completed
Round 10 Season 2022 is completed
Round 11 Season 2022 is completed
Round 12 Season 2022 is completed
Round 13 Season 2022 is completed
Round 14 Season 2022 is completed
Round 15 Season 2022 is completed
Round 16 Season 2022 is completed
Round 17 Season 2022 is completed
Round 18 Season 2022 is completed
Round 19 Season 2022 is completed
Round 20 Season 2022 is completed
Round 21 Season 2022 is completed
Round 22 Season 2022 is completed
Round 23 Season 2022 is completed


The next section of code will simply create a reference table to help join the table and games data together.

In [4]:
# Team Code data for joins between datasets
data = {'Team': ['Adelaide', 'Brisbane Lions', 'Brisbane Bears', 'Carlton', 'Collingwood', 'Essendon', 'Fitzroy', 'Fremantle', 'Geelong', 'Gold Coast', 'Greater Western Sydney', 'Hawthorn', 'Melbourne', 'North Melbourne', 'Port Adelaide', 'Richmond', 'St Kilda', 'Sydney', 'University', 'West Coast', 'Western Bulldogs'],
        'Code': [1, 19, 2, 3, 4, 5, 6, 8, 9, 20, 21, 10, 11, 12, 13, 14, 15, 16, 17, 18, 7],
        'Abv': ['AD', 'BL', 'BB', 'CA', 'CW', 'ES', 'FI', 'FR', 'GE', 'GC', 'GW', 'HW', 'ME', 'NM', 'PA', 'RI', 'SK', 'SY', 'UN', 'WC', 'WB']}

team_code = pd.DataFrame(data)
team_code['Code'] = team_code['Code'].apply(add_leading_zero)

This next section will create a copy of the table history dataset, then clean all the table data. Finally the games data is cleaned and then joined back with the cleaned table data.

In [5]:
# Create a copy of the table to complete data cleaning & join
table_hist_2 = table_hist.copy()

# Run the function to clean the table data ready to be joined
table_hist_2 = clean_table_data(table_hist_2)

# Run the function to clean the games data and then join the table and games data
cleaned_data = clean_games_data(games, team_code, table_hist_2)

  df['For'] = df['For'].str.replace(r" \(.*\)", "")
  df['Agn'] = df['Agn'].str.replace(r" \(.*\)", "")


In [6]:
cleaned_data.head()

Unnamed: 0,Date,Round,Game number,Venue,Home Team,Away Team,Home team score,Away team score,Home team score detail,Away team score detail,Margin,Year stage,day,month,year,time,HomeCode,AwayCode,Pos_home,P_home,W_home,D_home,L_home,For_home,Agn_home,Max_home,Min_home,Home_W_home,Home_D_home,Home_L_home,Away_W_home,Away_D_home,Away_L_home,Stk_home,Pts_home,%_home,Stkn_home,Stkd_home,Pos_away,P_away,W_away,D_away,L_away,For_away,Agn_away,Max_away,Min_away,Home_W_away,Home_D_away,Home_L_away,Away_W_away,Away_D_away,Away_L_away,Stk_away,Pts_away,%_away,Stkn_away,Stkd_away,target
0,2022-03-24 18:20:00,2,1,Docklands,Western Bulldogs,Carlton,90,102,"[4, 1, 7, 3, 11, 5, 13, 12]","[5, 2, 12, 4, 14, 5, 16, 6]",12,Early season,24,3,2022,18:20:00,7,3,17.0,1.0,0.0,0.0,1.0,71,97,71.0,71.0,0.0,0.0,0.0,0.0,0.0,1.0,1L,0.0,73.2,-1.0,L,5.0,1.0,1.0,0.0,0.0,101,76,101.0,101.0,1.0,0.0,0.0,0.0,0.0,0.0,1W,4.0,132.89,1.0,W,0
1,2022-03-25 18:50:00,2,2,S.C.G.,Sydney,Geelong,107,77,"[4, 3, 11, 3, 15, 4, 17, 5]","[2, 4, 6, 7, 8, 13, 10, 17]",30,Early season,25,3,2022,18:50:00,16,9,6.0,1.0,1.0,0.0,0.0,112,92,112.0,112.0,0.0,0.0,0.0,1.0,0.0,0.0,1W,4.0,121.74,1.0,W,1.0,1.0,1.0,0.0,0.0,138,72,138.0,138.0,1.0,0.0,0.0,0.0,0.0,0.0,1W,4.0,191.67,1.0,W,1
2,2022-03-26 12:45:00,2,3,M.C.G.,Collingwood,Adelaide,100,58,"[5, 3, 7, 5, 14, 7, 15, 10]","[1, 5, 3, 6, 7, 6, 8, 10]",42,Early season,26,3,2022,12:45:00,4,1,7.0,1.0,1.0,0.0,0.0,102,85,102.0,102.0,0.0,0.0,0.0,1.0,0.0,0.0,1W,4.0,120.0,1.0,W,10.0,1.0,0.0,0.0,1.0,82,83,82.0,82.0,0.0,0.0,1.0,0.0,0.0,0.0,1L,0.0,98.8,-1.0,L,1
3,2022-03-26 15:35:00,2,4,Docklands,Essendon,Brisbane Lions,75,97,"[4, 5, 5, 9, 8, 13, 10, 15]","[1, 1, 7, 2, 13, 5, 15, 7]",22,Early season,26,3,2022,15:35:00,5,19,18.0,1.0,0.0,0.0,1.0,72,138,72.0,72.0,0.0,0.0,0.0,0.0,0.0,1.0,1L,0.0,52.17,-1.0,L,8.0,1.0,1.0,0.0,0.0,80,69,80.0,80.0,1.0,0.0,0.0,0.0,0.0,0.0,1W,4.0,115.94,1.0,W,0
4,2022-03-26 18:40:00,2,5,Adelaide Oval,Port Adelaide,Hawthorn,56,120,"[0, 3, 3, 6, 7, 10, 7, 14]","[3, 2, 8, 4, 14, 4, 19, 6]",64,Early season,26,3,2022,18:40:00,13,10,11.0,1.0,0.0,0.0,1.0,69,80,69.0,69.0,0.0,0.0,0.0,0.0,0.0,1.0,1L,0.0,86.25,-1.0,L,3.0,1.0,1.0,0.0,0.0,78,58,78.0,78.0,1.0,0.0,0.0,0.0,0.0,0.0,1W,4.0,134.48,1.0,W,0


In [7]:
cleaned_data.to_csv('AFL_Base_Training_Data.csv')