# Introduction

This project predicts Australian Football League (AFL) match outcomes using machine learning techniques. The dataset was scraped using the fitzRoy package from Footywire and covers detailed player statistics and match information spanning 10 seasons from 2015 to 2025.

With rich game-level and player-level data, including player performance metrics and match results, the project builds predictive models to forecast match winners and analyze key factors influencing game outcomes.

**Goal: Develop an accurate and interpretable model to predict AFL match results.**

# 1. Exploratory Data Analysis
This dataset contains 50 variables describing player performance and match details in AFL games:

Date (Datetime) – Date when the match was played.

Season (Numeric) – Year of the AFL season.

Round (Numeric) – Round number within the season.

Venue (Nominal) – Stadium where the match took place.

Player (Nominal) – Name of the player.

Team (Nominal) – Player’s team.

Opposition (Nominal) – Opposing team in the match.

Status (Nominal) – Home or away status of the player’s team.

Match_id (Nominal) – Unique identifier for each match.

**Performance Metrics:**

GA (Numeric) – Goals scored against the player’s team.

CP (Numeric) – Contested possessions gained.

UP (Numeric) – Uncontested possessions gained.

ED (Numeric) – Effective disposals (successful passes).

DE (Numeric) – Total disposals (kicks + handballs).

CM (Numeric) – Centre clearances won.

MI5 (Numeric) – Disposals gaining more than 5 meters.

One.Percenters (Numeric) – Defensive efforts like spoils or smothers.

BO (Numeric) – Number of bounces while running.

TOG (Numeric) – Time on ground (minutes played).

K (Numeric) – Kicks delivered.

HB (Numeric) – Handballs delivered.

D (Numeric) – Total disposals (kicks + handballs).

M (Numeric) – Marks (catches from kicks).

G (Numeric) – Goals scored by the player.

B (Numeric) – Behinds scored.

T (Numeric) – Tackles made.

HO (Numeric) – Hitouts in ruck contests.

I50 (Numeric) – Inside 50-meter entries.

CL (Numeric) – Clearances from stoppages.

CG (Numeric) – Clangers (errors made).

R50 (Numeric) – Rebound 50-meter exits.

FF (Numeric) – Free kicks awarded to player’s team.

FA (Numeric) – Free kicks against player’s team.

AF (Numeric) – Airballs fisted from contests.

SC (Numeric) – Score involvements.

CCL (Numeric) – Centre clearances.

SCL (Numeric) – Score clearances.

SI (Numeric) – Spoils preventing opponent marks.

MG (Numeric) – Meters gained from disposals.

TO (Numeric) – Turnovers conceded.

ITC (Numeric) – Intercepts made.

T5 (Numeric) – Times among top 5 possessions on team.

**Imports**

I'll import the core libraries needed for data handling, visualization, and machine learning.

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

I'll load the data that was scraped from footywire.

In [12]:
stats_2015 = pd.read_csv('../data/footywire_player_stats_2015.csv')
stats_2016 = pd.read_csv('../data/footywire_player_stats_2016.csv')
stats_2017 = pd.read_csv('../data/footywire_player_stats_2017.csv')
stats_2018 = pd.read_csv('../data/footywire_player_stats_2018.csv')
stats_2019 = pd.read_csv('../data/footywire_player_stats_2019.csv')
stats_2020 = pd.read_csv('../data/footywire_player_stats_2020.csv')
stats_2021 = pd.read_csv('../data/footywire_player_stats_2021.csv')
stats_2022 = pd.read_csv('../data/footywire_player_stats_2022.csv')
stats_2023 = pd.read_csv('../data/footywire_player_stats_2023.csv')
stats_2024 = pd.read_csv('../data/footywire_player_stats_2024.csv')
stats_2025 = pd.read_csv('../data/footywire_player_stats_2025.csv')

games_2015 = pd.read_csv('../data/footywire_match_results_2015.csv')
games_2016 = pd.read_csv('../data/footywire_match_results_2016.csv')
games_2017 = pd.read_csv('../data/footywire_match_results_2017.csv')
games_2018 = pd.read_csv('../data/footywire_match_results_2018.csv')
games_2019 = pd.read_csv('../data/footywire_match_results_2019.csv')
games_2020 = pd.read_csv('../data/footywire_match_results_2020.csv')
games_2021 = pd.read_csv('../data/footywire_match_results_2021.csv')
games_2022 = pd.read_csv('../data/footywire_match_results_2022.csv')
games_2023 = pd.read_csv('../data/footywire_match_results_2023.csv')
games_2024 = pd.read_csv('../data/footywire_match_results_2024.csv')
games_2025 = pd.read_csv('../data/footywire_match_results_2025.csv')

In [19]:
games_2016

Unnamed: 0,Date,Time,Round,Venue,Home.Team,Away.Team,Home.Points,Away.Points
0,2016-03-24,19:20,Round 1,MCG,Richmond,Carlton,92,83
1,2016-03-26,13:40,Round 1,MCG,Melbourne,GWS,80,78
2,2016-03-26,15:35,Round 1,People First Stadium,Gold Coast,Essendon,121,60
3,2016-03-26,19:25,Round 1,Accor Stadium,Sydney,Collingwood,133,53
4,2016-03-26,19:25,Round 1,Marvel Stadium,North Melbourne,Adelaide,107,97
...,...,...,...,...,...,...,...,...
202,2016-09-16,19:50,Semi Final,MCG,Hawthorn,Western Bulldogs,84,107
203,2016-09-17,19:25,Semi Final,SCG,Sydney,Adelaide,118,82
204,2016-09-23,19:50,Preliminary Final,MCG,Geelong,Sydney,60,97
205,2016-09-24,19:15,Preliminary Final,ENGIE Stadium,GWS,Western Bulldogs,83,89


In [16]:
stats = pd.concat([stats_2015, stats_2016, stats_2017, stats_2018, stats_2019, stats_2020, stats_2021, stats_2022, stats_2023, stats_2024, stats_2025], ignore_index=True)

games = pd.concat([games_2015, games_2016, games_2017, games_2018, games_2019, games_2020, games_2021, games_2022, games_2023, games_2024, games_2025], ignore_index=True)

**Data Inspection (Games)**

In [17]:
games.head()

Unnamed: 0,Date,Time,Round,Venue,Home.Team,Away.Team,Home.Points,Away.Points,AwayWin,HomeWin,Draw,Season,Match_id
0,2015-04-02,19:20,1,MCG,Carlton,Richmond,78,105,1.0,0.0,0.0,2015.0,5964.0
1,2015-04-04,13:40,1,MCG,Melbourne,Gold Coast,115,89,0.0,1.0,0.0,2015.0,5965.0
2,2015-04-04,16:35,1,Accor Stadium,Sydney,Essendon,72,60,0.0,1.0,0.0,2015.0,5966.0
3,2015-04-04,18:20,1,Gabba,Brisbane,Collingwood,74,86,1.0,0.0,0.0,2015.0,5967.0
4,2015-04-04,19:20,1,Marvel Stadium,Western Bulldogs,West Coast,97,87,0.0,1.0,0.0,2015.0,5968.0


In [14]:
games.tail()

Unnamed: 0,Date,Time,Round,Venue,Home.Team,Away.Team,Home.Points,Away.Points,AwayWin,HomeWin,Draw,Season,Match_id
4209,2025-07-25,19:50,Round 20,ENGIE Stadium,GWS,Sydney,102,58,,,,,
4210,2025-07-26,13:20,Round 20,People First Stadium,Gold Coast,Brisbane,130,64,,,,,
4211,2025-07-26,14:15,Round 20,Optus Stadium,Fremantle,West Coast,126,77,,,,,
4212,2025-07-26,19:35,Round 20,Marvel Stadium,North Melbourne,Geelong,49,150,,,,,
4213,2025-07-26,19:40,Round 20,Adelaide Oval,Adelaide,Port Adelaide,133,35,,,,,


In [None]:
# Shape of dataset
print(f"Shape: {games.shape}")

# Data types
print("\nData types:")
print(games.dtypes)

# Total missing values
total_missing = games.isna().sum().sum()
print(f"\nTotal missing values: {total_missing}")

# Duplicate rows
duplicate_count = games.duplicated().sum()
print(f"Duplicate rows: {duplicate_count}")

# Unique venue count
print(f"Unique Venues: {games['Venue'].nunique()}")

# Unique team count
print(f"Unique Teams: {games['Home.Team'].nunique()}")

# Transactions date range
print(f"Date range: {data['InvoiceDate'].min()} → {data['InvoiceDate'].max()}")

data.describe()


In [9]:
stats.head()

Unnamed: 0,Date,Season,Round,Venue,Player,Team,Opposition,Status,Match_id,GA,...,FA,AF,SC,CCL,SCL,SI,MG,TO,ITC,T5
0,2015-04-02,2015,Round 1,MCG,Bryce Gibbs,Carlton,Richmond,Home,5964,0.0,...,2.0,96.0,82.0,2.0,2.0,8.0,466.0,6.0,3.0,0.0
1,2015-04-02,2015,Round 1,MCG,Tom Bell,Carlton,Richmond,Home,5964,2.0,...,0.0,108.0,115.0,0.0,1.0,10.0,475.0,4.0,1.0,3.0
2,2015-04-02,2015,Round 1,MCG,Sam Docherty,Carlton,Richmond,Home,5964,1.0,...,0.0,107.0,147.0,0.0,2.0,6.0,287.0,2.0,12.0,1.0
3,2015-04-02,2015,Round 1,MCG,Chris Judd,Carlton,Richmond,Home,5964,4.0,...,2.0,93.0,108.0,4.0,2.0,8.0,474.0,5.0,2.0,1.0
4,2015-04-02,2015,Round 1,MCG,Kade Simpson,Carlton,Richmond,Home,5964,0.0,...,0.0,97.0,103.0,0.0,0.0,4.0,269.0,4.0,5.0,0.0


In [10]:
stats.tail()

Unnamed: 0,Date,Season,Round,Venue,Player,Team,Opposition,Status,Match_id,GA,...,FA,AF,SC,CCL,SCL,SI,MG,TO,ITC,T5
99263,2025-07-26,2025,Round 20,Optus Stadium,Sandy Brock,West Coast,Fremantle,Away,11357,0.0,...,0.0,30.0,41.0,0.0,0.0,2.0,163.0,1.0,1.0,0.0
99264,2025-07-26,2025,Round 20,Optus Stadium,Jobe Shanahan,West Coast,Fremantle,Away,11357,0.0,...,0.0,34.0,15.0,0.0,0.0,1.0,10.0,3.0,0.0,1.0
99265,2025-07-26,2025,Round 20,Optus Stadium,Matt Owies,West Coast,Fremantle,Away,11357,0.0,...,0.0,16.0,25.0,0.0,0.0,0.0,7.0,0.0,1.0,1.0
99266,2025-07-26,2025,Round 20,Optus Stadium,Tyrell Dewar,West Coast,Fremantle,Away,11357,0.0,...,0.0,15.0,10.0,0.0,0.0,1.0,50.0,0.0,0.0,0.0
99267,2025-07-26,2025,Round 20,Optus Stadium,Archer Reid,West Coast,Fremantle,Away,11357,0.0,...,0.0,7.0,13.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
