# Exploring the Impact of Home Team Advantage in Football

## Methodology
In this analysis, we will explore the impact of home team advantage in football by:
1. Collecting and preprocessing match data, including home and away team performance.
2. Calculating maximum likelihood estimate for probability of winning at home and probability of winning away
3. Run a hypothesis test to see if the proportion of games won at home or away is different 

### **Step 1:** Collecting and Preprocessing Match Data
In this step, we will gather match data, including information about home and away team performances. The data will be cleaned and prepared for analysis by handling missing values, standardizing formats, and ensuring consistency. First let's load the modules and data we need

In [3]:
import soccerdata as sd
from pathlib import Path
import sys
import os
import logging

# Add the parent directory to sys.path
sys.path.append(os.path.abspath(".."))  # Adjust as needed
from utils.constants import BIG_FIVE_LEAGUES

logging.getLogger().setLevel(logging.INFO)

In [None]:
# Add the parent directory to sys.path
sys.path.append(os.path.abspath(".."))  # Adjust as needed

TIME_PERIOD = 20 # How far back are we looking
CACHE_DIR = Path("../data/home advantage cache")

# Using Match History to get match specific data like if a team is home
match_history = sd.MatchHistory(
    leagues=BIG_FIVE_LEAGUES, # Premier League, Seria A, La Liga, Bunsaliga, Ligue 1 
    seasons=range(2025 - TIME_PERIOD, 2025),
    no_cache=False,
    no_store=False,
    data_dir=CACHE_DIR
) 

games = match_history.read_games()
games.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,date,home_team,away_team,FTHG,FTAG,FTR,HTHG,HTAG,HTR,referee,...,1XBCH,1XBCD,1XBCA,BFECH,BFECD,BFECA,BFEC>2.5,BFEC<2.5,BFECAHH,BFECAHA
league,season,game,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
ENG-Premier League,506,2005-08-13 Aston Villa-Bolton,2005-08-13 12:00:00,Aston Villa,Bolton,2.0,2.0,D,2.0,2.0,D,M Riley,...,,,,,,,,,,
ENG-Premier League,506,2005-08-13 Everton-Man United,2005-08-13 12:00:00,Everton,Man United,0.0,2.0,A,0.0,1.0,A,G Poll,...,,,,,,,,,,
ENG-Premier League,506,2005-08-13 Fulham-Birmingham,2005-08-13 12:00:00,Fulham,Birmingham,0.0,0.0,D,0.0,0.0,D,R Styles,...,,,,,,,,,,
ENG-Premier League,506,2005-08-13 Man City-West Brom,2005-08-13 12:00:00,Man City,West Brom,0.0,0.0,D,0.0,0.0,D,C Foy,...,,,,,,,,,,
ENG-Premier League,506,2005-08-13 Middlesbrough-Liverpool,2005-08-13 12:00:00,Middlesbrough,Liverpool,0.0,0.0,D,0.0,0.0,D,M Halsey,...,,,,,,,,,,
