In [1]:
# importing data manipulation libraries
import pandas as pd
import numpy as np
# from ydata_profiling import ProfileReport

# importing visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# configure notebook for inline plotting
%matplotlib inline

# config pandas to display more than 20 columns
pd.set_option('display.max_columns',250)

# set grid style 
sns.set_style('darkgrid')

# Data Understanding
The objective in this section is to learn as much as possible about the our match data. Here, we will get a high-level practical understnading of our data and by the end of the process, we should have a clear undertanding of the structure of the dataset, how to clean the data, the target variables and possible modelling techniques.

In [2]:
# read data into pandas dataframe
data = pd.read_csv('./data/Matches.csv', header=0)

In [3]:
# check the shape of dataframe
data.shape

(6460, 215)

The dataset consists of 6,460 observations and 215 features some of which are our target variables.

In [4]:
# check types of columns
data.dtypes.unique()

array([dtype('int64'), dtype('O'), dtype('float64'), dtype('bool')],
      dtype=object)

The result indicates that our dataset consists of categorical (object), boolean and numeric (float and int) features. Before proceeding further, let's confirm that the dtypes match what is expected in the documentation.

### To do (Create table of columns and their definitions) - Use datawrapper and insert link 



In [5]:
# inspect the first few rows
data.head(5)

Unnamed: 0,id,homeID,awayID,season,status,roundID,game_week,revised_game_week,homeGoals,awayGoals,homeGoalCount,awayGoalCount,totalGoalCount,team_a_corners,team_b_corners,totalCornerCount,team_a_offsides,team_b_offsides,team_a_yellow_cards,team_b_yellow_cards,team_a_red_cards,team_b_red_cards,team_a_shotsOnTarget,team_b_shotsOnTarget,team_a_shotsOffTarget,team_b_shotsOffTarget,team_a_shots,team_b_shots,team_a_fouls,team_b_fouls,team_a_possession,team_b_possession,refereeID,coach_a_ID,coach_b_ID,stadium_name,stadium_location,team_a_cards_num,team_b_cards_num,odds_ft_1,odds_ft_x,odds_ft_2,odds_ft_over05,odds_ft_over15,odds_ft_over25,odds_ft_over35,odds_ft_over45,odds_ft_under05,odds_ft_under15,odds_ft_under25,odds_ft_under35,odds_ft_under45,odds_btts_yes,odds_btts_no,odds_team_a_cs_yes,odds_team_a_cs_no,odds_team_b_cs_yes,odds_team_b_cs_no,odds_doublechance_1x,odds_doublechance_12,odds_doublechance_x2,odds_1st_half_result_1,odds_1st_half_result_x,odds_1st_half_result_2,odds_2nd_half_result_1,odds_2nd_half_result_x,odds_2nd_half_result_2,odds_dnb_1,odds_dnb_2,odds_corners_over_75,odds_corners_over_85,odds_corners_over_95,odds_corners_over_105,odds_corners_over_115,odds_corners_under_75,odds_corners_under_85,odds_corners_under_95,odds_corners_under_105,odds_corners_under_115,odds_corners_1,odds_corners_x,odds_corners_2,odds_team_to_score_first_1,odds_team_to_score_first_x,odds_team_to_score_first_2,odds_win_to_nil_1,odds_win_to_nil_2,odds_1st_half_over05,odds_1st_half_over15,odds_1st_half_over25,odds_1st_half_over35,odds_1st_half_under05,odds_1st_half_under15,odds_1st_half_under25,odds_1st_half_under35,odds_2nd_half_over05,odds_2nd_half_over15,odds_2nd_half_over25,odds_2nd_half_over35,odds_2nd_half_under05,odds_2nd_half_under15,odds_2nd_half_under25,odds_2nd_half_under35,odds_btts_1st_half_yes,odds_btts_1st_half_no,odds_btts_2nd_half_yes,odds_btts_2nd_half_no,overallGoalCount,ht_goals_team_a,ht_goals_team_b,goals_2hg_team_a,goals_2hg_team_b,GoalCount_2hg,HTGoalCount,date_unix,winningTeam,no_home_away,btts_potential,btts_fhg_potential,btts_2hg_potential,goalTimingDisabled,attendance,corner_timings_recorded,card_timings_recorded,team_a_fh_corners,team_b_fh_corners,team_a_2h_corners,team_b_2h_corners,corner_fh_count,corner_2h_count,team_a_fh_cards,team_b_fh_cards,team_a_2h_cards,team_b_2h_cards,total_fh_cards,total_2h_cards,attacks_recorded,team_a_dangerous_attacks,team_b_dangerous_attacks,team_a_attacks,team_b_attacks,team_a_xg,team_b_xg,total_xg,team_a_penalties_won,team_b_penalties_won,team_a_penalty_goals,team_b_penalty_goals,team_a_penalty_missed,team_b_penalty_missed,pens_recorded,goal_timings_recorded,team_a_0_10_min_goals,team_b_0_10_min_goals,team_a_corners_0_10_min,team_b_corners_0_10_min,team_a_cards_0_10_min,team_b_cards_0_10_min,throwins_recorded,team_a_throwins,team_b_throwins,freekicks_recorded,team_a_freekicks,team_b_freekicks,goalkicks_recorded,team_a_goalkicks,team_b_goalkicks,o45_potential,o35_potential,o25_potential,o15_potential,o05_potential,o15HT_potential,o05HT_potential,o05_2H_potential,o15_2H_potential,corners_potential,offsides_potential,cards_potential,avg_potential,home_url,home_image,home_name,away_url,away_image,away_name,home_ppg,away_ppg,pre_match_home_ppg,pre_match_away_ppg,pre_match_teamA_overall_ppg,pre_match_teamB_overall_ppg,u45_potential,u35_potential,u25_potential,u15_potential,u05_potential,corners_o85_potential,corners_o95_potential,corners_o105_potential,team_a_xg_prematch,team_b_xg_prematch,total_xg_prematch,match_url,competition_id,matches_completed_minimum,over05,over15,over25,over35,over45,over55,btts,homeGoals_timings,awayGoals_timings
0,2155,150,108,2016/2017,complete,19,1,-1,"['45+1', '57']",['47'],2,1,3,5,3,8,1,0,2,2,0,0,4,6,7,9,11,15,7,17,50,50,685.0,196.0,160.0,KCOM Stadium (Hull),,2,2,3.41,3.19,2.39,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,3,1,0,1,1,2,1,1471087800,150,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,2,2,0,2,2,-1,0,0,0,0,0.0,0.0,0.0,0,1,0,1,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/hull-city-afc-150,teams/england-hull-city-afc.png,Hull City,/clubs/leicester-city-fc-108,teams/england-leicester-city-fc.png,Leicester City,1.47,0.53,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/leicester-city-fc-vs-hull-city-afc-h2...,9,38,True,True,True,False,False,False,True,"['45+1', '57']",['47']
1,2156,145,154,2016/2017,complete,19,1,-1,[],['82'],0,1,1,7,4,11,2,2,3,2,0,0,6,8,6,7,12,15,10,14,55,45,688.0,197.0,198.0,Turf Moor (Burnley),,3,2,2.45,3.22,3.26,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,1,0,0,0,1,1,0,1471096800,154,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,1,0,2,2,1,4,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/burnley-fc-145,teams/england-burnley-fc.png,Burnley,/clubs/swansea-city-afc-154,teams/wales-swansea-city-afc.png,Swansea City,1.74,0.74,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/burnley-fc-vs-swansea-city-afc-h2h-st...,9,38,True,False,False,False,False,False,False,[],['82']
2,2157,143,142,2016/2017,complete,19,1,-1,[],['74'],0,1,1,3,6,9,0,2,2,2,0,0,7,6,5,6,12,12,12,14,54,46,360.0,199.0,200.0,Selhurst Park (London),,2,2,2.2,3.25,3.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,1,0,0,0,1,1,0,1471096800,142,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,1,2,1,1,3,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/crystal-palace-fc-143,teams/england-crystal-palace-fc.png,Crystal Palace,/clubs/west-bromwich-albion-fc-142,teams/england-west-bromwich-albion-fc.png,West Bromwich Albion,1.05,0.84,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/west-bromwich-albion-fc-vs-crystal-pa...,9,38,True,False,False,False,False,False,False,[],['74']
3,2158,144,92,2016/2017,complete,19,1,-1,['5'],['59'],1,1,2,5,6,11,4,0,0,0,0,0,7,4,3,2,10,6,10,14,41,59,537.0,201.0,156.0,Goodison Park (Liverpool),,0,0,3.13,3.36,2.45,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,2,1,0,0,1,1,1,1471096800,-1,0,0,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/everton-fc-144,teams/england-everton-fc.png,Everton,/clubs/tottenham-hotspur-fc-92,teams/england-tottenham-hotspur-fc.png,Tottenham Hotspur,2.26,1.74,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/tottenham-hotspur-fc-vs-everton-fc-h2...,9,38,True,True,False,False,False,False,True,['5'],['59']
4,2159,147,141,2016/2017,complete,19,1,-1,['11'],['67'],1,1,2,9,6,15,1,3,3,5,0,0,3,2,5,8,8,10,18,13,46,54,693.0,202.0,203.0,Riverside Stadium (Middlesbrough),,3,5,2.49,3.2,3.21,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,2,1,0,0,1,1,1,1471096800,-1,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,2,2,1,3,4,4,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/middlesbrough-fc-147,teams/england-middlesbrough-fc.png,Middlesbrough,/clubs/stoke-city-fc-141,teams/england-stoke-city-fc.png,Stoke City,0.95,0.89,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/stoke-city-fc-vs-middlesbrough-fc-h2h...,9,38,True,True,False,False,False,False,True,['11'],['67']


In [6]:
# inspect the last rows
data.tail()

Unnamed: 0,id,homeID,awayID,season,status,roundID,game_week,revised_game_week,homeGoals,awayGoals,homeGoalCount,awayGoalCount,totalGoalCount,team_a_corners,team_b_corners,totalCornerCount,team_a_offsides,team_b_offsides,team_a_yellow_cards,team_b_yellow_cards,team_a_red_cards,team_b_red_cards,team_a_shotsOnTarget,team_b_shotsOnTarget,team_a_shotsOffTarget,team_b_shotsOffTarget,team_a_shots,team_b_shots,team_a_fouls,team_b_fouls,team_a_possession,team_b_possession,refereeID,coach_a_ID,coach_b_ID,stadium_name,stadium_location,team_a_cards_num,team_b_cards_num,odds_ft_1,odds_ft_x,odds_ft_2,odds_ft_over05,odds_ft_over15,odds_ft_over25,odds_ft_over35,odds_ft_over45,odds_ft_under05,odds_ft_under15,odds_ft_under25,odds_ft_under35,odds_ft_under45,odds_btts_yes,odds_btts_no,odds_team_a_cs_yes,odds_team_a_cs_no,odds_team_b_cs_yes,odds_team_b_cs_no,odds_doublechance_1x,odds_doublechance_12,odds_doublechance_x2,odds_1st_half_result_1,odds_1st_half_result_x,odds_1st_half_result_2,odds_2nd_half_result_1,odds_2nd_half_result_x,odds_2nd_half_result_2,odds_dnb_1,odds_dnb_2,odds_corners_over_75,odds_corners_over_85,odds_corners_over_95,odds_corners_over_105,odds_corners_over_115,odds_corners_under_75,odds_corners_under_85,odds_corners_under_95,odds_corners_under_105,odds_corners_under_115,odds_corners_1,odds_corners_x,odds_corners_2,odds_team_to_score_first_1,odds_team_to_score_first_x,odds_team_to_score_first_2,odds_win_to_nil_1,odds_win_to_nil_2,odds_1st_half_over05,odds_1st_half_over15,odds_1st_half_over25,odds_1st_half_over35,odds_1st_half_under05,odds_1st_half_under15,odds_1st_half_under25,odds_1st_half_under35,odds_2nd_half_over05,odds_2nd_half_over15,odds_2nd_half_over25,odds_2nd_half_over35,odds_2nd_half_under05,odds_2nd_half_under15,odds_2nd_half_under25,odds_2nd_half_under35,odds_btts_1st_half_yes,odds_btts_1st_half_no,odds_btts_2nd_half_yes,odds_btts_2nd_half_no,overallGoalCount,ht_goals_team_a,ht_goals_team_b,goals_2hg_team_a,goals_2hg_team_b,GoalCount_2hg,HTGoalCount,date_unix,winningTeam,no_home_away,btts_potential,btts_fhg_potential,btts_2hg_potential,goalTimingDisabled,attendance,corner_timings_recorded,card_timings_recorded,team_a_fh_corners,team_b_fh_corners,team_a_2h_corners,team_b_2h_corners,corner_fh_count,corner_2h_count,team_a_fh_cards,team_b_fh_cards,team_a_2h_cards,team_b_2h_cards,total_fh_cards,total_2h_cards,attacks_recorded,team_a_dangerous_attacks,team_b_dangerous_attacks,team_a_attacks,team_b_attacks,team_a_xg,team_b_xg,total_xg,team_a_penalties_won,team_b_penalties_won,team_a_penalty_goals,team_b_penalty_goals,team_a_penalty_missed,team_b_penalty_missed,pens_recorded,goal_timings_recorded,team_a_0_10_min_goals,team_b_0_10_min_goals,team_a_corners_0_10_min,team_b_corners_0_10_min,team_a_cards_0_10_min,team_b_cards_0_10_min,throwins_recorded,team_a_throwins,team_b_throwins,freekicks_recorded,team_a_freekicks,team_b_freekicks,goalkicks_recorded,team_a_goalkicks,team_b_goalkicks,o45_potential,o35_potential,o25_potential,o15_potential,o05_potential,o15HT_potential,o05HT_potential,o05_2H_potential,o15_2H_potential,corners_potential,offsides_potential,cards_potential,avg_potential,home_url,home_image,home_name,away_url,away_image,away_name,home_ppg,away_ppg,pre_match_home_ppg,pre_match_away_ppg,pre_match_teamA_overall_ppg,pre_match_teamB_overall_ppg,u45_potential,u35_potential,u25_potential,u15_potential,u05_potential,corners_o85_potential,corners_o95_potential,corners_o105_potential,team_a_xg_prematch,team_b_xg_prematch,total_xg_prematch,match_url,competition_id,matches_completed_minimum,over05,over15,over25,over35,over45,over55,btts,homeGoals_timings,awayGoals_timings
6455,6689328,143,158,2023/2024,incomplete,100543,38,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Selhurst Park (London),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1716130800,-1,0,65,25,35,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,20,35,60,80,90,40,60,90,50,12.2,1.8,4.9,2.9,/clubs/crystal-palace-fc-143,teams/england-crystal-palace-fc.png,Crystal Palace,/clubs/aston-villa-fc-158,teams/england-aston-villa-fc.png,Aston Villa,0.9,1.4,0.9,1.4,1.05,2.1,80,65,40,20,10,85,75,55,1.3,1.48,2.78,/england/crystal-palace-fc-vs-aston-villa-fc-h...,9660,20,False,False,False,False,False,False,False,[],[]
6456,6689329,151,223,2023/2024,incomplete,100543,38,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Anfield (Liverpool),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1716130800,-1,0,60,30,30,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,25,35,70,85,95,50,60,90,50,11.0,4.2,4.3,3.25,/clubs/liverpool-fc-151,teams/england-liverpool-fc.png,Liverpool,/clubs/wolverhampton-wanderers-fc-223,teams/england-wolverhampton-wanderers-fc.png,Wolverhampton Wanderers,2.6,1.0,2.6,1.0,2.25,1.4,75,65,30,15,5,70,65,50,2.28,1.27,3.55,/england/liverpool-fc-vs-wolverhampton-wandere...,9660,20,False,False,False,False,False,False,False,[],[]
6457,6689330,271,162,2023/2024,incomplete,100543,38,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,"Kenilworth Road (Luton, Bedfordshire)",,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1716130800,-1,0,65,15,50,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,20,30,60,80,95,25,65,90,70,9.5,3.7,5.0,3.1,/clubs/luton-town-fc-271,teams/england-luton-town-fc.png,Luton Town,/clubs/fulham-fc-162,teams/england-fulham-fc.png,Fulham,0.8,0.6,0.8,0.6,0.79,1.2,80,70,40,20,5,75,55,50,1.36,1.11,2.47,/england/fulham-fc-vs-luton-town-fc-h2h-stats,9660,19,False,False,False,False,False,False,False,[],[]
6458,6689331,93,153,2023/2024,incomplete,100543,38,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Etihad Stadium (Manchester),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1716130800,-1,0,74,21,42,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,32,47,68,95,100,43,90,89,67,11.71,3.43,4.31,3.64,/clubs/manchester-city-fc-93,teams/england-manchester-city-fc.png,Manchester City,/clubs/west-ham-united-fc-153,teams/england-west-ham-united-fc.png,West Ham United,2.33,1.6,2.33,1.6,2.11,1.7,69,53,32,6,0,79,63,63,1.67,1.21,2.88,/england/manchester-city-fc-vs-west-ham-united...,9660,19,False,False,False,False,False,False,False,[],[]
6459,6689332,251,92,2023/2024,incomplete,100543,38,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Bramall Lane (Sheffield),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1716130800,-1,0,65,30,45,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,25,45,70,85,100,45,75,90,55,8.4,3.6,6.6,3.6,/clubs/sheffield-united-fc-251,teams/england-sheffield-united-fc.png,Sheffield United,/clubs/tottenham-hotspur-fc-92,teams/england-tottenham-hotspur-fc.png,Tottenham Hotspur,0.7,1.8,0.7,1.8,0.45,1.95,75,55,30,15,0,80,70,55,1.14,1.47,2.61,/england/tottenham-hotspur-fc-vs-sheffield-uni...,9660,20,False,False,False,False,False,False,False,[],[]


In [7]:
# inspect random sample of rows
data.sample(5)

Unnamed: 0,id,homeID,awayID,season,status,roundID,game_week,revised_game_week,homeGoals,awayGoals,homeGoalCount,awayGoalCount,totalGoalCount,team_a_corners,team_b_corners,totalCornerCount,team_a_offsides,team_b_offsides,team_a_yellow_cards,team_b_yellow_cards,team_a_red_cards,team_b_red_cards,team_a_shotsOnTarget,team_b_shotsOnTarget,team_a_shotsOffTarget,team_b_shotsOffTarget,team_a_shots,team_b_shots,team_a_fouls,team_b_fouls,team_a_possession,team_b_possession,refereeID,coach_a_ID,coach_b_ID,stadium_name,stadium_location,team_a_cards_num,team_b_cards_num,odds_ft_1,odds_ft_x,odds_ft_2,odds_ft_over05,odds_ft_over15,odds_ft_over25,odds_ft_over35,odds_ft_over45,odds_ft_under05,odds_ft_under15,odds_ft_under25,odds_ft_under35,odds_ft_under45,odds_btts_yes,odds_btts_no,odds_team_a_cs_yes,odds_team_a_cs_no,odds_team_b_cs_yes,odds_team_b_cs_no,odds_doublechance_1x,odds_doublechance_12,odds_doublechance_x2,odds_1st_half_result_1,odds_1st_half_result_x,odds_1st_half_result_2,odds_2nd_half_result_1,odds_2nd_half_result_x,odds_2nd_half_result_2,odds_dnb_1,odds_dnb_2,odds_corners_over_75,odds_corners_over_85,odds_corners_over_95,odds_corners_over_105,odds_corners_over_115,odds_corners_under_75,odds_corners_under_85,odds_corners_under_95,odds_corners_under_105,odds_corners_under_115,odds_corners_1,odds_corners_x,odds_corners_2,odds_team_to_score_first_1,odds_team_to_score_first_x,odds_team_to_score_first_2,odds_win_to_nil_1,odds_win_to_nil_2,odds_1st_half_over05,odds_1st_half_over15,odds_1st_half_over25,odds_1st_half_over35,odds_1st_half_under05,odds_1st_half_under15,odds_1st_half_under25,odds_1st_half_under35,odds_2nd_half_over05,odds_2nd_half_over15,odds_2nd_half_over25,odds_2nd_half_over35,odds_2nd_half_under05,odds_2nd_half_under15,odds_2nd_half_under25,odds_2nd_half_under35,odds_btts_1st_half_yes,odds_btts_1st_half_no,odds_btts_2nd_half_yes,odds_btts_2nd_half_no,overallGoalCount,ht_goals_team_a,ht_goals_team_b,goals_2hg_team_a,goals_2hg_team_b,GoalCount_2hg,HTGoalCount,date_unix,winningTeam,no_home_away,btts_potential,btts_fhg_potential,btts_2hg_potential,goalTimingDisabled,attendance,corner_timings_recorded,card_timings_recorded,team_a_fh_corners,team_b_fh_corners,team_a_2h_corners,team_b_2h_corners,corner_fh_count,corner_2h_count,team_a_fh_cards,team_b_fh_cards,team_a_2h_cards,team_b_2h_cards,total_fh_cards,total_2h_cards,attacks_recorded,team_a_dangerous_attacks,team_b_dangerous_attacks,team_a_attacks,team_b_attacks,team_a_xg,team_b_xg,total_xg,team_a_penalties_won,team_b_penalties_won,team_a_penalty_goals,team_b_penalty_goals,team_a_penalty_missed,team_b_penalty_missed,pens_recorded,goal_timings_recorded,team_a_0_10_min_goals,team_b_0_10_min_goals,team_a_corners_0_10_min,team_b_corners_0_10_min,team_a_cards_0_10_min,team_b_cards_0_10_min,throwins_recorded,team_a_throwins,team_b_throwins,freekicks_recorded,team_a_freekicks,team_b_freekicks,goalkicks_recorded,team_a_goalkicks,team_b_goalkicks,o45_potential,o35_potential,o25_potential,o15_potential,o05_potential,o15HT_potential,o05HT_potential,o05_2H_potential,o15_2H_potential,corners_potential,offsides_potential,cards_potential,avg_potential,home_url,home_image,home_name,away_url,away_image,away_name,home_ppg,away_ppg,pre_match_home_ppg,pre_match_away_ppg,pre_match_teamA_overall_ppg,pre_match_teamB_overall_ppg,u45_potential,u35_potential,u25_potential,u15_potential,u05_potential,corners_o85_potential,corners_o95_potential,corners_o105_potential,team_a_xg_prematch,team_b_xg_prematch,total_xg_prematch,match_url,competition_id,matches_completed_minimum,over05,over15,over25,over35,over45,over55,btts,homeGoals_timings,awayGoals_timings
221,2376,59,155,2016/2017,complete,19,23,-1,['58'],"['10', '13']",1,2,3,9,5,14,2,1,3,3,0,0,7,6,7,2,14,8,10,15,63,37,,145.0,205.0,Emirates Stadium,"Queensland Road, London",3,3,1.22,6.87,9.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,3,0,2,1,0,1,2,1485891900,155,0,55,27,27,0,60035,1,1,3,3,6,2,6,8,2,0,1,3,2,4,1,0,0,0,0,2.15,1.37,3.52,0,0,0,0,0,0,1,1,0,1,2,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,18,36,50,78,91,41,73,87,41,11.28,5.0,3.91,2.96,/clubs/arsenal-fc-59,teams/england-arsenal-fc.png,Arsenal,/clubs/watford-fc-155,teams/england-watford-fc.png,Watford,2.37,0.63,2.36,0.82,2.14,1.09,82,64,50,23,9,78,59,50,2.28,1.48,3.76,/england/arsenal-fc-vs-watford-fc-h2h-stats#2376,9,38,True,True,True,False,False,False,True,['58'],"['10', '13']"
771,2926,152,108,2014/2015,complete,21,2,-1,"['62', '77']",[],2,0,2,8,7,15,2,2,0,1,0,0,9,5,10,1,19,6,11,12,62,38,361.0,208.0,233.0,Stamford Bridge (London),,0,1,1.26,6.4,9.99,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,2,0,0,2,0,2,0,1408802400,152,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,0,0,1,0,1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/chelsea-fc-152,teams/england-chelsea-fc.png,Chelsea,/clubs/leicester-city-fc-108,teams/england-leicester-city-fc.png,Leicester City,2.58,0.79,0.0,0.0,3.0,1.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/leicester-city-fc-vs-chelsea-fc-h2h-s...,11,38,True,True,False,False,False,False,False,"['62', '77']",[]
3349,751798,154,157,2011/2012,complete,53591,32,-1,[],"['5', '69']",0,2,2,-1,-1,-1,-1,-1,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,29014.0,37.0,199.0,Liberty Stadium (Swansea),,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,2,0,1,0,1,1,1,1333726200,157,0,54,14,27,0,19874,-1,1,-1,-1,-1,-1,-1,-1,0,1,1,0,1,1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,1,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,17,34,44,77,87,33,63,83,40,0.0,0.0,3.57,2.67,/clubs/swansea-city-afc-154,teams/wales-swansea-city-afc.png,Swansea City,/clubs/newcastle-united-fc-157,teams/england-newcastle-united-fc.png,Newcastle United,1.63,1.42,1.6,1.4,1.26,1.71,84,67,57,23,14,0,0,0,0.0,0.0,0.0,/england/swansea-city-afc-vs-newcastle-united-...,3119,38,True,True,False,False,False,False,False,[],"['5', '69']"
3048,751497,93,154,2011/2012,complete,53591,1,-1,"['57', '68', '71', '90+1']",[],4,0,4,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,685.0,1097.0,37.0,Etihad Stadium (Manchester),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,4,0,0,4,0,4,0,1313434800,93,0,0,0,0,0,46802,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,/clubs/manchester-city-fc-93,teams/england-manchester-city-fc.png,Manchester City,/clubs/swansea-city-afc-154,teams/wales-swansea-city-afc.png,Swansea City,2.89,0.84,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/manchester-city-fc-vs-swansea-city-af...,3119,38,True,True,True,True,False,False,False,"['57', '68', '71', '90+1']",[]
1359,3514,142,144,2013/2014,complete,22,22,-1,['75'],['41'],1,1,2,4,2,6,1,2,3,2,0,0,6,5,1,5,7,10,18,12,49,51,,257.0,221.0,The Hawthorns (West Bromwich),"Halfords Lane, West Bromwich",3,2,3.4,3.48,2.28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,2,0,1,1,0,1,1,1390248000,-1,0,50,5,30,0,24184,-1,1,-1,-1,-1,-1,-1,-1,2,0,1,2,2,3,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,25,35,75,90,15,50,90,45,10.7,4.9,2.7,2.35,/clubs/west-bromwich-albion-fc-142,teams/england-west-bromwich-albion-fc.png,West Bromwich Albion,/clubs/everton-fc-144,teams/england-everton-fc.png,Everton,1.11,1.58,1.2,1.7,1.0,1.95,90,75,65,25,10,65,55,50,0.0,0.0,0.0,/england/west-bromwich-albion-fc-vs-everton-fc...,12,38,True,True,False,False,False,False,True,['75'],['41']


### Columns of interest:
This were are columns that were identified to contain interesting, problematic or missing data and warranted further inspection.

#### Status Column
The first column of interest is the status colum which indicates the whether a match was completed. Looking at the head, tail and random sample snapshots of the df, the status of a match can be either complete or incomplete. Let's inspect unique values in the column:

In [8]:
# check unique values in the status column
data['status'].unique()

array(['complete', 'incomplete', 'suspended'], dtype=object)

Based on the output above, match status can `complete`, `incomplete`, or `suspended`. The dataframe snapshots, particularly the tail, shows that most of the incomplete matches are from the 2023/2024 season. This makes sense because, the 2023/2024 is ongoing and there are bound to be incomplete matches. For our analysis we will want to drop entries where the match status. 

In [9]:
# Check list of incomplete matches
data[data['status'] == 'incomplete' ].sample(10)

Unnamed: 0,id,homeID,awayID,season,status,roundID,game_week,revised_game_week,homeGoals,awayGoals,homeGoalCount,awayGoalCount,totalGoalCount,team_a_corners,team_b_corners,totalCornerCount,team_a_offsides,team_b_offsides,team_a_yellow_cards,team_b_yellow_cards,team_a_red_cards,team_b_red_cards,team_a_shotsOnTarget,team_b_shotsOnTarget,team_a_shotsOffTarget,team_b_shotsOffTarget,team_a_shots,team_b_shots,team_a_fouls,team_b_fouls,team_a_possession,team_b_possession,refereeID,coach_a_ID,coach_b_ID,stadium_name,stadium_location,team_a_cards_num,team_b_cards_num,odds_ft_1,odds_ft_x,odds_ft_2,odds_ft_over05,odds_ft_over15,odds_ft_over25,odds_ft_over35,odds_ft_over45,odds_ft_under05,odds_ft_under15,odds_ft_under25,odds_ft_under35,odds_ft_under45,odds_btts_yes,odds_btts_no,odds_team_a_cs_yes,odds_team_a_cs_no,odds_team_b_cs_yes,odds_team_b_cs_no,odds_doublechance_1x,odds_doublechance_12,odds_doublechance_x2,odds_1st_half_result_1,odds_1st_half_result_x,odds_1st_half_result_2,odds_2nd_half_result_1,odds_2nd_half_result_x,odds_2nd_half_result_2,odds_dnb_1,odds_dnb_2,odds_corners_over_75,odds_corners_over_85,odds_corners_over_95,odds_corners_over_105,odds_corners_over_115,odds_corners_under_75,odds_corners_under_85,odds_corners_under_95,odds_corners_under_105,odds_corners_under_115,odds_corners_1,odds_corners_x,odds_corners_2,odds_team_to_score_first_1,odds_team_to_score_first_x,odds_team_to_score_first_2,odds_win_to_nil_1,odds_win_to_nil_2,odds_1st_half_over05,odds_1st_half_over15,odds_1st_half_over25,odds_1st_half_over35,odds_1st_half_under05,odds_1st_half_under15,odds_1st_half_under25,odds_1st_half_under35,odds_2nd_half_over05,odds_2nd_half_over15,odds_2nd_half_over25,odds_2nd_half_over35,odds_2nd_half_under05,odds_2nd_half_under15,odds_2nd_half_under25,odds_2nd_half_under35,odds_btts_1st_half_yes,odds_btts_1st_half_no,odds_btts_2nd_half_yes,odds_btts_2nd_half_no,overallGoalCount,ht_goals_team_a,ht_goals_team_b,goals_2hg_team_a,goals_2hg_team_b,GoalCount_2hg,HTGoalCount,date_unix,winningTeam,no_home_away,btts_potential,btts_fhg_potential,btts_2hg_potential,goalTimingDisabled,attendance,corner_timings_recorded,card_timings_recorded,team_a_fh_corners,team_b_fh_corners,team_a_2h_corners,team_b_2h_corners,corner_fh_count,corner_2h_count,team_a_fh_cards,team_b_fh_cards,team_a_2h_cards,team_b_2h_cards,total_fh_cards,total_2h_cards,attacks_recorded,team_a_dangerous_attacks,team_b_dangerous_attacks,team_a_attacks,team_b_attacks,team_a_xg,team_b_xg,total_xg,team_a_penalties_won,team_b_penalties_won,team_a_penalty_goals,team_b_penalty_goals,team_a_penalty_missed,team_b_penalty_missed,pens_recorded,goal_timings_recorded,team_a_0_10_min_goals,team_b_0_10_min_goals,team_a_corners_0_10_min,team_b_corners_0_10_min,team_a_cards_0_10_min,team_b_cards_0_10_min,throwins_recorded,team_a_throwins,team_b_throwins,freekicks_recorded,team_a_freekicks,team_b_freekicks,goalkicks_recorded,team_a_goalkicks,team_b_goalkicks,o45_potential,o35_potential,o25_potential,o15_potential,o05_potential,o15HT_potential,o05HT_potential,o05_2H_potential,o15_2H_potential,corners_potential,offsides_potential,cards_potential,avg_potential,home_url,home_image,home_name,away_url,away_image,away_name,home_ppg,away_ppg,pre_match_home_ppg,pre_match_away_ppg,pre_match_teamA_overall_ppg,pre_match_teamB_overall_ppg,u45_potential,u35_potential,u25_potential,u15_potential,u05_potential,corners_o85_potential,corners_o95_potential,corners_o105_potential,team_a_xg_prematch,team_b_xg_prematch,total_xg_prematch,match_url,competition_id,matches_completed_minimum,over05,over15,over25,over35,over45,over55,btts,homeGoals_timings,awayGoals_timings
6427,6689300,92,59,2023/2024,incomplete,100543,35,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Tottenham Hotspur Stadium (London),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1714226400,-1,0,60,30,25,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,15,30,60,75,100,45,70,90,45,13.3,4.8,4.6,2.9,/clubs/tottenham-hotspur-fc-92,teams/england-tottenham-hotspur-fc.png,Tottenham Hotspur,/clubs/arsenal-fc-59,teams/england-arsenal-fc.png,Arsenal,2.1,1.7,2.1,1.7,1.95,2.0,85,70,40,25,0,85,55,55,2.03,1.49,3.52,/england/arsenal-fc-vs-tottenham-hotspur-fc-h2...,9660,20,False,False,False,False,False,False,False,[],[]
6295,6689168,153,148,2023/2024,incomplete,100543,22,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,London Stadium (London),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1706815800,-1,0,60,25,35,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,15,55,70,90,95,50,80,90,60,10.0,2.8,4.6,3.35,/clubs/west-ham-united-fc-153,teams/england-west-ham-united-fc.png,West Ham United,/clubs/afc-bournemouth-148,teams/england-afc-bournemouth.png,AFC Bournemouth,1.8,1.3,1.8,1.3,1.7,1.32,85,45,30,10,5,65,65,65,1.32,1.5,2.82,/england/afc-bournemouth-vs-west-ham-united-fc...,9660,19,False,False,False,False,False,False,False,[],[]
6410,6689283,158,148,2023/2024,incomplete,100543,34,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Villa Park (Birmingham),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1713621600,-1,0,70,20,60,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,25,70,80,90,100,50,80,95,75,10.7,2.0,5.1,3.85,/clubs/aston-villa-fc-158,teams/england-aston-villa-fc.png,Aston Villa,/clubs/afc-bournemouth-148,teams/england-afc-bournemouth.png,AFC Bournemouth,2.8,1.3,2.8,1.3,2.1,1.32,75,30,20,10,0,60,60,50,1.63,1.5,3.13,/england/afc-bournemouth-vs-aston-villa-fc-h2h...,9660,19,False,False,False,False,False,False,False,[],[]
6334,6689207,152,92,2023/2024,incomplete,100543,26,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Stamford Bridge (London),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1708718400,-1,0,60,40,40,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,25,40,60,85,100,45,70,90,60,9.8,4.3,6.3,3.45,/clubs/chelsea-fc-152,teams/england-chelsea-fc.png,Chelsea,/clubs/tottenham-hotspur-fc-92,teams/england-tottenham-hotspur-fc.png,Tottenham Hotspur,1.5,1.8,1.5,1.8,1.4,1.95,75,60,40,15,0,75,65,55,1.51,1.47,2.98,/england/tottenham-hotspur-fc-vs-chelsea-fc-h2...,9660,20,False,False,False,False,False,False,False,[],[]
6324,6689197,271,149,2023/2024,incomplete,100543,25,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,"Kenilworth Road (Luton, Bedfordshire)",,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1708273800,-1,0,55,15,40,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,15,50,70,95,20,50,85,65,10.0,3.9,4.8,2.5,/clubs/luton-town-fc-271,teams/england-luton-town-fc.png,Luton Town,/clubs/manchester-united-fc-149,teams/england-manchester-united-fc.png,Manchester United,0.8,1.3,0.8,1.3,0.79,1.55,90,85,50,30,5,95,70,65,1.36,1.22,2.58,/england/manchester-united-fc-vs-luton-town-fc...,9660,19,False,False,False,False,False,False,False,[],[]
6448,6689321,153,271,2023/2024,incomplete,100543,37,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,London Stadium (London),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1715436000,-1,0,59,16,38,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,16,48,64,79,95,26,63,85,59,8.61,3.48,4.06,3.02,/clubs/west-ham-united-fc-153,teams/england-west-ham-united-fc.png,West Ham United,/clubs/luton-town-fc-271,teams/england-luton-town-fc.png,Luton Town,1.8,0.78,1.8,0.78,1.7,0.79,84,52,36,21,5,64,59,47,1.32,0.94,2.26,/england/west-ham-united-fc-vs-luton-town-fc-h...,9660,19,False,False,False,False,False,False,False,[],[]
6422,6689295,144,218,2023/2024,incomplete,100543,35,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,Goodison Park (Liverpool),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1714226400,-1,0,37,16,6,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,11,53,74,100,21,59,90,48,9.88,3.03,4.66,2.37,/clubs/everton-fc-144,teams/england-everton-fc.png,Everton,/clubs/brentford-fc-218,teams/england-brentford-fc.png,Brentford,1.0,0.78,1.0,0.78,1.3,1.0,100,90,47,26,0,80,75,64,1.67,1.33,3.0,/england/everton-fc-vs-brentford-fc-h2h-stats,9660,19,False,False,False,False,False,False,False,[],[]
6449,6689322,223,143,2023/2024,incomplete,100543,37,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,"Molineux Stadium (Wolverhampton, West Midlands)",,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1715436000,-1,0,70,15,35,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,5,30,60,85,100,20,75,85,50,8.9,3.9,4.5,2.8,/clubs/wolverhampton-wanderers-fc-223,teams/england-wolverhampton-wanderers-fc.png,Wolverhampton Wanderers,/clubs/crystal-palace-fc-143,teams/england-crystal-palace-fc.png,Crystal Palace,1.8,1.2,1.8,1.2,1.4,1.05,95,70,40,15,0,55,50,45,1.23,1.19,2.42,/england/crystal-palace-fc-vs-wolverhampton-wa...,9660,20,False,False,False,False,False,False,False,[],[]
6372,6689245,218,149,2023/2024,incomplete,100543,30,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,"Gtech Community Stadium (Brentford, Middlesex)",,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1711810800,-1,0,55,35,15,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,35,60,75,95,35,60,85,55,8.7,4.3,4.7,2.75,/clubs/brentford-fc-218,teams/england-brentford-fc.png,Brentford,/clubs/manchester-united-fc-149,teams/england-manchester-united-fc.png,Manchester United,1.2,1.3,1.2,1.3,1.0,1.55,90,65,40,25,5,80,55,45,1.49,1.22,2.71,/england/manchester-united-fc-vs-brentford-fc-...,9660,19,False,False,False,False,False,False,False,[],[]
6350,6689223,148,251,2023/2024,incomplete,100543,28,-1,[],[],0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,-1.0,-1.0,"Vitality Stadium (Bournemouth, Dorset)",,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,1709996400,-1,0,47,11,31,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,10,26,53,95,95,27,64,90,69,8.89,3.57,4.98,2.78,/clubs/afc-bournemouth-148,teams/england-afc-bournemouth.png,AFC Bournemouth,/clubs/sheffield-united-fc-251,teams/england-sheffield-united-fc.png,Sheffield United,1.33,0.2,1.33,0.2,1.32,0.45,90,74,47,6,6,79,74,69,1.41,0.8,2.21,/england/afc-bournemouth-vs-sheffield-united-f...,9660,19,False,False,False,False,False,False,False,[],[]


In [10]:
# Check seasons from which matches are incomplete
data[data['status'] == 'incomplete' ]['season'].unique()

array(['2023/2024'], dtype=object)

In [11]:
# Check list of incomplete matches
data[data['status'] == 'suspended' ]['season'].unique()

array(['2023/2024'], dtype=object)

The above inspection confirms that all matches with the `incomplete` or `suspended` status are the current season. 

#### game_week and revised_game_week columns
The `revised_game_week` column indicates indicates whether the game-week (the week of the season when the match was played) was revised or not. We observe that for inspected sections of the dataframe, no matches appear to have revised match weeks.

In [12]:
# check for unique values in the 'revised_game_week' column
data['revised_game_week'].unique()

array([-1], dtype=int64)

 -1 is the only unique value in the `revised_game_week` column. Thus this column does not provide any useful information and will be dropped.
 As mentioned before, the `game_week`, indicates the week of the season when the match was played. Typically, the EPL games run from from game-week 1 to game-week 38 (assuming a team plays a single EPL game every week - which is usually the case). 

In [13]:
# check for unique values in the 'game_week' column
data['game_week'].unique()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38,  0], dtype=int64)

The above output mostly confirms our that there are typically 38 game weeks in a season. However, there appears to be some entries where the column value is 0.

In [14]:
# check for entries where game_week == 0
data[data['game_week'] == 0]

Unnamed: 0,id,homeID,awayID,season,status,roundID,game_week,revised_game_week,homeGoals,awayGoals,homeGoalCount,awayGoalCount,totalGoalCount,team_a_corners,team_b_corners,totalCornerCount,team_a_offsides,team_b_offsides,team_a_yellow_cards,team_b_yellow_cards,team_a_red_cards,team_b_red_cards,team_a_shotsOnTarget,team_b_shotsOnTarget,team_a_shotsOffTarget,team_b_shotsOffTarget,team_a_shots,team_b_shots,team_a_fouls,team_b_fouls,team_a_possession,team_b_possession,refereeID,coach_a_ID,coach_b_ID,stadium_name,stadium_location,team_a_cards_num,team_b_cards_num,odds_ft_1,odds_ft_x,odds_ft_2,odds_ft_over05,odds_ft_over15,odds_ft_over25,odds_ft_over35,odds_ft_over45,odds_ft_under05,odds_ft_under15,odds_ft_under25,odds_ft_under35,odds_ft_under45,odds_btts_yes,odds_btts_no,odds_team_a_cs_yes,odds_team_a_cs_no,odds_team_b_cs_yes,odds_team_b_cs_no,odds_doublechance_1x,odds_doublechance_12,odds_doublechance_x2,odds_1st_half_result_1,odds_1st_half_result_x,odds_1st_half_result_2,odds_2nd_half_result_1,odds_2nd_half_result_x,odds_2nd_half_result_2,odds_dnb_1,odds_dnb_2,odds_corners_over_75,odds_corners_over_85,odds_corners_over_95,odds_corners_over_105,odds_corners_over_115,odds_corners_under_75,odds_corners_under_85,odds_corners_under_95,odds_corners_under_105,odds_corners_under_115,odds_corners_1,odds_corners_x,odds_corners_2,odds_team_to_score_first_1,odds_team_to_score_first_x,odds_team_to_score_first_2,odds_win_to_nil_1,odds_win_to_nil_2,odds_1st_half_over05,odds_1st_half_over15,odds_1st_half_over25,odds_1st_half_over35,odds_1st_half_under05,odds_1st_half_under15,odds_1st_half_under25,odds_1st_half_under35,odds_2nd_half_over05,odds_2nd_half_over15,odds_2nd_half_over25,odds_2nd_half_over35,odds_2nd_half_under05,odds_2nd_half_under15,odds_2nd_half_under25,odds_2nd_half_under35,odds_btts_1st_half_yes,odds_btts_1st_half_no,odds_btts_2nd_half_yes,odds_btts_2nd_half_no,overallGoalCount,ht_goals_team_a,ht_goals_team_b,goals_2hg_team_a,goals_2hg_team_b,GoalCount_2hg,HTGoalCount,date_unix,winningTeam,no_home_away,btts_potential,btts_fhg_potential,btts_2hg_potential,goalTimingDisabled,attendance,corner_timings_recorded,card_timings_recorded,team_a_fh_corners,team_b_fh_corners,team_a_2h_corners,team_b_2h_corners,corner_fh_count,corner_2h_count,team_a_fh_cards,team_b_fh_cards,team_a_2h_cards,team_b_2h_cards,total_fh_cards,total_2h_cards,attacks_recorded,team_a_dangerous_attacks,team_b_dangerous_attacks,team_a_attacks,team_b_attacks,team_a_xg,team_b_xg,total_xg,team_a_penalties_won,team_b_penalties_won,team_a_penalty_goals,team_b_penalty_goals,team_a_penalty_missed,team_b_penalty_missed,pens_recorded,goal_timings_recorded,team_a_0_10_min_goals,team_b_0_10_min_goals,team_a_corners_0_10_min,team_b_corners_0_10_min,team_a_cards_0_10_min,team_b_cards_0_10_min,throwins_recorded,team_a_throwins,team_b_throwins,freekicks_recorded,team_a_freekicks,team_b_freekicks,goalkicks_recorded,team_a_goalkicks,team_b_goalkicks,o45_potential,o35_potential,o25_potential,o15_potential,o05_potential,o15HT_potential,o05HT_potential,o05_2H_potential,o15_2H_potential,corners_potential,offsides_potential,cards_potential,avg_potential,home_url,home_image,home_name,away_url,away_image,away_name,home_ppg,away_ppg,pre_match_home_ppg,pre_match_away_ppg,pre_match_teamA_overall_ppg,pre_match_teamB_overall_ppg,u45_potential,u35_potential,u25_potential,u15_potential,u05_potential,corners_o85_potential,corners_o95_potential,corners_o105_potential,team_a_xg_prematch,team_b_xg_prematch,total_xg_prematch,match_url,competition_id,matches_completed_minimum,over05,over15,over25,over35,over45,over55,btts,homeGoals_timings,awayGoals_timings
4180,753648,59,142,2008/2009,complete,53612,0,-1,['4'],[],1,0,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,145.0,422.0,Emirates Stadium (London),,-1,-1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,1,1,0,0,0,0,1,1218887100,59,0,0,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.00,0.00,/clubs/arsenal-fc-59,teams/england-arsenal-fc.png,Arsenal,/clubs/west-bromwich-albion-fc-142,teams/england-west-bromwich-albion-fc.png,West Bromwich Albion,2.00,0.42,0.00,0.00,0.00,0.00,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/arsenal-fc-vs-west-bromwich-albion-fc...,3131,38,True,False,False,False,False,False,False,['4'],[]
4181,753649,153,221,2008/2009,complete,53612,0,-1,"['4', '10']",['48'],2,1,3,-1,-1,-1,-1,-1,2,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,32139.0,235.0,Boleyn Ground (London),"Green Street, London",2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,3,2,0,0,1,1,2,1218895200,153,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,1,0,1,1,1,2,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,2,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.00,0.00,/clubs/west-ham-united-fc-153,teams/england-west-ham-united-fc.png,West Ham United,/clubs/wigan-athletic-fc-221,teams/england-wigan-athletic-fc.png,Wigan Athletic,1.53,0.84,0.00,0.00,0.00,0.00,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/west-ham-united-fc-vs-wigan-athletic-...,3131,38,True,True,True,False,False,False,True,"['4', '10']",['48']
4182,753650,147,92,2008/2009,complete,53612,0,-1,"['71', '87']",['90'],2,1,3,-1,-1,-1,-1,-1,1,2,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,10655.0,598.0,Riverside Stadium (Middlesbrough),,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,3,0,0,2,1,3,0,1218895200,147,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,0,1,2,0,3,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.00,0.00,/clubs/middlesbrough-fc-147,teams/england-middlesbrough-fc.png,Middlesbrough,/clubs/tottenham-hotspur-fc-92,teams/england-tottenham-hotspur-fc.png,Tottenham Hotspur,1.26,0.84,0.00,0.00,0.00,0.00,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/tottenham-hotspur-fc-vs-middlesbrough...,3131,38,True,True,True,False,False,False,True,"['71', '87']",['90']
4183,753651,150,162,2008/2009,complete,53612,0,-1,"['23', '81']",['8'],2,1,3,-1,-1,-1,-1,-1,3,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,492.0,3610.0,KCOM Stadium (Hull),,3,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,3,1,1,1,0,1,2,1218895200,150,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,2,0,1,0,2,1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,1,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.00,0.00,/clubs/hull-city-afc-150,teams/england-hull-city-afc.png,Hull City,/clubs/fulham-fc-162,teams/england-fulham-fc.png,Fulham,0.74,0.89,0.00,0.00,0.00,0.00,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/hull-city-afc-vs-fulham-fc-h2h-stats#...,3131,38,True,True,True,False,False,False,True,"['23', '81']",['8']
4184,753652,144,216,2008/2009,complete,53612,0,-1,"['44', '64']","['21', '66', '90']",2,3,5,-1,-1,-1,-1,-1,2,2,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,206.0,21130.0,Goodison Park (Liverpool),,2,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,5,1,1,1,2,3,2,1218895200,216,0,0,0,0,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,1,2,1,1,3,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0.0,0.0,0.00,0.00,/clubs/everton-fc-144,teams/england-everton-fc.png,Everton,/clubs/blackburn-rovers-fc-216,teams/england-blackburn-rovers-fc.png,Blackburn Rovers,1.58,0.84,0.00,0.00,0.00,0.00,0,0,0,0,0,0,0,0,0.0,0.0,0.0,/england/everton-fc-vs-blackburn-rovers-fc-h2h...,3131,38,True,True,True,True,True,False,True,"['44', '64']","['21', '66', '90']"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4935,755509,153,158,2007/2008,complete,53621,0,-1,"['8', '88']","['14', '58']",2,2,4,-1,-1,-1,-1,-1,2,2,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,32139.0,2746.0,Boleyn Ground (London),"Green Street, London",2,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,4,1,1,1,1,2,2,1210514400,-1,0,61,17,28,0,-1,-1,1,-1,-1,-1,-1,-1,-1,2,1,0,1,3,1,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,1,0,-1,-1,2,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,9,28,56,86,97,28,78,86,48,0.0,0.0,3.29,2.94,/clubs/west-ham-united-fc-153,teams/england-west-ham-united-fc.png,West Ham United,/clubs/aston-villa-fc-158,teams/england-aston-villa-fc.png,Aston Villa,1.47,1.42,1.50,1.44,1.30,1.59,92,72,45,14,3,0,0,0,0.0,0.0,0.0,/england/west-ham-united-fc-vs-aston-villa-fc-...,3137,38,True,True,True,True,False,False,True,"['8', '88']","['14', '58']"
4936,755510,92,151,2007/2008,complete,53621,0,-1,[],"['69', '74']",0,2,2,-1,-1,-1,-1,-1,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,598.0,230.0,White Hart Lane (London),"Bill Nicholson Way, 748 High Road, London",1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,2,0,0,0,2,2,0,1210514400,151,0,58,17,33,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,0,1,1,0,2,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,17,39,58,84,89,28,75,86,56,0.0,0.0,2.72,3.20,/clubs/tottenham-hotspur-fc-92,teams/england-tottenham-hotspur-fc.png,Tottenham Hotspur,/clubs/liverpool-fc-151,teams/england-liverpool-fc.png,Liverpool,1.53,1.79,1.61,1.72,1.24,1.97,84,61,42,17,11,0,0,0,0.0,0.0,0.0,/england/tottenham-hotspur-fc-vs-liverpool-fc-...,3137,38,True,True,False,False,False,False,False,[],"['69', '74']"
4937,755511,156,59,2007/2008,complete,53621,0,-1,[],['24'],0,1,1,-1,-1,-1,-1,-1,1,1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,32140.0,145.0,Stadium of Light (Sunderland),,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,1,0,1,0,0,0,1,1210514400,59,0,64,17,31,0,-1,-1,1,-1,-1,-1,-1,-1,-1,0,0,1,1,0,2,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,12,28,53,81,95,28,73,84,39,0.0,0.0,3.97,2.75,/clubs/sunderland-afc-156,teams/england-sunderland-afc.png,Sunderland,/clubs/arsenal-fc-59,teams/england-arsenal-fc.png,Arsenal,1.58,1.89,1.67,1.83,1.05,2.16,89,72,47,20,6,0,0,0,0.0,0.0,0.0,/england/arsenal-fc-vs-sunderland-afc-h2h-stat...,3137,38,True,False,False,False,False,False,False,[],['24']
4938,755512,206,216,2007/2008,complete,53621,0,-1,"['32', '73', '90+1', '90+3']",['49'],4,1,5,-1,-1,-1,-1,-1,3,2,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,,10853.0,203.0,St Andrew's Trillion Trophy Stadium (Birmingham),,3,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,5,1,0,3,1,4,1,1210514400,206,0,67,14,42,0,-1,-1,1,-1,-1,-1,-1,-1,-1,2,1,1,1,3,2,-1,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0,1,1,0,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,9,28,50,78,95,20,61,86,47,0.0,0.0,4.00,2.67,/clubs/birmingham-city-fc-206,teams/england-birmingham-city-fc.png,Birmingham City,/clubs/blackburn-rovers-fc-216,teams/england-blackburn-rovers-fc.png,Blackburn Rovers,1.37,1.42,1.28,1.50,0.86,1.57,92,73,50,22,6,0,0,0,0.0,0.0,0.0,/england/birmingham-city-fc-vs-blackburn-rover...,3137,38,True,True,True,True,True,False,True,"['32', '73', '90+1', '90+3']",['49']


In [15]:
# check seasons where game_week == 0 occur
data[data['game_week'] == 0]['season'].unique()

array(['2008/2009', '2007/2008'], dtype=object)

In [16]:
# check season unique game_week values for the 2008/2009 season
data[data['season'] == '2008/2009']['game_week'].unique()

array([0], dtype=int64)

In [17]:
# check season unique game_week values for the 2007/2008 season
data[data['season'] == '2007/2008']['game_week'].unique()

array([0], dtype=int64)

In [18]:
# check season unique game_week values for the 2009/2010 season
data[data['season'] == '2009/2010']['game_week'].unique()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 17, 14, 15, 16,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38], dtype=int64)

Further inspection shows that matches from the `2007/2008` and `2008/2009` seasons have all values in the `game_week` column as 0. `game_week` information seems to have been stardardized starting from `2009/2010` season. If `game_week` data from `2007/2008` and `2008/2009` seasons is to be used during EDA or modelling, it will be important to indicate the correct match-week information.

#### homeGoals and awayGoals columns
This columns consist of arrays that store goal timing data (when goals were scored during the match). Upon closer scrutiny of the dataset, it was observed that the columns `homeGoals` and `homeGoals_timings` contained identical information, as did `awayGoals` and `awayGoals_timings`. In the interest of eliminating redundancy and retaining more descriptive columns, it makes sense to remove the `homeGoals` and `awayGoals` columns.

In [19]:
data[['homeGoals', 'awayGoals', 'homeGoals_timings', 'awayGoals_timings' ]].head()

Unnamed: 0,homeGoals,awayGoals,homeGoals_timings,awayGoals_timings
0,"['45+1', '57']",['47'],"['45+1', '57']",['47']
1,[],['82'],[],['82']
2,[],['74'],[],['74']
3,['5'],['59'],['5'],['59']
4,['11'],['67'],['11'],['67']


#### corner & offside columns
Dataframe snapshots (head, tail, sample) showed that some seasons, particularly seasons priror to the `2013/2014` season had missing data (encoded as -1) in the following columns 
- `team_a_corners`
- `team_b_corners`
- `totalCornerCount`
- `team_a_offsides`
- `team_b_offsides`
- `team_a_shotsOnTarget`
- `team_b_shotsOnTarget`
- `team_a_shotsOffTarget`
- `team_b_shotsOffTarget`
- `team_a_shots`
- `team_b_shots`
- `team_a_fouls`
- `team_b_fouls`
- `team_a_possession`
- `team_b_possession`

In [20]:
# group by season and get descriptive stats for columns suspected to contain missing data
missing_data_columns = [
                        'team_a_corners',
                        'team_b_corners',
                        'totalCornerCount',
                        'team_a_offsides',
                        'team_b_offsides',
                        'team_a_shotsOnTarget',
                        'team_b_shotsOnTarget',
                        'team_a_shotsOffTarget',
                        'team_b_shotsOffTarget',
                        'team_a_shots',
                        'team_b_shots',
                        'team_a_fouls',
                        'team_b_fouls',
                        'team_a_possession',
                        'team_b_possession'
                        ]

data.groupby('season')[missing_data_columns].describe()

Unnamed: 0_level_0,team_a_corners,team_a_corners,team_a_corners,team_a_corners,team_a_corners,team_a_corners,team_a_corners,team_a_corners,team_b_corners,team_b_corners,team_b_corners,team_b_corners,team_b_corners,team_b_corners,team_b_corners,team_b_corners,totalCornerCount,totalCornerCount,totalCornerCount,totalCornerCount,totalCornerCount,totalCornerCount,totalCornerCount,totalCornerCount,team_a_offsides,team_a_offsides,team_a_offsides,team_a_offsides,team_a_offsides,team_a_offsides,team_a_offsides,team_a_offsides,team_b_offsides,team_b_offsides,team_b_offsides,team_b_offsides,team_b_offsides,team_b_offsides,team_b_offsides,team_b_offsides,team_a_shotsOnTarget,team_a_shotsOnTarget,team_a_shotsOnTarget,team_a_shotsOnTarget,team_a_shotsOnTarget,team_a_shotsOnTarget,team_a_shotsOnTarget,team_a_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_b_shotsOnTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_a_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_b_shotsOffTarget,team_a_shots,team_a_shots,team_a_shots,team_a_shots,team_a_shots,team_a_shots,team_a_shots,team_a_shots,team_b_shots,team_b_shots,team_b_shots,team_b_shots,team_b_shots,team_b_shots,team_b_shots,team_b_shots,team_a_fouls,team_a_fouls,team_a_fouls,team_a_fouls,team_a_fouls,team_a_fouls,team_a_fouls,team_a_fouls,team_b_fouls,team_b_fouls,team_b_fouls,team_b_fouls,team_b_fouls,team_b_fouls,team_b_fouls,team_b_fouls,team_a_possession,team_a_possession,team_a_possession,team_a_possession,team_a_possession,team_a_possession,team_a_possession,team_a_possession,team_b_possession,team_b_possession,team_b_possession,team_b_possession,team_b_possession,team_b_possession,team_b_possession,team_b_possession
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
season,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2,Unnamed: 82_level_2,Unnamed: 83_level_2,Unnamed: 84_level_2,Unnamed: 85_level_2,Unnamed: 86_level_2,Unnamed: 87_level_2,Unnamed: 88_level_2,Unnamed: 89_level_2,Unnamed: 90_level_2,Unnamed: 91_level_2,Unnamed: 92_level_2,Unnamed: 93_level_2,Unnamed: 94_level_2,Unnamed: 95_level_2,Unnamed: 96_level_2,Unnamed: 97_level_2,Unnamed: 98_level_2,Unnamed: 99_level_2,Unnamed: 100_level_2,Unnamed: 101_level_2,Unnamed: 102_level_2,Unnamed: 103_level_2,Unnamed: 104_level_2,Unnamed: 105_level_2,Unnamed: 106_level_2,Unnamed: 107_level_2,Unnamed: 108_level_2,Unnamed: 109_level_2,Unnamed: 110_level_2,Unnamed: 111_level_2,Unnamed: 112_level_2,Unnamed: 113_level_2,Unnamed: 114_level_2,Unnamed: 115_level_2,Unnamed: 116_level_2,Unnamed: 117_level_2,Unnamed: 118_level_2,Unnamed: 119_level_2,Unnamed: 120_level_2
2007/2008,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0
2008/2009,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0
2009/2010,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0
2010/2011,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0
2011/2012,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,380.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0
2012/2013,380.0,6.252632,3.158833,0.0,4.0,6.0,8.0,15.0,380.0,4.792105,2.784511,0.0,3.0,4.0,6.0,17.0,380.0,11.044737,3.681315,0.0,9.0,11.0,13.25,23.0,380.0,2.339474,1.654296,0.0,1.0,2.0,3.0,9.0,380.0,2.134211,1.703756,0.0,1.0,2.0,3.0,9.0,380.0,8.581579,3.743233,2.0,6.0,8.0,11.0,21.0,380.0,7.015789,3.243791,0.0,5.0,7.0,8.0,21.0,380.0,6.555263,3.23725,0.0,4.0,6.0,9.0,21.0,380.0,4.971053,2.649579,0.0,3.0,4.0,6.0,16.0,380.0,15.136842,5.403235,2.0,11.0,15.0,18.0,36.0,380.0,11.986842,4.688991,2.0,9.0,11.0,14.0,31.0,380.0,10.265789,3.144772,2.0,8.0,10.0,12.0,23.0,380.0,10.697368,3.568477,2.0,8.0,10.0,13.0,23.0,380.0,52.013158,7.738287,27.0,47.0,53.0,57.0,74.0,380.0,47.986842,7.738287,26.0,43.0,47.0,53.0,73.0
2013/2014,380.0,6.092105,2.962286,0.0,4.0,6.0,8.0,14.0,380.0,4.665789,2.695915,0.0,3.0,4.0,6.0,14.0,380.0,10.757895,3.522909,2.0,8.0,11.0,13.0,21.0,380.0,2.073684,1.67705,0.0,1.0,2.0,3.0,10.0,380.0,1.986842,1.760707,0.0,1.0,2.0,3.0,10.0,380.0,5.163158,2.461248,0.0,3.0,5.0,6.0,14.0,380.0,4.213158,2.125194,0.0,3.0,4.0,6.0,13.0,380.0,6.660526,3.335745,0.0,4.0,6.0,9.0,22.0,380.0,5.5,3.222202,0.0,3.0,5.0,7.0,20.0,380.0,11.823684,4.708715,2.0,8.0,11.0,15.0,33.0,380.0,9.713158,4.437706,2.0,7.0,9.0,12.0,30.0,380.0,10.313158,3.1813,2.0,8.0,10.0,13.0,20.0,380.0,10.839474,3.48582,2.0,8.0,11.0,13.0,24.0,380.0,52.226316,8.941401,30.0,46.0,52.0,59.0,76.0,380.0,47.773684,8.941401,24.0,41.0,48.0,54.0,70.0
2014/2015,380.0,6.042105,3.243555,0.0,4.0,6.0,8.0,18.0,380.0,4.663158,2.568782,0.0,3.0,4.0,6.0,13.0,380.0,10.705263,3.521135,1.0,8.0,11.0,13.0,23.0,380.0,1.939474,1.617523,0.0,1.0,2.0,3.0,9.0,380.0,1.844737,1.54257,0.0,1.0,2.0,3.0,8.0,380.0,4.589474,2.450544,0.0,3.0,4.0,6.0,14.0,380.0,3.739474,2.070784,0.0,2.0,4.0,5.0,11.0,380.0,5.307895,2.803586,0.0,3.0,5.0,7.0,15.0,380.0,4.357895,2.485486,0.0,3.0,4.0,6.0,13.0,380.0,9.897368,4.216752,2.0,7.0,9.0,13.0,29.0,380.0,8.097368,3.438643,0.0,6.0,8.0,10.0,19.0,380.0,10.971053,3.349746,4.0,8.0,11.0,13.0,23.0,380.0,11.071053,3.472881,1.0,9.0,11.0,14.0,20.0,380.0,52.026316,8.840083,27.0,47.0,52.0,58.0,77.0,380.0,47.973684,8.840083,23.0,42.0,48.0,53.0,73.0
2015/2016,380.0,5.931579,3.247983,0.0,4.0,5.0,8.0,17.0,380.0,4.876316,2.50971,0.0,3.0,5.0,7.0,14.0,380.0,10.807895,3.604811,1.0,8.0,10.0,13.0,25.0,380.0,1.994737,1.515086,0.0,1.0,2.0,3.0,8.0,380.0,1.834211,1.548303,0.0,1.0,1.5,3.0,10.0,380.0,4.636842,2.424525,0.0,3.0,4.0,6.0,13.0,380.0,3.915789,2.293334,0.0,2.0,4.0,5.0,14.0,380.0,5.478947,2.922405,0.0,3.0,5.0,7.0,15.0,380.0,4.226316,2.263941,0.0,3.0,4.0,6.0,11.0,380.0,10.115789,4.211088,2.0,7.0,10.0,12.25,24.0,380.0,8.142105,3.34014,2.0,6.0,8.0,10.0,18.0,380.0,9.863158,3.481911,2.0,7.0,10.0,12.0,21.0,380.0,11.105263,3.339928,4.0,9.0,11.0,13.0,22.0,380.0,52.123684,8.212094,27.0,47.0,52.0,58.0,76.0,380.0,47.876316,8.212094,24.0,42.0,48.0,53.0,73.0
2016/2017,380.0,5.673684,3.085217,0.0,3.0,5.0,8.0,19.0,380.0,4.742105,2.705767,0.0,3.0,4.0,6.0,15.0,380.0,10.415789,3.442826,2.0,8.0,10.0,13.0,23.0,380.0,1.957895,1.612391,0.0,1.0,2.0,3.0,9.0,380.0,1.736842,1.461633,0.0,1.0,2.0,2.0,9.0,380.0,5.857895,3.039896,0.0,4.0,5.0,8.0,19.0,380.0,4.726316,2.599745,0.0,3.0,4.0,6.0,17.0,380.0,7.468421,3.966423,0.0,5.0,7.0,10.0,25.0,380.0,6.110526,3.218252,0.0,4.0,6.0,8.0,19.0,380.0,13.326316,5.921549,0.0,9.0,12.0,17.0,37.0,380.0,10.836842,4.763506,0.0,7.0,10.0,14.0,27.0,380.0,10.718421,3.421518,2.0,8.0,11.0,13.0,23.0,380.0,11.2,3.527801,2.0,9.0,11.0,14.0,23.0,380.0,50.242105,11.558926,-1.0,44.0,51.0,58.0,74.0,380.0,48.147368,11.411671,-1.0,42.0,48.0,56.0,73.0


The output above confirms that seasons prior to the 2013/2014 season indeed are missing data in the columns identified above. Given that shots, shots on target, and corner kicks are occasions in a football/soccer match when the probility of scoring is high, and therefore probably have significant match outcome prediction power, we will likely retain only data from 2013/2014 season onwards.

#### Referee ID column
The 'refereeID column is a unique identifier for the referee who officiated a match. Dataframe snapshots revealed that the referee column had some missing values:

In [21]:
data[data['refereeID'].isnull()].shape

(2784, 215)

In [22]:
data[data['refereeID'].isnull()]['season'].unique()

array(['2016/2017', '2015/2016', '2014/2015', '2013/2014', '2017/2018',
       '2012/2013', '2008/2009', '2007/2008', '2023/2024'], dtype=object)

Output indicates that 2,784 entries in the refereeID column have null values. Given that Referees are randomly selected and are often assumed to have very little influence on the match outcome (assuming match officiation is fair), we can safely drop the referee column.

#### Stadium Location
Examination of dataframe shows that the stadium location column has some missing values. since this column has limited relevance, we will add to the list of columns to be dropped.

In [23]:
data[data['stadium_location'].isnull()]['season'].unique()

array(['2016/2017', '2015/2016', '2014/2015', '2013/2014', '2018/2019',
       '2019/2020', '2011/2012', '2010/2011', '2009/2010', '2008/2009',
       '2007/2008', '2020/2021', '2021/2022', '2022/2023', '2023/2024'],
      dtype=object)

#### odds columns
The dataframe snapshots (head, tail, sample) revealed that the values of most of the odds columns, with the exception of `odds_ft_1`, `odds_ft_x`, & `odds_ft_1`  columns, were zeros. Odds represent the probability of a given match outcome, therefore, are not expected to be zero.Given that this columns essentially are missing data, we will drop them from the dataset.

In [24]:
# get list of odds columns 
list_of_columns = list(data.columns)

# odds columns
odds_columns = [column for column in list_of_columns if column.startswith('odds')]


In [42]:
# compute the % zero values for each column and store the results in a dictionary
zero_counts = {}
for column in odds_columns:
    zero_counts[column] = (data[column] == 0).sum()/6460

In [43]:
zero_counts

{'odds_ft_1': 0.3238390092879257,
 'odds_ft_x': 0.3238390092879257,
 'odds_ft_2': 0.3238390092879257,
 'odds_ft_over05': 0.6174922600619195,
 'odds_ft_over15': 0.6174922600619195,
 'odds_ft_over25': 0.6164086687306501,
 'odds_ft_over35': 0.6174922600619195,
 'odds_ft_over45': 0.6174922600619195,
 'odds_ft_under05': 0.6174922600619195,
 'odds_ft_under15': 0.6174922600619195,
 'odds_ft_under25': 0.6167182662538699,
 'odds_ft_under35': 0.6173374613003096,
 'odds_ft_under45': 0.6174922600619195,
 'odds_btts_yes': 0.6174922600619195,
 'odds_btts_no': 0.6174922600619195,
 'odds_team_a_cs_yes': 0.6188854489164086,
 'odds_team_a_cs_no': 0.6188854489164086,
 'odds_team_b_cs_yes': 0.6339009287925697,
 'odds_team_b_cs_no': 0.6335913312693499,
 'odds_doublechance_1x': 0.6195046439628483,
 'odds_doublechance_12': 0.6195046439628483,
 'odds_doublechance_x2': 0.6195046439628483,
 'odds_1st_half_result_1': 0.6185758513931888,
 'odds_1st_half_result_x': 0.6187306501547988,
 'odds_1st_half_result_2': 0.

The output above reveals that 32% of entries in the `odds_ft_1`, `odds_ft_x`, & `odds_ft_1`  columns have invalid odd data (zeros) while the other odd columns consists of between 62% and 100%  invalid data.Therefore, to ensure model integrity all odds columns except `odds_ft_1`, `odds_ft_x`, & `odds_ft_1`  will be dropped from the dataset.Entries with invalid data in the `odds_ft_1`, `odds_ft_x`, & `odds_ft_1`  columns will be dropped. 

#### winning_team column
Inspection revealed that this column indicates winning team using the teamid and draws using -1. For EDA it might be useful to create column that indicates home win with `1`, draws with `x` and away wins with 2, which is the industry standard.

#### no_home_away column
We observed tha the `no_home_away` appears to consist of mostly 0 which makes sense as this would only be 1 if a match was played on a neutral ground. This rarely happens in the EPL.

In [44]:
data['no_home_away'].unique()

array([0], dtype=int64)

The above output confirms that no games were played on neutral ground and thus this column can be dropped.
#### Attendance column
In this column we noted that zero attendance or missing attendance was encoded with -1. 

In [48]:
data[data['attendance']==-1]['attendance'].count()

1822

In [50]:
# List of seasons to filter out
season_not_relevant = ['2007/2008', '2008/2009', '2009/2010', '2010/2011', 
                       '2011/2012', '2012/2013', '2023/2024']

# Filter out rows where the "season" column is in the list
filtered_data = data[~data['season'].isin(season_not_relevant)]

# check missing attendance for relevant seasons
filtered_data[filtered_data['attendance']==-1]['attendance'].count()

729

In [54]:
filtered_data[filtered_data['attendance']==-1].groupby('season')['attendance'].count()

season
2013/2014     16
2014/2015     16
2015/2016     16
2016/2017     16
2020/2021     81
2021/2022    263
2022/2023    321
Name: attendance, dtype: int64

Further inspection reveals that for the relevant seasons, seasons to be considered to have sufficient data, very few matches between `2013` and `2019` had missing fan attendance information. While missing attendance data was expected for the `2020/2021` due covid restrictions, missing data in `2021/2022` and `2022/2023` is unusual. 
Since attendance data may be useful in predicting match outcome, missing data could be obtained from past attendance data or stadium capacity can be used with the assumption that the stadium was full.

#### goalTimingDisabled, corner_timings_recorded, & card_timings_recorded columns
This columns consisted mostly of 1's or 0's. API documentation revealed that this columns were used to indicate whether goal timing, corner timing or card timing data was captured. This information has little relevance and can be safely dropped dropped.

#### team_a_fh_corners	team_b_fh_corners	team_a_2h_corners	team_b_2h_corners	corner_fh_count	corner_2h_count	team_a_fh_cards	team_b_fh_cards	team_a_2h_cards	team_b_2h_cards	total_fh_cards	total_2h_cards 

- Review this columns (2016/2017) season showed some missing data for these this season

#### attacks_recorded column
The attacks_recorded column tracks whether or not attack data was captured encoded as -1 (attack data not recorded) and 1 (attack data was recorded). This is a metadata column and thus can be dropped as it does have any prective relevance. 

In [60]:
filtered_data['attacks_recorded'].unique()

array([-1,  1], dtype=int64)

In [61]:
filtered_data['attacks_recorded'].value_counts(normalize=True)

 1    0.690263
-1    0.309737
Name: attacks_recorded, dtype: float64

The output above that approximately 56% of matches are missing attack data. The affected columns include:
- `team_a_dangerous_attacks`	
- `team_b_dangerous_attacks`	
- `team_a_attacks`	
- `team_b_attacks`
Given that a large fraction of the attack data is missing, and because this data cannot be imputed, dropping the affected columns is a sensible approach. Additionally, given that live match predictions, which rely on attack data,  is not part of the project scope, dropping attack data does not significantly affect the success of the project. 

In [63]:
data['competition_id'].unique()

array([   9,   10,   11,   12,  161,  246, 1625, 2012, 3119, 3121, 3125,
       3131, 3137, 4759, 6135, 7704, 9660], dtype=int64)