# EPL 데이터 분석

문제 정의
프리미어리그(EPL)의 경기 기록과 선수 통계를 활용하여 팀 혹은 특정 경기의 결과에 영향을 미치는 요인을 분석하고, 이를 통해 향후 경기 결과나 선수 활약도를 예측할 수 있는 인사이트를 도출하고자 한다.

목표

- 경기 데이터를 기반으로 홈/원정팀의 승리 여부에 영향을 주는 주요 요인을 파악한다.

- 각 경기의 이벤트 및 선수 스탯을 바탕으로 경기 결과(예: 승/무/패)를 예측할 수 있는 모델을 설계한다.

예상할 수 있는 분석 방향

- 팀의 슈팅 횟수, 유효 슈팅, 점유율, 패스 정확도 등이 결과에 미치는 영향 파악

- 선수 포지션, 교체 선수 기여도 등과 팀 성적의 상관관계 분석

머신러닝 모델 (예: Decision Tree, Random Forest 등)을 활용한 경기 결과 예측 모델 구축

In [1]:
import pandas as pd

In [3]:
epl_df = pd.read_json('epl_2022_2023_07_02_2023.json')
epl_df.head()

Unnamed: 0,75091,75096,75098,75099,75093,75082,75088,75087,75089,75084,...,75127,75128,75129,75124,75125,75121,75122,75123,75126,75130
event,"[Full-time Match ends, Arsenal 0, Newcastle Un...","[Full-time Match ends, Everton 1, Brighton and...","[Full-time Match ends, Leicester City 0, Fulha...","[Full-time Match ends, Manchester United 3, Bo...","[Full-time Match ends, Brentford 3, Liverpool ...","[Full-time Match ends, Brighton and Hove Albio...","[Full-time Match ends, Tottenham Hotspur 0, As...","[Full-time Match ends, Nottingham Forest 1, Ch...","[Full-time Match ends, West Ham United 0, Bren...","[Full-time Match ends, Liverpool 2, Leicester ...",...,"[Full-time Match ends, Newcastle United 1, Wes...","[Full-time Match ends, Nottingham Forest 1, Le...","[Full-time Match ends, Tottenham Hotspur 1, Ma...","[Full-time Match ends, Chelsea 0, Fulham 0., S...","[Full-time Match ends, Everton 1, Arsenal 0., ...","[Full-time Match ends, Aston Villa 2, Leiceste...","[Full-time Match ends, Brentford 3, Southampto...","[Full-time Match ends, Brighton and Hove Albio...","[Full-time Match ends, Manchester United 2, Cr...","[Full-time Match ends, Wolverhampton Wanderers..."
matchweek,Matchweek 19,Matchweek 19,Matchweek 19,Matchweek 19,Matchweek 19,Matchweek 18,Matchweek 18,Matchweek 18,Matchweek 18,Matchweek 18,...,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22,Matchweek 22
team1_name,Arsenal,Everton,Leicester City,Manchester United,Brentford,Brighton and Hove Albion,Tottenham Hotspur,Nottingham Forest,West Ham United,Liverpool,...,Newcastle United,Nottingham Forest,Tottenham Hotspur,Chelsea,Everton,Aston Villa,Brentford,Brighton & Hove Albion,Manchester United,Wolverhampton Wanderers
team1_startings,"[Aaron Ramsdale, Ben White, Gabriel Magalhães,...","[Jordan Pickford, James Tarkowski, Nathan Patt...","[Danny Ward, Wout Faes, Daniel Amartey, Timoth...","[David de Gea, Victor Lindelöf, Harry Maguire,...","[David Raya, Ethan Pinnock, Zanka, Ben Mee, Ri...","[Robert Sánchez, Tariq Lamptey, Lewis Dunk, Le...","[Hugo Lloris, Cristian Romero, Ben Davies, Clé...","[Dean Henderson, Joe Worrall, Serge Aurier, Wi...","[Lukasz Fabianski, Aaron Cresswell, Craig Daws...","[Alisson, Virgil van Dijk, Andrew Robertson, J...",...,"[Nick Pope, Kieran Trippier, Sven Botman, Fabi...","[Keylor Navas, Neco Williams, Scott McKenna, W...","[Hugo Lloris, Eric Dier, Cristian Romero, Ben ...","[Kepa Arrizabalaga, Benoît Badiashile, Thiago ...","[Jordan Pickford, James Tarkowski, Vitaliy Myk...","[Emiliano Martínez, Ezri Konsa, Tyrone Mings, ...","[David Raya, Aaron Hickey, Rico Henry, Ethan P...","[Robert Sánchez, Lewis Dunk, Pervis Estupiñán,...","[David de Gea, Lisandro Martínez, Raphaël Vara...","[José Sá, Rayan Aït-Nouri, Craig Dawson, Nélso..."
team1_subs,"[Matt Turner, Kieran Tierney, Rob Holding, Tak...","[Asmir Begovic, Yerry Mina, Ben Godfrey, Séamu...","[Daniel Iversen, Çaglar Söyüncü, Jannik Vester...","[Tom Heaton, Lisandro Martínez, Tyrell Malacia...","[Thomas Strakosha, Mads Bech Sørensen, Tristan...","[Jason Steele, Jan Paul van Hecke, Joël Veltma...","[Fraser Forster, Davinson Sánchez, Emerson Roy...","[Wayne Hennessey, Steve Cook, Neco Williams, H...","[Alphonse Aréola, Ben Johnson, Thilo Kehrer, N...","[Adrián, Joe Gomez, Ibrahima Konaté, Konstanti...",...,"[Martin Dúbravka, Paul Dummett, Jamaal Lascell...","[Wayne Hennessey, Felipe, Joe Worrall, Serge A...","[Fraser Forster, Pedro Porro, Davinson Sánchez...","[Marcus Bettinelli, Trevoh Chalobah, Ben Chilw...","[Asmir Begovic, Mason Holgate, Yerry Mina, Ben...","[Robin Olsen, Viljami Sinisalo, Matty Cash, Ál...","[Matthew Cox, Zanka, Kristoffer Ajer, Mads Roe...","[Jason Steele, Adam Webster, Jan Paul van Heck...","[Tom Heaton, Victor Lindelöf, Harry Maguire, T...","[Daniel Bentley, Nathan Collins, Jonny, Hugo B..."


In [9]:
epl_df.to_csv('epl.csv', index=True)

In [31]:
trans_epl_df = epl_df.T

In [11]:
trans_epl_df = epl_df.T
trans_epl_df.to_csv('trans_epl_df')

In [32]:
trans_epl_df.head()

Unnamed: 0,event,matchweek,team1_name,team1_startings,team1_subs,team1_stat,team2_name,team2_startings,team2_subs,team2_stat
75091,"[Full-time Match ends, Arsenal 0, Newcastle Un...",Matchweek 19,Arsenal,"[Aaron Ramsdale, Ben White, Gabriel Magalhães,...","[Matt Turner, Kieran Tierney, Rob Holding, Tak...","{'possession_%': '66.8', 'shots_on_target': '4...",Newcastle United,"[Nick Pope, Kieran Trippier, Sven Botman, Fabi...","[Martin Dúbravka, Jamaal Lascelles, Jamal Lewi...","{'possession_%': '33.2', 'shots_on_target': '1..."
75096,"[Full-time Match ends, Everton 1, Brighton and...",Matchweek 19,Everton,"[Jordan Pickford, James Tarkowski, Nathan Patt...","[Asmir Begovic, Yerry Mina, Ben Godfrey, Séamu...","{'possession_%': '48.9', 'shots_on_target': '4...",Brighton and Hove Albion,"[Robert Sánchez, Lewis Dunk, Levi Colwill, Per...","[Jason Steele, Tariq Lamptey, Jan Paul van Hec...","{'possession_%': '51.1', 'shots_on_target': '8..."
75098,"[Full-time Match ends, Leicester City 0, Fulha...",Matchweek 19,Leicester City,"[Danny Ward, Wout Faes, Daniel Amartey, Timoth...","[Daniel Iversen, Çaglar Söyüncü, Jannik Vester...","{'possession_%': '61.2', 'shots_on_target': '6...",Fulham,"[Bernd Leno, Kenny Tete, Tosin Adarabioyo, Tim...","[Marek Rodák, Layvin Kurzawa, Issa Diop, Harry...","{'possession_%': '38.8', 'shots_on_target': '2..."
75099,"[Full-time Match ends, Manchester United 3, Bo...",Matchweek 19,Manchester United,"[David de Gea, Victor Lindelöf, Harry Maguire,...","[Tom Heaton, Lisandro Martínez, Tyrell Malacia...","{'possession_%': '58.5', 'shots_on_target': '6...",Bournemouth,"[Mark Travers, Lloyd Kelly, Chris Mepham, Adam...","[Cameron Plain, Jack Stephens, Jack Stacey, Jo...","{'possession_%': '41.5', 'shots_on_target': '4..."
75093,"[Full-time Match ends, Brentford 3, Liverpool ...",Matchweek 19,Brentford,"[David Raya, Ethan Pinnock, Zanka, Ben Mee, Ri...","[Thomas Strakosha, Mads Bech Sørensen, Tristan...","{'possession_%': '27', 'shots_on_target': '7',...",Liverpool,"[Alisson, Virgil van Dijk, Ibrahima Konaté, Ko...","[Caoimhín Kelleher, Joe Gomez, Andrew Robertso...","{'possession_%': '73', 'shots_on_target': '6',..."


In [33]:
trans_epl_df['matchweek'] = trans_epl_df['matchweek'].str.extract('(\d+)').astype(int)

In [34]:
trans_epl_df['matchweek'] = trans_epl_df['matchweek'].sort_index()

In [43]:
#문자열 딕셔너리로 변환
import ast
team1_stats_df = trans_epl_df['team1_stat'].apply(pd.Series).add_prefix('team1_')
team2_stats_df = trans_epl_df['team2_stat'].apply(pd.Series).add_prefix('team2_')

In [44]:
team1_stats_df.head()

Unnamed: 0,team1_possession_%,team1_shots_on_target,team1_shots,team1_touches,team1_passes,team1_tackels,team1_clearances,team1_corners,team1_offsides,team1_yellow_cards,team1_foul_conceded
75091,66.8,4,17,703,536,13,17,5,1,4,10
75096,48.9,4,10,663,478,16,15,5,1,3,8
75098,61.2,6,15,732,551,17,15,6,0,2,12
75099,58.5,6,18,731,579,9,11,6,0,0,13
75093,27.0,7,10,432,237,17,37,4,7,1,5


In [45]:
team2_stats_df.head()

Unnamed: 0,team2_possession_%,team2_shots_on_target,team2_shots,team2_touches,team2_passes,team2_tackels,team2_clearances,team2_corners,team2_offsides,team2_yellow_cards,team2_foul_conceded
75091,33.2,1,8,446,253,23,29,5,4,5,16
75096,51.1,8,19,676,512,16,13,2,0,1,10
75098,38.8,2,11,566,348,13,32,3,1,6,11
75099,41.5,4,7,579,406,18,19,5,2,3,7
75093,73.0,6,16,857,652,9,15,9,4,3,8


In [42]:
print(trans_epl_df['team1_stat'].iloc[0])
print(type(trans_epl_df['team1_stat'].iloc[0]))

{'possession_%': '66.8', 'shots_on_target': '4', 'shots': '17', 'touches': '703', 'passes': '536', 'tackels': '13', 'clearances': '17', 'corners': '5', 'offsides': '1', 'yellow_cards': '4', 'foul_conceded': '10'}
<class 'dict'>


In [46]:
trans_epl_df = trans_epl_df.drop(columns=['team1_stat','team2_stat'])

In [48]:
trans_epl_df.head()

Unnamed: 0,event,matchweek,team1_name,team1_startings,team1_subs,team2_name,team2_startings,team2_subs
75091,"[Full-time Match ends, Arsenal 0, Newcastle Un...",19,Arsenal,"[Aaron Ramsdale, Ben White, Gabriel Magalhães,...","[Matt Turner, Kieran Tierney, Rob Holding, Tak...",Newcastle United,"[Nick Pope, Kieran Trippier, Sven Botman, Fabi...","[Martin Dúbravka, Jamaal Lascelles, Jamal Lewi..."
75096,"[Full-time Match ends, Everton 1, Brighton and...",19,Everton,"[Jordan Pickford, James Tarkowski, Nathan Patt...","[Asmir Begovic, Yerry Mina, Ben Godfrey, Séamu...",Brighton and Hove Albion,"[Robert Sánchez, Lewis Dunk, Levi Colwill, Per...","[Jason Steele, Tariq Lamptey, Jan Paul van Hec..."
75098,"[Full-time Match ends, Leicester City 0, Fulha...",19,Leicester City,"[Danny Ward, Wout Faes, Daniel Amartey, Timoth...","[Daniel Iversen, Çaglar Söyüncü, Jannik Vester...",Fulham,"[Bernd Leno, Kenny Tete, Tosin Adarabioyo, Tim...","[Marek Rodák, Layvin Kurzawa, Issa Diop, Harry..."
75099,"[Full-time Match ends, Manchester United 3, Bo...",19,Manchester United,"[David de Gea, Victor Lindelöf, Harry Maguire,...","[Tom Heaton, Lisandro Martínez, Tyrell Malacia...",Bournemouth,"[Mark Travers, Lloyd Kelly, Chris Mepham, Adam...","[Cameron Plain, Jack Stephens, Jack Stacey, Jo..."
75093,"[Full-time Match ends, Brentford 3, Liverpool ...",19,Brentford,"[David Raya, Ethan Pinnock, Zanka, Ben Mee, Ri...","[Thomas Strakosha, Mads Bech Sørensen, Tristan...",Liverpool,"[Alisson, Virgil van Dijk, Ibrahima Konaté, Ko...","[Caoimhín Kelleher, Joe Gomez, Andrew Robertso..."


In [49]:
trans_epl_df.describe()

Unnamed: 0,matchweek
count,209.0
mean,11.880383
std,6.499264
min,1.0
25%,6.0
50%,12.0
75%,18.0
max,22.0


In [51]:
trans_epl_df.isnull().sum()

event              0
matchweek          0
team1_name         0
team1_startings    0
team1_subs         0
team2_name         0
team2_startings    0
team2_subs         0
dtype: int64