# Analysis of English Premier League 2018-2019 season
<h4>Content</h4>

+ Introduction
+ Data description
+ Formulation of research question
+ Data preparation

# Introduction
The English Premier League is a professional football competition held in the United Kingdom. It is attended by 20 English clubs. There are about 580 players participated in EPL from 2018-2019 season.
Let's move on to the process of running League games. A table of 20 teams is created in which each of them plays with all the others twice: at their own stadiums(at home) and others'(away).If a team wins a match, it is awarded 3 points, a draw is 1 point, and a loss is 0 points. There are n*(n-1)/2 games in total, which in our case is 190 games. Which team will score the most points in 38 rounds then it will become the winner of the tournament. Also, along with points,other statistics are counted, such as the number of wins, draws, losses, goals scored and conceded, as well as the difference between goals scored and goals conceded. For example, in the 2019-2020 season, Liverpool became the champion with the following statistics: 32 wins, 3 draws, 3 losses, 85 goals and 33 conceded, the difference is 52.

Full description of the League you can read on https://en.wikipedia.org/wiki/Premier_League

<img src="https://cdn-blog.scorum.com/production/mortilla007/7da1f4df8e27dd77_800" >

# Data description
Since football is not only a ball game and it is important not only the final score, after each game is taken detailed statistics of the match, players, how many resultative actions made by a particular player. Based on this observation:
+ Bookmakers determine the coefficient for the next games of this team. 
+ Coaches choose the best starting line-up, knowing the best qualities of the players. 
+ Sports analysts guess the score of a match based on that data.
+ The Premier League compiles complete statistics of the past season, such as the best teams, players, defenders, scorers, assistants and so on.

My analysis will cover 4 datasets of year 2019. One the main set includes data about players, the size of which is 572 rows. Three additional ones are data about teams that are 20 rows in size, data about league itself - 1 row and all team matches are 380 rows in size. All three form is an overall information about the Premier League.<br>
Analysis of the <mark>players</mark> dataset will be based on such indicators as:<br>

+ position - characterizes the location of the player in the match(such as striker, midfielder, defender, goalkeeper)
+ minutes played - how much time the player spent on the field for the entire season 
+ nationality
+ number of goals
+ number of assists(pass that led to a goal)
+ number of yellow cards - cards awarded for agressive play
+ number of red cards - such cards remove the player from the game for unsportsmanlike behavior
+ rank in the top Strikers 
+ rank in the top Midfielders 
+ rank in the top Defenders
+ rank in the top Goalkeepers 



Analysis of the <mark>matches</mark> dataset consists of:
+ home team name
+ away team name
+ referee - the name of a referee in a match
+ home team goal count - goals scored by home team in a game
+ away team goal count - goals scored by away team in a game
+ home team goal timing - minutes in which goals are scored by home team
+ away team goal timing - minutes in which goals are scored by away team
+ total corners count - number of corner kicks during a game
+ total yellow card count
+ total red card count
+ home team shots on target - number of shots directed to a goal's target by home team
+ away team shots on target - number of shots directed to a goal's target by away team


Analysis of the <mark>teams</mark> dataset consists of:
+ matches played
+ wins
+ wins home 
+ wins away 
+ draws home
+ draws away
+ losses home
+ losses away
+ average points per game
+ goals scored
+ goals conceded
+ goals difference

Analysis of the <mark>league</mark> dataset consists of:
+ total matches
+ goals scored
+ goals conceded
+ clean sheets - games where team does not concede goals 
+ total corners
+ total red cards
+ total yellow cards

# Formulation of research question
My project will consist of 5 parts:
1. Analysis of points scored, goals scored, goals conceded by each team, and building a team distribution table based on it.
2. Analysis of additional club data, such as the number of wins, losses, draws and the average number of points scored per match.
3. Analysis of the performance of teams at home and away, how they play in the first half of the match and in the second
4. Analysis of players resultative actions and discipline: top scorers, assistants, defenders, goalkeepers.
5. Analysis of expected goals(describing the approximate number of shots on target per match), goals scored, shots on target per match. This statistic helps in predictions of game outcomes.

# Data preparation

In [106]:
import pandas as pd
import numpy as np
import requests

In [113]:
# let's open players.csv file
# the main one
csv_file = open(file = 'players.csv',
               mode = 'r',
               encoding = 'ISO-8859-1')
# three additional files
csv1_file = open(file = 'teams.csv',
               mode = 'r',
               encoding = 'ISO-8859-1')

csv2_file = open(file = 'league.csv',
               mode = 'r',
               encoding = 'ISO-8859-1')

csv3_file = open(file = 'matches.csv',
               mode = 'r',
               encoding = 'ISO-8859-1')

In [118]:
# show csv file in txt format
csv_file.read()

"full_name,age,birthday,league,season,position,Current Club,minutes_played_overall,minutes_played_home,minutes_played_away,nationality,appearances_overall,appearances_home,appearances_away,goals_overall,goals_home,goals_away,assists_overall,assists_home,assists_away,penalty_goals,penalty_misses,clean_sheets_overall,clean_sheets_home,clean_sheets_away,conceded_overall,conceded_home,conceded_away,yellow_cards_overall,red_cards_overall,goals_involved_per_90_overall,assists_per_90_overall,goals_per_90_overall,goals_per_90_home,goals_per_90_away,min_per_goal_overall,conceded_per_90_overall,min_per_conceded_overall,min_per_match,min_per_card_overall,min_per_assist_overall,cards_per_90_overall,rank_in_league_top_attackers,rank_in_league_top_midfielders,rank_in_league_top_defenders,rank_in_club_top_scorer\nAaron Cresswell,30,629683200,Premier League,2018/2019,Defender,West Ham United,1589,888,701,England,20,10,8,0,0,0,1,1,0,0,0,3,2,1,22,12,10,1,0,0.06,0.06,0,0,0,0,1.25,72,79,1589,1589,0.06,292

In [109]:
# use pandas in order to show file as a dataframe
players = pd.read_csv('players.csv')
players

Unnamed: 0,full_name,age,birthday,league,season,position,Current Club,minutes_played_overall,minutes_played_home,minutes_played_away,...,conceded_per_90_overall,min_per_conceded_overall,min_per_match,min_per_card_overall,min_per_assist_overall,cards_per_90_overall,rank_in_league_top_attackers,rank_in_league_top_midfielders,rank_in_league_top_defenders,rank_in_club_top_scorer
0,Aaron Cresswell,30,629683200,Premier League,2018/2019,Defender,West Ham United,1589,888,701,...,1.25,72,79,1589,1589,0.06,292,193,80,20
1,Aaron Lennon,33,545529600,Premier League,2018/2019,Midfielder,Burnley,1217,487,730,...,1.48,61,76,1217,1217,0.07,198,187,-1,10
2,Aaron Mooy,29,653356800,Premier League,2018/2019,Midfielder,Huddersfield Town,2327,1190,1137,...,1.78,51,80,582,2327,0.15,147,233,-1,3
3,Aaron Ramsey,29,662169600,Premier League,2018/2019,Midfielder,Arsenal,1327,689,638,...,0.81,111,47,0,221,0.00,69,8,-1,5
4,Aaron Rowe,19,968284800,Premier League,2018/2019,Forward,Huddersfield Town,69,14,55,...,1.30,69,35,0,0,0.00,-1,-1,-1,31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
567,Youri Tielemans,23,862963200,Premier League,2018/2019,Midfielder,Leicester City,1092,575,517,...,1.07,84,84,546,273,0.16,81,13,-1,4
568,Yves Bissouma,23,841363200,Premier League,2018/2019,Midfielder,Brighton & Hove Albion,1769,747,1022,...,1.53,59,63,354,0,0.25,402,335,-1,17
569,Zechariah Medley,20,962928000,Premier League,2018/2019,Defender,Arsenal,0,0,0,...,0.00,0,0,0,0,0.00,-1,-1,-1,-1
570,Zeze Steven Sessegnon,20,958608000,Premier League,2018/2019,Defender,Fulham,0,0,0,...,0.00,0,0,0,0,0.00,-1,-1,-1,-1


In [112]:
# dataframe size
players.shape

(572, 46)

In [92]:
# delete unnecessary columns
players.drop(['birthday' , 'league' , 'season', 
              'penalty_goals' , 'penalty_misses' ,  'minutes_played_home' , 
              'minutes_played_away' , 'appearances_home' , 'appearances_away' ,
              'goals_home' , 'goals_away' , 'assists_home' , 'assists_away' ,
              'clean_sheets_home' , 'clean_sheets_away' , 'conceded_away' , 
              'conceded_home' , 'goals_involved_per_90_overall' , 'goals_per_90_home'
              ,'goals_per_90_away' , 'conceded_per_90_overall','cards_per_90_overall' ], axis = 'columns', inplace = True)
players

Unnamed: 0,full_name,age,position,Current Club,minutes_played_overall,nationality,appearances_overall,goals_overall,assists_overall,clean_sheets_overall,...,goals_per_90_overall,min_per_goal_overall,min_per_conceded_overall,min_per_match,min_per_card_overall,min_per_assist_overall,rank_in_league_top_attackers,rank_in_league_top_midfielders,rank_in_league_top_defenders,rank_in_club_top_scorer
0,Aaron Cresswell,30,Defender,West Ham United,1589,England,20,0,1,3,...,0.00,0,72,79,1589,1589,292,193,80,20
1,Aaron Lennon,33,Midfielder,Burnley,1217,England,16,1,1,4,...,0.07,1217,61,76,1217,1217,198,187,-1,10
2,Aaron Mooy,29,Midfielder,Huddersfield Town,2327,Australia,29,3,1,4,...,0.12,776,51,80,582,2327,147,233,-1,3
3,Aaron Ramsey,29,Midfielder,Arsenal,1327,Wales,28,4,6,7,...,0.27,332,111,47,0,221,69,8,-1,5
4,Aaron Rowe,19,Forward,Huddersfield Town,69,England,2,0,0,0,...,0.00,0,69,35,0,0,-1,-1,-1,31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
567,Youri Tielemans,23,Midfielder,Leicester City,1092,Belgium,13,3,4,3,...,0.25,364,84,84,546,273,81,13,-1,4
568,Yves Bissouma,23,Midfielder,Brighton & Hove Albion,1769,Mali,28,0,0,5,...,0.00,0,59,63,354,0,402,335,-1,17
569,Zechariah Medley,20,Defender,Arsenal,0,England,0,0,0,0,...,0.00,0,0,0,0,0,-1,-1,-1,-1
570,Zeze Steven Sessegnon,20,Defender,Fulham,0,England,0,0,0,0,...,0.00,0,0,0,0,0,-1,-1,-1,-1


In [95]:
# check dataframe to identify null values
players1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 572 entries, 0 to 571
Data columns (total 24 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   full_name                       572 non-null    object 
 1   age                             572 non-null    int64  
 2   position                        572 non-null    object 
 3   Current Club                    572 non-null    object 
 4   minutes_played_overall          572 non-null    int64  
 5   nationality                     572 non-null    object 
 6   appearances_overall             572 non-null    int64  
 7   goals_overall                   572 non-null    int64  
 8   assists_overall                 572 non-null    int64  
 9   clean_sheets_overall            572 non-null    int64  
 10  conceded_overall                572 non-null    int64  
 11  yellow_cards_overall            572 non-null    int64  
 12  red_cards_overall               572 

In [96]:
# the another way to find null values 
players1.isnull().sum()

full_name                         0
age                               0
position                          0
Current Club                      0
minutes_played_overall            0
nationality                       0
appearances_overall               0
goals_overall                     0
assists_overall                   0
clean_sheets_overall              0
conceded_overall                  0
yellow_cards_overall              0
red_cards_overall                 0
assists_per_90_overall            0
goals_per_90_overall              0
min_per_goal_overall              0
min_per_conceded_overall          0
min_per_match                     0
min_per_card_overall              0
min_per_assist_overall            0
rank_in_league_top_attackers      0
rank_in_league_top_midfielders    0
rank_in_league_top_defenders      0
rank_in_club_top_scorer           0
dtype: int64

In [98]:
# drop missing values
players1 = players.dropna()
players1

Unnamed: 0,full_name,age,position,Current Club,minutes_played_overall,nationality,appearances_overall,goals_overall,assists_overall,clean_sheets_overall,...,goals_per_90_overall,min_per_goal_overall,min_per_conceded_overall,min_per_match,min_per_card_overall,min_per_assist_overall,rank_in_league_top_attackers,rank_in_league_top_midfielders,rank_in_league_top_defenders,rank_in_club_top_scorer
0,Aaron Cresswell,30,Defender,West Ham United,1589,England,20,0,1,3,...,0.00,0,72,79,1589,1589,292,193,80,20
1,Aaron Lennon,33,Midfielder,Burnley,1217,England,16,1,1,4,...,0.07,1217,61,76,1217,1217,198,187,-1,10
2,Aaron Mooy,29,Midfielder,Huddersfield Town,2327,Australia,29,3,1,4,...,0.12,776,51,80,582,2327,147,233,-1,3
3,Aaron Ramsey,29,Midfielder,Arsenal,1327,Wales,28,4,6,7,...,0.27,332,111,47,0,221,69,8,-1,5
4,Aaron Rowe,19,Forward,Huddersfield Town,69,England,2,0,0,0,...,0.00,0,69,35,0,0,-1,-1,-1,31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
567,Youri Tielemans,23,Midfielder,Leicester City,1092,Belgium,13,3,4,3,...,0.25,364,84,84,546,273,81,13,-1,4
568,Yves Bissouma,23,Midfielder,Brighton & Hove Albion,1769,Mali,28,0,0,5,...,0.00,0,59,63,354,0,402,335,-1,17
569,Zechariah Medley,20,Defender,Arsenal,0,England,0,0,0,0,...,0.00,0,0,0,0,0,-1,-1,-1,-1
570,Zeze Steven Sessegnon,20,Defender,Fulham,0,England,0,0,0,0,...,0.00,0,0,0,0,0,-1,-1,-1,-1


In the DataFrame I don't need to replace zeros to another values, because in football there are data that can be equal to zero. Example: Goalkeepers usually don't participate in attack that's why they don't score goals