# Hello! Its my first pet-project and i chose dataset from my favorite league in football 
 ### English Premier League (EPL) is the most popular football league in the world, so it will be funny to look deeply in the league statistics at 2020/2021 season.
 ### This dataset is available at Kaggle: https://www.kaggle.com/rajatrc1705/english-premier-league202021


In [None]:
# Importing necessary Python libraries.
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from plotly.offline import plot, iplot, init_notebook_mode
import plotly.graph_objs as go
init_notebook_mode(connected=True)

In [None]:
# Reading csv file by Pandas library
epl = pd.read_csv('../input/english-premier-league202021/EPL_20_21.csv')
epl.head(10)

### Columns of dataset are: 

 #### Name: Name of the player
 #### Club: Clubs where player played this season
 #### Nationality: Nationality of the player
 #### Position: Role of the player on the pitch. FW - Forward, MF - Midfielder, GK - Goalkeeper 
 #### Age: Age of player during the season
 #### Matches: In how many matches player participated. 
 ####  Starts: The number of times the player was named in the starting 11 by the manager.
 #### Mins: The number of minutes played by the player.
 #### Goals: The number of Goals scored by the player.
 #### Assists: The number of times the player has assisted other player in scoring the goal.
 ####  Passes_Attempted: The number of passes attempted by the player.
 #### PercPassesCompleted: The number of passes that the player accurately passed to his teammate. 
 #### Pentalty_Goals: The number of goals from pentalty
 #### Pentalty_Attempted: The number of pentalty shoots during the season
 #### xG: EXPECTED number of goals from the player in a match.
 #### xA: Expected number of assists from the player in a match.
 #### Yellow_Cards: The players get a yellow card from the referee for indiscipline, technical fouls, or other minor      fouls.
 #### Red_Cards: The players get a red card for accumulating 2 yellow cards in a single game, or for a major foul. 


In [None]:
list(epl.columns)

## Let's start exloring the data from getting the number of each position (GK, MF, FW) representatives in the EPL. It will help us to know the most popular positions among players in the league. 

In [None]:
epl_position = epl['Position'].value_counts()
fig = go.Figure(data=[go.Pie(labels=epl_position.index, values=epl_position.values, hole=.3)])
fig.show()

### 1/3 of players in EPL are defenders (DF), but the most popular (8.83%) double position for players who had 2 roles on the pitch during the season was FW-MF (Forward-Midfilder)

## Let's look at the top 10 popular nationalities in the league. 

In [None]:
epl_nations = epl['Nationality'].value_counts().head(10)
fig = go.Figure(data=[go.Pie(labels=epl_nations.index, values=epl_nations.values, hole=.3)])
fig.show()

### A little more than half of players in the league in 20/2021 season were from England, the homecountry of league. According to this pie-chart, it proves that EPL is the home for players from Europe. Only one non-European Brazil has a place in this pie-chart with 7.16%.


## Let's shift to age insights in EPL 2020/2021 season

In [None]:
epl.loc[epl['Age'] == epl['Age'].max()]

### The oldest player in EPL 2020/2021 was Willy Caballero, one of the Chelsea Club Goalkeper. He played only 1 full-match during the season. 

In [None]:
epl.loc[epl['Age'] == epl['Age'].min()]

### The youngest players in the season were 16 years forwards. Nevertheles notice that only one 16 y.o footballer (Carney Chukwuemeka) played more than 15 minutes on the pitch. Also, all of these players are from England. 

In [None]:
age_stat = sns.displot(x = epl['Age'], bins = 20)

### It's crucial to notice that 22-27 years old players are the main core of the league. After peak of more than 90+ 27 y.o players, the quantity of player older than 27 y.o is decreasing. 

In [None]:
plt.figure(figsize=(18,8))
club_age = sns.boxplot(x='Club',y='Age', data=epl)
plt.xticks(rotation=90)
plt.show()

### We can notice from this grid that Tottenham Hotspur has the youngest team in the league, whereas West Bromwich Albion has the oldest. 

In [None]:
goal_team = pd.DataFrame(epl.groupby('Position', as_index=False)['Age'].mean())
plt.figure(figsize = (18,8))
ax = sns.barplot(x='Position', y='Age', data=goal_team.sort_values(by="Age"))
plt.xticks(rotation=45)
plt.show()

### From this visualization, we can conclude that Goalkeeper position has the highest age mean among other position (below 27 years), while the average age of FW-DF (position with the lowest mean) players is less than 23 years.


## Players, who didn't skip any minute during the season of 38 games.

In [None]:
epl.loc[epl['Mins'] == epl['Mins'].max()]

### Notice that only two (Pierre Højbjerg, James Ward-Prowse) of six players who played every minute during the season is not a goalkeeper & both of them are midfielders. This fact one of the arguments that proves that due to the high workload & injury risks forward cannot be always available for managers in EPL

In [None]:
minutes_by_position = pd.DataFrame(epl.groupby('Position', as_index=False)['Mins'].sum())
fig = go.Figure(data=[go.Pie(labels=minutes_by_position['Position'], values=minutes_by_position['Mins'])])
fig.show()

## Clubs that scored the greatest amoung of goals during the season

In [None]:
goal_team = pd.DataFrame(epl.groupby('Club', as_index=False)['Goals'].sum())
plt.figure(figsize = (18,8))

ax = sns.barplot(x='Club', y='Goals', data=goal_team.sort_values(by="Goals"))
plt.xticks(rotation=45)
plt.show()

### Manchester City scored the greatest amount of goals during the season and became the champion of the league. Manchester United finished second both on the scored goals & league championship table. However, Tottenham which is third on this visualization finished the season outside TOP-6 (Euroleague places) . Goals are not the main correlation if we talk about the final place in the table, but one of the important one. 

## Percentage of EPL goals scored by each position.

In [None]:
goal_by_position = pd.DataFrame(epl.groupby('Position', as_index=False)['Goals'].sum())


### Forwards scored 44.8% percent of goals in EPL season 2020/2021, but the most interesting insight in this pie chart is that DF (Defenders) scored more than players that played as Forwards or Midfields (FW,MF) and more than played season as Midfielders of Forwards (MF, FW). The core difference between FW, MF and MF,DF is that it depends on which position player was on pitch more. So, MF-FW player was more Midfielder on the pitch than forward during the 2020/2021 season

## TOP-5 Best performers of the league by scored goals.

In [None]:
scorers = epl[['Name', 'Club', 'Position', 'Age', 'Matches', 'Goals']].sort_values(by=['Goals'], ascending=False).head(5)
plt.figure(figsize = (18,8))
ax = sns.barplot(x='Name', y='Goals', data=scorers)
plt.xticks(rotation=45)
plt.show()


### As we can wee, forward of Tottenham Hotspur - Harry Kane was the the best player of the league if we consider only scored goals. The succeeding player, Mohamed Salah scored one goal less than Harry, but the dominance of Harry Kane in the league is more clear if we will consider only goals scored during the game (without penalty goals). Let's look at the league's best scorers without considering penalty goals

In [None]:
epl['Goals_without_penalties'] = (epl['Goals'] - epl['Penalty_Goals'])
goals_without_pens = epl[['Name', 'Club', 'Position', 'Goals', 'Penalty_Goals', 'Penalty_Attempted', 'Goals_without_penalties']].sort_values(by=['Goals_without_penalties', 'Penalty_Attempted'], ascending=[False, True]).head(5)
plt.figure(figsize = (18,8))
ax = sns.barplot(x='Name', y='Goals_without_penalties', data=goals_without_pens)
plt.xticks(rotation=45)
plt.show()

### After creating a column with number of goals without penalty and sorting order by new column and number of penalty attempled, it's became clear that Harry was the best scorer of the league. He is 3 goals ahead of the catch-up Dominic Calvert-lewin. Notice that 4 penalty attempts by Harry ended up by 4 goals. 19 goals + 4/4 penalty goals - top level! 
### Interesting insight is absence of Bruno Fernandes in new table. Manchester United's midfield scored the half of his goals (18) by penalty (9), which made him present in the first table, but game goals not helped him to be in the new table that considered only goals from the game. Nevertheless 18 goals (9 penalty goals) is the good stats for midfields in the league
### Another great insight is the presence of Dominic Calvert-Lewin in the new table and straightaway in the second position. He is not presented in the table ordered by only goals, despite 16 points. 0 penalty absence and clear 16 goals during the season make him one of the best league scorers. His situation proves that more deep statistics is the crucial in order to evaluate EPL players.



## Now let's look at top-5 players, who were most helpful during their time on the the pitch (Mins column) by number of scored goals.

In [None]:
epl['min_per_goal'] = (epl['Mins'] / epl['Goals'])
epl[['Name', 'Club', 'Position', 'Age', 'Mins', 'Goals', 'min_per_goal']].sort_values(by=['min_per_goal'], ascending=True).head(5)

### The top-5 of most effective scorers consists of two player of Tottenham Hotspur - Gareth Bale and Harry Kane. Gareth needed only 83 minutes on the pitch to score a goal, while for the same purpose Kane required 134 minutes. First of all, it's interesting that 31 years old forward (Gareth Bale) perfomed so effectively in the league where mean age of forward is less than 24 years. Secondly, Harry Kane is presented in every goal-involment table. Great individual season by Kane!

## TOP-5 assistents of the league

In [None]:
epl[['Name', 'Club', 'Position', 'Matches', 'Assists', 'Nationality']].sort_values(by=['Assists'], ascending=False).head(5)

### Harry Kane first again! The best scorer (23) and assistant (14) of the league. His teammate Son Heung-min is also here with him. This duo in present at TOP-5 league scorers (with/without pentalty goals) table & TOP-5 league assistants. Bruno Fendandes from mentioned tables is also here! What a season from these guys.  

### Let's look at most efficient assistants of the league

In [None]:
epl['assist_per_minute'] = (epl['Mins'] / epl['Assists'])
epl[['Name', 'Club','Nationality', 'Position', 'Age', 'Mins', 'Assists', 'assist_per_minute']].loc[epl['Mins'] >= 1000].sort_values(by=['assist_per_minute'], ascending=True).head(5)

### Kevin De Bruyne is in the top here. He needed 166 minutes to assist someone in the league. Our friends Harry Kane & Fernandes are here too. Again - what a season from these guys. Notice that 3/5 players from the last two tables are not from England despite the numerical advantage of brits in the league.

## TOP-5 Best player by goals+assists system.

In [None]:
epl['goal_and_assist'] = (epl['Goals'] + epl['Assists'])
epl[['Name', 'Club','Nationality', 'Position', 'Age', 'Mins', 'Assists','Goals', 'goal_and_assist']].sort_values(by=['goal_and_assist'], ascending=False).head(5)

### The same guys over and over! Son, Mohamed, Harry and Bruno are permanent participans of the last 5-6 tables. 

### Let's look at players that most often are involved in creating goals during the season.

In [None]:
epl['goal_involment'] = (epl['Mins'] / epl['goal_and_assist'])
epl[['Name', 'Club','Nationality', 'Position', 'Age', 'Mins', 'Assists','Goals', 'goal_and_assist', 'goal_involment']].loc[epl['Mins'] >= 1000].sort_values(by=['goal_involment'], ascending=True).head(5)

### Fantastic! The only player from the league that in average scored/assists in every 90 minutes on the pitch was Harry Kane. Bruno Fernandes is second in this list. 

## Pass statistics in the EPL 2020/2021. 

### First of all, let's look at correlation between passes and positions in EPL using pie-chart

In [None]:
position_passes = pd.DataFrame(epl.groupby('Position', as_index=False)['Passes_Attempted'].sum())
fig = go.Figure(data=[go.Pie(labels=position_passes['Position'], values=position_passes['Passes_Attempted'])])
fig.show()

### More than 40% of passes in EPL were by defenders. It's not a surprise, because the quantity dominance of defenders in the league. Now let's look at most accurate players of the league.

In [None]:
mean_passes = epl['Passes_Attempted'].median()
epl[['Name', 'Club','Nationality', 'Position', 'Age', 'Mins', 'Passes_Attempted', 'Perc_Passes_Completed']].loc[epl['Passes_Attempted'] >= mean_passes].sort_values(by=['Perc_Passes_Completed'], ascending=False).head(5)

### Wow! Duo of Manchester City's central defenders were the most accurate players of the league. It's a key skill for defenders to provide accurate passes to attack-midfield players to start a new advance to enemy's field. So, John Stones, Ruben Dias and Thiago Silva were good during this season! Notice that there are no forwards in the list, but only defenders or midfielders. 

#### P.S: I calculated median number of passes by players in the season and choose to show only players who made more or equal number of passes to statistics be more accurate.

## Yellow & Red Card statistics in EPL 2020/2021

## Players with the most amount of yellow cards in the league

In [None]:
yellow_cards = epl[['Name', 'Club','Nationality', 'Position', 'Age', 'Mins', 'Yellow_Cards','Red_Cards']].sort_values(by=['Yellow_Cards'], ascending=False).head(5)
yellow_cards

### Wow! Only one defender in this list! It's kind of interesting, because defenders are players that sometimes use non-play tricks to stop forwarding movements of opponents. 

## Players with the most amount of red cards in the league

In [None]:
red_cards = epl[['Name', 'Club','Nationality', 'Position', 'Age', 'Mins', 'Yellow_Cards','Red_Cards']].sort_values(by=['Red_Cards', 'Mins'], ascending=False).head(5)
red_cards

### There is only one player in the season that had more than one red card during the season - Lewis Dunk, defender from Brighton club. 

## Project Resume

#### 1) Harry Kane is the MVP of EPL season 2020/2021 - 37 points in gol+assist system is the huge achievement from forward. He played as a forward and playmaker as well. 
#### 2) Harry Kane & Son Heung-min is the duo of this season. Both of them had places on scorers, assistants, gol+assist tablem. Nevertheless, their efforts didn't help to Tottenham Hotspur to be in top 5 of the league.
#### 3) Manchester City is the MVP team of this season. As a result of being the most often scoring team they became the champions of the season. 
#### 4) John Stones & Ruben Dias helped Manchester City a lot. Their pass accurance became the result of high-quality transmissions forward for Manchester City. Truly the best defenders duo of the season.