<h2>Introduction</h2>
<p>
In this notebook we are going to apply an EDA the English Premier League player data set ranging from the 2015-16 to 2020-21 seasons. The dataset includes player positional data such as goals scored by foot, clearances from goalkeepers, and fouls commited.
</p>
<br>
<hr>
<br>
<h3> About the dataset </h3>

<p>
Data is pulled from premierleague.com and graciously compiled into csvs <a href="https://www.kaggle.com/datasets/krishanthbarkav/english-premier-leagueepl-player-statistics" alt="link to kaggle data"> here </a>.
</p>


<br>
<hr>
<br>
<h4> Data Definitition </h4>

<p>

<strong> Appearances: </strong> The number of times a player has participated in a match. <br>

<strong> Clean sheets: </strong> The number of matches in which a goalkeeper has prevented the opposing team from scoring any goals. <br>

<strong> Goals conceded: </strong> The total number of goals a goalkeeper has allowed in a given period. <br>

<strong>Tackles: </strong> The total number of successful attempts made by a player to dispossess an opponent or regain possession of the ball. <br>

<strong> Tackle success %: </strong> The percentage of successful tackles out of the total attempted tackles. <br>

<strong> Last man tackles: </strong> Tackles made by a player when they are the last defender, preventing a potential goal-scoring opportunity. <br>

<strong> Blocked shots: </strong> The number of shots blocked by a player, preventing them from reaching the goal. <br>

<strong> Interceptions: </strong> The number of times a player successfully intercepts a pass or clearance from the opposing team. <br>

<strong> Clearances: </strong> The number of times a player successfully clears the ball from their own defensive area. <br>

<strong> Headed Clearance: </strong> The number of clearances made by a player using their head. <br>

<strong> Clearances off line: </strong>The number of clearances made by a player when the ball is near or on the goal line, preventing a goal. <br>

<strong>Recoveries: </strong>The number of times a player regains possession of the ball for their team. <br>

<strong>Duels won: </strong>The number of times a player successfully wins a one-on-one contest for the ball against an opponent. <br>

<strong>Duels lost: </strong>The number of times a player unsuccessfully loses a one-on-one contest for the ball against an opponent. <br>

<strong>Successful 50/50s: </strong>The number of times a player successfully wins a contested ball that has a roughly equal chance of being won by either team. <br>

<strong>Aerial battles won: </strong>The number of times a player successfully wins an aerial duel for the ball. <br>

<strong>Aerial battles lost: </strong>The number of times a player unsuccessfully loses an aerial duel for the ball. <br>

<strong>Own goals: </strong>The number of goals scored by a player inadvertently into their own team's net. <br>

<strong>Errors leading to goal: </strong>The number of mistakes made by a player that directly result in the opposing team scoring a goal. <br>

<strong>Assists: </strong>The number of passes or plays made by a player that directly lead to a goal scored by a teammate. <br>

<strong>Passes: </strong>The total number of successful passes made by a player. <br>

<strong>Passes per match: </strong>The average number of passes made by a player per match. <br>

<strong>Big chances created: </strong>The number of scoring opportunities created by a player that are considered high-quality or likely to result in a goal. <br>

<strong>Crosses: </strong>The number of times a player delivers a ball into the opponent's penalty area from the wide areas of the pitch. <br>

<strong>Cross accuracy %: </strong>The percentage of accurate crosses out of the total attempted crosses. <br>

<strong>Through balls: </strong>The number of passes played by a player that split the opposing team's defense and create goal-scoring opportunities. <br>

<strong>Accurate long balls: </strong>The number of successful long passes played by a player. <br>

<strong>Yellow cards: </strong>The number of cautions received by a player for committing fouls or other rule violations. <br>

<strong>Red cards: </strong>The number of dismissals received by a player for serious rule violations, resulting in their ejection from the match. <br>

<strong>Fouls: </strong>The number of rule violations committed by a player. <br>

<strong>Offsides: </strong>The number of times a player is caught in an offside position, ahead of the last defender when the ball is played to them. <br>

<strong>Goals: </strong>The total number of goals scored by a player. <br>

<strong>Headed goals: </strong>The number of goals scored by <br>

<strong>Goals with right foot: </strong>The number of goals scored by a player using their right foot. <br>

<strong>Goals with left foot: </strong>The number of goals scored by a player using their left foot. <br>

<strong>Hit woodwork: </strong>The number of times a player's shot hits the goal frame (crossbar or posts) but does not result in a goal. <br>

<strong>Goals per match: </strong>The average number of goals scored by a player per match. <br>

<strong>Penalties scored: </strong>The number of penalties successfully converted into goals by a player. <br>

<strong>Freekicks scored: </strong>The number of goals scored by a player directly from a freekick.<br>

<strong>Shots: </strong>The total number of attempts a player has made to score a goal.<br>

<strong>Shots on target: </strong>The number of shots that are on target and require the opposing goalkeeper to make a save.<br>

<strong>Shooting accuracy %: </strong>The percentage of shots on target out of the total number of shots taken by a player.<br>

<strong>Big chances missed: </strong>The number of clear goal-scoring opportunities that a player fails to convert into a goal.<br>

<strong>Saves: </strong>The number of shots on target that a goalkeeper successfully stops from resulting in a goal.<br>

<strong>Penalties saved: </strong>The number of penalties stopped by a goalkeeper.<br>

<strong>Punches: </strong>The number of times a goalkeeper punches the ball away from their goal during a cross or set-piece situation.<br>

<strong>High Claims: </strong>The number of times a goalkeeper successfully catches or claims the ball in the air.<br>

<strong>Catches: </strong>The number of times a goalkeeper securely catches the ball without dropping it.<br>

<strong>Sweeper clearances: </strong>The number of times a goalkeeper clears the ball with their feet or by running out of the penalty area to prevent an opposing player from having a goal-scoring opportunity.<br>

<strong>Throw outs: </strong> The number of times a goalkeeper throws the ball to initiate a counter-attack or distribute it to a teammate.<br>

<strong>Goal Kicks: </strong> The number of times a goalkeeper takes a goal kick to restart play from their own goal area.<br>

</p>

<hr>


<h3> Setup of Data </h3>

<p> First steps, I will import all the required python libraries and configure the base directories so that loading each season's dataframe requires less change when moving the data between systems. Each season will have separate analytics completed before merging all seasons together for a roll up view.
</p>

In [9]:
import pandas as pd
import os
import sqlite3 as sq3
import matplotlib as mplot
import glob

basedir = os.path.dirname(os.path.abspath(__name__))

# Setting up Dataframes for each season
season_15_16 = pd.read_csv(os.path.join(basedir,'pl_15-16.csv'))
season_16_17 = pd.read_csv(os.path.join(basedir,'pl_16-17.csv'))
season_17_18 = pd.read_csv(os.path.join(basedir,'pl_17-18.csv'))
season_18_19 = pd.read_csv(os.path.join(basedir,'pl_18-19.csv'))
season_19_20 = pd.read_csv(os.path.join(basedir,'pl_19-20.csv'))
season_20_21 = pd.read_csv(os.path.join(basedir,'pl_20-21.csv'))

# Enable view of all features at once
pd.set_option('display.max_columns', None)


In [10]:
season_15_16.head()

Unnamed: 0.1,Unnamed: 0,Name,Position,Appearances,Clean sheets,Goals conceded,Tackles,Tackle success %,Last man tackles,Blocked shots,Interceptions,Clearances,Headed Clearance,Clearances off line,Recoveries,Duels won,Duels lost,Successful 50/50s,Aerial battles won,Aerial battles lost,Own goals,Errors leading to goal,Assists,Passes,Passes per match,Big chances created,Crosses,Cross accuracy %,Through balls,Accurate long balls,Yellow cards,Red cards,Fouls,Offsides,Goals,Headed goals,Goals with right foot,Goals with left foot,Hit woodwork,Goals per match,Penalties scored,Freekicks scored,Shots,Shots on target,Shooting accuracy %,Big chances missed,Saves,Penalties saved,Punches,High Claims,Catches,Sweeper clearances,Throw outs,Goal Kicks
0,0,Rolando Aarons,Midfielder,10,,,13.0,77%,,0.0,6.0,10.0,5.0,,23.0,29.0,34.0,5.0,4.0,6.0,,1.0,1,119,11.9,1.0,12.0,25%,0.0,7.0,1,0,9,0.0,1,0.0,0.0,1.0,0.0,0.1,0.0,0.0,2.0,1.0,50%,0.0,,,,,,,,
1,1,Almen Abdi,Midfielder,32,,,83.0,78%,,10.0,32.0,24.0,9.0,,137.0,140.0,153.0,12.0,22.0,31.0,,0.0,0,938,29.31,4.0,55.0,31%,2.0,36.0,4,0,38,1.0,2,0.0,2.0,0.0,1.0,0.06,0.0,1.0,39.0,10.0,26%,1.0,,,,,,,,
2,2,Abdul Rahman Baba,Defender,15,2.0,13.0,47.0,83%,0.0,1.0,23.0,32.0,13.0,1.0,76.0,67.0,65.0,8.0,8.0,13.0,0.0,1.0,1,526,35.07,0.0,31.0,16%,0.0,18.0,1,0,14,2.0,0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,
3,3,Mehdi Abeid,Midfielder,0,,,0.0,0%,,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0,0,0.0,0.0,0.0,0%,0.0,0.0,0,0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0%,0.0,,,,,,,,
4,4,Tammy Abraham,Forward,2,,,0.0,,,1.0,0.0,0.0,0.0,,,,,,,,,,0,10,5.0,0.0,2.0,,,,0,0,2,3.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0%,0.0,,,,,,,,
