<a id='top'></a>

# Getting Started with pandas
##### Notebook to explore the basics of using the pandas library, using footballer statistics data from FBref

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 15/06/2021<br>
Notebook last updated: 15/06/2021

![title](../../../img/pandas.jpeg)

## 1. Notebook Setup

### Import Libraries and Modules
The first thing that needs to be done before you can use any pandas functions is to import the pandas library to the notebook itself. This is done by the entering following code

`import pandas as pd`

The common convention is to rename the pandas library to `'pd'` as shorthand when using the pandas functions in the notebook.

Below we'll import the pandas library as well as a few other libraries (don't worry about these yet if you don't understand what these all are or do, just run the code and concentrate on the subsequent commands)

In [1]:
# System
import os

# Math Operations
import numpy as np
import math

# Data Preprocessing
import pandas as pd

# Display in Jupyter
from IPython.display import Image

# Ignore Warnings
import warnings
warnings.filterwarnings(action='ignore', message='^internal gelsd')

print('Setup Complete')

Setup Complete


### Notebook Settings

In [2]:
# Display all columns of pandas DataFrames
pd.set_option('display.max_columns', None)

---

## 2. Introduction to pandas

So far, we have just worked with ‘vanilla’ Python, the language that allows us to leverage some of the powerful libraries and packages that are at our disposal.

Today, we will be focussing on data manipulation library - pandas. 

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

When people start learning to code, they spend a lot of time focussing on the generic Python language. However, when working with Data Science, a huge proportion of your time is spent manipulating, reconciling, and cleansing messy data – this is where pandas comes in and my recommendation, is that you start using pandas as early as possible.

The main thing I want to stress is that you only need to know the basics of Pythons before getting started pandas.  YOU DO NOT have to be an expert Python programmer that read x number of books and done y number of courses before getting started with pandas. 

Before tackling this slightly long read, I'll quickly summarise the contents of this entry. 

This blog goes into detail of how to do the following: 
*    How to import the pandas library in our notebook.
*    Import a football data set downloaded from GitHub to our notebook using the `.read_csv()` function and create a DataFrame.
*    Analyse our data set using some important functions and attributes that are part of the Pandas library, including: `.head()`, `.tail()`, `.columns`, `.index`, `.values`, `type()`, `pandas.__version__`.
*    Subsetting specific rows and columns using `.iloc[]` and `.loc[]`.

Finally, in the references at the bottom I've included some of my favourite tutorials and reference material for learning Pandas, including a handy cheat sheet.

Let's begin...

---

## 3. Data Sources
The data used in this notebook is player statistics data scraped from [FBref](https://fbref.com/en/), which has in turn been provided by [StatsBomb](https://statsbomb.com/).

![title](../../../img/logos/fbref-logo-banner.png)

![title](../../../img/logos/statsbomb-logo.jpg)


More information about how to scrape this data will be covered in a separate post, however, if you do want to skip ahead and scrape your own FBref data, the notebook of code that I use can be found in my [`football_analytics`](https://github.com/eddwebster/football_analytics) GitHub repository at the following [[link](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Web%20Scraping%20and%20Parsing.ipynb)], which uses [Parth Athale](https://twitter.com/ParthAthale)'s [`Scrape-FBref-data`](https://github.com/parth1902/Scrape-FBref-data) scraper, which in turn was written using code from [Christopher Martin](https://github.com/chmartin)'s [`FBref_EPL`](https://github.com/chmartin/FBref_EPL) repository.

Another great package to scrape this data in R is the [`worldfootballR`](https://github.com/JaseZiv/worldfootballR) package by [Jason Zivkovic](https://twitter.com/jaseziv) (see guide [[link](https://www.dontblamethedata.com/blog/extract-data-using-worldfootballr/)]).

### Reading in Data as a DataFrame
To import the data set to the notebook, we need to use the call the `.read_csv()` function that is part of the pandas library. We then need to assign this function to a DataFrame variable, denoted 'df' (a common convention when working with DataFrames), as to be able to call the DataFrame later on.

A pandas DataFrame is an Python object, and is the equivalent to an Excel spreadsheet, a 2-dimensional labelled data structure with rows and columns of potentially different types. 

Loading data as a pandas DataFrame is achieved with commands like `df = pd.read_csv(...)` or `df = pd.read_excel(...)`, where the string within the brackets is the filepath of the file you wish to load.

This can be done as follows:

In [9]:
# Read in FBref CSV as a pandas DataFrame

## Define the filepath of the CSV file
data_dir_fbref = os.path.join('..', '..', '..', 'data', 'fbref', 'engineered', 'players_combined', 'players_combined_big5_engineered_latest.csv')

## Create pandas DataFrame 
df = pd.read_csv(data_dir_fbref)

### Displaying the Data
Ok, so the data has been successfully loaded, but where is it. The following command display the first few rows of the DataFrame, using the [`head()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html?highlight=head#pandas.DataFrame.head) method.

In [8]:
# Display the first five rows of the DataFrame, df
df.head()

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk
0,Patrick van Aanholt,nl NED,DF,Crystal Palace,26,1990,28.0,25.0,2184.0,5.0,1.0,0.0,0.0,7.0,0.0,0.21,0.04,0.25,0.21,0.25,3.1,3.1,1.5,0.13,0.06,0.19,0.13,0.19,24.3,33.0,12.0,4.0,36.4,1.36,0.49,0.15,0.42,0.09,1.9,1.9,845.0,1116.0,75.7,15182.0,7109.0,363.0,411.0,88.3,377.0,471.0,80.0,88.0,183.0,48.1,-0.5,17.0,58.0,28.0,7.0,101.0,895.0,221.0,21.0,0.0,183.0,21.0,42.0,11.0,0.0,1.0,0.0,682.0,131.0,303.0,717.0,175.0,30.0,189.0,4.0,3.0,24.0,19.0,37.0,41.0,1.69,23.0,11.0,4.0,1.0,1.0,3.0,0.12,2.0,0.0,1.0,0.0,0.0,0.0,48.0,30.0,33.0,13.0,2.0,21.0,38.0,55.3,17.0,253.0,78.0,30.8,159.0,71.0,23.0,36.0,5.0,0.0,31.0,36.0,56.0,2.0,1369.0,83.0,448.0,578.0,437.0,39.0,1148.0,17.0,26.0,65.4,20.0,0.0,767.0,4832.0,2651.0,765.0,709.0,92.7,23.0,17.0,0.0,18.0,10.0,3.0,0.0,0.0,0.0,225.0,2.0,6.0,25.0,17/18,Crystal Palace,Premier League,England,patrick van aanholt,patrick,aanholt,p,england,NED,Netherlands,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Rolando Aarons,eng ENG,"FW,MF",Newcastle Utd,21,1995,4.0,1.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.1,0.0,0.1,0.1,0.1,1.5,2.0,0.0,0.0,0.0,1.29,0.0,0.0,0.0,0.08,-0.2,-0.2,81.0,115.0,70.4,1095.0,304.0,50.0,63.0,79.4,22.0,34.0,64.7,3.0,8.0,37.5,-0.2,3.0,5.0,6.0,1.0,9.0,106.0,9.0,4.0,0.0,32.0,4.0,6.0,0.0,0.0,0.0,0.0,76.0,19.0,20.0,79.0,26.0,2.0,2.0,0.0,0.0,9.0,1.0,7.0,1.0,0.66,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,9.0,4.0,5.0,3.0,1.0,5.0,20.0,4.0,117.0,32.0,27.4,23.0,60.0,34.0,4.0,0.0,0.0,4.0,1.0,0.0,0.0,173.0,2.0,20.0,82.0,87.0,7.0,167.0,9.0,18.0,50.0,11.0,0.0,136.0,993.0,583.0,212.0,126.0,59.4,10.0,15.0,0.0,8.0,11.0,0.0,0.0,0.0,0.0,22.0,7.0,15.0,31.8,17/18,Newcastle Utd,Premier League,England,rolando aarons,rolando,aarons,r,england,ENG,England,Forward,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,Rolando Aarons,eng ENG,"MF,FW",Hellas Verona,21,1995,11.0,6.0,517.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.2,0.04,0.04,0.08,0.04,0.08,5.7,3.0,0.0,0.0,0.0,0.52,0.0,0.0,0.0,0.07,-0.2,-0.2,27.0,41.0,65.9,408.0,88.0,17.0,22.0,77.3,8.0,11.0,72.7,1.0,4.0,25.0,0.0,0.0,0.0,1.0,1.0,1.0,41.0,0.0,0.0,0.0,10.0,1.0,2.0,0.0,0.0,0.0,0.0,31.0,6.0,4.0,29.0,8.0,2.0,0.0,0.0,1.0,1.0,2.0,3.0,3.0,0.52,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,3.0,1.0,1.0,2.0,4.0,50.0,2.0,34.0,8.0,23.5,13.0,16.0,5.0,4.0,0.0,0.0,4.0,2.0,0.0,0.0,68.0,0.0,14.0,33.0,27.0,3.0,68.0,2.0,3.0,66.7,2.0,0.0,48.0,328.0,225.0,59.0,39.0,66.1,4.0,5.0,0.0,1.0,3.0,1.0,0.0,0.0,0.0,12.0,2.0,2.0,50.0,17/18,Hellas Verona,Serie A,Italy,rolando aarons,rolando,aarons,r,italy,ENG,England,Midfielder,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,Ignazio Abate,it ITA,DF,Milan,30,1986,17.0,11.0,1057.0,1.0,0.0,0.0,0.0,3.0,0.0,0.09,0.0,0.09,0.09,0.09,0.2,0.2,0.6,0.02,0.05,0.07,0.02,0.07,11.7,4.0,2.0,0.0,50.0,0.34,0.17,0.25,0.5,0.06,0.8,0.8,625.0,774.0,80.7,11501.0,4742.0,255.0,280.0,91.1,294.0,337.0,87.2,69.0,133.0,51.9,-0.6,11.0,58.0,20.0,8.0,62.0,659.0,115.0,9.0,0.0,122.0,16.0,26.0,2.0,0.0,0.0,1.0,466.0,119.0,189.0,65.0,584.0,16.0,104.0,0.0,4.0,14.0,9.0,26.0,26.0,2.22,22.0,2.0,0.0,2.0,0.0,4.0,0.34,3.0,0.0,0.0,1.0,0.0,0.0,21.0,15.0,12.0,8.0,1.0,11.0,18.0,61.1,7.0,141.0,46.0,32.6,53.0,58.0,30.0,29.0,2.0,0.0,27.0,4.0,17.0,0.0,870.0,40.0,231.0,464.0,226.0,12.0,757.0,1.0,5.0,20.0,2.0,0.0,552.0,2524.0,1494.0,605.0,565.0,93.4,3.0,6.0,0.0,11.0,8.0,3.0,0.0,0.0,0.0,106.0,5.0,4.0,55.6,17/18,Milan,Serie A,Italy,ignazio abate,ignazio,abate,i,italy,ITA,Italy,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,Aymen Abdennour,tn TUN,DF,Marseille,27,1989,8.0,6.0,500.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.02,0.0,0.02,0.02,0.02,5.6,2.0,1.0,0.0,50.0,0.36,0.18,0.0,0.0,0.05,-0.1,-0.1,304.0,336.0,90.5,6306.0,1794.0,90.0,99.0,90.9,166.0,174.0,95.4,45.0,56.0,80.4,0.0,0.0,11.0,0.0,0.0,7.0,335.0,1.0,1.0,0.0,38.0,9.0,0.0,0.0,0.0,0.0,0.0,285.0,24.0,27.0,294.0,20.0,17.0,0.0,2.0,1.0,6.0,2.0,4.0,1.0,0.18,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,5.0,1.0,0.0,4.0,7.0,57.1,3.0,23.0,8.0,34.8,17.0,6.0,0.0,6.0,3.0,0.0,3.0,2.0,8.0,1.0,359.0,34.0,224.0,160.0,5.0,2.0,358.0,1.0,1.0,100.0,1.0,0.0,269.0,1840.0,762.0,267.0,259.0,97.0,0.0,0.0,0.0,7.0,6.0,0.0,0.0,0.0,0.0,57.0,3.0,0.0,100.0,17/18,Marseille,Ligue 1,France,aymen abdennour,aymen,abdennour,a,france,TUN,Tunisia,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


By default, pandas only displays the first five rows of a DataFrame, this can be change by including a value in the parameters (within the brackets)

In [10]:
# Display the first twenty rows of the DataFrame, df
df.head(20)

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk
0,Patrick van Aanholt,nl NED,DF,Crystal Palace,26,1990,28.0,25.0,2184.0,5.0,1.0,0.0,0.0,7.0,0.0,0.21,0.04,0.25,0.21,0.25,3.1,3.1,1.5,0.13,0.06,0.19,0.13,0.19,24.3,33.0,12.0,4.0,36.4,1.36,0.49,0.15,0.42,0.09,1.9,1.9,845.0,1116.0,75.7,15182.0,7109.0,363.0,411.0,88.3,377.0,471.0,80.0,88.0,183.0,48.1,-0.5,17.0,58.0,28.0,7.0,101.0,895.0,221.0,21.0,0.0,183.0,21.0,42.0,11.0,0.0,1.0,0.0,682.0,131.0,303.0,717.0,175.0,30.0,189.0,4.0,3.0,24.0,19.0,37.0,41.0,1.69,23.0,11.0,4.0,1.0,1.0,3.0,0.12,2.0,0.0,1.0,0.0,0.0,0.0,48.0,30.0,33.0,13.0,2.0,21.0,38.0,55.3,17.0,253.0,78.0,30.8,159.0,71.0,23.0,36.0,5.0,0.0,31.0,36.0,56.0,2.0,1369.0,83.0,448.0,578.0,437.0,39.0,1148.0,17.0,26.0,65.4,20.0,0.0,767.0,4832.0,2651.0,765.0,709.0,92.7,23.0,17.0,0.0,18.0,10.0,3.0,0.0,0.0,0.0,225.0,2.0,6.0,25.0,17/18,Crystal Palace,Premier League,England,patrick van aanholt,patrick,aanholt,p,england,NED,Netherlands,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Rolando Aarons,eng ENG,"FW,MF",Newcastle Utd,21,1995,4.0,1.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.1,0.0,0.1,0.1,0.1,1.5,2.0,0.0,0.0,0.0,1.29,0.0,0.0,0.0,0.08,-0.2,-0.2,81.0,115.0,70.4,1095.0,304.0,50.0,63.0,79.4,22.0,34.0,64.7,3.0,8.0,37.5,-0.2,3.0,5.0,6.0,1.0,9.0,106.0,9.0,4.0,0.0,32.0,4.0,6.0,0.0,0.0,0.0,0.0,76.0,19.0,20.0,79.0,26.0,2.0,2.0,0.0,0.0,9.0,1.0,7.0,1.0,0.66,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,9.0,4.0,5.0,3.0,1.0,5.0,20.0,4.0,117.0,32.0,27.4,23.0,60.0,34.0,4.0,0.0,0.0,4.0,1.0,0.0,0.0,173.0,2.0,20.0,82.0,87.0,7.0,167.0,9.0,18.0,50.0,11.0,0.0,136.0,993.0,583.0,212.0,126.0,59.4,10.0,15.0,0.0,8.0,11.0,0.0,0.0,0.0,0.0,22.0,7.0,15.0,31.8,17/18,Newcastle Utd,Premier League,England,rolando aarons,rolando,aarons,r,england,ENG,England,Forward,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,Rolando Aarons,eng ENG,"MF,FW",Hellas Verona,21,1995,11.0,6.0,517.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.2,0.04,0.04,0.08,0.04,0.08,5.7,3.0,0.0,0.0,0.0,0.52,0.0,0.0,0.0,0.07,-0.2,-0.2,27.0,41.0,65.9,408.0,88.0,17.0,22.0,77.3,8.0,11.0,72.7,1.0,4.0,25.0,0.0,0.0,0.0,1.0,1.0,1.0,41.0,0.0,0.0,0.0,10.0,1.0,2.0,0.0,0.0,0.0,0.0,31.0,6.0,4.0,29.0,8.0,2.0,0.0,0.0,1.0,1.0,2.0,3.0,3.0,0.52,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,3.0,1.0,1.0,2.0,4.0,50.0,2.0,34.0,8.0,23.5,13.0,16.0,5.0,4.0,0.0,0.0,4.0,2.0,0.0,0.0,68.0,0.0,14.0,33.0,27.0,3.0,68.0,2.0,3.0,66.7,2.0,0.0,48.0,328.0,225.0,59.0,39.0,66.1,4.0,5.0,0.0,1.0,3.0,1.0,0.0,0.0,0.0,12.0,2.0,2.0,50.0,17/18,Hellas Verona,Serie A,Italy,rolando aarons,rolando,aarons,r,italy,ENG,England,Midfielder,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,Ignazio Abate,it ITA,DF,Milan,30,1986,17.0,11.0,1057.0,1.0,0.0,0.0,0.0,3.0,0.0,0.09,0.0,0.09,0.09,0.09,0.2,0.2,0.6,0.02,0.05,0.07,0.02,0.07,11.7,4.0,2.0,0.0,50.0,0.34,0.17,0.25,0.5,0.06,0.8,0.8,625.0,774.0,80.7,11501.0,4742.0,255.0,280.0,91.1,294.0,337.0,87.2,69.0,133.0,51.9,-0.6,11.0,58.0,20.0,8.0,62.0,659.0,115.0,9.0,0.0,122.0,16.0,26.0,2.0,0.0,0.0,1.0,466.0,119.0,189.0,65.0,584.0,16.0,104.0,0.0,4.0,14.0,9.0,26.0,26.0,2.22,22.0,2.0,0.0,2.0,0.0,4.0,0.34,3.0,0.0,0.0,1.0,0.0,0.0,21.0,15.0,12.0,8.0,1.0,11.0,18.0,61.1,7.0,141.0,46.0,32.6,53.0,58.0,30.0,29.0,2.0,0.0,27.0,4.0,17.0,0.0,870.0,40.0,231.0,464.0,226.0,12.0,757.0,1.0,5.0,20.0,2.0,0.0,552.0,2524.0,1494.0,605.0,565.0,93.4,3.0,6.0,0.0,11.0,8.0,3.0,0.0,0.0,0.0,106.0,5.0,4.0,55.6,17/18,Milan,Serie A,Italy,ignazio abate,ignazio,abate,i,italy,ITA,Italy,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,Aymen Abdennour,tn TUN,DF,Marseille,27,1989,8.0,6.0,500.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.02,0.0,0.02,0.02,0.02,5.6,2.0,1.0,0.0,50.0,0.36,0.18,0.0,0.0,0.05,-0.1,-0.1,304.0,336.0,90.5,6306.0,1794.0,90.0,99.0,90.9,166.0,174.0,95.4,45.0,56.0,80.4,0.0,0.0,11.0,0.0,0.0,7.0,335.0,1.0,1.0,0.0,38.0,9.0,0.0,0.0,0.0,0.0,0.0,285.0,24.0,27.0,294.0,20.0,17.0,0.0,2.0,1.0,6.0,2.0,4.0,1.0,0.18,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,5.0,1.0,0.0,4.0,7.0,57.1,3.0,23.0,8.0,34.8,17.0,6.0,0.0,6.0,3.0,0.0,3.0,2.0,8.0,1.0,359.0,34.0,224.0,160.0,5.0,2.0,358.0,1.0,1.0,100.0,1.0,0.0,269.0,1840.0,762.0,267.0,259.0,97.0,0.0,0.0,0.0,7.0,6.0,0.0,0.0,0.0,0.0,57.0,3.0,0.0,100.0,17/18,Marseille,Ligue 1,France,aymen abdennour,aymen,abdennour,a,france,TUN,Tunisia,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,Aly Abeid,mr MTN,DF,Levante,19,1997,1.0,1.0,76.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.19,0.19,0.0,0.19,0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,29.0,38.0,76.3,521.0,188.0,15.0,16.0,93.8,10.0,11.0,90.9,4.0,9.0,44.4,-0.2,2.0,0.0,2.0,1.0,2.0,31.0,7.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,20.0,10.0,8.0,20.0,10.0,1.0,7.0,0.0,0.0,1.0,1.0,2.0,3.0,3.46,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,1.0,2.0,0.0,1.0,1.0,100.0,0.0,9.0,4.0,44.4,7.0,2.0,0.0,2.0,0.0,0.0,2.0,1.0,2.0,0.0,46.0,3.0,20.0,19.0,11.0,0.0,39.0,0.0,0.0,0.0,0.0,0.0,25.0,144.0,95.0,21.0,20.0,95.2,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,2.0,0.0,17/18,Levante,La Liga,Spain,aly abeid,aly,abeid,a,spain,MTN,Mauritania,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,Mehdi Abeid,dz ALG,MF,Dijon,24,1992,19.0,14.0,1176.0,0.0,1.0,0.0,0.0,2.0,0.0,0.0,0.08,0.08,0.0,0.08,1.0,1.0,0.2,0.08,0.02,0.09,0.08,0.09,13.1,21.0,6.0,2.0,28.6,1.61,0.46,0.0,0.0,0.05,-1.0,-1.0,519.0,611.0,84.9,10250.0,2390.0,177.0,196.0,90.3,250.0,281.0,89.0,81.0,109.0,74.3,0.8,6.0,48.0,2.0,0.0,47.0,598.0,13.0,8.0,0.0,78.0,20.0,7.0,1.0,0.0,0.0,0.0,494.0,45.0,72.0,47.0,520.0,36.0,4.0,1.0,1.0,16.0,15.0,10.0,15.0,1.15,11.0,1.0,0.0,2.0,0.0,1.0,0.08,1.0,0.0,0.0,0.0,0.0,0.0,21.0,14.0,11.0,9.0,1.0,4.0,23.0,17.4,19.0,206.0,68.0,33.0,63.0,120.0,23.0,23.0,4.0,0.0,19.0,15.0,14.0,0.0,762.0,30.0,257.0,453.0,129.0,16.0,749.0,11.0,22.0,50.0,13.0,1.0,551.0,3788.0,1785.0,537.0,493.0,91.8,14.0,17.0,0.0,27.0,17.0,2.0,0.0,0.0,0.0,142.0,3.0,5.0,37.5,17/18,Dijon,Ligue 1,France,mehdi abeid,mehdi,abeid,m,france,ALG,Algeria,Midfielder,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,David Abraham,ar ARG,DF,Eint Frankfurt,31,1986,27.0,27.0,2302.0,0.0,2.0,0.0,0.0,3.0,0.0,0.0,0.08,0.08,0.0,0.08,0.5,0.5,0.4,0.02,0.01,0.03,0.02,0.03,25.6,12.0,1.0,0.0,8.3,0.47,0.04,0.0,0.0,0.04,-0.5,-0.5,1081.0,1313.0,82.3,25128.0,7959.0,241.0,275.0,87.6,588.0,655.0,89.8,243.0,369.0,65.9,1.6,6.0,66.0,3.0,2.0,91.0,1274.0,39.0,32.0,1.0,147.0,68.0,6.0,0.0,0.0,0.0,0.0,823.0,165.0,325.0,208.0,923.0,135.0,7.0,2.0,4.0,29.0,10.0,10.0,16.0,0.63,12.0,1.0,0.0,3.0,0.0,2.0,0.08,1.0,0.0,0.0,1.0,0.0,0.0,43.0,28.0,31.0,11.0,1.0,24.0,41.0,58.5,17.0,221.0,74.0,33.5,148.0,66.0,7.0,39.0,16.0,0.0,23.0,23.0,130.0,1.0,1561.0,151.0,830.0,844.0,43.0,13.0,1526.0,13.0,14.0,92.9,13.0,0.0,955.0,6200.0,3493.0,903.0,890.0,98.6,5.0,6.0,0.0,11.0,20.0,0.0,0.0,0.0,1.0,378.0,62.0,26.0,70.5,17/18,Eint Frankfurt,Bundesliga,Germany,david abraham,david,abraham,d,germany,ARG,Argentina,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,Tammy Abraham,eng ENG,"FW,MF",Swansea City,19,1997,31.0,15.0,1726.0,5.0,1.0,0.0,0.0,0.0,0.0,0.26,0.05,0.31,0.26,0.31,6.3,6.3,0.7,0.33,0.04,0.37,0.33,0.37,19.2,43.0,14.0,0.0,32.6,2.24,0.73,0.12,0.36,0.15,-1.3,-1.3,208.0,309.0,67.3,3094.0,683.0,123.0,159.0,77.4,63.0,95.0,66.3,13.0,28.0,46.4,0.3,10.0,19.0,7.0,3.0,19.0,290.0,19.0,0.0,0.0,102.0,7.0,11.0,0.0,0.0,0.0,0.0,202.0,49.0,58.0,24.0,222.0,35.0,1.0,7.0,0.0,6.0,11.0,14.0,32.0,1.68,20.0,0.0,6.0,3.0,2.0,2.0,0.1,1.0,0.0,0.0,1.0,0.0,0.0,11.0,5.0,3.0,4.0,4.0,1.0,13.0,7.7,12.0,178.0,58.0,32.6,14.0,67.0,97.0,12.0,4.0,0.0,8.0,4.0,7.0,0.0,535.0,11.0,42.0,246.0,275.0,72.0,516.0,26.0,45.0,57.8,27.0,0.0,368.0,2413.0,1041.0,691.0,392.0,56.7,60.0,35.0,0.0,15.0,27.0,9.0,0.0,0.0,0.0,61.0,30.0,30.0,50.0,17/18,Swansea City,Premier League,England,tammy abraham,tammy,abraham,t,england,ENG,England,Forward,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,Amir Abrashi,al ALB,MF,Freiburg,27,1990,12.0,11.0,850.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.5,0.02,0.06,0.08,0.02,0.08,9.4,5.0,0.0,0.0,0.0,0.53,0.0,0.0,0.0,0.04,-0.2,-0.2,260.0,335.0,77.6,4482.0,1183.0,124.0,157.0,79.0,104.0,124.0,83.9,26.0,42.0,61.9,-0.5,4.0,20.0,4.0,1.0,30.0,329.0,6.0,5.0,0.0,84.0,9.0,4.0,0.0,0.0,0.0,0.0,205.0,52.0,78.0,18.0,272.0,28.0,1.0,1.0,0.0,9.0,8.0,8.0,12.0,1.27,10.0,0.0,0.0,0.0,2.0,1.0,0.11,1.0,0.0,0.0,0.0,0.0,0.0,37.0,26.0,14.0,22.0,1.0,11.0,20.0,55.0,9.0,215.0,57.0,26.5,71.0,118.0,26.0,20.0,5.0,0.0,15.0,11.0,19.0,2.0,471.0,28.0,122.0,288.0,80.0,9.0,467.0,5.0,11.0,45.5,7.0,0.0,277.0,1277.0,553.0,266.0,230.0,86.5,9.0,14.0,0.0,20.0,27.0,0.0,0.0,0.0,0.0,137.0,21.0,19.0,52.5,17/18,Freiburg,Bundesliga,Germany,amir abrashi,amir,abrashi,a,germany,ALB,Albania,Midfielder,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Likewise, we can also display the final rows of the DataFrame using the [`tail()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) method.

In [11]:
# Display the last five rows of the DataFrame, df
df.tail()

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk
11257,Iván Villar,es ESP,GK,Celta Vigo,23,1997,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Celta Vigo,La Liga,Spain,ivan villar,ivan,villar,i,spain,ESP,Spain,Goalkeeper,,7.0,7.0,630.0,10.0,1.43,22.0,15.0,0.636,1.0,3.0,3.0,1.0,14.3,2.0,2.0,0.0,0.0,7.0,0.0,0.0,1.0,8.6,0.32,-0.4,-0.05,30.0,96.0,31.3,150.0,28.0,46.7,39.2,56.0,46.4,40.8,67.0,5.0,7.5,8.0,1.14,17.2
11258,Rubén Yáñez,es ESP,GK,Getafe,27,1993,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Getafe,La Liga,Spain,ruben yanez,ruben,yanez,r,spain,ESP,Spain,Goalkeeper,,2.0,2.0,180.0,1.0,0.5,2.0,2.0,1.0,1.0,1.0,0.0,1.0,50.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,1.4,0.22,0.4,0.21,18.0,42.0,42.9,37.0,2.0,86.5,55.7,12.0,83.3,61.2,18.0,2.0,11.1,1.0,0.5,13.5
11259,Robin Zentner,de GER,GK,Mainz 05,26,1994,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Mainz 05,Bundesliga,Germany,robin zentner,robin,zentner,r,germany,GER,Germany,Goalkeeper,,13.0,13.0,1170.0,26.0,2.0,70.0,47.0,0.657,1.0,3.0,9.0,1.0,7.7,2.0,2.0,0.0,0.0,13.0,1.0,3.0,1.0,20.5,0.28,-4.5,-0.35,61.0,189.0,32.3,370.0,95.0,35.1,34.8,106.0,55.7,47.3,130.0,10.0,7.7,12.0,0.92,14.0
11260,Ron-Robert Zieler,de GER,GK,Köln,31,1989,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Köln,Bundesliga,Germany,ron-robert zieler,ron-robert,zieler,r,germany,GER,Germany,Goalkeeper,,1.0,0.0,51.0,1.0,1.76,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.6,0.0,0.0,0.0,0.7,0.01,-0.3,-0.46,6.0,11.0,54.5,17.0,4.0,35.3,35.0,6.0,83.3,49.7,10.0,0.0,0.0,0.0,0.0,11.8
11261,Jeroen Zoet,nl NED,GK,Spezia,29,1991,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Spezia,Serie A,Italy,jeroen zoet,jeroen,zoet,j,italy,NED,Netherlands,Goalkeeper,,2.0,2.0,153.0,4.0,2.35,5.0,2.0,0.4,1.0,0.0,1.0,1.0,50.0,1.0,1.0,0.0,0.0,1.7,0.0,1.0,0.0,3.0,0.4,-1.0,-0.6,9.0,20.0,45.0,52.0,11.0,34.6,33.0,8.0,25.0,30.8,16.0,0.0,0.0,0.0,0.0,8.1


As per the `.head()` method, the number of rows to show in the `tail()` method can be defined by being entered within the parameters.

In [21]:
# Display the last twenty rows of the DataFrame, df
df.tail(20)

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk
11242,Marco Silvestri,it ITA,GK,Hellas Verona,29,1991,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Hellas Verona,Serie A,Italy,marco silvestri,marco,silvestri,m,italy,ITA,Italy,Goalkeeper,,14.0,14.0,1260.0,14.0,1.0,58.0,44.0,0.776,5.0,5.0,4.0,4.0,28.6,2.0,1.0,0.0,1.0,14.0,0.0,2.0,1.0,12.8,0.22,-0.2,-0.02,96.0,314.0,30.6,295.0,27.0,69.2,47.1,123.0,89.4,58.4,146.0,5.0,3.4,5.0,0.36,13.0
11243,Unai Simón,es ESP,GK,Athletic Club,23,1997,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Athletic Club,La Liga,Spain,unai simon,unai,simon,u,spain,ESP,Spain,Goalkeeper,,15.0,15.0,1350.0,18.0,1.2,36.0,22.0,0.583,5.0,3.0,7.0,3.0,20.0,3.0,3.0,0.0,0.0,15.0,0.0,3.0,1.0,14.3,0.32,-2.7,-0.18,124.0,290.0,42.8,370.0,67.0,51.9,43.2,105.0,93.3,67.5,143.0,8.0,5.6,11.0,0.73,14.3
11244,Tobias Sippel,de GER,GK,M'Gladbach,32,1988,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,M'Gladbach,Bundesliga,Germany,tobias sippel,tobias,sippel,t,germany,GER,Germany,Goalkeeper,,1.0,1.0,90.0,1.0,1.0,2.0,1.0,0.5,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.2,0.59,0.2,0.18,5.0,9.0,55.6,20.0,2.0,40.0,34.9,5.0,20.0,20.2,6.0,1.0,16.7,2.0,2.0,24.0
11245,Salvatore Sirigu,it ITA,GK,Torino,33,1987,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Torino,Serie A,Italy,salvatore sirigu,salvatore,sirigu,s,italy,ITA,Italy,Goalkeeper,,12.0,12.0,1080.0,28.0,2.33,50.0,24.0,0.48,1.0,4.0,7.0,1.0,8.3,2.0,2.0,0.0,0.0,12.0,1.0,3.0,0.0,19.7,0.36,-8.3,-0.69,72.0,159.0,45.3,336.0,50.0,35.4,33.1,96.0,41.7,32.3,127.0,6.0,4.7,2.0,0.17,11.5
11246,Łukasz Skorupski,pl POL,GK,Bologna,29,1991,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Bologna,Serie A,Italy,ukasz skorupski,ukasz,skorupski,u,italy,POL,Poland,Goalkeeper,,10.0,10.0,900.0,17.0,1.7,45.0,30.0,0.644,4.0,0.0,6.0,1.0,10.0,1.0,1.0,0.0,0.0,10.0,0.0,3.0,1.0,13.3,0.27,-2.7,-0.27,46.0,117.0,39.3,276.0,48.0,34.1,33.3,71.0,32.4,28.3,78.0,13.0,16.7,5.0,0.5,14.3
11247,Yann Sommer,ch SUI,GK,M'Gladbach,32,1988,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,M'Gladbach,Bundesliga,Germany,yann sommer,yann,sommer,y,germany,SUI,Switzerland,Goalkeeper,,12.0,12.0,1080.0,21.0,1.75,45.0,27.0,0.6,4.0,5.0,3.0,1.0,8.3,3.0,3.0,0.0,0.0,12.0,0.0,3.0,0.0,15.8,0.33,-3.2,-0.29,63.0,142.0,44.4,404.0,46.0,31.2,32.3,79.0,20.3,25.7,80.0,7.0,8.8,3.0,0.27,12.6
11248,David Soria,es ESP,GK,Getafe,27,1993,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Getafe,La Liga,Spain,david soria,david,soria,d,spain,ESP,Spain,Goalkeeper,,12.0,12.0,1080.0,14.0,1.17,35.0,25.0,0.686,3.0,4.0,5.0,5.0,41.7,3.0,3.0,0.0,0.0,12.0,0.0,2.0,1.0,11.7,0.26,-1.3,-0.11,121.0,336.0,36.0,312.0,4.0,86.5,58.7,87.0,75.9,53.1,76.0,9.0,11.8,15.0,1.25,15.7
11249,Marco Sportiello,it ITA,GK,Atalanta,28,1992,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Atalanta,Serie A,Italy,marco sportiello,marco,sportiello,m,italy,ITA,Italy,Goalkeeper,,8.0,7.0,655.0,15.0,2.06,33.0,18.0,0.545,4.0,1.0,2.0,0.0,0.0,1.0,0.0,1.0,0.0,7.3,0.0,1.0,0.0,10.7,0.3,-4.3,-0.59,26.0,61.0,42.6,164.0,41.0,23.8,30.7,37.0,59.5,47.5,49.0,4.0,8.2,6.0,0.82,16.1
11250,Thomas Strakosha,al ALB,GK,Lazio,25,1995,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Lazio,Serie A,Italy,thomas strakosha,thomas,strakosha,t,italy,ALB,Albania,Goalkeeper,,5.0,5.0,450.0,11.0,2.2,23.0,12.0,0.522,1.0,1.0,3.0,1.0,20.0,0.0,0.0,0.0,0.0,5.0,0.0,1.0,0.0,8.9,0.39,-2.1,-0.43,28.0,68.0,41.2,179.0,29.0,35.2,31.6,31.0,16.1,25.7,43.0,3.0,7.0,0.0,0.0,11.5
11251,Wojciech Szczęsny,pl POL,GK,Juventus,30,1990,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Juventus,Serie A,Italy,wojciech szczesny,wojciech,szczesny,w,italy,POL,Poland,Goalkeeper,,9.0,9.0,810.0,11.0,1.22,28.0,19.0,0.643,3.0,5.0,1.0,1.0,11.1,1.0,1.0,0.0,0.0,9.0,0.0,2.0,1.0,9.0,0.29,-1.0,-0.11,29.0,45.0,64.4,239.0,37.0,15.5,27.6,51.0,15.7,23.2,67.0,2.0,3.0,9.0,1.0,15.5


These are important checks to do at the start of you notebook, to ensure that data set has been imported properly to the notebook properly and isn't missing any important columns for example.

Another important example for the `head()` in particular is to ensure that the first row imported to the DataFrame is data, and not a row of column names. This is something that could happen depending on your data set.

---

## 4. Data Handling
The following section covers the basics of exploring the DataFrame using pandas methods and functions. 

What is a method and what is a function?
*    A method is...
*    An attribute is...

### Object type
The object ype can be determined using the `type()` function.

In [20]:
type(df)

pandas.core.frame.DataFrame

We can see that the object `df` is a DataFrame.

### Shape of the DataFrame
The [`shape()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html?highlight=shape#pandas.DataFrame.shape) attribute informs you of the number of rows and columns in your data set. This become especially useful when working with larger data sets, so you can find out what sort of beast you are tackling.

In [13]:
# Print the shape of the raw DataFrame, df
df.shape

(11262, 205)

### Accessing individual parts of the DataFrame

A pandas DataFrame consists of three parts:
1.    the column names running across the top,
2.	  the index on the left-hand-side indicating the row number, and finally
3.	  the main body of the DataFrame with all the data.

#### Columns in the DataFrame
The columns in the DataFrame can be determined using the [`columns()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html?highlight=columns#pandas.DataFrame.columns) method.

In [22]:
df.columns

Index(['player', 'nationality', 'position', 'squad', 'age', 'birth_year',
       'games', 'games_starts', 'minutes', 'goals',
       ...
       'passes_length_avg_gk', 'goal_kicks', 'pct_goal_kicks_launched',
       'goal_kick_length_avg', 'crosses_gk', 'crosses_stopped_gk',
       'crosses_stopped_pct_gk', 'def_actions_outside_pen_area_gk',
       'def_actions_outside_pen_area_per90_gk', 'avg_distance_def_actions_gk'],
      dtype='object', length=205)

#### Index of the DataFrame
To access the 'row names' i.e. the index of the DataFrame, we can use the [`index`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.index.html?highlight=index#pandas.DataFrame.index) attribute:

In [23]:
df.index

RangeIndex(start=0, stop=11262, step=1)

#### Values in the DataFrame
To access the body of the DataFrame i.e. the values, we can use the [`values`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html?highlight=values#pandas.DataFrame.values) attribute:

In [24]:
df.values

array([['Patrick van Aanholt', 'nl NED', 'DF', ..., nan, nan, nan],
       ['Rolando Aarons', 'eng ENG', 'FW,MF', ..., nan, nan, nan],
       ['Rolando Aarons', 'eng ENG', 'MF,FW', ..., nan, nan, nan],
       ...,
       ['Robin Zentner', 'de GER', 'GK', ..., 12.0, 0.92, 14.0],
       ['Ron-Robert Zieler', 'de GER', 'GK', ..., 0.0, 0.0, 11.8],
       ['Jeroen Zoet', 'nl NED', 'GK', ..., 0.0, 0.0, 8.1]], dtype=object)

The `values` attribute is particularly useful if you want to take the data from a DataFrame and use it with a Python library that doesn't understand what a Pandas DataFrame is. This is like a NumPy array representation of your data set.

### Data Types in the DataFrame
The columns in the DataFrame can be determined using the [`dtypes()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html?highlight=dtypes#pandas.DataFrame.dtypes) method.

In [14]:
# Data types of the features of the raw DataFrame, df
df.dtypes

player                                    object
nationality                               object
position                                  object
squad                                     object
age                                        int64
                                          ...   
crosses_stopped_gk                       float64
crosses_stopped_pct_gk                   float64
def_actions_outside_pen_area_gk          float64
def_actions_outside_pen_area_per90_gk    float64
avg_distance_def_actions_gk              float64
Length: 205, dtype: object

If the DataFrame has too many columns that it cannot show all of them, the following code can be used:

In [15]:
# Displays all columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df.dtypes)

player                                    object
nationality                               object
position                                  object
squad                                     object
age                                        int64
birth_year                                 int64
games                                    float64
games_starts                             float64
minutes                                  float64
goals                                    float64
assists                                  float64
pens_made                                float64
pens_att                                 float64
cards_yellow                             float64
cards_red                                float64
goals_per90                              float64
assists_per90                            float64
goals_assists_per90                      float64
goals_pens_per90                         float64
goals_assists_pens_per90                 float64
xg                  

### Info
The [`info()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html?highlight=info#pandas.DataFrame.info) method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type (int64, float64 etc.) and the number of non-null values.

In [16]:
# Info for the DataFrame, df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11262 entries, 0 to 11261
Columns: 205 entries, player to avg_distance_def_actions_gk
dtypes: float64(186), int64(2), object(17)
memory usage: 17.6+ MB


This is particularly useful when you don't have too many columns in the DataFrame.

### Describe
The description of the DataFrame can be determined using the [`describe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html?highlight=describe#pandas.DataFrame.describe) method.

In [17]:
# Description of the DataFrame, df, showing some summary statistics for each numerical column in the DataFrame
df.describe()

Unnamed: 0,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk
count,11262.0,11262.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,10497.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0,765.0
mean,25.593323,1990.4477,16.32333,12.74688,1144.425455,1.554254,1.063447,0.150519,0.19215,2.357721,0.126322,0.115243,0.080019,0.195258,0.105706,0.18571,1.556835,1.40986,1.03542,0.127636,0.083804,0.211478,0.118295,0.202184,12.716043,14.451843,4.815662,0.607602,24.474269,1.212895,0.386176,0.065567,0.178622,0.072664,-0.002677,-0.006126,445.268553,565.570639,76.117653,8819.563399,3036.681814,176.96056,204.042298,84.504935,192.147185,225.109746,81.053253,68.995904,116.977041,56.271716,0.027932,10.508812,34.064304,9.196913,2.543203,38.833381,510.5996,54.971039,14.741736,1.013623,90.119748,17.223683,14.31247,5.682767,1.201296,1.23683,0.329713,367.121177,76.02334,122.426122,160.698962,334.506811,26.639421,23.556159,7.627322,1.988473,11.136515,9.235496,14.299514,22.794513,1.846336,16.292369,2.073259,1.431933,1.151472,1.347623,2.600267,0.198085,1.742593,0.17062,0.185672,0.213394,0.208345,0.025912,19.605316,12.309898,9.837763,7.479185,2.288368,6.477946,19.169382,28.158455,12.691436,179.519386,49.859198,26.873383,59.801658,81.030485,38.687244,18.164618,3.864914,0.081547,14.299705,9.693436,25.42174,0.2998,699.058207,71.692388,226.953987,337.66619,179.076498,25.838239,645.564352,11.720587,19.241783,53.524378,12.691245,0.617796,453.449462,2657.89759,1447.90502,523.629513,445.266171,83.897142,14.36458,13.140516,0.053634,14.65714,14.040392,2.309422,0.146232,0.177479,0.045823,106.97323,16.197771,16.197771,40.133886,16.08366,15.909804,1430.934641,22.001307,1.528732,66.065359,46.430065,0.672248,5.950327,4.002614,5.947712,4.324183,24.653072,2.656209,2.062745,0.447059,0.146405,15.898954,0.504575,2.777778,0.635294,21.129281,0.295216,-0.185752,-0.053974,103.555556,258.734641,40.692418,379.117647,64.23268,46.836209,39.81634,114.411765,64.506275,50.004444,148.098039,11.473203,7.298301,10.028758,0.623046,14.266405
std,4.602689,65.174849,11.042223,10.778573,939.635449,3.092042,1.832697,0.685917,0.809236,2.620818,0.402746,0.360014,0.307908,0.486745,0.350827,0.478557,2.725521,2.369988,1.533883,0.209283,0.287666,0.369811,0.196178,0.361194,10.440441,19.782944,7.629961,2.132088,22.684279,1.581844,0.702089,0.113071,0.256381,0.06574,1.092712,1.074711,441.631116,534.157231,12.286452,9102.108878,3599.869291,180.438872,203.027758,14.479543,207.463176,232.272203,16.398028,84.296411,144.004791,20.830186,0.923747,14.3238,41.261246,12.860725,4.545488,44.286723,489.316539,82.888233,22.68831,2.05556,86.607943,21.769254,22.359383,17.095314,4.674949,4.877279,1.401758,372.710479,78.110593,137.555866,266.106595,393.455766,30.959516,56.036699,22.742406,2.570752,11.624301,10.061889,15.867659,28.17837,2.365993,19.465842,5.706278,2.84856,1.930361,2.379619,3.869596,0.446919,2.652234,0.635238,0.599886,0.554568,0.572934,0.164189,20.905707,13.451205,11.705159,8.673675,2.959169,7.641819,20.642363,22.300769,14.072397,179.561672,48.75368,12.877891,65.123776,89.410447,52.964717,18.542437,6.093345,0.328435,14.756106,11.418777,39.644846,0.685455,630.382765,164.284226,283.635202,347.071798,215.567744,37.320497,588.052777,16.014729,26.309453,30.960976,17.29559,1.275375,428.998887,2594.318649,1485.994955,478.017903,421.414883,16.482776,18.741303,16.588092,0.236447,14.413658,15.735127,4.873094,0.465925,0.462829,0.223214,102.222188,22.476583,20.038923,25.209737,13.182203,13.290097,1189.759737,18.660405,0.980883,56.904286,40.869053,0.156737,6.233052,3.860797,5.684312,4.467978,21.264533,2.653306,2.170419,0.759845,0.37876,13.219847,0.82598,2.95211,0.945676,18.089031,0.086124,3.405907,0.377885,97.339591,243.582778,11.389631,327.84536,57.913171,17.774687,8.573005,101.460095,24.34118,13.593708,129.478453,11.372483,5.185555,10.524552,0.491983,2.838471
min,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.28,-0.28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.33,-1.0,0.0,-7.7,-6.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-15.0,-2.18,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,22.0,1989.0,7.0,3.0,318.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.01,0.01,0.04,0.01,0.04,3.5,1.0,0.0,0.0,0.0,0.27,0.0,0.0,0.0,0.03,-0.4,-0.4,96.0,131.0,70.8,1686.0,460.0,38.0,45.0,81.4,35.0,45.0,75.6,9.0,17.0,46.6,-0.3,1.0,5.0,1.0,0.0,5.0,117.0,6.0,1.0,0.0,20.0,2.0,1.0,0.0,0.0,0.0,0.0,79.0,16.0,23.0,19.0,47.0,4.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,0.54,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,0.0,1.0,3.0,7.7,2.0,34.0,9.0,22.8,8.0,14.0,3.0,3.0,0.0,0.0,2.0,1.0,2.0,0.0,175.0,5.0,31.0,56.0,20.0,3.0,162.0,1.0,2.0,40.0,1.0,0.0,110.0,595.0,301.0,136.0,111.0,75.9,2.0,1.0,0.0,3.0,2.0,0.0,0.0,0.0,0.0,25.0,2.0,3.0,23.8,3.0,3.0,270.0,5.0,1.01,13.0,9.0,0.632,1.0,1.0,1.0,1.0,10.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,4.4,0.26,-1.9,-0.2,17.0,40.0,35.0,67.0,11.0,34.9,34.2,22.0,47.8,41.7,27.0,2.0,4.5,2.0,0.32,12.9
50%,25.0,1993.0,14.0,10.0,930.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.08,0.0,0.08,0.5,0.5,0.4,0.05,0.05,0.12,0.05,0.12,10.3,7.0,2.0,0.0,25.0,0.8,0.18,0.0,0.0,0.06,0.0,0.0,310.0,410.0,77.5,5775.0,1686.0,122.0,144.0,86.9,121.0,149.0,83.8,36.0,64.0,57.8,0.0,5.0,19.0,4.0,1.0,24.0,366.0,22.0,6.0,0.0,64.0,9.0,5.0,0.0,0.0,0.0,0.0,252.0,52.0,75.0,61.0,177.0,16.0,2.0,2.0,1.0,7.0,6.0,9.0,12.0,1.57,9.0,0.0,0.0,0.0,0.0,1.0,0.1,1.0,0.0,0.0,0.0,0.0,0.0,13.0,8.0,5.0,5.0,1.0,4.0,12.0,28.6,8.0,126.0,37.0,27.5,36.0,51.0,17.0,12.0,1.0,0.0,10.0,6.0,9.0,0.0,528.0,20.0,113.0,231.0,94.0,12.0,487.0,6.0,9.0,59.1,6.0,0.0,335.0,1898.0,991.0,397.0,331.0,89.5,7.0,7.0,0.0,11.0,9.0,0.0,0.0,0.0,0.0,77.0,8.0,10.0,43.2,13.0,13.0,1170.0,18.0,1.42,51.0,35.0,0.692,4.0,3.0,4.0,3.0,23.8,2.0,1.0,0.0,0.0,13.0,0.0,2.0,0.0,16.5,0.29,-0.2,-0.03,75.0,189.0,40.0,316.0,49.0,46.7,39.7,89.0,66.9,51.0,115.0,8.0,7.1,6.0,0.55,14.3
75%,29.0,1996.0,26.0,21.0,1842.0,2.0,1.0,0.0,0.0,4.0,0.0,0.14,0.11,0.28,0.13,0.26,1.8,1.7,1.4,0.17,0.12,0.3,0.16,0.29,20.5,19.0,6.0,0.0,38.1,1.8,0.57,0.11,0.33,0.1,0.2,0.2,667.0,864.0,83.7,13215.0,4380.0,259.0,301.0,91.3,282.0,334.0,90.8,97.0,164.0,69.2,0.2,14.0,49.0,13.0,3.0,59.0,765.0,62.0,19.0,1.0,139.0,24.0,18.0,2.0,0.0,0.0,0.0,539.0,112.0,175.0,161.0,498.0,39.0,11.0,5.0,3.0,16.0,14.0,21.0,32.0,2.6,23.0,1.0,2.0,2.0,2.0,4.0,0.29,2.0,0.0,0.0,0.0,0.0,0.0,30.0,19.0,15.0,11.0,3.0,10.0,28.0,42.3,18.0,270.0,77.0,32.4,93.0,116.0,53.0,28.0,5.0,0.0,22.0,15.0,31.0,0.0,1080.0,63.0,315.0,512.0,264.0,33.0,976.0,16.0,26.0,73.3,17.0,1.0,679.0,3934.0,2103.0,787.0,667.0,96.5,20.0,19.0,0.0,22.0,20.0,2.0,0.0,0.0,0.0,161.0,21.0,23.0,57.8,29.0,29.0,2610.0,37.0,1.77,116.0,82.0,0.75,10.0,7.0,10.0,7.0,33.3,4.0,3.0,1.0,0.0,29.0,1.0,4.0,1.0,35.9,0.33,1.3,0.14,170.0,418.0,45.8,658.0,101.0,57.4,44.7,198.0,84.9,60.5,259.0,19.0,9.5,15.0,0.88,15.7
max,43.0,2004.0,38.0,38.0,3420.0,36.0,21.0,14.0,15.0,17.0,5.0,22.5,18.0,22.5,22.5,22.5,29.7,25.7,18.4,5.25,25.4,25.4,5.25,25.4,38.0,196.0,92.0,47.0,100.0,36.0,30.0,1.0,1.0,0.88,12.7,12.5,2864.0,3229.0,100.0,65376.0,27017.0,1638.0,1745.0,100.0,1548.0,1625.0,100.0,607.0,1142.0,100.0,8.3,131.0,442.0,141.0,49.0,321.0,3153.0,718.0,232.0,48.0,527.0,227.0,223.0,190.0,85.0,78.0,25.0,2505.0,625.0,1120.0,2519.0,2900.0,237.0,477.0,234.0,34.0,98.0,90.0,122.0,238.0,90.0,168.0,68.0,39.0,27.0,32.0,43.0,18.0,27.0,9.0,11.0,6.0,7.0,2.0,160.0,109.0,98.0,67.0,28.0,56.0,141.0,100.0,98.0,1056.0,303.0,100.0,525.0,604.0,579.0,125.0,60.0,4.0,98.0,89.0,329.0,7.0,3520.0,1441.0,1683.0,2395.0,1746.0,322.0,3393.0,184.0,274.0,100.0,197.0,14.0,2617.0,17993.0,11141.0,3096.0,2833.0,100.0,138.0,172.0,2.0,98.0,167.0,56.0,6.0,5.0,4.0,597.0,266.0,290.0,100.0,38.0,38.0,3420.0,76.0,15.0,231.0,167.0,1.0,32.0,18.0,25.0,22.0,100.0,15.0,13.0,4.0,2.0,38.0,5.0,15.0,5.0,81.2,0.79,12.9,1.72,412.0,1103.0,100.0,1362.0,234.0,100.0,103.0,451.0,100.0,79.3,569.0,53.0,50.0,59.0,4.0,27.5


### Null Values
The null values in a DataFrame can be determined using the [`isnull()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isnull.html?highlight=null#pandas.DataFrame.isnull) method, with the following code:

In [18]:
# Counts of NULL values

##
null_values = df.isnull().sum(axis=0)

##
null_values[null_values != 0]

games                                      765
games_starts                               765
minutes                                    765
goals                                      765
assists                                    765
                                         ...  
crosses_stopped_gk                       10497
crosses_stopped_pct_gk                   10497
def_actions_outside_pen_area_gk          10497
def_actions_outside_pen_area_per90_gk    10497
avg_distance_def_actions_gk              10497
Length: 193, dtype: int64

### Subsetting specific column(s) of the DataFrame
To just see the information for a specified column, or columns, you can type the column name in square brackets between single or double quotation marks. If you've used pandas before, you might know that you can also perform the same action through dot notation when used for a single column only. My personal preference is the square brackets, necessary if your column names have a space in name.

#### Square Brackets

In [27]:
# Select a single column
df['player']

0        Patrick van Aanholt
1             Rolando Aarons
2             Rolando Aarons
3              Ignazio Abate
4            Aymen Abdennour
                ...         
11257            Iván Villar
11258            Rubén Yáñez
11259          Robin Zentner
11260      Ron-Robert Zieler
11261            Jeroen Zoet
Name: player, Length: 11262, dtype: object

#### Dot notation

In [34]:
# Select a single column
df.player

0        Patrick van Aanholt
1             Rolando Aarons
2             Rolando Aarons
3              Ignazio Abate
4            Aymen Abdennour
                ...         
11257            Iván Villar
11258            Rubén Yáñez
11259          Robin Zentner
11260      Ron-Robert Zieler
11261            Jeroen Zoet
Name: player, Length: 11262, dtype: object

When selecting only a single column, we now no longer have a Python DataFrame but now a Series. As we previously mentioned, a DataFrame is a Python object, and is a 2-dimensional labelled data structure with rows and columns. Each of these individual columns is a a series, i.e. a 1D NumPy array.

We can subset multiple columns as per code above. Note that when selecting multiple columns, two sets of square brackets are required. This is because we are using a Python list, denoted by square brackets, to pull out multiple columns:

In [29]:
# Select several columns
df[['player', 'position', 'age']]

Unnamed: 0,player,position,age
0,Patrick van Aanholt,DF,26
1,Rolando Aarons,"FW,MF",21
2,Rolando Aarons,"MF,FW",21
3,Ignazio Abate,DF,30
4,Aymen Abdennour,DF,27
...,...,...,...
11257,Iván Villar,GK,23
11258,Rubén Yáñez,GK,27
11259,Robin Zentner,GK,26
11260,Ron-Robert Zieler,GK,31


We can then take this sliced column and save it to a variable, e.g.

In [32]:
# Select several columns and assign to a new DataFrame
df_demographics = df[['player', 'position', 'age']]

In [35]:
# Display the first five rows of the DataFrame, df_demographics
df_demographics.head()

Unnamed: 0,player,position,age
0,Patrick van Aanholt,DF,26
1,Rolando Aarons,"FW,MF",21
2,Rolando Aarons,"MF,FW",21
3,Ignazio Abate,DF,30
4,Aymen Abdennour,DF,27


### Subsetting specific row(s) of the DataFrame
When subsetting rows, there are three main ways to do this:
1.    `loc[]` - label based indexing
2.	  `iloc[]` - positional indexing (remember Python indexes start at 0)
3.	  `ix[]`

`.ix[]` is now deprecated so I'll move on from this and just recommend you use `.loc[]` or `.iloc[]`, e.g.

In [36]:
df.loc[2]     # finds the row with the character "2", like character matching

player                                   Rolando Aarons
nationality                                     eng ENG
position                                          MF,FW
squad                                     Hellas Verona
age                                                  21
                                              ...      
crosses_stopped_gk                                  NaN
crosses_stopped_pct_gk                              NaN
def_actions_outside_pen_area_gk                     NaN
def_actions_outside_pen_area_per90_gk               NaN
avg_distance_def_actions_gk                         NaN
Name: 2, Length: 205, dtype: object

In [37]:
df.iloc[2]    # finds the row with index position 2

player                                   Rolando Aarons
nationality                                     eng ENG
position                                          MF,FW
squad                                     Hellas Verona
age                                                  21
                                              ...      
crosses_stopped_gk                                  NaN
crosses_stopped_pct_gk                              NaN
def_actions_outside_pen_area_gk                     NaN
def_actions_outside_pen_area_per90_gk               NaN
avg_distance_def_actions_gk                         NaN
Name: 2, Length: 205, dtype: object

To select multiple rows, just like we did for columns, we put our identifiers between another set of square brackets because it is a Python list, i.e.

In [38]:
df.loc[[2, 0]] 

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk
2,Rolando Aarons,eng ENG,"MF,FW",Hellas Verona,21,1995,11.0,6.0,517.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.2,0.04,0.04,0.08,0.04,0.08,5.7,3.0,0.0,0.0,0.0,0.52,0.0,0.0,0.0,0.07,-0.2,-0.2,27.0,41.0,65.9,408.0,88.0,17.0,22.0,77.3,8.0,11.0,72.7,1.0,4.0,25.0,0.0,0.0,0.0,1.0,1.0,1.0,41.0,0.0,0.0,0.0,10.0,1.0,2.0,0.0,0.0,0.0,0.0,31.0,6.0,4.0,29.0,8.0,2.0,0.0,0.0,1.0,1.0,2.0,3.0,3.0,0.52,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,3.0,1.0,1.0,2.0,4.0,50.0,2.0,34.0,8.0,23.5,13.0,16.0,5.0,4.0,0.0,0.0,4.0,2.0,0.0,0.0,68.0,0.0,14.0,33.0,27.0,3.0,68.0,2.0,3.0,66.7,2.0,0.0,48.0,328.0,225.0,59.0,39.0,66.1,4.0,5.0,0.0,1.0,3.0,1.0,0.0,0.0,0.0,12.0,2.0,2.0,50.0,17/18,Hellas Verona,Serie A,Italy,rolando aarons,rolando,aarons,r,italy,ENG,England,Midfielder,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
0,Patrick van Aanholt,nl NED,DF,Crystal Palace,26,1990,28.0,25.0,2184.0,5.0,1.0,0.0,0.0,7.0,0.0,0.21,0.04,0.25,0.21,0.25,3.1,3.1,1.5,0.13,0.06,0.19,0.13,0.19,24.3,33.0,12.0,4.0,36.4,1.36,0.49,0.15,0.42,0.09,1.9,1.9,845.0,1116.0,75.7,15182.0,7109.0,363.0,411.0,88.3,377.0,471.0,80.0,88.0,183.0,48.1,-0.5,17.0,58.0,28.0,7.0,101.0,895.0,221.0,21.0,0.0,183.0,21.0,42.0,11.0,0.0,1.0,0.0,682.0,131.0,303.0,717.0,175.0,30.0,189.0,4.0,3.0,24.0,19.0,37.0,41.0,1.69,23.0,11.0,4.0,1.0,1.0,3.0,0.12,2.0,0.0,1.0,0.0,0.0,0.0,48.0,30.0,33.0,13.0,2.0,21.0,38.0,55.3,17.0,253.0,78.0,30.8,159.0,71.0,23.0,36.0,5.0,0.0,31.0,36.0,56.0,2.0,1369.0,83.0,448.0,578.0,437.0,39.0,1148.0,17.0,26.0,65.4,20.0,0.0,767.0,4832.0,2651.0,765.0,709.0,92.7,23.0,17.0,0.0,18.0,10.0,3.0,0.0,0.0,0.0,225.0,2.0,6.0,25.0,17/18,Crystal Palace,Premier League,England,patrick van aanholt,patrick,aanholt,p,england,NED,Netherlands,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


### Subsetting specific row(s) and column(s) of the DataFrame
When selecting rows and columns, you can use `.loc[]` in the following way:

In [49]:
df.loc[:,]

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk,Data_Source
0,Patrick van Aanholt,nl NED,DF,Crystal Palace,26,1990,28.0,25.0,2184.0,5.0,1.0,0.0,0.0,7.0,0.0,0.21,0.04,0.25,0.21,0.25,3.1,3.1,1.5,0.13,0.06,0.19,0.13,0.19,24.3,33.0,12.0,4.0,36.4,1.36,0.49,0.15,0.42,0.09,1.9,1.9,845.0,1116.0,75.7,15182.0,7109.0,363.0,411.0,88.3,377.0,471.0,80.0,88.0,183.0,48.1,-0.5,17.0,58.0,28.0,7.0,101.0,895.0,221.0,21.0,0.0,183.0,21.0,42.0,11.0,0.0,1.0,0.0,682.0,131.0,303.0,717.0,175.0,30.0,189.0,4.0,3.0,24.0,19.0,37.0,41.0,1.69,23.0,11.0,4.0,1.0,1.0,3.0,0.12,2.0,0.0,1.0,0.0,0.0,0.0,48.0,30.0,33.0,13.0,2.0,21.0,38.0,55.3,17.0,253.0,78.0,30.8,159.0,71.0,23.0,36.0,5.0,0.0,31.0,36.0,56.0,2.0,1369.0,83.0,448.0,578.0,437.0,39.0,1148.0,17.0,26.0,65.4,20.0,0.0,767.0,4832.0,2651.0,765.0,709.0,92.7,23.0,17.0,0.0,18.0,10.0,3.0,0.0,0.0,0.0,225.0,2.0,6.0,25.0,17/18,Crystal Palace,Premier League,England,patrick van aanholt,patrick,aanholt,p,england,NED,Netherlands,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref
1,Rolando Aarons,eng ENG,"FW,MF",Newcastle Utd,21,1995,4.0,1.0,139.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.2,0.2,0.0,0.10,0.00,0.10,0.10,0.10,1.5,2.0,0.0,0.0,0.0,1.29,0.00,0.00,0.00,0.08,-0.2,-0.2,81.0,115.0,70.4,1095.0,304.0,50.0,63.0,79.4,22.0,34.0,64.7,3.0,8.0,37.5,-0.2,3.0,5.0,6.0,1.0,9.0,106.0,9.0,4.0,0.0,32.0,4.0,6.0,0.0,0.0,0.0,0.0,76.0,19.0,20.0,79.0,26.0,2.0,2.0,0.0,0.0,9.0,1.0,7.0,1.0,0.66,1.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,12.0,9.0,4.0,5.0,3.0,1.0,5.0,20.0,4.0,117.0,32.0,27.4,23.0,60.0,34.0,4.0,0.0,0.0,4.0,1.0,0.0,0.0,173.0,2.0,20.0,82.0,87.0,7.0,167.0,9.0,18.0,50.0,11.0,0.0,136.0,993.0,583.0,212.0,126.0,59.4,10.0,15.0,0.0,8.0,11.0,0.0,0.0,0.0,0.0,22.0,7.0,15.0,31.8,17/18,Newcastle Utd,Premier League,England,rolando aarons,rolando,aarons,r,england,ENG,England,Forward,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref
2,Rolando Aarons,eng ENG,"MF,FW",Hellas Verona,21,1995,11.0,6.0,517.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.2,0.2,0.2,0.04,0.04,0.08,0.04,0.08,5.7,3.0,0.0,0.0,0.0,0.52,0.00,0.00,0.00,0.07,-0.2,-0.2,27.0,41.0,65.9,408.0,88.0,17.0,22.0,77.3,8.0,11.0,72.7,1.0,4.0,25.0,0.0,0.0,0.0,1.0,1.0,1.0,41.0,0.0,0.0,0.0,10.0,1.0,2.0,0.0,0.0,0.0,0.0,31.0,6.0,4.0,29.0,8.0,2.0,0.0,0.0,1.0,1.0,2.0,3.0,3.0,0.52,3.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,3.0,1.0,1.0,2.0,4.0,50.0,2.0,34.0,8.0,23.5,13.0,16.0,5.0,4.0,0.0,0.0,4.0,2.0,0.0,0.0,68.0,0.0,14.0,33.0,27.0,3.0,68.0,2.0,3.0,66.7,2.0,0.0,48.0,328.0,225.0,59.0,39.0,66.1,4.0,5.0,0.0,1.0,3.0,1.0,0.0,0.0,0.0,12.0,2.0,2.0,50.0,17/18,Hellas Verona,Serie A,Italy,rolando aarons,rolando,aarons,r,italy,ENG,England,Midfielder,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref
3,Ignazio Abate,it ITA,DF,Milan,30,1986,17.0,11.0,1057.0,1.0,0.0,0.0,0.0,3.0,0.0,0.09,0.00,0.09,0.09,0.09,0.2,0.2,0.6,0.02,0.05,0.07,0.02,0.07,11.7,4.0,2.0,0.0,50.0,0.34,0.17,0.25,0.50,0.06,0.8,0.8,625.0,774.0,80.7,11501.0,4742.0,255.0,280.0,91.1,294.0,337.0,87.2,69.0,133.0,51.9,-0.6,11.0,58.0,20.0,8.0,62.0,659.0,115.0,9.0,0.0,122.0,16.0,26.0,2.0,0.0,0.0,1.0,466.0,119.0,189.0,65.0,584.0,16.0,104.0,0.0,4.0,14.0,9.0,26.0,26.0,2.22,22.0,2.0,0.0,2.0,0.0,4.0,0.34,3.0,0.0,0.0,1.0,0.0,0.0,21.0,15.0,12.0,8.0,1.0,11.0,18.0,61.1,7.0,141.0,46.0,32.6,53.0,58.0,30.0,29.0,2.0,0.0,27.0,4.0,17.0,0.0,870.0,40.0,231.0,464.0,226.0,12.0,757.0,1.0,5.0,20.0,2.0,0.0,552.0,2524.0,1494.0,605.0,565.0,93.4,3.0,6.0,0.0,11.0,8.0,3.0,0.0,0.0,0.0,106.0,5.0,4.0,55.6,17/18,Milan,Serie A,Italy,ignazio abate,ignazio,abate,i,italy,ITA,Italy,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref
4,Aymen Abdennour,tn TUN,DF,Marseille,27,1989,8.0,6.0,500.0,0.0,0.0,0.0,0.0,3.0,0.0,0.00,0.00,0.00,0.00,0.00,0.1,0.1,0.0,0.02,0.00,0.02,0.02,0.02,5.6,2.0,1.0,0.0,50.0,0.36,0.18,0.00,0.00,0.05,-0.1,-0.1,304.0,336.0,90.5,6306.0,1794.0,90.0,99.0,90.9,166.0,174.0,95.4,45.0,56.0,80.4,0.0,0.0,11.0,0.0,0.0,7.0,335.0,1.0,1.0,0.0,38.0,9.0,0.0,0.0,0.0,0.0,0.0,285.0,24.0,27.0,294.0,20.0,17.0,0.0,2.0,1.0,6.0,2.0,4.0,1.0,0.18,1.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,5.0,1.0,0.0,4.0,7.0,57.1,3.0,23.0,8.0,34.8,17.0,6.0,0.0,6.0,3.0,0.0,3.0,2.0,8.0,1.0,359.0,34.0,224.0,160.0,5.0,2.0,358.0,1.0,1.0,100.0,1.0,0.0,269.0,1840.0,762.0,267.0,259.0,97.0,0.0,0.0,0.0,7.0,6.0,0.0,0.0,0.0,0.0,57.0,3.0,0.0,100.0,17/18,Marseille,Ligue 1,France,aymen abdennour,aymen,abdennour,a,france,TUN,Tunisia,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11257,Iván Villar,es ESP,GK,Celta Vigo,23,1997,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Celta Vigo,La Liga,Spain,ivan villar,ivan,villar,i,spain,ESP,Spain,Goalkeeper,,7.0,7.0,630.0,10.0,1.43,22.0,15.0,0.636,1.0,3.0,3.0,1.0,14.3,2.0,2.0,0.0,0.0,7.0,0.0,0.0,1.0,8.6,0.32,-0.4,-0.05,30.0,96.0,31.3,150.0,28.0,46.7,39.2,56.0,46.4,40.8,67.0,5.0,7.5,8.0,1.14,17.2,FBref
11258,Rubén Yáñez,es ESP,GK,Getafe,27,1993,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Getafe,La Liga,Spain,ruben yanez,ruben,yanez,r,spain,ESP,Spain,Goalkeeper,,2.0,2.0,180.0,1.0,0.50,2.0,2.0,1.000,1.0,1.0,0.0,1.0,50.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,1.4,0.22,0.4,0.21,18.0,42.0,42.9,37.0,2.0,86.5,55.7,12.0,83.3,61.2,18.0,2.0,11.1,1.0,0.50,13.5,FBref
11259,Robin Zentner,de GER,GK,Mainz 05,26,1994,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Mainz 05,Bundesliga,Germany,robin zentner,robin,zentner,r,germany,GER,Germany,Goalkeeper,,13.0,13.0,1170.0,26.0,2.00,70.0,47.0,0.657,1.0,3.0,9.0,1.0,7.7,2.0,2.0,0.0,0.0,13.0,1.0,3.0,1.0,20.5,0.28,-4.5,-0.35,61.0,189.0,32.3,370.0,95.0,35.1,34.8,106.0,55.7,47.3,130.0,10.0,7.7,12.0,0.92,14.0,FBref
11260,Ron-Robert Zieler,de GER,GK,Köln,31,1989,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20/21,Köln,Bundesliga,Germany,ron-robert zieler,ron-robert,zieler,r,germany,GER,Germany,Goalkeeper,,1.0,0.0,51.0,1.0,1.76,2.0,2.0,1.000,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.6,0.0,0.0,0.0,0.7,0.01,-0.3,-0.46,6.0,11.0,54.5,17.0,4.0,35.3,35.0,6.0,83.3,49.7,10.0,0.0,0.0,0.0,0.00,11.8,FBref


How you specific your rows is to the left of the comma and columns to the right, e.g.

In [47]:
df.loc[:, ['player', 'nationality']]

Unnamed: 0,player,nationality
0,Patrick van Aanholt,nl NED
1,Rolando Aarons,eng ENG
2,Rolando Aarons,eng ENG
3,Ignazio Abate,it ITA
4,Aymen Abdennour,tn TUN
...,...,...
11257,Iván Villar,es ESP
11258,Rubén Yáñez,es ESP
11259,Robin Zentner,de GER
11260,Ron-Robert Zieler,de GER


In this example above, just using a colon will select all the rows, and then we have selected the year and pop columns. 

This example is pretty redundant, as we could have just used the double square bracket notation shown previously for subsetting more than one columns, but for illustrative purposes, it demonstrates how you would also select for rows, if you wanted to.

This notation is great because we can now use Boolean and logical operators for more complex subsetting, e.g.

In [57]:
df.loc[df['nationality'] == 'eng ENG', ['player', 'position', 'age', 'Season']] 

Unnamed: 0,player,position,age,Season
1,Rolando Aarons,"FW,MF",21,17/18
2,Rolando Aarons,"MF,FW",21,17/18
8,Tammy Abraham,"FW,MF",19,17/18
45,Marc Albrighton,"MF,FW",27,17/18
55,Trent Alexander-Arnold,DF,18,17/18
...,...,...,...,...
11165,Sam Johnstone,GK,27,20/21
11188,Alex McCarthy,GK,31,20/21
11218,Jordan Pickford,GK,26,20/21
11219,Nick Pope,GK,28,20/21


or

In [61]:
df.loc[(df['nationality'] == 'eng ENG') & (df['age'] > 32) & (df['Season'] == '20/21'), ['player', 'position', 'age']] 

Unnamed: 0,player,position,age
8432,Gary Cahill,DF,35
8607,Scott Dann,DF,33
9120,Phil Jagielka,"DF,MF",38
9584,James Milner,"MF,DF",34
9719,Mark Noble,MF,33
9828,Lee Peltier,MF,34
10143,Billy Sharp,"FW,MF",34
10358,Jamie Vardy,FW,33
10461,Ashley Young,DF,35


In the second example, you can see I've wrapped my multiple condition statements in round brackets.

### Randomly selecting a sample
`sample(n)` randomly selects n rows, `sample(frac=0.4)` selects 40% of the data.

In [68]:
df.sample(5)

Unnamed: 0,player,nationality,position,squad,age,birth_year,games,games_starts,minutes,goals,assists,pens_made,pens_att,cards_yellow,cards_red,goals_per90,assists_per90,goals_assists_per90,goals_pens_per90,goals_assists_pens_per90,xg,npxg,xa,xg_per90,xa_per90,xg_xa_per90,npxg_per90,npxg_xa_per90,minutes_90s,shots_total,shots_on_target,shots_free_kicks,shots_on_target_pct,shots_total_per90,shots_on_target_per90,goals_per_shot,goals_per_shot_on_target,npxg_per_shot,xg_net,npxg_net,passes_completed,passes,passes_pct,passes_total_distance,passes_progressive_distance,passes_completed_short,passes_short,passes_pct_short,passes_completed_medium,passes_medium,passes_pct_medium,passes_completed_long,passes_long,passes_pct_long,xa_net,assisted_shots,passes_into_final_third,passes_into_penalty_area,crosses_into_penalty_area,progressive_passes,passes_live,passes_dead,passes_free_kicks,through_balls,passes_pressure,passes_switches,crosses,corner_kicks,corner_kicks_in,corner_kicks_out,corner_kicks_straight,passes_ground,passes_low,passes_high,passes_left_foot,passes_right_foot,passes_head,throw_ins,passes_other_body,passes_offsides,passes_oob,passes_intercepted,passes_blocked,sca,sca_per90,sca_passes_live,sca_passes_dead,sca_dribbles,sca_shots,sca_fouled,gca,gca_per90,gca_passes_live,gca_passes_dead,gca_dribbles,gca_shots,gca_fouled,gca_og_for,tackles,tackles_won,tackles_def_3rd,tackles_mid_3rd,tackles_att_3rd,dribble_tackles,dribbles_vs,dribble_tackles_pct,dribbled_past,pressures,pressure_regains,pressure_regain_pct,pressures_def_3rd,pressures_mid_3rd,pressures_att_3rd,blocks,blocked_shots,blocked_shots_saves,blocked_passes,interceptions,clearances,errors,touches,touches_def_pen_area,touches_def_3rd,touches_mid_3rd,touches_att_3rd,touches_att_pen_area,touches_live_ball,dribbles_completed,dribbles,dribbles_completed_pct,players_dribbled_past,nutmegs,carries,carry_distance,carry_progressive_distance,pass_targets,passes_received,passes_received_pct,miscontrols,dispossessed,cards_yellow_red,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct,Season,team_name,league_name,league_country,player_lower,firstname_lower,lastname_lower,firstinitial_lower,league_country_lower,nationality_code,nationality_cleaned,position_grouped,outfielder_goalkeeper,games_gk,games_starts_gk,minutes_gk,goals_against_gk,goals_against_per90_gk,shots_on_target_against,saves,save_pct,wins_gk,draws_gk,losses_gk,clean_sheets,clean_sheets_pct,pens_att_gk,pens_allowed,pens_saved,pens_missed_gk,minutes_90s_gk,free_kick_goals_against_gk,corner_kick_goals_against_gk,own_goals_against_gk,psxg_gk,psnpxg_per_shot_on_target_against,psxg_net_gk,psxg_net_per90_gk,passes_completed_launched_gk,passes_launched_gk,passes_pct_launched_gk,passes_gk,passes_throws_gk,pct_passes_launched_gk,passes_length_avg_gk,goal_kicks,pct_goal_kicks_launched,goal_kick_length_avg,crosses_gk,crosses_stopped_gk,crosses_stopped_pct_gk,def_actions_outside_pen_area_gk,def_actions_outside_pen_area_per90_gk,avg_distance_def_actions_gk,Data_Source,data_source,data_source_season
679,Lucas Digne,fr FRA,DF,Barcelona,24,1993,12.0,8.0,709.0,0.0,1.0,0.0,0.0,2.0,0.0,0.0,0.13,0.13,0.0,0.13,0.3,0.3,1.7,0.04,0.22,0.26,0.04,0.26,7.9,2.0,1.0,0.0,50.0,0.25,0.13,0.0,0.0,0.16,-0.3,-0.3,465.0,539.0,86.3,7809.0,2589.0,211.0,238.0,88.7,223.0,247.0,90.3,30.0,42.0,71.4,-0.7,6.0,32.0,5.0,2.0,33.0,477.0,62.0,7.0,0.0,90.0,6.0,16.0,0.0,0.0,0.0,0.0,401.0,88.0,50.0,433.0,23.0,22.0,55.0,0.0,0.0,5.0,13.0,17.0,9.0,1.15,7.0,0.0,0.0,0.0,0.0,2.0,0.26,2.0,0.0,0.0,0.0,0.0,0.0,20.0,11.0,14.0,5.0,1.0,6.0,9.0,66.7,3.0,132.0,49.0,37.1,62.0,45.0,25.0,21.0,4.0,1.0,17.0,13.0,8.0,1.0,621.0,30.0,203.0,277.0,178.0,19.0,559.0,4.0,7.0,57.1,5.0,0.0,391.0,2153.0,1140.0,416.0,382.0,91.8,6.0,2.0,0.0,11.0,7.0,2.0,0.0,1.0,0.0,82.0,6.0,6.0,50.0,17/18,Barcelona,La Liga,Spain,lucas digne,lucas,digne,l,spain,FRA,France,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref,FBref,FBref-17/18
6053,Kasper Dolberg,dk DEN,FW,Nice,21,1997,23.0,22.0,1888.0,11.0,3.0,0.0,0.0,2.0,0.0,0.52,0.14,0.67,0.52,0.67,7.2,7.2,1.2,0.34,0.06,0.4,0.34,0.4,21.0,42.0,25.0,0.0,59.5,2.0,1.19,0.26,0.44,0.17,3.8,3.8,232.0,301.0,77.1,3534.0,243.0,128.0,161.0,79.5,83.0,100.0,83.0,13.0,15.0,86.7,1.8,7.0,3.0,4.0,0.0,6.0,254.0,47.0,2.0,0.0,67.0,2.0,3.0,0.0,0.0,0.0,0.0,225.0,59.0,17.0,23.0,256.0,9.0,1.0,3.0,1.0,4.0,13.0,13.0,26.0,1.24,14.0,0.0,3.0,4.0,5.0,6.0,0.29,3.0,0.0,1.0,1.0,1.0,0.0,16.0,8.0,1.0,4.0,11.0,9.0,25.0,36.0,16.0,379.0,87.0,23.0,17.0,154.0,208.0,18.0,4.0,0.0,14.0,9.0,28.0,0.0,552.0,30.0,59.0,268.0,261.0,70.0,507.0,18.0,38.0,47.4,18.0,2.0,365.0,2244.0,954.0,660.0,386.0,58.5,48.0,35.0,0.0,26.0,36.0,7.0,0.0,0.0,0.0,38.0,17.0,20.0,45.9,19/20,Nice,Ligue 1,France,kasper dolberg,kasper,dolberg,k,france,DEN,Denmark,Forward,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref,FBref,FBref-19/20
8724,Fodé Doucouré,ml MLI,DF,Reims,19,2001,4.0,2.0,221.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.2,0.07,0.1,0.17,0.07,0.17,2.5,3.0,2.0,0.0,66.7,1.22,0.81,0.0,0.0,0.06,-0.2,-0.2,104.0,135.0,77.0,1859.0,740.0,36.0,46.0,78.3,60.0,70.0,85.7,6.0,15.0,40.0,-0.2,3.0,6.0,3.0,1.0,7.0,106.0,29.0,2.0,0.0,11.0,2.0,2.0,0.0,0.0,0.0,0.0,88.0,34.0,13.0,10.0,95.0,2.0,27.0,0.0,0.0,3.0,7.0,3.0,5.0,2.04,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,1.0,1.0,4.0,0.0,0.0,3.0,0.0,3.0,36.0,7.0,19.4,19.0,14.0,3.0,3.0,0.0,0.0,3.0,3.0,9.0,0.0,163.0,11.0,58.0,94.0,34.0,2.0,134.0,1.0,1.0,100.0,1.0,0.0,96.0,792.0,580.0,92.0,89.0,96.7,2.0,3.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,23.0,1.0,2.0,33.3,20/21,Reims,Ligue 1,France,fode doucoure,fode,doucoure,f,france,MLI,Mali,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref,FBref,FBref-20/21
7923,Thibaut Vargas,fr FRA,"DF,MF",Montpellier,19,2000,3.0,2.0,151.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.05,0.05,0.0,0.05,1.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.0,65.0,70.8,875.0,236.0,22.0,28.0,78.6,18.0,25.0,72.0,6.0,12.0,50.0,-0.1,2.0,0.0,3.0,2.0,2.0,51.0,14.0,2.0,0.0,5.0,1.0,6.0,0.0,0.0,0.0,0.0,37.0,7.0,21.0,4.0,46.0,3.0,12.0,0.0,0.0,0.0,3.0,0.0,4.0,2.38,4.0,0.0,0.0,0.0,0.0,1.0,0.6,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,4.0,25.0,3.0,33.0,6.0,18.2,16.0,11.0,6.0,2.0,2.0,0.0,0.0,0.0,3.0,0.0,88.0,4.0,27.0,30.0,38.0,2.0,74.0,1.0,4.0,25.0,1.0,0.0,50.0,344.0,172.0,55.0,47.0,85.5,6.0,3.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,17.0,0.0,2.0,0.0,19/20,Montpellier,Ligue 1,France,thibaut vargas,thibaut,vargas,t,france,FRA,France,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref,FBref,FBref-19/20
7597,Marcelo Saracchi,uy URU,DF,RB Leipzig,21,1998,4.0,1.0,179.0,1.0,0.0,0.0,0.0,1.0,0.0,0.5,0.0,0.5,0.5,0.5,0.5,0.5,0.3,0.27,0.14,0.42,0.27,0.42,2.0,3.0,1.0,0.0,33.3,1.51,0.5,0.33,1.0,0.18,0.5,0.5,91.0,127.0,71.7,1636.0,524.0,47.0,53.0,88.7,34.0,47.0,72.3,10.0,25.0,40.0,-0.3,4.0,6.0,2.0,1.0,7.0,108.0,19.0,1.0,0.0,18.0,4.0,6.0,0.0,0.0,0.0,0.0,67.0,21.0,39.0,95.0,5.0,7.0,18.0,0.0,1.0,1.0,2.0,3.0,4.0,2.01,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,3.0,2.0,3.0,1.0,3.0,5.0,60.0,2.0,37.0,13.0,35.1,19.0,15.0,3.0,3.0,1.0,0.0,2.0,4.0,5.0,0.0,157.0,6.0,46.0,87.0,39.0,6.0,138.0,5.0,7.0,71.4,6.0,0.0,94.0,636.0,402.0,96.0,89.0,92.7,3.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,26.0,4.0,3.0,57.1,19/20,RB Leipzig,Bundesliga,Germany,marcelo saracchi,marcelo,saracchi,m,germany,URU,Uruguay,Defender,Outfielder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,FBref,FBref,FBref-19/20


### Confusing an attribute for a function, and vice versa
If you ever forget and confuse an attribute for a function (even experienced pandas users do this sometimes), Python will spit out a Type Error, as the following:

`TypeError: 'tuple' object is not callable`

### Checking for your version of Python
To get the version of the Pandas you are using, you can type the following:

In [40]:
pd.__version__

'1.2.0'

It is important to check your version of Pandas when searching for an answer to your problem on sites such as StackOverflow.com. Sometimes, older solutions that are suggested may use deprecated code replaced during updates.

---

## 5. Data Engineering
In the previous section we convered the functions and methods for **data handling** i.e. to understand and examine the dataset in question.

The next step is to engineer the DataFrame, to make permanent changes and transform the dataset into a form that usable for use.

So now we know a little bit about Pandas, what it is, and how we can use it to manipulate our data sets. 

The next stage is to perform some basic data engineering tasks, which includes:
*    Creating new attributes
*    Converting data types
*    Selecting columns of interest
*    Joining datasets
*    Melting datasets
*    Pivot
*    Saving data for external use

### Creating new attributes

#### Add column

In [63]:
df['data_source'] = 'FBref'

In [64]:
df.columns

Index(['player', 'nationality', 'position', 'squad', 'age', 'birth_year',
       'games', 'games_starts', 'minutes', 'goals',
       ...
       'pct_goal_kicks_launched', 'goal_kick_length_avg', 'crosses_gk',
       'crosses_stopped_gk', 'crosses_stopped_pct_gk',
       'def_actions_outside_pen_area_gk',
       'def_actions_outside_pen_area_per90_gk', 'avg_distance_def_actions_gk',
       'Data_Source', 'data_source'],
      dtype='object', length=207)

We can see that a new column has been created with the value 'FBref'

#### Concat columns

In [65]:
df['data_source_season'] = df['data_source'] + '-' + df['Season'] 

In [66]:
df.columns

Index(['player', 'nationality', 'position', 'squad', 'age', 'birth_year',
       'games', 'games_starts', 'minutes', 'goals',
       ...
       'goal_kick_length_avg', 'crosses_gk', 'crosses_stopped_gk',
       'crosses_stopped_pct_gk', 'def_actions_outside_pen_area_gk',
       'def_actions_outside_pen_area_per90_gk', 'avg_distance_def_actions_gk',
       'Data_Source', 'data_source', 'data_source_season'],
      dtype='object', length=208)

In [67]:
df['data_source_season']

0        FBref-17/18
1        FBref-17/18
2        FBref-17/18
3        FBref-17/18
4        FBref-17/18
            ...     
11257    FBref-20/21
11258    FBref-20/21
11259    FBref-20/21
11260    FBref-20/21
11261    FBref-20/21
Name: data_source_season, Length: 11262, dtype: object

#### Mathematical Operations

### Rename columns

### Selecting columns of interest

In [None]:
# Select columns of interest

## Create a list containing the columns we wish to select
lst_cols = ['col_1', 'col_2', 'col_3', 'col_4', 'col_5']

##
df = df[lst_cols]

### Converting data types

#### To datetime ([`to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html))

In [None]:
# Convert birth_date from string to datetime
df['birth_date'] = pd.to_datetime(df['birth_date'])

#### To numeric ([`to_numeric`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html))

In [None]:
# Convert string to integer
df['value'] = pd.to_numeric(df['value'])

### Melting datasets

### Pivot

### String cleaning

In [None]:
# Remove accents and create lowercase name
df['player_lower'] = (df['player']
                          .str.normalize('NFKD')
                          .str.encode('ascii', errors='ignore')
                          .str.decode('utf-8')
                          .str.lower()
                     )

### Replace values

In [None]:
df['value'] = df['value'].str.replace('£','')

### Map values

In [None]:
df['nationality_cleaned'] = df['nationality_code'].map(dict_countries)

### Filtering values

In [None]:
# Highly rated players only (rating of 84 or over)
df = df.loc[(df_value['overall_fifa_rating'] >= 84)]

In [None]:
# Highly rated players only (rating of 77 or over)
df = df.loc[(df_bargains['overall_fifa_rating'] >= 77) & (df['value_gbp'] <= 1_000_000)]


In [None]:
df = df[df['player_name'].isin(full_squad)]

### GroupBy and Aggregate

In [None]:
# Created groupedby DataFrame, by date and journey_full
df = (df
         .groupby(['overall_fifa_rating'])
         .agg({'value_gbp': 'mean'})
         .reset_index()    # reset index to get grouped columns back
     )

### Concatenate DataFrames
If you're more familiar with SQL, you might know this as a UNION of the two DataFrames.

DataFrames can be joined using ([`concat`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html)).

In [None]:
# Union the individual datasets
df = pd.concat([df_1, df_2])

### Merge DataFrames

In [None]:
# Join Teams DataFrame that adds the 'league_name' and 'league_country' columns
df = pd.merge(df_1, df_2, left_on='squad', right_on='team_name', how='left')

### Print Statements

In [None]:
# Calculated Stats for the Value squad
squad_name = 'Best Value'
total_players = 18
total_value_millions = round(((df_value_squad['value_gbp'].sum())) / 1_000_000, 2)
average_value_million = round(((df_value_squad['value_gbp'].mean()) / 1_000_000), 2) 
average_rating = int(round(df_value_squad['overall_fifa_rating'].mean(), 0))
average_rating_average_price = round(((df_fifa_tm_joined_grouped[df_fifa_tm_joined_grouped['overall_fifa_rating'] == average_rating ]['average_value_gbp'].item()) / 1_000_000), 2)
average_rating_average_squad_price = round((average_rating_average_price * total_players), 2)
squad_saving = round((average_rating_average_squad_price - total_value_millions), 2)

In [None]:
# Print of of stats from this squad
print(f"• The total cost of the {total_players} man {squad_name} squad is £{total_value_millions}mil.\n"
      f"• The average player value in this squad is £{average_value_million}mil and the average player rating is {average_rating}.\n"
      f"• The average price of players of that rating is: £{average_rating_average_price}mil which would cost £{average_rating_average_squad_price}mil for an {total_players} man squad.\n"
      f"• This equates to a total saving of £{squad_saving}mil.")

---

## 6. Export Final Dataset

In [None]:
df.to_csv(dataDir + 'export/' + 'data_fifa_cleaned.csv', index=None, header=True)

---

## 7. Conclusion

This notebook explores how to use the [pandas](http://pandas.pydata.org/) library for data ingestion and manipulation.

We have also demonstrated an array of techniques in Python using the following methods and functions:
*    [head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html),
*    [tail()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html),
*    [shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html),
*    [columns](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html),
*    [dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html),
*    [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html), and
*    [describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html).
*    [.to_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) method to export the DataFrames as csv files.

---

## 8. Bibliography

*    https://pandas.pydata.org/ 
*    [Kaggle Learn’s module for pandas](https://www.kaggle.com/learn/pandas). The estimated time of completion is four hours
*    [pandas Tutorial playlist](https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy) by [codebasics](https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ)
*    [The official pandas documentation](https://pandas.pydata.org/pandas-docs/stable/). Their cheat sheet, readily available to download as a PDF here is an excellent reference material.
*    https://aeturrell.github.io/coding-for-economists/data-analysis-quickstart.html
*    This blog entry is inspired and structured as a written accompaniment to [Daniel Chen‘s PyData DC 2018](https://twitter.com/chendaniel) presentation. Searching his name in YouTube will bring up a series of great presentations he’s done at Python conferences that I recommend highly including [this 3 hour presentation at PyCon2019](https://www.youtube.com/watch?v=3qDhDXNRgHE), going into a little more detail than observed here.

---

***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)