![Premier League Logo](https://wl-portfolio.s3.eu-west-2.amazonaws.com/images/premier_league_logo.png)

In the English Premier League, there is the idea of the "big 6". Let's use Pandas to see the top 6 finishing teams within the Premier League between the seasons of 2006-2007 until 2017-2018

In [1]:
# Check Python version for compatibility/reference
import sys
print(sys.executable)
print(sys.version)
print(sys.version_info)

/Applications/JupyterLab.app/Contents/Resources/jlab_server/bin/python
3.8.12 | packaged by conda-forge | (default, Sep 16 2021, 01:59:00) 
[Clang 11.1.0 ]
sys.version_info(major=3, minor=8, micro=12, releaselevel='final', serial=0)


In [2]:
# Import requisite data science libraries
import pandas as pd

In [3]:
# Check Package version numbering for compatibility/reference
print(f"{'Pandas version:'} \t{pd.__version__}")

Pandas version: 	1.3.3


In [4]:
# Set format of number fields to be formatted as whole numbers
pd.options.display.float_format = '{:,.0f}'.format

In [5]:
# Read the file into a Pandas dataframe
df = pd.read_csv('../data_files/prem_stats.csv', index_col=None)

In [6]:
df.head(3)

Unnamed: 0,team,wins,losses,goals,total_yel_card,total_red_card,total_scoring_att,ontarget_scoring_att,hit_woodwork,att_hd_goal,...,total_cross,corner_taken,touches,big_chance_missed,clearance_off_line,dispossessed,penalty_save,total_high_claim,punches,season
0,Manchester United,28,5,83,60,1,698,256,21,12,...,918,258,25686,,1,,2,37,25,2006-2007
1,Chelsea,24,3,64,62,4,636,216,14,16,...,897,231,24010,,2,,1,74,22,2006-2007
2,Liverpool,20,10,57,44,0,668,214,15,8,...,1107,282,24150,,1,,0,51,27,2006-2007


In [7]:
# Draws are not included in the dataset, so this constant exists to enable the calculation of draws. There have always been 38 games per season since the Premier League's inception.
GAMES_PER_SEASON = 38

In [8]:
# Select just the columns required
df = df.filter([ 'team', 'wins', 'losses', 'goals', 'season'])

In [9]:
# Insert a calculated field for the number of draws
df.insert(2, 'draws', GAMES_PER_SEASON - df['wins'] - df['losses'])

In [10]:
# Change all of the column headers to be title case
df.columns = map(lambda x: str(x).title(), df.columns)

In [11]:
# Append a column and use it as a position number for each group
df['dummy'] = 0
df['Pos'] = df.groupby(['Season'])['dummy'].cumcount() + 1
df = df.drop('dummy', axis=1)

In [12]:
# Organise the dataframe in the order required for presentation
df = df[['Pos', 'Team', 'Wins', 'Draws', 'Losses', 'Goals', 'Season']]

In [13]:
# Set the seasons groups to iterate over
seasons = df.groupby(['Season'])

In [14]:
# Reflect the top 6 finishing teams for each of the seasons
seasons_tables = [seasons.get_group(season).head(6) for season in seasons.groups]

In [15]:
seasons_tables

[   Pos               Team  Wins  Draws  Losses  Goals     Season
 0    1  Manchester United    28      5       5     83  2006-2007
 1    2            Chelsea    24     11       3     64  2006-2007
 2    3          Liverpool    20      8      10     57  2006-2007
 3    4            Arsenal    19     11       8     63  2006-2007
 4    5  Tottenham Hotspur    17      9      12     57  2006-2007
 5    6   Bolton Wanderers    16      8      14     47  2006-2007,
     Pos               Team  Wins  Draws  Losses  Goals     Season
 20    1  Manchester United    27      6       5     80  2007-2008
 21    2            Chelsea    25     10       3     65  2007-2008
 22    3            Arsenal    24     11       3     74  2007-2008
 23    4          Liverpool    21     13       4     67  2007-2008
 24    5            Everton    19      8      11     55  2007-2008
 25    6        Aston Villa    16     12      10     71  2007-2008,
     Pos               Team  Wins  Draws  Losses  Goals     Season
