# Premier League Goals Accumulation - Bar Chart Race
By ***Ahmad Zaenun Faiz***

This is a data visualization using Python to visualize the accumulation of goal in the top tier English Football League/Premier League. I use Matplotlib, Pandas and NumPy library to make this visualization.

### Reference: 
* Bar Chart Race Tutorial: https://www.dunderdata.com/blog/create-a-bar-chart-race-animation-in-python-with-matplotlib
* Data source: James P. Curley (2016). engsoccerdata: English Soccer Data 1871-2016. R package version 0.1.5. https://github.com/jalapic/engsoccerdata
* Python Module use: Pandas, Matplotlib, Bar Chart Race 

In [1]:
!pip install bar-chart-race

import pandas as pd
import bar_chart_race as bcr

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Data Wrangling

In [2]:
fc = pd.read_csv('https://raw.githubusercontent.com/ahmadzfaiz/python-data-visual/main/data/1.%20English%20Football%20Match/england.csv')

fc

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,Date,Season,home,visitor,hgoal,vgoal,division,tier,totgoal,goaldif,result
0,1888-09-08,1888,Bolton Wanderers,Derby County,3,6,1,1,9,-3,A
1,1888-09-08,1888,Everton,Accrington F.C.,2,1,1,1,3,1,H
2,1888-09-08,1888,Preston North End,Burnley,5,2,1,1,7,3,H
3,1888-09-08,1888,Stoke City,West Bromwich Albion,0,2,1,1,2,-2,A
4,1888-09-08,1888,Wolverhampton Wanderers,Aston Villa,1,1,1,1,2,0,D
...,...,...,...,...,...,...,...,...,...,...,...
199879,3/7/2020,2019,Plymouth Argyle,Macclesfield,3,0,4,4,3,3,H
199880,3/7/2020,2019,Salford City,Bradford City,2,0,4,4,2,2,H
199881,3/7/2020,2019,Swindon Town,Forest Green Rovers,0,2,4,4,2,-2,A
199882,3/7/2020,2019,Walsall,Exeter City,3,1,4,4,4,2,H


In [3]:
home = fc[['Date', 'Season', 'home', 'hgoal', 'tier']]
away = fc[['Date', 'Season', 'visitor', 'vgoal', 'tier']]

home = home.loc[(home['tier'] == 1) & (home['Season'] == 2019)]
away = away.loc[(away['tier'] == 1) & (away['Season'] == 2019)]

home
# home['home'].count()

Unnamed: 0,Date,Season,home,hgoal,tier
198112,8/9/2019,2019,Liverpool,4,1
198113,8/10/2019,2019,West Ham United,0,1
198114,8/10/2019,2019,AFC Bournemouth,1,1
198115,8/10/2019,2019,Burnley,3,1
198116,8/10/2019,2019,Crystal Palace,0,1
...,...,...,...,...,...
198487,7/26/2020,2019,Leicester City,0,1
198488,7/26/2020,2019,Manchester City,5,1
198489,7/26/2020,2019,Newcastle United,1,1
198490,7/26/2020,2019,Southampton,3,1


In [4]:
home['Club'] = home['home']
away['Club'] = away['visitor']

home['Goal'] = home['hgoal']
away['Goal'] = away['vgoal']

home = home[['Date', 'Club', 'Goal']]
away = away[['Date', 'Club', 'Goal']]

In [5]:
frames = [home, away]
dcc = pd.concat(frames)

dcc

Unnamed: 0,Date,Club,Goal
198112,8/9/2019,Liverpool,4
198113,8/10/2019,West Ham United,0
198114,8/10/2019,AFC Bournemouth,1
198115,8/10/2019,Burnley,3
198116,8/10/2019,Crystal Palace,0
...,...,...,...
198487,7/26/2020,Manchester United,2
198488,7/26/2020,Norwich City,0
198489,7/26/2020,Liverpool,3
198490,7/26/2020,Sheffield United,1


In [6]:
ctb = pd.crosstab(dcc['Date'], columns = dcc['Club'], values=dcc['Goal'], aggfunc = 'first')

In [7]:
ctb['Timeline'] = ctb.index
ctb = ctb.reset_index(drop=True)

ctb['day'] = pd.DatetimeIndex(ctb['Timeline']).day
ctb['month'] = pd.DatetimeIndex(ctb['Timeline']).month
ctb['year'] = pd.DatetimeIndex(ctb['Timeline']).year

ctb['Date'] = pd.to_datetime(dict(year=ctb.year, month=ctb.month, day=ctb.day))
ctb = ctb.sort_values('Date').reset_index(drop=True)

In [8]:
del ctb['Timeline']
del ctb['day']
del ctb['month']
del ctb['year']

ctb

Club,AFC Bournemouth,Arsenal,Aston Villa,Brighton & Hove Albion,Burnley,Chelsea,Crystal Palace,Everton,Leicester City,Liverpool,...,Manchester United,Newcastle United,Norwich City,Sheffield United,Southampton,Tottenham Hotspur,Watford,West Ham United,Wolverhampton Wanderers,Date
0,,,,,,,,,,4.0,...,,,1.0,,,,,,,2019-08-09
1,1.0,,1.0,3.0,3.0,,0.0,0.0,,,...,,,,1.0,0.0,3.0,0.0,0.0,,2019-08-10
2,,1.0,,,,0.0,,,0.0,,...,4.0,0.0,,,,,,,0.0,2019-08-11
3,2.0,2.0,1.0,1.0,1.0,,,1.0,,2.0,...,,1.0,3.0,,1.0,2.0,0.0,1.0,,2019-08-17
4,,,,,,1.0,0.0,,1.0,,...,,,,1.0,,,,,,2019-08-18
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
110,0.0,,,,,,,,0.0,,...,,,,,2.0,3.0,,,,2020-07-19
111,,,,0.0,,,0.0,1.0,,,...,,0.0,,0.0,,,,,2.0,2020-07-20
112,,0.0,1.0,,,,,,,,...,,,,,,,0.0,,,2020-07-21
113,,,,,,3.0,,,,5.0,...,1.0,,,,,,,1.0,,2020-07-22


In [9]:
df = ctb
df.fillna(0, inplace=True)

df = df.set_index('Date').cumsum()

df

Club,AFC Bournemouth,Arsenal,Aston Villa,Brighton & Hove Albion,Burnley,Chelsea,Crystal Palace,Everton,Leicester City,Liverpool,Manchester City,Manchester United,Newcastle United,Norwich City,Sheffield United,Southampton,Tottenham Hotspur,Watford,West Ham United,Wolverhampton Wanderers
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2019-08-09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2019-08-10,1.0,0.0,1.0,3.0,3.0,0.0,0.0,0.0,0.0,4.0,5.0,0.0,0.0,1.0,1.0,0.0,3.0,0.0,0.0,0.0
2019-08-11,1.0,1.0,1.0,3.0,3.0,0.0,0.0,0.0,0.0,4.0,5.0,4.0,0.0,1.0,1.0,0.0,3.0,0.0,0.0,0.0
2019-08-17,3.0,3.0,2.0,4.0,4.0,0.0,0.0,1.0,0.0,6.0,7.0,4.0,1.0,4.0,1.0,1.0,5.0,0.0,1.0,0.0
2019-08-18,3.0,3.0,2.0,4.0,4.0,1.0,0.0,1.0,1.0,6.0,7.0,4.0,1.0,4.0,2.0,1.0,5.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-07-19,37.0,53.0,39.0,37.0,42.0,64.0,30.0,42.0,67.0,77.0,93.0,63.0,37.0,26.0,38.0,48.0,60.0,34.0,47.0,49.0
2020-07-20,37.0,53.0,39.0,37.0,42.0,64.0,30.0,43.0,67.0,77.0,93.0,63.0,37.0,26.0,38.0,48.0,60.0,34.0,47.0,51.0
2020-07-21,37.0,53.0,40.0,37.0,42.0,64.0,30.0,43.0,67.0,77.0,97.0,63.0,37.0,26.0,38.0,48.0,60.0,34.0,47.0,51.0
2020-07-22,37.0,53.0,40.0,37.0,42.0,67.0,30.0,43.0,67.0,82.0,97.0,64.0,37.0,26.0,38.0,48.0,60.0,34.0,48.0,51.0


## Bar Chart Race

In [None]:
def summary(values, ranks):
    text = '©2022 Ahmad Zaenun Faiz | Data source: James P. Curley (2016). engsoccerdata: English Soccer Data 1871-2016'
    return {'x': .99, 'y': .02, 's': text, 'ha': 'right', 'size': 4}

chart = bcr.bar_chart_race(
                    df = df,
                    # n_bars=6, 
                    fixed_max=True, 
                    steps_per_period=20,
                    cmap='Set1',
                    filter_column_colors=True,
                    title='Akumulasi Goal setiap Klub Sepak Bola pada Premier League Season 19/20',
                    title_size='smaller',
                    period_fmt= '%d %B %Y',
                    period_summary_func=summary
                )

chart