<a href="https://colab.research.google.com/github/JoshuaOmondi/Data-Projects/blob/master/Analyzing_FPL_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

ANALYZING FPL DATA

This project is about accessing the Premier League data using the FPL API. The inspiration behind this was my passion of making data-driven decisions, moreso in picking an FPL team. To do so, I need to access the data on players as well as how the teams have been performing in the current season. Shoutout to James Leslie for his Analytics Vidhya article that made this possible.

Now, most of the data in the URLs are in the JSON format. For this reason, we will use the requests and json python libraries.

In [1]:
#import libraries
import requests, json
from pprint import pprint

In [2]:
# base url for all FPL API endpoints
base_url = 'https://fantasy.premierleague.com/api/'

# get data from bootstrap-static endpoint
r = requests.get(base_url+'bootstrap-static/').json()

# show the top level fields
pprint(r, indent=2, depth=1, compact=True)

{ 'element_stats': [...],
  'element_types': [...],
  'elements': [...],
  'events': [...],
  'game_settings': {...},
  'phases': [...],
  'teams': [...],
  'total_players': 8611897}


Elements field contains player data

In [3]:
# get player data from 'elements' field
players = r['elements']

# show data for first player
pprint(players[0])

{'assists': 0,
 'bonus': 0,
 'bps': 48,
 'chance_of_playing_next_round': None,
 'chance_of_playing_this_round': None,
 'clean_sheets': 0,
 'code': 80201,
 'corners_and_indirect_freekicks_order': None,
 'corners_and_indirect_freekicks_text': '',
 'cost_change_event': -1,
 'cost_change_event_fall': 1,
 'cost_change_start': -4,
 'cost_change_start_fall': 4,
 'creativity': '0.0',
 'creativity_rank': 564,
 'creativity_rank_type': 59,
 'direct_freekicks_order': None,
 'direct_freekicks_text': '',
 'dreamteam_count': 0,
 'element_type': 1,
 'ep_next': '-0.5',
 'ep_this': '1.0',
 'event_points': 0,
 'first_name': 'Bernd',
 'form': '0.0',
 'goals_conceded': 9,
 'goals_scored': 0,
 'ict_index': '7.9',
 'ict_index_rank': 335,
 'ict_index_rank_type': 23,
 'id': 1,
 'in_dreamteam': False,
 'influence': '79.0',
 'influence_rank': 256,
 'influence_rank_type': 23,
 'minutes': 270,
 'news': '',
 'news_added': None,
 'now_cost': 46,
 'own_goals': 0,
 'penalties_missed': 0,
 'penalties_order': None,
 'pe

We can now put the data into a more readable format. The pandas library will be helpful in doing that.

In [4]:
import pandas as pd
pd.set_option('display.max_columns', None)

In [5]:
# create players dataframe
players = pd.json_normalize(r['elements'])

# show some information about first five players
players[['id', 'web_name', 'team', 'element_type']].head()

Unnamed: 0,id,web_name,team,element_type
0,1,Leno,1,1
1,2,Rúnarsson,1,1
2,3,Willian,1,3
3,4,Aubameyang,1,4
4,5,Cédric,1,2


In [6]:
# create teams dataframe
teams = pd.json_normalize(r['teams'])

teams.head()

Unnamed: 0,code,draw,form,id,loss,name,played,points,position,short_name,strength,team_division,unavailable,win,strength_overall_home,strength_overall_away,strength_attack_home,strength_attack_away,strength_defence_home,strength_defence_away,pulse_id
0,3,0,,1,0,Arsenal,0,0,0,ARS,4,,False,0,1190,1220,1110,1140,1100,1170,1
1,7,0,,2,0,Aston Villa,0,0,0,AVL,3,,False,0,1130,1180,1110,1120,1130,1160,2
2,94,0,,3,0,Brentford,0,0,0,BRE,3,,False,0,1080,1100,1130,1160,1100,1150,130
3,36,0,,4,0,Brighton,0,0,0,BHA,3,,False,0,1140,1180,1160,1190,1090,1130,131
4,90,0,,5,0,Burnley,0,0,0,BUR,2,,False,0,1050,1060,1040,1070,1060,1100,43


In [7]:
# get position information from 'element_types' field
positions = pd.json_normalize(r['element_types'])

positions.head()

Unnamed: 0,id,plural_name,plural_name_short,singular_name,singular_name_short,squad_select,squad_min_play,squad_max_play,ui_shirt_specific,sub_positions_locked,element_count
0,1,Goalkeepers,GKP,Goalkeeper,GKP,2,1,1,True,[12],72
1,2,Defenders,DEF,Defender,DEF,5,3,5,False,[],211
2,3,Midfielders,MID,Midfielder,MID,5,2,5,False,[],255
3,4,Forwards,FWD,Forward,FWD,3,1,3,False,[],86


In [8]:
# join players to teams
df = pd.merge(
    left=players,
    right=teams,
    left_on='team',
    right_on='id'
)

# show joined result
df[['first_name', 'second_name', 'name']].head()

Unnamed: 0,first_name,second_name,name
0,Bernd,Leno,Arsenal
1,Rúnar Alex,Rúnarsson,Arsenal
2,Willian,Borges Da Silva,Arsenal
3,Pierre-Emerick,Aubameyang,Arsenal
4,Cédric,Soares,Arsenal


In [9]:
# join player positions
df = df.merge(
    positions,
    left_on='element_type',
    right_on='id'
)

# rename columns
df = df.rename(
    columns={'name':'team_name', 'singular_name':'position_name'}
)

# show result
df[
    ['first_name', 'second_name', 'team_name', 'position_name']
].head()

Unnamed: 0,first_name,second_name,team_name,position_name
0,Bernd,Leno,Arsenal,Goalkeeper
1,Rúnar Alex,Rúnarsson,Arsenal,Goalkeeper
2,Karl,Hein,Arsenal,Goalkeeper
3,Aaron,Ramsdale,Arsenal,Goalkeeper
4,Arthur,Okonkwo,Arsenal,Goalkeeper


**Player Gameweek History**

We can now get this from:https://fantasy.premierleague.com/api/event/{GID}/ or
https://fantasy.premierleague.com/api/element-summary/{PID}/.
Because we already have the information on the players, we will use the second url so as to get the information on every player.

In [10]:
# get data from 'element-summary/{PID}/' endpoint for PID=4
r = requests.get(base_url + 'element-summary/4/').json()

# show top-level fields for player summary
pprint(r, depth=1)

{'fixtures': [...], 'history': [...], 'history_past': [...]}


In [11]:
# show data for first gameweek
pprint(r['history'][0])

{'assists': 0,
 'bonus': 0,
 'bps': 0,
 'clean_sheets': 0,
 'creativity': '0.0',
 'element': 4,
 'fixture': 1,
 'goals_conceded': 0,
 'goals_scored': 0,
 'ict_index': '0.0',
 'influence': '0.0',
 'kickoff_time': '2021-08-13T19:00:00Z',
 'minutes': 0,
 'opponent_team': 3,
 'own_goals': 0,
 'penalties_missed': 0,
 'penalties_saved': 0,
 'red_cards': 0,
 'round': 1,
 'saves': 0,
 'selected': 200068,
 'team_a_score': 0,
 'team_h_score': 2,
 'threat': '0.0',
 'total_points': 0,
 'transfers_balance': 0,
 'transfers_in': 0,
 'transfers_out': 0,
 'value': 100,
 'was_home': False,
 'yellow_cards': 0}


To get information on how each player performed in a particular gameweek, we will use the get_gameweek_history() function.

In [12]:
def get_gameweek_history(player_id):
    '''get all gameweek info for a given player_id'''
    
    # send GET request to
    # https://fantasy.premierleague.com/api/element-summary/{PID}/
    r = requests.get(
            base_url + 'element-summary/' + str(player_id) + '/'
    ).json()
    
    # extract 'history' data from response into dataframe
    df = pd.json_normalize(r['history'])
    
    return df


# show player #4's gameweek history
get_gameweek_history(4)[
    [
        'round',
        'total_points',
        'minutes',
        'goals_scored',
        'assists'
    ]
].head()

Unnamed: 0,round,total_points,minutes,goals_scored,assists
0,1,0,0,0,0
1,2,1,29,0,0
2,3,1,58,0,0
3,4,9,90,1,0
4,5,2,90,0,0


Player ID4 is for Aubameyang. From this dataframe, we can see he played the first three games without registering any goals or assists. And in the first five games he played, he only scored one goal and registered zero assists.

In [13]:
#getting information on past season
def get_season_history(player_id):
    '''get all past season info for a given player_id'''
    
    # send GET request to
    # https://fantasy.premierleague.com/api/element-summary/{PID}/
    r = requests.get(
            base_url + 'element-summary/' + str(player_id) + '/'
    ).json()
    
    # extract 'history_past' data from response into dataframe
    df = pd.json_normalize(r['history_past'])
    
    return df


# show player #1's gameweek history
get_season_history(1)[
    [
        'season_name',
        'total_points',
        'minutes',
        'goals_scored',
        'assists'
    ]
].head(10)

Unnamed: 0,season_name,total_points,minutes,goals_scored,assists
0,2018/19,106,2835,0,0
1,2019/20,114,2649,0,0
2,2020/21,131,3131,0,0


We see that there are no goals nor assists registered in the past three seasons. This is because Leno is a goalkeeper and goalkeepers rarely score or give assists. 

We can go further by now creating a table with data that shows all the points garnered by players in all the games they have played this season.

In [14]:
# select columns of interest from players df
players = players[
    ['id', 'first_name', 'second_name', 'web_name', 'team',
     'element_type']
]

# join team name
players = players.merge(
    teams[['id', 'name']],
    left_on='team',
    right_on='id',
    suffixes=['_player', None]
).drop(['team', 'id'], axis=1)

# join player positions
players = players.merge(
    positions[['id', 'singular_name_short']],
    left_on='element_type',
    right_on='id'
).drop(['element_type', 'id'], axis=1)

players.head()

Unnamed: 0,id_player,first_name,second_name,web_name,name,singular_name_short
0,1,Bernd,Leno,Leno,Arsenal,GKP
1,2,Rúnar Alex,Rúnarsson,Rúnarsson,Arsenal,GKP
2,532,Karl,Hein,Hein,Arsenal,GKP
3,559,Aaron,Ramsdale,Ramsdale,Arsenal,GKP
4,572,Arthur,Okonkwo,Okonkwo,Arsenal,GKP


In [15]:
from tqdm.auto import tqdm
tqdm.pandas()

In [16]:
# get gameweek histories for each player
points = players['id_player'].progress_apply(get_gameweek_history)

# combine results into single dataframe
points = pd.concat(df for df in points)

# join web_name
points = players[['id_player', 'web_name']].merge(
    points,
    left_on='id_player',
    right_on='element'
)

  0%|          | 0/624 [00:00<?, ?it/s]

In [17]:
# get top scoring players
points.groupby(
    ['element', 'web_name']
).agg(
    {'total_points':'sum', 'goals_scored':'sum', 'assists':'sum'}
).reset_index(
).sort_values(
    'total_points', ascending=False
).head()

Unnamed: 0,element,web_name,total_points,goals_scored,assists
232,233,Salah,117,10,8
255,256,Cancelo,67,0,4
236,237,Alexander-Arnold,64,1,4
141,142,James,63,4,3
143,144,Gallagher,62,4,4
