### Explanation:

* Inorder to get the matches of a specific season we need to do the following:
1. get the season code (1 request). 
2. get the basic details of all matches (i.e. Fixtures) (1 request for all matches)
3. get the lineups and more information about a match (1 request per match)
4. get the players ratings of each match (1 request per match)

Things to note:
1. We must query the fixtures corrected to a specified timezone
2. We must find a way to update matches that occur while working on the project.
3. Everything must be automated eventually
4. Some entries such as referee, are not updated right after the match, it might be registerd as None until the api updates its data
5. some matches are not updated right away.

In [1]:
import requests
import pprint
import re

import pandas as pd
import numpy as np
import time

In [2]:
# I change it after every commit
apiKey = "084ef0ab41msh714bcce162e79d1p1b729ajsn45e76174f3f1"

In [3]:
season2018 = pd.read_csv("Season2018.csv")
season2019 = pd.read_csv("Season2019.csv")
season2020 = pd.read_csv("Season2020.csv")


Getting the codes for the seasons 2018, 19, 20

In [9]:
#The response contains information of all leagues in a specific country
url = "https://api-football-v1.p.rapidapi.com/v2/leagues/country/SA"

headers = {
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com",
    'x-rapidapi-key': apiKey
    }

response = requests.request("GET", url, headers=headers)

#I'll extract the codes manually

In [10]:
pprint.pprint(response.json())

{'api': {'leagues': [{'country': 'Saudi-Arabia',
                      'country_code': 'SA',
                      'coverage': {'fixtures': {'events': True,
                                                'lineups': True,
                                                'players_statistics': True,
                                                'statistics': True},
                                   'odds': False,
                                   'players': True,
                                   'predictions': True,
                                   'standings': True,
                                   'topScorers': True},
                      'flag': 'https://media.api-sports.io/flags/sa.svg',
                      'is_current': 0,
                      'league_id': 419,
                      'logo': 'https://media.api-sports.io/football/leagues/307.png',
                      'name': 'Pro League',
                      'season': 2018,
                      'season_end': '2019-05

In [4]:
season2018code = 419
season2019code = 901
season2020code = 2984


getting the code for ٍRiyadh's timezone so that the date variable is relative

In [7]:
url = "https://api-football-v1.p.rapidapi.com/v2/timezone"

headers = {
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com",
    'x-rapidapi-key': apiKey
    }

response = requests.request("GET", url, headers=headers)

pprint.pprint(response.json())



{'message': 'You are not subscribed to this API.'}


requesting the matches data corrected to our timezone

In [5]:
def get_fixtures(seasonCode):
    """Returns all season fixtures as a json type object.

    Args:
    SeasonCode: int or String. Which is the code specified by the
                    api for querying the details of a specific season.


    Returns:
    fixtures: which is a json object that contains all fixtures of the desired season.
    """
    url = f"https://api-football-v1.p.rapidapi.com/v2/fixtures/league/{seasonCode}?timezone=Asia/Riyadh"

    headers = {
        'x-rapidapi-host': "api-football-v1.p.rapidapi.com",
        'x-rapidapi-key': apiKey
        }

    fixtures = requests.request("GET", url, headers=headers)

    return fixtures.json()["api"]["fixtures"]

In [6]:
def create_df(seasonCode, year):
    """Creates a dataframe with the the basic details of the match.

    Args:
    seasonCode:  int or String. Which is the code specified by the
                    api for querying the details of a specific season.
    
    year: String. Which specifies which season are we creating the dataframe for (e.g. 2020/2021)

    Returns:
    
    season_df: Pandas Dataframe. that is consisted of all matches 'basic' details.
    
    **** Note: for creating a dataframe for a full completed season this might****
    """
    
    matches = get_fixtures(seasonCode)
    
    season_df = pd.DataFrame(columns=['fixture_id', 'home_team', 'away_team', 'home_goals',
                                      'away_goals', 'venue', 'referee',
                                      'round', 'date', 'season','status',
                                     "home_coach", "home_formation", "home_players",
                                    "away_coach", "away_formation", "away_players",
                                     "H_avg_ratings", "A_avg_ratings"])
    
    for i in range(240):
        season_df = addMatchInfo(season_df, year, i, matches[i])
        season_df = addMatchLineups(seaosn_df, i)
        season_df = addPlayerRatings(season_df, i)
            
    return season_df

In [20]:
season2020.columns

Index(['fixture_id', 'home_team', 'away_team', 'home_goals', 'away_goals',
       'venue', 'referee', 'round', 'date', 'season', 'status', 'home_coach',
       'home_formation', 'home_players', 'away_coach', 'away_formation',
       'away_players', 'H_avg_ratings', 'A_avg_ratings'],
      dtype='object')

In [8]:
def addMatchInfo(season_df, year, match_index, match_info):
    
    fixture_id = match_info["fixture_id"]
    home_team = match_info["homeTeam"]["team_name"]
    away_team = match_info["awayTeam"]["team_name"]
    referee = match_info["referee"]
    nRound = match_info["round"].split("-")[1].replace(" ", "")
    date = match_info["event_date"]
    venue = match_info["venue"]
    home_goals = match_info["goalsHomeTeam"]
    away_goals = match_info["goalsAwayTeam"]
    status = match_info["status"]

    season_df.loc[match_index, ["fixture_id", "home_team",
                               "away_team", "home_goals",
                               "away_goals", "venue",
                               "referee", "round",
                               "date", "year", "sttus"]] = [fixture_id, home_team,
                                                             away_team, home_goals,
                                                             away_goals, venue,
                                                             referee, nRound,
                                                             date, year, status]

    return season_df

In [9]:
def addMatchLineups(season_df, match_index):
    """Adds all the extra information of finished matches to the dataframe.

    Args:
    season_df: Pandas Dataframe. that is consisted of all matches 'basic' details.
    
    match_index: index of the match to be updated


    Returns:
    season_df: Pandas Dataframe. The same dataframe provided but with added 
                                features to the finished matches
    """
    

    match = season_df.loc[match_index]
    
    url = f"https://api-football-v1.p.rapidapi.com/v2/lineups/{match['fixture_id']}"

    headers = {
        'x-rapidapi-host': "api-football-v1.p.rapidapi.com",
        'x-rapidapi-key': apiKey
        }

    response = requests.request("GET", url, headers=headers).json()

    #home team data

    home_coach = response["api"]["lineUps"][match["home_team"]]["coach"]
    home_formation = response["api"]["lineUps"][match["home_team"]]["formation"]

    home_players = []
    for player in response["api"]["lineUps"][match["home_team"]]["startXI"]:
        home_players.append((player["player"], player["pos"]))


    #away team data                                             
    away_coach = response["api"]["lineUps"][match["away_team"]]["coach"]
    away_formation = response["api"]["lineUps"][match["away_team"]]["formation"]

    away_players = []
    for player in response["api"]["lineUps"][match["away_team"]]["startXI"]:
        away_players.append((player["player"], player["pos"]))                                            

    season_df.loc[match_index, ["home_coach", "home_formation",
                  "home_players", "away_coach",
                  "away_formation", "away_players"]] = [home_coach, home_formation, 
                                                       home_players, away_coach,
                                                       away_formation, away_players]
    
    return season_df

In [10]:
def addPlayerRatings(season_df, match_index):
    
    fixture_id = season_df.loc[match_index, "fixture_id"]
    home_team = season_df.loc[match_index, "home_team"]
    
    url = f"https://api-football-v1.p.rapidapi.com/v2/players/fixture/{fixture_id}"

    headers = {
        'x-rapidapi-host': "api-football-v1.p.rapidapi.com",
        'x-rapidapi-key': apiKey
        }

    response = requests.request("GET", url, headers=headers)

    response.json()
    
    players = response.json()["api"]["players"]

    Hteam_ratings = []
    Ateam_ratings = []
    
    for player in players:

        try: #Incase the rating is null, we can't cast it
            if player["team_name"] == home_team:
                Hteam_ratings.append(float(player["rating"]))
            else:
                Ateam_ratings.append(float(player["rating"]))    
        except:
            pass

    Hteam_avg_ratings = round(sum(Hteam_ratings) / len(Hteam_ratings), 2)
    Ateam_avg_ratings = round(sum(Ateam_ratings) / len(Ateam_ratings), 2)
        
    season_df.loc[match_index, ["H_avg_ratings", "A_avg_ratings"]] = [Hteam_avg_ratings, Ateam_avg_ratings]
    
    return season_df


In [11]:
def update_match(season_df, fixture):
    
    match_index = season_df.loc[season_df["fixture_id"] == fixture["fixture_id"]].index[0]
    
    season_df = addMatchInfo(season_df, season_df.season.unique()[0], match_index, fixture)

    season_df = addMatchLineups(season_df, match_index)
    
    season_df = addPlayerRatings(season_df, match_index)
    
    return season_df

In [12]:
#The following code was used to extract matches that happend while I was working on the project
def update_df(season_df, season_code):

    not_started_matches = list(season_df.query("status == 'Not Started'").fixture_id)

    all_matches = get_fixtures(season_code)
    
    fixtures = [match for match in all_matches if match["fixture_id"] in not_started_matches]

    for fixture in fixtures:
        if fixture["status"] == "Match Finished":
            season_df = update_match(season_df, fixture)
        
    return season_df


-----

In [14]:
season2020 = update_df(season2020, season2020code)

In [15]:
season2020.loc[season2020["status"] == "Not Started"]

Unnamed: 0,fixture_id,home_team,away_team,home_goals,away_goals,venue,referee,round,date,season,status,home_coach,home_formation,home_players,away_coach,away_formation,away_players,H_avg_ratings,A_avg_ratings


In [16]:
season2020.loc[(season2020["H_avg_ratings"].isnull() == True) & (season2020["status"] == "Match Finished")]

Unnamed: 0,fixture_id,home_team,away_team,home_goals,away_goals,venue,referee,round,date,season,status,home_coach,home_formation,home_players,away_coach,away_formation,away_players,H_avg_ratings,A_avg_ratings


In [18]:
season2020.isnull().sum()

fixture_id         0
home_team          0
away_team          0
home_goals         0
away_goals         0
venue              0
referee           23
round              0
date               0
season             0
status             0
home_coach        42
home_formation     0
home_players       0
away_coach        40
away_formation     0
away_players       0
H_avg_ratings      0
A_avg_ratings      0
dtype: int64