
 # Serie A 2021-2022 Fixtures 
 
 #### Sepideh Nazemi

In this notebook, I have extracted Fixtures of Serie A league and provided two functions that can look up information about the next game of a user's desired team or the next game(s) in general.

In [1]:
import pandas as pd
import numpy as np
import requests 
from bs4 import BeautifulSoup
from datetime import date

- The web address I use to scrap data from is https://understat.com


In [2]:
# create urls for all seasons of all leagues 
base_url = 'https://understat.com/league' 
leagues = ['La_liga', 'EPL', 'Bundesliga', 'Serie_A', 'Ligue_1', 'RFPL'] 
seasons = ['2019', '2020','2021']

- Focusing on Serie A 2021-22 I will use the following URL: https://understat.com/league/Serie_A/2021

In [3]:
# Creating the url
url = base_url+'/'+leagues[3]+'/'+seasons[2] 


- Inspecting the structure of the webpage, it seams that the related data (i.e, datesData ) is in the Json variable under 'script' tag.

In [4]:
res = requests.get(url) 
soup = BeautifulSoup(res.content, "lxml")
scripts = soup.find_all('script')

In [5]:
import json 
string_with_json_obj = '' 
# Find data for matches 
for el in scripts: 
    if 'datesData' in el.text: 
        string_with_json_obj = el.text.strip()
# print(string_with_json_obj)
# strip unnecessary symbols and get only JSON data 
ind_start = string_with_json_obj.index("('")+2 
ind_end = string_with_json_obj.index("')") 
json_data = string_with_json_obj[ind_start:ind_end] 
json_data = json_data.encode('utf8').decode('unicode_escape')

In [6]:
data = json.loads(json_data)
#data

- Information about all matches came as a list where each element of the list is a dictionary containing details of the specific game.

In [7]:
data[0]

{'id': '16756',
 'isResult': True,
 'h': {'id': '94', 'title': 'Verona', 'short_title': 'VER'},
 'a': {'id': '104', 'title': 'Sassuolo', 'short_title': 'SAS'},
 'goals': {'h': '2', 'a': '3'},
 'xG': {'h': '2.25155', 'a': '1.86679'},
 'datetime': '2021-08-21 16:30:00',
 'forecast': {'w': '0.4794', 'd': '0.2373', 'l': '0.2833'}}

In [8]:
#Reformating data list 
home_team = [item['h']['title'] for item in data]
away_team = [item['a']['title'] for item in data]
home_goal = [item['goals']['h'] for item in data]
away_goal = [item['goals']['a'] for item in data]
match_date = [item['datetime'] for item in data]
status = [item['isResult'] for item in data]


# Creating a DataFrame 
fixtures_df = pd.DataFrame({'Date':match_date, 'Home':home_team, 'Away': away_team, 'Scored Home':home_goal,
                            'Scored Away': away_goal, 'Status': status})
    

In [9]:
fixtures_df.head()

Unnamed: 0,Date,Home,Away,Scored Home,Scored Away,Status
0,2021-08-21 16:30:00,Verona,Sassuolo,2,3,True
1,2021-08-21 16:30:00,Inter,Genoa,4,0,True
2,2021-08-21 18:45:00,Torino,Atalanta,1,2,True
3,2021-08-21 18:45:00,Empoli,Lazio,1,3,True
4,2021-08-22 16:30:00,Bologna,Salernitana,3,2,True


In [10]:
fixtures_df.dtypes

Date           object
Home           object
Away           object
Scored Home    object
Scored Away    object
Status           bool
dtype: object

In [11]:
fixtures_df['Date'] = pd.to_datetime(fixtures_df['Date']).dt.date

### Look up the next match for a spesific team

In [12]:
def next_game(teamName):
    '''This function receives a team's name
    and returns game data and the opponent team.'''
    sliced = fixtures_df[((fixtures_df.Home==teamName) | (fixtures_df.Away==teamName)) & (fixtures_df.Status==False)]
    if sliced.shape[0] == 0:
          print('No game is available')
    else:
        return sliced.iloc[0][['Date', 'Home', 'Away']]
    

In [13]:
# teams = ['Verona','Roma','Lazio','Bologna','Juventus','Udinese','Genoa','Sampdoria','Sassuolo','Napoli','Inter','Atalanta','Empoli','Fiorentina','AC Milan','Torino','Cagliari','Spezia','Salernitana','Venezia']
next_game('Napoli')

Date    2022-03-19
Home        Napoli
Away       Udinese
Name: 292, dtype: object

### Look up the next match

In [14]:
def next_game():
    '''This function returns the next game information'''
    sliced = fixtures_df[(fixtures_df.Status==False)& (fixtures_df.Date > date.today())]
    if sliced.shape[0] == 0:
        print('No game is available')
    else:
        next_game_list = sliced[sliced['Date']==sliced.iloc[0]['Date']]
        print(next_game_list[['Date', 'Home', 'Away']])

In [15]:
next_game()

           Date      Home    Away
290  2022-03-18  Sassuolo  Spezia
291  2022-03-18     Genoa  Torino
