# Scrap Soccer Data

## 0 - Introdução

Nesse notebook iremos realizar a coleta dos dados do site [FBRef](https://fbref.com/en/).

## 1 - Importação das bibliotecas

Para importar e manipular os dados utilizaremos as bibliotecas Pandas, Requests e BeautifulSoup

In [3]:
import pandas as pd
from bs4 import BeautifulSoup
from bs4 import Comment
import requests
import datetime

## 2 - Coleta dos dados

Criando a função para extrair as tabelas

In [4]:
def extrair_tabela(comments):
    tables = []
    for each in comments:
        if 'table' in each:
            try:
                tables.append(pd.read_html(each)[0])
            except:
                continue
                
    return tables

Pegando a data atual e salvando numa variável

In [5]:
current_date = datetime.datetime.now().strftime("%Y-%m-%d")

### **Player goals and squad creations**

In [6]:
response = requests.get('https://fbref.com/en/comps/24/gca/Serie-A-Stats')

In [7]:
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        player_goal_and_shot_creation = tables[0]
        
        player_goal_and_shot_creation.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", 
            "SCA", "SCA90", "PassLive", "PassDead", "TO", "Sh", "Fld", "Def", 
            "GCA", "GCA90", "GCA_PassLive", "GCA_PassDead", "GCA_TO", 
            "GCA_Sh", "GCA_Fld", "GCA_Def", "Matches"
        ]
        
        player_goal_and_shot_creation = player_goal_and_shot_creation[player_goal_and_shot_creation["Rk"] != "Rk"]

        player_goal_and_shot_creation["Curr_Date"] = current_date
        
        player_goal_and_shot_creation.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  player_goal_and_shot_creation["Curr_Date"] = current_date


In [8]:
player_goal_and_shot_creation.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,SCA,SCA90,...,GCA,GCA90,GCA_PassLive,GCA_PassDead,GCA_TO,GCA_Sh,GCA_Fld,GCA_Def,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,1,11.25,...,0,0.0,0,0,0,0,0,0,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,0,0.0,...,0,0.0,0,0,0,0,0,0,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,8,6.92,...,0,0.0,0,0,0,0,0,0,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,0,0.0,...,0,0.0,0,0,0,0,0,0,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,9,3.8,...,0,0.0,0,0,0,0,0,0,Matches,2024-05-18


In [9]:
player_goal_and_shot_creation.to_csv(f"../datasets/{current_date}/player_goal_and_shot_creation.csv", index=False)

### **Standard stats**

In [10]:
response_standard_stats = requests.get('https://fbref.com/en/comps/24/stats/Serie-A-Stats')

In [11]:
if response_standard_stats.status_code == 200:
    soup = BeautifulSoup(response_standard_stats.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        standard_stats = tables[0]
        
        standard_stats.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "MP", "Starts", "Min", 
            "90s", "Gls", "Ast", "G+A", "G-PK", "PK", "PKatt", "CrdY", "CrdR", "xG", 
            "npxG", "xAG", "npxG+xAG", "PrgC", "PrgP", "PrgR", "Gls_2", "Ast_2", "G+A_2", 
            "G-PK_2", "G+A-PK", "xG_2", "xAG_2", "xG+xAG_2", "npxG_2", "npxG+xAG_2", "Matches"
        ]
        
        standard_stats = standard_stats[standard_stats["Rk"] != "Rk"]

        standard_stats["Curr_Date"] = current_date
        
        standard_stats.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  standard_stats["Curr_Date"] = current_date


In [12]:
standard_stats.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,MP,Starts,Min,...,G+A_2,G-PK_2,G+A-PK,xG_2,xAG_2,xG+xAG_2,npxG_2,npxG+xAG_2,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,1,0,8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,4,0,76,...,0.0,0.0,0.0,0.15,0.0,0.15,0.15,0.15,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,5,0,104,...,0.0,0.0,0.0,0.16,0.19,0.35,0.16,0.35,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,2,0,42,...,0.0,0.0,0.0,0.18,0.0,0.18,0.18,0.18,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,4,2,213,...,0.0,0.0,0.0,0.35,0.03,0.38,0.35,0.38,Matches,2024-05-18


In [50]:
standard_stats.to_csv(f"../datasets/{current_date}/standard_stats.csv", index=False, sep=';')

### **Goalkeeping**

In [14]:
response_goalkeeping = requests.get('https://fbref.com/en/comps/24/keepers/Serie-A-Stats')

In [15]:
if response_goalkeeping.status_code == 200:
    soup = BeautifulSoup(response_goalkeeping.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        goalkeeping = tables[0]
        
        goalkeeping.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "MP", "Starts", "Min", 
            "90s", "GA", "GA90", "SoTA", "Saves", "Save%", "W", "D", "L", "CS", "CS%", 
            "PKatt", "PKA", "PKsv", "PKm", "Save%_2", "Matches"
        ]
        
        goalkeeping = goalkeeping[goalkeeping["Rk"] != "Rk"]

        goalkeeping["Curr_Date"] = current_date
        
        goalkeeping.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  goalkeeping["Curr_Date"] = current_date


In [16]:
goalkeeping.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,MP,Starts,Min,...,L,CS,CS%,PKatt,PKA,PKsv,PKm,Save%_2,Matches,Curr_Date
0,1,Alisson,br BRA,GK,Criciúma,28-345,1995,2,2,180,...,0,0,0.0,0,0,0,0,,Matches,2024-05-18
1,2,Anderson,br BRA,GK,Cruzeiro,26-074,1998,5,5,450,...,1,1,20.0,0,0,0,0,,Matches,2024-05-18
2,3,Lucas Arcanjo,br BRA,GK,Vitória,25-284,1998,5,5,450,...,4,0,0.0,0,0,0,0,,Matches,2024-05-18
3,4,Bento,br BRA,GK,Ath Paranaense,24-343,1999,6,6,540,...,1,4,66.7,1,0,1,0,100.0,Matches,2024-05-18
4,5,Rafael Cabral,br BRA,GK,Grêmio,33-364,1990,1,1,90,...,1,0,0.0,0,0,0,0,,Matches,2024-05-18


In [51]:
goalkeeping.to_csv(f"../datasets/{current_date}/goalkeeping.csv", index=False, sep=';')

### **Advanced Goalkeeping**

In [18]:
response_advanced_goalkeeping = requests.get('https://fbref.com/en/comps/24/keepersadv/Serie-A-Stats')

In [19]:
if response_advanced_goalkeeping.status_code == 200:
    soup = BeautifulSoup(response_advanced_goalkeeping.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        advanced_goalkeeping = tables[0]
        
        advanced_goalkeeping.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", "GA", "PKA", 
            "FK", "CK", "OG", "PSxG", "PSxG/SoT", "PSxG+/-", "/90", "Cmp", "Att", "Cmp%", 
            "Att (GK)", "Thr", "Launch%", "AvgLen", "Att_2", "Launch%_2", "AvgLen_2", 
            "Opp", "Stp", "Stp%", "#OPA", "#OPA/90", "AvgDist", "Matches"
        ]
        
        advanced_goalkeeping = advanced_goalkeeping[advanced_goalkeeping["Rk"] != "Rk"]

        advanced_goalkeeping["Curr_Date"] = current_date
        
        advanced_goalkeeping.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  advanced_goalkeeping["Curr_Date"] = current_date


In [20]:
advanced_goalkeeping.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,GA,PKA,...,Launch%_2,AvgLen_2,Opp,Stp,Stp%,#OPA,#OPA/90,AvgDist,Matches,Curr_Date
0,1,Alisson,br BRA,GK,Criciúma,28-345,1995,2.0,2,0,...,47.6,39.3,40,1,2.5,1,0.5,11.1,Matches,2024-05-18
1,2,Anderson,br BRA,GK,Cruzeiro,26-074,1998,5.0,7,0,...,37.5,34.0,63,6,9.5,6,1.2,12.8,Matches,2024-05-18
2,3,Lucas Arcanjo,br BRA,GK,Vitória,25-284,1998,5.0,11,0,...,43.6,40.3,85,2,2.4,5,1.0,15.2,Matches,2024-05-18
3,4,Bento,br BRA,GK,Ath Paranaense,24-343,1999,6.0,3,0,...,85.4,58.0,76,4,5.3,1,0.17,10.4,Matches,2024-05-18
4,5,Rafael Cabral,br BRA,GK,Grêmio,33-364,1990,1.0,1,0,...,50.0,38.0,15,2,13.3,0,0.0,8.5,Matches,2024-05-18


In [52]:
advanced_goalkeeping.to_csv(f"../datasets/{current_date}/advanced_goalkeeping.csv", index=False, sep=';')

### **Shooting**

In [22]:
response_shooting = requests.get('https://fbref.com/en/comps/24/shooting/Serie-A-Stats')

In [23]:
if response_shooting.status_code == 200:
    soup = BeautifulSoup(response_shooting.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        shooting = tables[0]
        
        shooting.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", "Gls", "Sh", "SoT", 
            "SoT%", "Sh/90", "SoT/90", "G/Sh", "G/SoT", "Dist", "FK", "PK", "PKatt", "xG", 
            "npxG", "npxG/Sh", "G-xG", "np:G-xG", "Matches"
        ]
        
        shooting = shooting[shooting["Rk"] != "Rk"]

        shooting["Curr_Date"] = current_date
        
        shooting.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  shooting["Curr_Date"] = current_date


In [24]:
shooting.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,Gls,Sh,...,FK,PK,PKatt,xG,npxG,npxG/Sh,G-xG,np:G-xG,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,0,0,...,0,0,0,0.0,0.0,,0.0,0.0,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,0,2,...,0,0,0,0.1,0.1,0.06,-0.1,-0.1,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,0,4,...,0,0,0,0.2,0.2,0.05,-0.2,-0.2,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,0,1,...,0,0,0,0.1,0.1,0.08,-0.1,-0.1,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,0,10,...,0,0,0,0.8,0.8,0.08,-0.8,-0.8,Matches,2024-05-18


In [53]:
shooting.to_csv(f"../datasets/{current_date}/shooting.csv", index=False, sep=';')

### **Passing**

In [26]:
response_passing = requests.get('https://fbref.com/en/comps/24/passing/Serie-A-Stats')

In [27]:
if response_passing.status_code == 200:
    soup = BeautifulSoup(response_passing.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        passing = tables[0]
        
        passing.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", 
            "Cmp", "Att", "Cmp%", "TotDist", "PrgDist", 
            "Cmp_2", "Att_2", "Cmp%_2", "Cmp_3", "Att_3", "Cmp%_3", 
            "Cmp_4", "Att_4", "Cmp%_4", "Ast", "xAG", "xA", "A-xAG", 
            "KP", "1/3", "PPA", "CrsPA", "PrgP", "Matches"
        ]
        
        passing = passing[passing["Rk"] != "Rk"]

        passing["Curr_Date"] = current_date
        
        passing.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  passing["Curr_Date"] = current_date


In [54]:
passing.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,Cmp,Att,...,xAG,xA,A-xAG,KP,1/3,PPA,CrsPA,PrgP,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,8,8,...,0.0,0.0,0.0,0,1,1,0,1,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,14,15,...,0.0,0.0,0.0,0,0,1,0,1,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,60,73,...,0.2,0.1,-0.2,2,6,2,0,6,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,4,7,...,0.0,0.0,0.0,0,0,0,0,0,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,27,36,...,0.1,0.1,-0.1,2,1,1,0,2,Matches,2024-05-18


In [55]:
passing.to_csv(f"../datasets/{current_date}/passing.csv", index=False, sep=';')

### **Pass Types**

In [30]:
response_passing_types = requests.get('https://fbref.com/en/comps/24/passing_types/Serie-A-Stats')

In [31]:
if response_passing_types.status_code == 200:
    soup = BeautifulSoup(response_passing_types.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        passing_types = tables[0]
        
        passing_types.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", 
            "Att", "Live", "Dead", "FK", "TB", "Sw", "Crs", "TI", "CK", 
            "In", "Out", "Str", "Cmp", "Off", "Blocks", "Matches"
        ]
        
        passing_types = passing_types[passing_types["Rk"] != "Rk"]

        passing_types["Curr_Date"] = current_date
        
        passing_types.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  passing_types["Curr_Date"] = current_date


In [32]:
passing_types.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,Att,Live,...,TI,CK,In,Out,Str,Cmp,Off,Blocks,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,8,7,...,0,0,0,0,0,8,0,0,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,15,13,...,0,0,0,0,0,14,0,0,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,73,72,...,1,0,0,0,0,60,0,2,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,7,7,...,0,0,0,0,0,4,0,1,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,36,32,...,0,0,0,0,0,27,0,1,Matches,2024-05-18


In [56]:
passing_types.to_csv(f"../datasets/{current_date}/passing_types.csv", index=False, sep=';')

### **Defensive Actions**

In [34]:
response_defensive_actions = requests.get('https://fbref.com/en/comps/24/defense/Serie-A-Stats')

In [35]:
if response_defensive_actions.status_code == 200:
    soup = BeautifulSoup(response_defensive_actions.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        defensive_actions = tables[0]
        
        defensive_actions.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", 
            "Tkl", "TklW", "Def 3rd", "Mid 3rd", "Att 3rd", "Tkl_2", 
            "Att_2", "Tkl%", "Lost", "Blocks", "Sh", "Pass", "Int", 
            "Tkl+Int", "Clr", "Err", "Matches"
        ]
        
        defensive_actions = defensive_actions[defensive_actions["Rk"] != "Rk"]

        defensive_actions["Curr_Date"] = current_date
        
        defensive_actions.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  defensive_actions["Curr_Date"] = current_date


In [36]:
defensive_actions.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,Tkl,TklW,...,Lost,Blocks,Sh,Pass,Int,Tkl+Int,Clr,Err,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,0,0,...,0,0,0,0,0,0,0,0,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,0,0,...,0,1,1,0,0,0,0,0,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,6,6,...,4,2,0,2,1,7,1,0,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,0,0,...,0,0,0,0,0,0,2,0,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,2,1,...,2,0,0,0,1,3,1,0,Matches,2024-05-18


In [57]:
defensive_actions.to_csv(f"../datasets/{current_date}/defensive_actions.csv", index=False, sep=';')

### **Possession**

In [38]:
response_possession = requests.get('https://fbref.com/en/comps/24/possession/Serie-A-Stats')

In [39]:
if response_possession.status_code == 200:
    soup = BeautifulSoup(response_possession.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        possession = tables[0]
        
        possession.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", 
            "Touches", "Def Pen", "Def 3rd", "Mid 3rd", "Att 3rd", "Att Pen", 
            "Live", "Att", "Succ", "Succ%", "Tkld", "Tkld%", "Carries", 
            "TotDist", "PrgDist", "PrgC", "1/3", "CPA", "Mis", "Dis", 
            "Rec", "PrgR", "Matches"
        ]
        
        possession = possession[possession["Rk"] != "Rk"]

        possession["Curr_Date"] = current_date
        
        possession.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  possession["Curr_Date"] = current_date


In [40]:
possession.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,Touches,Def Pen,...,PrgDist,PrgC,1/3,CPA,Mis,Dis,Rec,PrgR,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,8,0,...,7,0,0,0,0,0,7,0,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,20,1,...,11,0,0,0,0,1,15,4,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,100,1,...,152,5,2,2,6,0,66,16,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,11,2,...,7,0,0,0,0,1,8,1,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,68,1,...,36,3,1,3,12,2,47,17,Matches,2024-05-18


In [58]:
possession.to_csv(f"../datasets/{current_date}/possession.csv", index=False, sep=';')

### **Playing Time**

In [42]:
response_playing_time = requests.get('https://fbref.com/en/comps/24/playingtime/Serie-A-Stats')

In [43]:
if response_playing_time.status_code == 200:
    soup = BeautifulSoup(response_playing_time.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        playing_time = tables[0]
        
        playing_time.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "MP", 
            "Min", "Mn/MP", "Min%", "90s", "Starts", "Mn/Start", "Compl", 
            "Subs", "Mn/Sub", "unSub", "PPM", "onG", "onGA", "+/-", "+/-90", 
            "On-Off", "onxG", "onxGA", "xG+/-", "xG+/-90", "On-Off_2", "Matches"
        ]
        
        playing_time = playing_time[playing_time["Rk"] != "Rk"]

        playing_time["Curr_Date"] = current_date
        
        playing_time.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  playing_time["Curr_Date"] = current_date


In [44]:
playing_time.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,MP,Min,Mn/MP,...,+/-,+/-90,On-Off,onxG,onxGA,xG+/-,xG+/-90,On-Off_2,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,1,8.0,8.0,...,1.0,11.25,12.02,0.2,0.0,0.2,2.01,2.62,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,4,76.0,19.0,...,0.0,0.0,1.44,1.3,1.9,-0.6,-0.75,0.03,Matches,2024-05-18
2,3,Adriel,br BRA,GK,Bahia,23-125,2001,0,,,...,,,,,,,,,Matches,2024-05-18
3,4,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,5,104.0,21.0,...,-1.0,-0.87,-0.04,2.2,2.5,-0.3,-0.22,-0.1,Matches,2024-05-18
4,5,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,2,42.0,21.0,...,1.0,2.14,2.14,0.9,0.2,0.7,1.55,0.85,Matches,2024-05-18


In [59]:
playing_time.to_csv(f"../datasets/{current_date}/playing_time.csv", index=False, sep=';')

### **Misc**

In [46]:
response_misc = requests.get('https://fbref.com/en/comps/24/misc/Serie-A-Stats')

In [47]:
if response_misc.status_code == 200:
    soup = BeautifulSoup(response_misc.content, 'html.parser')
    
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = extrair_tabela(comments)
                
    if tables:
        misc = tables[0]
        
        misc.columns = [
            "Rk", "Player", "Nation", "Pos", "Squad", "Age", "Born", "90s", 
            "CrdY", "CrdR", "2CrdY", "Fls", "Fld", "Off", "Crs", "Int", 
            "TklW", "PKwon", "PKcon", "OG", "Recov", "Won", "Lost", 
            "Won%", "Matches"
        ]
        
        misc = misc[misc["Rk"] != "Rk"]

        misc["Curr_Date"] = current_date
        
        misc.reset_index(drop=True, inplace=True)
        
    else:
        print("Nenhuma tabela encontrada nos comentários HTML.")

  tables.append(pd.read_html(each)[0])
  tables.append(pd.read_html(each)[0])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  misc["Curr_Date"] = current_date


In [48]:
misc.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,CrdY,CrdR,...,TklW,PKwon,PKcon,OG,Recov,Won,Lost,Won%,Matches,Curr_Date
0,1,Abner,br BRA,MF,Juventude,20-029,2004,0.1,0,0,...,0,0,0,0,0,0,0,,Matches,2024-05-18
1,2,Luiz Adriano,br BRA,FW,Vitória,37-036,1987,0.8,0,0,...,0,0,0,0,1,2,5,28.6,Matches,2024-05-18
2,3,Adson,br BRA,"FW,MF",Vasco da Gama,23-225,2000,1.2,0,0,...,6,0,0,0,6,2,0,100.0,Matches,2024-05-18
3,4,Lucas Alario,ar ARG,"FW,MF",Internacional,31-223,1992,0.5,1,0,...,0,0,0,0,1,2,2,50.0,Matches,2024-05-18
4,5,Yuri Alberto,br BRA,FW,Corinthians,23-061,2001,2.4,1,0,...,1,0,0,0,11,3,3,50.0,Matches,2024-05-18


In [61]:
playing_time.to_csv(f"../datasets/{current_date}/misc.csv", index=False, sep=';')