# Análisis deportivo

#### Diplomado en Ciencia de Datos

#### Daniel Cervantes

## Recopilación

En este proyecto se recopilarán y estructurarán datos de la Premier League de las últimas 5 temporadas (2019-2024) utilizando técnicas de web scraping desde la plataforma FBREF. La finalidad de este proyecto es crear una base de datos robusta que permita realizar análisis deportivos tanto sencillos como complejos, y a partir de estos datos, crear visualizaciones de estos, haciendo comparativas entre ellos y poder realizar un análisis exploratorio.

También se espera desarrollar modelos de machine learning para la clasificación del estilo de juego de los equipos y la predicción de resultados. El enfoque estará en aplicar redes neuronales para obtener patrones y tendencias útiles para el análisis del rendimiento deportivo.



## Problema que se desea resolver

El problema principal es la falta de acceso a herramientas accesibles que permitan realizar análisis deportivos complejos de manera automatizada y reproducible. El objetivo del proyecto es desarrollar una plataforma útil para el análisis deportivo. Que facilite el estudio de los datos de los equipos de fútbol de la Premier League.

Además, se agregarán modelos de machine learning que darán la oportunidad de realizar análisis más profundos de los datos. Principalmente para detectar estilos de juego, rendimiento de equipos y predicción de resultados.

Esto será una herramienta valiosa para aficionados y analistas deportivos. Este proyecto busca crear un espacio en el cual de forma práctica y sencilla aficionados y analistas deportivos puedan disponer de diferentes gráficos y otras herramientas para que exploren y profundicen tanto como gusten.




## Base de datos

La información se obtendrá mediante web scraping utilizando pd.read_html() de Python para extraer tablas disponibles en FBREF. Se recopilarán estadísticas de equipos y partidos por temporada para construir una base de datos estructurada que abarque las últimas 5 temporadas completas de la Premier League. Los datos incluyen métricas clave sobre posesión, pases, tiros, portería, entre otros, para el equipo y su oponente.

### Datos de FBREF

Este proyecto iniciara con la obtención de datos por equipo. Posteriormente se agregarán datos de los partidos disputados y de los jugadores. Para ello, se utilizarán los enlaces del sitio web de FBREF, siendo un enlace por cada temporada.

**Datos de los equipos**

La primera tabla será con los datos de todos los equipos que han participado en las últimas 5 temporadas. Se juntarán todas las tablas que se encuentran en el sitio web de FBREF y finalmente se unirán en una sola tabla agregando la columna 'Season' para diferenciar de que temporada son los datos.

**Columnas**

Para que este proyecto sea más accesible se utilizarán nombres de columnas en inglés.

**Lista de las 24 tablas:**
1. Overral
2. Home/Away
3. Squad Standard Stats
4. Opponent Squad Standard Stats
5. Squad Goalkeeping
6. Opponent Squad Goalkeeping
7. Squad Advanced Goalkeeping
8. Opponent Squad Advanced Goalkeeping
9. Squad Shooting
10. Opponent Squad Shooting
11. Squad Passing
12. Opponent Squad Passing
13. Squad Pass Types
14. Opponent Squad Pass Types
15. Squad Goal and Shot Creation
16. Opponent Squad Goal and Shot Creation
17. Squad Defensive Actions
18. Opponent Squad Defensive Actions
19. Squad Possession
20. Opponent Squad Possession
21. Squad Playing Time
22. Opponent Squad Playing Time
23. Squad Miscellaneous Stats
24. Opponent Squad Miscellaneous Stats


In [None]:
import pandas as pd

# Función para procesar las 13 tablas específicas de una temporada
def procesar_temporada(enlace, temporada):
    # Leer todas las tablas desde el enlace
    todas_tablas = pd.read_html(enlace)

    # Seleccionar las 24 tablas que necesitas por su índice
    df1 = todas_tablas[0]
    df2 = todas_tablas[1]
    df3 = todas_tablas[2]
    df4 = todas_tablas[3]
    df5 = todas_tablas[4]
    df6 = todas_tablas[5]
    df7 = todas_tablas[6]
    df8 = todas_tablas[7]
    df9 = todas_tablas[8]
    df10 = todas_tablas[9]
    df11 = todas_tablas[10]
    df12 = todas_tablas[11]
    df13 = todas_tablas[12]
    df14 = todas_tablas[13]
    df15 = todas_tablas[14]
    df16 = todas_tablas[15]
    df17 = todas_tablas[16]
    df18 = todas_tablas[17]
    df19 = todas_tablas[18]
    df20 = todas_tablas[19]
    df21 = todas_tablas[20]
    df22 = todas_tablas[21]
    df23 = todas_tablas[22]
    df24 = todas_tablas[23]

    # Realizar transformaciones específicas en cada DataFrame
    # df1: Overral

    if isinstance(df1.columns, pd.MultiIndex):
        df1.columns = ['_'.join(col).strip() for col in df1.columns.values]
        df1.columns = df1.columns.str.replace(' ', '')

        df1[['Top Goal Scorer', 'Goals']] = df1['Top Team Scorer'].str.split(' - ', expand=True)
        df1['Goals'] = pd.to_numeric(df1['Goals'], errors='coerce')

        df1.drop(columns=['Top Team Scorer'], inplace=True)

    # df2: Home/Away

    if isinstance(df2.columns, pd.MultiIndex):
        df2.columns = ['_'.join(col).strip() for col in df2.columns.values]
        df2.columns = df2.columns.str.replace(' ', '')

    df2.rename(columns={'Unnamed:1_level_0_Squad': 'Squad'}, inplace=True)
    df2_columns = [
                "Squad", "Home_MP", "Home_W", "Home_D", "Home_L", "Home_GF", "Home_GA", "Home_GD", "Home_Pts", "Home_Pts/MP",
                "Home_xG", "Home_xGA", "Home_xGD", "Away_MP", "Away_W", "Away_D", "Away_L", "Away_GF", "Away_GA", "Away_GD",
                "Away_Pts", "Away_Pts/MP", "Away_xG", "Away_xGA", "Away_xGD"
    ]
    df2 = df2[df2_columns]

    # df3: Squad Standard Stats

    if isinstance(df3.columns, pd.MultiIndex):
       df3.columns = ['_'.join(col).strip() for col in df3.columns.values]
       df3.columns = df3.columns.str.replace(' ', '')

    df3.rename(columns={"Unnamed:0_level_0_Squad": "Squad"}, inplace=True)
    df3.rename(columns={"Unnamed:1_level_0_#Pl": "#Pl"}, inplace=True)
    df3.rename(columns={"Unnamed:2_level_0_Age": "Age"}, inplace=True)
    df3.rename(columns={"Unnamed:3_level_0_Poss": "Poss"}, inplace=True)
    df3_columns = [
               "Squad", "#Pl", "Age", "Poss", "PlayingTime_MP", "PlayingTime_Starts", "PlayingTime_Min", "Performance_Gls",
               "Performance_Ast", "Performance_G+A", "Performance_G-PK", "Performance_PK", "Performance_PKatt", "Performance_CrdY",
               "Performance_CrdR", "Expected_xG", "Expected_npxG", "Expected_xAG", "Expected_npxG+xAG", "Progression_PrgC", "Progression_PrgP",
    ]
    df3 = df3[df3_columns]

    # df4: Opponent Squad Standard Stats

    if isinstance(df4.columns, pd.MultiIndex):
       df4.columns = ['_Opp_'.join(col).strip() for col in df4.columns.values]
       df4.columns = df4.columns.str.replace(' ', '')

    df4.rename(columns={"Unnamed:0_level_0_Opp_Squad": "Squad"}, inplace=True)
    df4.rename(columns={"Unnamed:1_level_0_Opp_#Pl": "Opp_#Pl"}, inplace=True)
    df4.rename(columns={"Unnamed:2_level_0_Opp_Age": "Opp_Age"}, inplace=True)
    df4.rename(columns={"Unnamed:3_level_0_Opp_Poss": "Opp_Poss"}, inplace=True)
    df4_columns = [
               "Squad", "Opp_#Pl", "Opp_Age", "Opp_Poss", "Opp_Age", "Opp_Poss", "PlayingTime_Opp_Min", "Performance_Opp_Gls", "Performance_Opp_Ast",
               "Performance_Opp_G+A", "Performance_Opp_G-PK", "Performance_Opp_PK", "Performance_Opp_PKatt", "Performance_Opp_CrdY", "Performance_Opp_CrdR",
               "Expected_Opp_xG", "Expected_Opp_npxG", "Expected_Opp_xAG", "Expected_Opp_npxG+xAG", "Progression_Opp_PrgC", "Progression_Opp_PrgP",
    ]
    df4 = df4[df4_columns]

    # df5: Squad Goalkeeping

    if isinstance(df5.columns, pd.MultiIndex):
      df5.columns = ['_'.join(col).strip() for col in df5.columns.values]
      df5.columns = df5.columns.str.replace(' ', '')

    df5.rename(columns={"Unnamed:0_level_0_Squad": "Squad"}, inplace=True)

    df5_columns = [
            "Squad", "Performance_SoTA", "Performance_Saves", "Performance_CS", "PenaltyKicks_PKA", "PenaltyKicks_PKsv",
             "PenaltyKicks_PKm"
    ]

    df5 = df5[df5_columns]

    # df6: Opponent Squad Goalkeeping

    if isinstance(df6.columns, pd.MultiIndex):
      df6.columns = ['_Opp_'.join(col).strip() for col in df6.columns.values]
      df6.columns = df6.columns.str.replace(' ', '')

    df6.rename(columns={"Unnamed:0_level_0_Opp_Squad": "Squad"}, inplace=True)

    df6_columns = [
        "Squad", "Performance_Opp_SoTA", "Performance_Opp_Saves", "Performance_Opp_CS", "PenaltyKicks_Opp_PKA", "PenaltyKicks_Opp_PKsv",
        "PenaltyKicks_Opp_PKm"
    ]

    df6 = df6[df6_columns]

    # df7: Squad Advanced Goalkeeping

    if isinstance(df7.columns, pd.MultiIndex):
      df7.columns = ['_'.join(col).strip() for col in df7.columns.values]
      df7.columns = df7.columns.str.replace(' ','')

    df7.rename(columns={"Unnamed:0_level_0_Squad": "Squad"}, inplace=True)
    df7.rename(columns={"Goals_FK": "Goals_FKA"}, inplace=True)
    df7.rename(columns={"Goals_CK": "Goals_CKA"}, inplace=True)

    df7_columns = [
               "Squad", "Goals_FKA", "Goals_CKA", "Goals_OG", "Expected_PSxG", "Expected_PSxG/SoT", "Launched_Cmp", "Launched_Att",
               "Passes_Att(GK)", "Passes_Thr", "Passes_AvgLen", "GoalKicks_Att", "GoalKicks_AvgLen", "Crosses_Opp", "Crosses_Stp",
               "Sweeper_#OPA", "Sweeper_AvgDist"
    ]

    df7 = df7[df7_columns]

    # df8: Opponent Squad Advanced Goalkeeping

    if isinstance(df8.columns, pd.MultiIndex):
      df8.columns = ['_Opp_'.join(col).strip() for col in df8.columns.values]
      df8.columns = df8.columns.str.replace(' ','')

    df8.rename(columns={"Unnamed:0_level_0_Opp_Squad": "Squad"}, inplace=True)
    df8.rename(columns={"Goals_Opp_FK": "Goals_Opp_FKA"}, inplace=True)
    df8.rename(columns={"Goals_Opp_CK": "Goals_Opp_CKA"}, inplace=True)

    df8_columns = [
               "Squad", "Goals_Opp_FKA", "Goals_Opp_CKA", "Goals_Opp_OG", "Expected_Opp_PSxG", "Expected_Opp_PSxG/SoT", "Launched_Opp_Cmp", "Launched_Opp_Att",
               "Passes_Opp_Att(GK)", "Passes_Opp_Thr", "Passes_Opp_AvgLen", "GoalKicks_Opp_Att", "GoalKicks_Opp_AvgLen", "Crosses_Opp_Opp", "Crosses_Opp_Stp",
               "Sweeper_Opp_#OPA", "Sweeper_Opp_AvgDist"
    ]

    df8 = df8[df8_columns]

    # df9: Squad Shooting

    if isinstance(df9.columns, pd.MultiIndex):
      df9.columns = ['_'.join(col).strip() for col in df9.columns.values]
      df9.columns = df9.columns.str.replace(' ','')

    df9.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)
    df9.rename(columns={'Standard_FK': 'Standard_FKAtt'}, inplace=True)

    df9_columns = [
        "Squad", "Standard_Sh", "Standard_SoT", "Standard_G/Sh", "Standard_G/SoT", "Standard_Dist", "Standard_FKAtt",
        "Expected_npxG/Sh"
    ]

    df9 = df9[df9_columns]

    # df10: Opponent Squad Shooting

    if isinstance(df10.columns, pd.MultiIndex):
      df10.columns = ['_Opp_'.join(col).strip() for col in df10.columns.values]
      df10.columns = df10.columns.str.replace(' ','')

    df10.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)
    df10.rename(columns={'Standard_Opp_FK': 'Standard_Opp_FKAtt'}, inplace=True)

    df10_columns = [
        "Squad", "Standard_Opp_Sh", "Standard_Opp_SoT", "Standard_Opp_G/Sh", "Standard_Opp_G/SoT", "Standard_Opp_Dist", "Standard_Opp_FKAtt",
        "Expected_Opp_npxG/Sh"
    ]

    df10 = df10[df10_columns]

    # df11: Squad Passing

    if isinstance(df11.columns, pd.MultiIndex):
      df11.columns = ['_'.join(col).strip() for col in df11.columns.values]
      df11.columns = df11.columns.str.replace(' ','')

    df11.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)
    df11.rename(columns={"Total_Cmp":"Total_PasCmp"}, inplace=True)
    df11.rename(columns={"Total_Att":"Total_PasAtt"}, inplace=True)
    df11.rename(columns={"Short_Cmp":"Short_PasCmp"}, inplace=True)
    df11.rename(columns={"Short_Att":"Short_PasAtt"}, inplace=True)
    df11.rename(columns={"Medium_Cmp":"Medium_PasCmp"}, inplace=True)
    df11.rename(columns={"Medium_Att":"Medium_PasAtt"}, inplace=True)
    df11.rename(columns={"Long_Cmp":"Long_PasCmp"}, inplace=True)
    df11.rename(columns={"Long_Att":"Long_PasAtt"}, inplace=True)
    df11.rename(columns={'Unnamed:21_level_0_KP':'Expected_KP'}, inplace=True)
    df11.rename(columns={'Unnamed:22_level_0_1/3':'Expected_F1/3'}, inplace=True)
    df11.rename(columns={'Unnamed:23_level_0_PPA':'Expected_PPA'}, inplace=True)
    df11.rename(columns={'Unnamed:24_level_0_CrsPA':'Expected_CrsPA'}, inplace=True)
    df11.rename(columns={'Unnamed:25_level_0_PrgP':'Expected_PrgP'}, inplace=True)

    df11_columns = [
        "Squad", "Total_PasCmp", "Total_PasAtt", "Total_TotDist", "Total_PrgDist", "Short_PasCmp", "Short_PasAtt",
        "Medium_PasCmp", "Medium_PasAtt", "Long_PasCmp", "Long_PasAtt", "Expected_xA", "Expected_A-xAG", "Expected_KP",
        "Expected_F1/3", "Expected_PPA", "Expected_CrsPA", "Expected_PrgP"
    ]

    df11 = df11[df11_columns]

    # df12: Opponent Squad Passing

    if isinstance(df12.columns, pd.MultiIndex):
      df12.columns = ['_Opp_'.join(col).strip() for col in df12.columns.values]
      df12.columns = df12.columns.str.replace(' ','')

    df12.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)
    df12.rename(columns={"Total_Opp_Cmp":"Total_Opp_PasCmp"}, inplace=True)
    df12.rename(columns={"Total_Opp_Att":"Total_Opp_PasAtt"}, inplace=True)
    df12.rename(columns={"Short_Opp_Cmp":"Short_Opp_PasCmp"}, inplace=True)
    df12.rename(columns={"Short_Opp_Att":"Short_Opp_PasAtt"}, inplace=True)
    df12.rename(columns={"Medium_Opp_Cmp":"Medium_Opp_PasCmp"}, inplace=True)
    df12.rename(columns={"Medium_Opp_Att":"Medium_Opp_PasAtt"}, inplace=True)
    df12.rename(columns={"Long_Opp_Cmp":"Long_Opp_PasCmp"}, inplace=True)
    df12.rename(columns={"Long_Opp_Att":"Long_Opp_PasAtt"}, inplace=True)
    df12.rename(columns={'Unnamed:21_level_0_Opp_KP':'Expected_Opp_KP'}, inplace=True)
    df12.rename(columns={'Unnamed:22_level_0_Opp_1/3':'Expected_Opp_F1/3'}, inplace=True)
    df12.rename(columns={'Unnamed:23_level_0_Opp_PPA':'Expected_Opp_PPA'}, inplace=True)
    df12.rename(columns={'Unnamed:24_level_0_Opp_CrsPA':'Expected_Opp_CrsPA'}, inplace=True)
    df12.rename(columns={'Unnamed:25_level_0_Opp_PrgP':'Expected_Opp_PrgP'}, inplace=True)

    df12_columns = [
        "Squad", "Total_Opp_PasCmp", "Total_Opp_PasAtt", "Total_Opp_TotDist", "Total_Opp_PrgDist", "Short_Opp_PasCmp", "Short_Opp_PasAtt",
        "Medium_Opp_PasCmp", "Medium_Opp_PasAtt", "Long_Opp_PasCmp", "Long_Opp_PasAtt", "Expected_Opp_xA", "Expected_Opp_A-xAG", "Expected_Opp_KP",
        "Expected_Opp_F1/3", "Expected_Opp_PPA", "Expected_Opp_CrsPA", "Expected_Opp_PrgP"
    ]

    df12 = df12[df12_columns]

    # df13: Squad Pass Types

    if isinstance(df13.columns, pd.MultiIndex):
        df13.columns = ['_'.join(col).strip() for col in df13.columns.values]
        df13.columns = df13.columns.str.replace(' ','')

    df13.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)

    df13_columns = [
        'Squad', "PassTypes_Live", "PassTypes_Dead", "PassTypes_FK", "PassTypes_TB", "PassTypes_Sw", "PassTypes_Crs",
        "PassTypes_TI", "PassTypes_CK", "CornerKicks_In", "CornerKicks_Out", "CornerKicks_Str", "Outcomes_Cmp",
        "Outcomes_Off", "Outcomes_Blocks"
    ]

    df13 = df13[df13_columns]

    # df14: Opponent Squad Pass Types

    if isinstance(df14.columns, pd.MultiIndex):
        df14.columns = ['_Opp_'.join(col).strip() for col in df14.columns.values]
        df14.columns = df14.columns.str.replace(' ','')

    df14.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)

    df14_columns = [
        'Squad', "PassTypes_Opp_Live", "PassTypes_Opp_Dead", "PassTypes_Opp_FK", "PassTypes_Opp_TB", "PassTypes_Opp_Sw", "PassTypes_Opp_Crs",
        "PassTypes_Opp_TI", "PassTypes_Opp_CK", "CornerKicks_Opp_In", "CornerKicks_Opp_Out", "CornerKicks_Opp_Str", "Outcomes_Opp_Cmp",
        "Outcomes_Opp_Off", "Outcomes_Opp_Blocks"
    ]

    df14 = df14[df14_columns]

    # df15: Squad Goal and Shot Creation

    if isinstance(df15.columns, pd.MultiIndex):
      df15.columns = ['_'.join(col).strip() for col in df15.columns.values]
      df15.columns = df15.columns.str.replace(' ','')

    df15.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)

    df15_columns = [
       'Squad', "SCA_SCA", "SCATypes_PassLive", "SCATypes_PassDead", "SCATypes_TO", "SCATypes_Sh", "SCATypes_Fld",
        "SCATypes_Def", "GCA_GCA", "GCATypes_PassLive", "GCATypes_PassDead", "GCATypes_TO", "GCATypes_Sh",
        "GCATypes_Fld", "GCATypes_Def"
    ]

    df15 = df15[df15_columns]

    # df16: Opponent Squad Goal and Shot Creation

    if isinstance(df16.columns, pd.MultiIndex):
      df16.columns = ['_Opp_'.join(col).strip() for col in df16.columns.values]
      df16.columns = df16.columns.str.replace(' ','')

    df16.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)

    df16_columns = [
       'Squad', "SCA_Opp_SCA", "SCATypes_Opp_PassLive", "SCATypes_Opp_PassDead", "SCATypes_Opp_TO", "SCATypes_Opp_Sh", "SCATypes_Opp_Fld",
        "SCATypes_Opp_Def", "GCA_Opp_GCA", "GCATypes_Opp_PassLive", "GCATypes_Opp_PassDead", "GCATypes_Opp_TO", "GCATypes_Opp_Sh",
        "GCATypes_Opp_Fld", "GCATypes_Opp_Def"
    ]

    df16 = df16[df16_columns]

    # df17: Squad Defensive Actions

    if isinstance(df17.columns, pd.MultiIndex):
       df17.columns = ['_'.join(col).strip() for col in df17.columns.values]
       df17.columns = df17.columns.str.replace(' ','')

    df17.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)
    df17.rename(columns={'Unnamed:15_level_0_Int':'Blocks_Int'}, inplace=True)
    df17.rename(columns={'Unnamed:16_level_0_Tkl+Int':'Blocks_Tkl+Int'}, inplace=True)
    df17.rename(columns={'Unnamed:17_level_0_Clr':'Blocks_Clr'}, inplace=True)
    df17.rename(columns={'Unnamed:18_level_0_Err':'Blocks_Err'}, inplace=True)

    df17_columns = [
       'Squad', "Tackles_Tkl", "Tackles_TklW", "Tackles_Def3rd", "Tackles_Mid3rd", "Tackles_Att3rd", "Challenges_Tkl",
        "Challenges_Att", "Challenges_Tkl%", "Challenges_Lost", "Blocks_Blocks", "Blocks_Sh", "Blocks_Pass",
        "Blocks_Int", "Blocks_Tkl+Int", "Blocks_Clr", "Blocks_Err"
    ]

    df17 = df17[df17_columns]

    # df18: Opponent Squad Defensive Actions

    if isinstance(df18.columns, pd.MultiIndex):
       df18.columns = ['_Opp_'.join(col).strip() for col in df18.columns.values]
       df18.columns = df18.columns.str.replace(' ','')

    df18.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)
    df18.rename(columns={'Unnamed:15_level_0_Opp_Int':'Blocks_Opp_Int'}, inplace=True)
    df18.rename(columns={'Unnamed:16_level_0_Opp_Tkl+Int':'Blocks_Opp_Tkl+Int'}, inplace=True)
    df18.rename(columns={'Unnamed:17_level_0_Opp_Clr':'Blocks_Opp_Clr'}, inplace=True)
    df18.rename(columns={'Unnamed:18_level_0_Opp_Err':'Blocks_Opp_Err'}, inplace=True)

    df18_columns = [
       'Squad', "Tackles_Opp_Tkl", "Tackles_Opp_TklW", "Tackles_Opp_Def3rd", "Tackles_Opp_Mid3rd", "Tackles_Opp_Att3rd", "Challenges_Opp_Tkl",
        "Challenges_Opp_Att", "Challenges_Opp_Tkl%", "Challenges_Opp_Lost", "Blocks_Opp_Blocks", "Blocks_Opp_Sh", "Blocks_Opp_Pass",
        "Blocks_Opp_Int", "Blocks_Opp_Tkl+Int", "Blocks_Opp_Clr", "Blocks_Opp_Err"
    ]

    df18 = df18[df18_columns]

    # df19: Squad Possession

    if isinstance(df19.columns, pd.MultiIndex):
       df19.columns = ['_'.join(col).strip() for col in df19.columns.values]
       df19.columns = df19.columns.str.replace(' ','')

    df19.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)
    df19.rename(columns={'Carries_1/3':'Carries_C1/3'}, inplace=True)

    df19_columns = [
        'Squad', "Touches_Touches", "Touches_DefPen", "Touches_Def3rd", "Touches_Mid3rd", "Touches_Att3rd",
        "Touches_AttPen", "Touches_Live", "Take-Ons_Att", "Take-Ons_Succ", "Take-Ons_Tkld", "Carries_Carries",
        "Carries_TotDist", "Carries_PrgDist", "Carries_PrgC", "Carries_C1/3", "Carries_CPA", "Carries_Mis",
        "Carries_Dis", "Receiving_Rec", "Receiving_PrgR"
    ]

    df19 = df19[df19_columns]

    # df20: Opponent Squad Possession

    if isinstance(df20.columns, pd.MultiIndex):
       df20.columns = ['_Opp_'.join(col).strip() for col in df20.columns.values]
       df20.columns = df20.columns.str.replace(' ','')

    df20.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)
    df20.rename(columns={'Carries_Opp_1/3':'Carries_Opp_C1/3'}, inplace=True)

    df20_columns = [
        'Squad', "Touches_Opp_Touches", "Touches_Opp_DefPen", "Touches_Opp_Def3rd", "Touches_Opp_Mid3rd", "Touches_Opp_Att3rd",
        "Touches_Opp_AttPen", "Touches_Opp_Live", "Take-Ons_Opp_Att", "Take-Ons_Opp_Succ", "Take-Ons_Opp_Tkld", "Carries_Opp_Carries",
        "Carries_Opp_TotDist", "Carries_Opp_PrgDist", "Carries_Opp_PrgC", "Carries_Opp_C1/3", "Carries_Opp_CPA", "Carries_Opp_Mis",
        "Carries_Opp_Dis", "Receiving_Opp_Rec", "Receiving_Opp_PrgR"
    ]

    df20 = df20[df20_columns]

    # df21: Squad Playing Time

    if isinstance(df21.columns, pd.MultiIndex):
      df21.columns = ['_'.join(col).strip() for col in df21.columns.values]
      df21.columns = df21.columns.str.replace(' ','')

    df21.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)

    df21_columns = [
        'Squad', "Starts_Starts", "Starts_Mn/Start", "Starts_Compl", "Subs_Subs", "Subs_Mn/Sub", "Subs_unSub",
        "TeamSuccess_PPM", "TeamSuccess_onG", "TeamSuccess_onGA", "TeamSuccess(xG)_onxG", "TeamSuccess(xG)_onxGA"
    ]

    df21 = df21[df21_columns]

    # df22: Opponent Squad Playing Time

    if isinstance(df22.columns, pd.MultiIndex):
      df22.columns = ['_Opp_'.join(col).strip() for col in df22.columns.values]
      df22.columns = df22.columns.str.replace(' ','')

    df22.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)

    df22_columns = [
        'Squad', "Starts_Opp_Starts", "Starts_Opp_Mn/Start", "Starts_Opp_Compl", "Subs_Opp_Subs", "Subs_Opp_Mn/Sub", "Subs_Opp_unSub",
        "TeamSuccess_Opp_PPM", "TeamSuccess_Opp_onG", "TeamSuccess_Opp_onGA", "TeamSuccess(xG)_Opp_onxG", "TeamSuccess(xG)_Opp_onxGA"
    ]

    df22 = df22[df22_columns]

    # df23: Squad Miscellaneous Stats

    if isinstance(df23.columns, pd.MultiIndex):
      df23.columns = ['_'.join(col).strip() for col in df23.columns.values]
      df23.columns = df23.columns.str.replace(' ','')

    df23.rename(columns={'Unnamed:0_level_0_Squad':'Squad'}, inplace=True)

    df23_columns = [
        'Squad', "Performance_Fls", "Performance_Fld", "Performance_Off", "Performance_PKwon", "Performance_PKcon",
        "Performance_Recov", "AerialDuels_Won", "AerialDuels_Lost"
    ]

    df23 = df23[df23_columns]

    # df24: Opponent Squad Miscellaneous Stats

    if isinstance(df24.columns, pd.MultiIndex):
      df24.columns = ['_Opp_'.join(col).strip() for col in df24.columns.values]
      df24.columns = df24.columns.str.replace(' ','')

    df24.rename(columns={'Unnamed:0_level_0_Opp_Squad':'Squad'}, inplace=True)

    df24_columns = [
        'Squad', "Performance_Opp_Fls", "Performance_Opp_Fld", "Performance_Opp_Off", "Performance_Opp_PKwon", "Performance_Opp_PKcon",
        "Performance_Opp_Recov", "AerialDuels_Opp_Won", "AerialDuels_Opp_Lost"
    ]

    df24 = df24[df24_columns]

    # Eliminar 'vs ' de los nombres de los equipos en las tablas de oponentes
    for i, df in enumerate([df4, df6, df8, df10, df12, df14, df16, df18, df20, df22, df24]):
        df['Squad'] = df['Squad'].str.replace('vs ', '')

    # Lista de los DataFrames que se van a unir
    dfs = [df1, df2, df3, df4, df5, df6, df7, df8, df9, df10, df11, df12, df13, df14, df15, df16, df17, df18, df19, df20, df21, df22, df23, df24]

    # Unir los DataFrames por la columna 'Squad'
    df_merged = dfs[0]
    for i in range(1, len(dfs)):
        df_merged = pd.merge(df_merged, dfs[i], on='Squad', how='inner')

        df_merged['Season'] = temporada

    # Asegurarse de que la columna 'Season' esté al principio
    columnas = ['Season'] + [col for col in df_merged.columns if col != 'Season']
    df_merged = df_merged[columnas]

    return df_merged

# Lista de temporadas y sus respectivos enlaces
temporadas_enlaces = {
    '2023/2024': 'https://fbref.com/en/comps/9/2023-2024/2023-2024-Premier-League-Stats',
    '2022/2023': 'https://fbref.com/en/comps/9/2022-2023/2022-2023-Premier-League-Stats',
    '2021/2022': 'https://fbref.com/en/comps/9/2021-2022/2021-2022-Premier-League-Stats',
    '2020/2021': 'https://fbref.com/en/comps/9/2020-2021/2020-2021-Premier-League-Stats',
    '2019/2020': 'https://fbref.com/en/comps/9/2019-2020/2019-2020-Premier-League-Stats'
}

# Crear un DataFrame vacío para almacenar los resultados finales
df_total = pd.DataFrame()

# Bucle para procesar cada temporada
for temporada, enlace in temporadas_enlaces.items():
    df_temporada = procesar_temporada(enlace, temporada)
    df_total = pd.concat([df_total, df_temporada], ignore_index=True)

# Mostrar las primeras filas del DataFrame final
df_total.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Squad'] = df['Squad'].str.replace('vs ', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Squad'] = df['Squad'].str.replace('vs ', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Squad'] = df['Squad'].str.replace('vs ', '')
A value is trying to be set on a copy of a slice from a Da

Unnamed: 0,Season,Rk,Squad,MP,W,D,L,GF,GA,GD,...,AerialDuels_Won,AerialDuels_Lost,Performance_Opp_Fls,Performance_Opp_Fld,Performance_Opp_Off,Performance_Opp_PKwon,Performance_Opp_PKcon,Performance_Opp_Recov,AerialDuels_Opp_Won,AerialDuels_Opp_Lost
0,2023/2024,1,Manchester City,38,28,7,3,96,34,62,...,325,288,422,280,82,3,10,1591,288,325
1,2023/2024,2,Arsenal,38,28,5,5,91,29,62,...,503,499,407,365,60,2,10,1585,499,503
2,2023/2024,3,Liverpool,38,24,10,4,86,41,45,...,583,441,391,453,75,1,9,1842,441,583
3,2023/2024,4,Aston Villa,38,20,8,10,76,61,15,...,334,363,483,394,167,2,4,1825,363,334
4,2023/2024,5,Tottenham,38,20,6,12,74,61,13,...,359,388,530,414,125,6,2,1880,388,359


###Exportar datos
Se tiene una tabla con 100 filas y 346 columnas.

In [None]:
df_total.head()

Unnamed: 0,Season,Rk,Squad,MP,W,D,L,GF,GA,GD,...,AerialDuels_Won,AerialDuels_Lost,Performance_Opp_Fls,Performance_Opp_Fld,Performance_Opp_Off,Performance_Opp_PKwon,Performance_Opp_PKcon,Performance_Opp_Recov,AerialDuels_Opp_Won,AerialDuels_Opp_Lost
0,2023/2024,1,Manchester City,38,28,7,3,96,34,62,...,325,288,422,280,82,3,10,1591,288,325
1,2023/2024,2,Arsenal,38,28,5,5,91,29,62,...,503,499,407,365,60,2,10,1585,499,503
2,2023/2024,3,Liverpool,38,24,10,4,86,41,45,...,583,441,391,453,75,1,9,1842,441,583
3,2023/2024,4,Aston Villa,38,20,8,10,76,61,15,...,334,363,483,394,167,2,4,1825,363,334
4,2023/2024,5,Tottenham,38,20,6,12,74,61,13,...,359,388,530,414,125,6,2,1880,388,359


In [None]:
df_total.to_csv('data.csv', index=False)