# Analysis of reproducibility
### Solène Lemonnier & Pauline Roches

This project is based on the paper "[COVID and Home Advantage in Football: An Analysis of Results and xG Data in European Leagues](https://blog.mathieuacher.com/FootballAnalysis-xG-COVIDHome/)" by Mathieu Acher. This study was published on May 23, 2021. 

## Analysis of the study

After reviewing the study, we were able to outline the context and understand its stakes. The goal of the study is to assess the impact of playing at home considering the presence of supporters in a football stadium. The initial hypothesis is that their presence has a positive effect on team performance, both in terms of points and goals scored, as well as in relation to expected points and goals.\
\
The COVID-19 pandemic provided data that allowed for comparisons between matches played in empty and filled stadiums. However, it is important to note that the impact extended beyond the spectators and also affected the teams themselves, potentially disrupting training sessions or sidelining players due to illness.\
\
To conduct this analysis, six European football leagues (Ligue 1, La Liga, EPL, Bundesliga, Serie A and RFPL) were studied from seasons 2014 to 2020. It is worth noting that the crisis was managed differently across leagues, which makes comparisons between them more challenging.




### Methodology 

The data were sourced from the website "[Understat](https://understat.com/)". The variables used include the number of matches, goals, expected goals (xG), expected goals conceded (xGA), points, and expected points (xPoints).\
\
To analyze the data, the following methods were used : 
- The **Wilcoxon signed-rank test** : a non-parametric test used to compare two paired sets of values, often before and after a treatment. The aim is to determine whether the two measurements are significantly different without assuming a specific distribution for the differences. From this test, the **p-value** is derived (the probability of observing results as extreme as those in the sample, under the null hypothesis that there is no effect or difference. If the p-value is small (<0.05), the null hypothesis is rejected, indicating an effect).
- The **Cohen's d** : a measure of effect size, used to determine if an observed difference between two groups is practically significant, beyond just statistical significance.
- The **Mann-Whitney U test** : a non-parametric test used to compare two independent groups to assess whether their distributions differ significantly.

### Key findings

The study observed that, generally, there is a notable advantage in terms of points gained when playing at home. However, during the COVID-19 seasons, this advantage diminished significantly or even reversed.

---
---

## Reproduction of the study

### Data collection

To begin our data collection process for each league and season, we utilize the [*Understat* Python package](https://understat.readthedocs.io/en/latest/). Understat is a specialized library designed to interact with the statistical data provided by the website *Understat.com*. This package allows us to programmatically fetch and analyze data related to various leagues, seasons, teams, players, and matches.

### Table

We first reproduced the table using `./reproduction/reproduce_diff_points.py`, which allows us to observe the differences in points and xPoints between seasons for all leagues. The resulting graph is `./reproduction/results/diff_points_xpoints.png`.

In [21]:
from IPython.display import display, HTML

# Créez du code HTML pour afficher les images côte à côte
html_code = """
<div style="display: flex; justify-content: space-between;">
    <img src="reproduction/results/diff_points_xpoints.png" style="max-width: 48%; height: auto;" />
    <img src="results_acherm/diff_points_xpoints_acherm.png" style="max-width: 48%; height: 50%;" />
</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


By comparing our table (left table) to the one of Mathieu Acher (right table), we observe the same results after carefully checking the rounding. Indeed, since xPoints are floats, it is important to ensure that the points are rounded conventionally. By default, they were rounded down.\
\
We just have one year in one league (La Liga, 2019) that does not have the same values than in the work of Mathieu Acher. Indeed, the Diff xPoints is 187 with Understat library instead of 188. It may be caused by the way the package handle the retrieving of data.\
\
We can see a clear home advantage in the different leagues, which diminishes during the COVID period.

### Graphs

We then reproduced the graphs using `./reproduction/graphs_par_ligue.py`  which allows us to observe the evolution of points earned and expected points both at home and away for all leagues from 2014 to 2020. The outputs are stored in the `./reproduction/results/evolutions_par_ligue` folder, with one graph per league.

In [4]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: space-between; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/evolutions_par_ligue/evolution_points_Bundesliga.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_Bundesliga_acherm.png" width="40%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/evolutions_par_ligue/evolution_points_EPL.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_EPL_acherm.png" width="40%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/evolutions_par_ligue/evolution_points_La_liga.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_La_liga_acherm.png" width="40%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/evolutions_par_ligue/evolution_points_Ligue_1.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_Ligue_1_acherm.png" width="40%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/evolutions_par_ligue/evolution_points_RFPL.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_RFPL_acherm.png" width="40%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/evolutions_par_ligue/evolution_points_Serie_A.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_Serie_A_acherm.png" width="40%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


By comparing our tables (left table) to those of Mathieu Acher (right table), we observe the same averages and, therefore, the same variations. The curves from the left and right tables are identical.\
\
The loss of the home advantage is particularly noticeable during the COVID seasons.

### Statistical tests

#### Non-parametrical Wilcoxon Signed-Rank test 

We reproduced the Wilcoxon test with the function *wilcoxon* from the python library and the Cohen's d with a function that we made by ourselves. The code is done in the `./reproduction/wilcoxon_with_understat.py` file and the results are saved in the file `./reproduction/results/reproduction_wilcoxon.png`.

In [29]:
from IPython.display import display, HTML

# Créez du code HTML pour afficher les images côte à côte
html_code = """
<img src="reproduction/results/reproduction_wilcoxon.png" style="height: auto;" />
"""

# Afficher les images côte à côte
display(HTML(html_code))

# Créez du code HTML pour afficher les images côte à côte
html_code_img = """
<img src="results_acherm/result-sTest-Wilco.png" style="height: auto;" />
"""

# Afficher les images côte à côte
display(HTML(html_code_img))

By comparing our results (first table) with the results of the study (second table), we can observe the same results for the wilcoxon test and the Cohen's d. In Ligue 1 and EPL, there is indeed a difference between Covid and non-Covid seasons : home advantage seems to not be a thing during Covid. In Bundesliga, the hybrid season has an impact on the Points and xPoints. In Liga, there seems to have no change and in Serie A and RFPL there is an change in actual points but not only during the Covid seasons.

#### Non-parametrical Mann–Whitney U test

We finnaly reproduced the Mann-Whitney U test with the function *mannwhitneyu* from the python library. The code is done in the `reproduction/mannwhitneyu.py` file and the results are saved in the folder `reproduction/tableau_ligues`, with one graph per league.

In [1]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: space-between; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/tableau_ligues/mann_whitney_Bundesliga.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_bundesliga.png" width="40%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/tableau_ligues/mann_whitney_EPL.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_epl.png" width="40%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/tableau_ligues/mann_whitney_La_liga.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_liga.png" width="40%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/tableau_ligues/mann_whitney_Ligue_1.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_ligue1.png" width="40%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/tableau_ligues/mann_whitney_RFPL.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_rfpl.png" width="40%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./reproduction/results/tableau_ligues/mann_whitney_Serie_A.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_seriea.png" width="40%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


By comparing our results (left side) with the results of the study (right side), we can lead to the same conclusion as the study. The results of the Wilcoxon test are confirmed for Ligue 1 and EPL. Liga is now impacted by the Covid. By comparing the seasons in Bundesliga, it seems that there is no difference between Covid seasons and the 2015-2016 season. In Serie A, only the 2020-2021 season, when compared to the 2015 and 2016 seasons, is impacted. RFPL is not impacted at all.

---
---

Building on the initial reproduction of the study, where we carefully revisited the original analysis to ensure accuracy and consistency in the findings, we now move into the replication phase. This step goes beyond merely verifying the original results; it aims to explore the robustness of the conclusions by applying new methodologies which may be new datasets, statistical metods or  tools. By doing so, we aim to explore the robustness of the conclusions, confirm their reliability, and potentially uncover additional insights from this expanded analytical approach.

---
---

## Replication of the study

---

### 1st change - Data collection process : Web Scraping

In our initial approach to replicating the data collection process, we opted to use web scraping instead of the Understat library. This method involves directly extracting data from the Understat website. It allows us to tailor the data extraction process as we have a finer control over the data collection.

#### Table

We added the `./replicabilite/web_scraping/scrap.py` file, which allows us to retrieve all team data from *2014* to *2020* for both *home* and *away* games. The results are stored in the CSV file `./replicabilite/web_scraping/understat_team_stats_home_away.csv`.

We replicated the table using `./replicabilite/web_scraping/reproduce_diff_points.py`, which allows us to observe the differences in points and xPoints between seasons for all leagues. The resulting graph is `./replicabilite/web_scraping/results/diff_points_xpoints.png`.

In [26]:
from IPython.display import display, HTML

# Créez du code HTML pour afficher les images côte à côte
html_code = """
<div style="display: flex; justify-content: space-between;">
    <img src="replicabilite/web_scraping/results/diff_points_xpoints.png" style="max-width: 48%; height: auto;" />
    <img src="results_acherm/diff_points_xpoints_acherm.png" style="max-width: 48%; height: 50%;" />
</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


By comparing our table (left table) to those of Mathieu Acher (right table), we observe the exact same results, nothing is different. But if we compare it with the results we had in our reproduction with another way of data collection, we observe a one-point difference. It mays mean that the Python package has a different way of recollecting the datas.


#### Graphs

We then replicated the graphs using `./replicabilite/web_scraping/graphs_par_ligue.py`  which allows us to observe the evolution of points earned and expected points both at home and away for all leagues from 2014 to 2020. The outputs are stored in the `./replicabilite/web_scraping/results/evolutions_par_ligue` folder, with one graph per league.

In [28]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: space-between; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/evolutions_par_ligue/evolution_points_Bundesliga.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_Bundesliga_acherm.png" width="40%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/evolutions_par_ligue/evolution_points_EPL.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_EPL_acherm.png" width="40%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/evolutions_par_ligue/evolution_points_La_liga.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_La_liga_acherm.png" width="40%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/evolutions_par_ligue/evolution_points_Ligue_1.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_Ligue_1_acherm.png" width="40%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/evolutions_par_ligue/evolution_points_RFPL.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_RFPL_acherm.png" width="40%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/evolutions_par_ligue/evolution_points_Serie_A.png" width="45%" />
        <img src="./results_acherm/evolutions_par_ligue_acherm/evolution_points_Serie_A_acherm.png" width="40%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


By comparing our tables (left table) to those of Mathieu Acher (right table), we observe the same averages and, therefore, the same variations. The curves from the left and right tables are identical. The variations are also identical to those observed with the python package.
\
The loss of the home advantage is particularly noticeable during the COVID seasons.

#### Statistical tests

##### Non-parametrical Wilcoxon Signed-Rank test 

We replicated the Wilcoxon test with the function *wilcoxon* from the python library and the Cohen's d with a function that we made by ourselves. The code is done in the `./replicabilite/web_scraping/wilcoxon_web.py` and the results are saved in the file `./replicabilite/web_scraping/results/wilcoxon_web.png`.

In [30]:
from IPython.display import display, HTML

html_code = """
<img src="./replicabilite/web_scraping/results/wilcoxon_web.png" style="height: auto;" />
"""

display(HTML(html_code))

html_code_img = """
<img src="results_acherm/result-sTest-Wilco.png" style="height: auto;" />
"""

display(HTML(html_code_img))

**Analysis of the results with web scraping**

Statistical results show that the COVID period had variable effects:

1. **Ligue 1 and EPL**:  
   In both leagues, the advantage of playing at home is seriously questioned in the 2020 season, with significant p-values for points but not for xPoints or xG. This suggests that home advantage was diminished during the COVID period, but expected performance metrics (xPoints and xG) were not affected.

2. **Bundesliga**:  
   The hybrid 2019 season is impacted, with a significant p-value for points but no significant results for xPoints or xG. This implies a disruption in actual results during this season, potentially reflecting external factors, but not in underlying performance metrics.

3. **Serie A**:  
   The 2019 season shows a significant p-value for points, highlighting an impact on home advantage during that year. However, no significant p-values for xPoints or xG suggest that the disruption primarily affected actual results rather than expected performance.  

4. **La Liga**:  
   Home advantage seems unaffected, as there are no significant p-values across points, xPoints, or xG. This indicates a consistent pattern where playing at home still provides a tangible advantage.  

5. **RFPL (Russian Premier League)**:  
   The 2019 season is marked by a significant p-value for points, indicating an impact on home advantage during this period. However, xPoints and xG remain stable, suggesting the disruption did not extend to expected performance metrics.  

6. **xPoints and xG across all leagues**:  
   No significant p-values are found for xPoints or xG across any league. This reinforces the conclusion that these metrics are less sensitive to external factors like the COVID period compared to actual results.

**Comparison**

1. **Cohen's d**:  
   Both methods yield identical results for Cohen's d, indicating that the effect sizes are consistent regardless of the data extraction method.

2. **Wilcoxon test for points**:  
   - **Similarities**: In **Ligue 1**, **EPL**, **Bundesliga** and **La Liga**, both methods produce the same results, with significant p-values in 2020 (Ligue 1 and EPL), in 2019 (Bundesliga) and no significant p-values in La Liga.  
   - **Differences**:   
     - For **Serie A**, the first method flags significance in 2017, 2019, and 2020, while the second method only identifies 2019.  
     - For **RFPL**, the first method highlights significance in 2015, 2017, and 2019, whereas the second method only flags 2019.  

3. **Wilcoxon test for xPoints and xG**:  
   - In the **first method**, significant p-values for xPoints are found in 2020 for Ligue 1, Bundesliga, and EPL, but none for xG.  
   - In the **second method**, no significant p-values are found for either xPoints or xG in any league.  

#### Non-parametrical Mann–Whitney U test

We replicated the Mann-Whitney U test with the function *mannwhitneyu* from the python library. The code is done in the `replicabilite/web_scraping/mannwhitneyu.py` and the results are saved in the folder `replicabilite/webscraping/results/tableau_ligues`, with one graph per league.

In [1]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: space-between; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/tableau_ligues/mann_whitney_Bundesliga.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_bundesliga.png" width="40%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/tableau_ligues/mann_whitney_EPL.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_epl.png" width="40%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/tableau_ligues/mann_whitney_La_liga.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_liga.png" width="40%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/tableau_ligues/mann_whitney_Ligue_1.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_ligue1.png" width="40%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/tableau_ligues/mann_whitney_RFPL.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_rfpl.png" width="40%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/web_scraping/results/tableau_ligues/mann_whitney_Serie_A.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_seriea.png" width="40%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


The results obtained with the second data extraction method are identical to those from the first method. Therefore, the conclusions drawn from the initial analysis remain fully valid.

#### Conclusion with another data extraction method:

The new data extraction method provides results that align with the initial numerical comparison of points and xPoints, confirming the consistency of these indicators across methods. However, statistical tests reveal notable differences. While the Mann-Whitney U test produces identical outcomes between the two methods, the Wilcoxon test yields divergent results, which introduces a degree of uncertainty regarding the reproducibility of the study’s conclusions.  

This discrepancy highlights the sensitivity of certain statistical tests to data extraction methods and suggests that interpretations based on Wilcoxon test results should be approached with caution. Despite these differences, the overarching findings of the study remain robust, though they may require nuanced consideration in light of these methodological variations. This emphasizes the importance of methodological consistency when studying phenomena such as the impact of COVID-19 on home advantage.

---

### 2nd change - New statistical method : Repeated Measures ANOVA

The Repeated Measures ANOVA (Analysis of Variance) is a statistical method used to evaluate whether there are significant differences in performance metrics across various conditions, such as matches with and without spectators. This method is particularly suitable when the same teams are observed under different conditions, as it accounts for within-subject variability by controlling for differences within the teams themselves.

So here, the primary goal is to assess the impact of playing conditions on football team performance, specifically examining differences between home and away matches. This analysis focuses on performance metrics such as points, goals scored, expected goals (xG), and expected goals against (xGA).

The method involves the following steps:
- Collecting performance data for teams across different seasons.
- Applying Repeated Measures ANOVA to determine whether there are statistically significant differences in performance based on playing conditions (home vs. away).
- Analyzing results for various leagues and seasons to identify patterns and potential impacts of specific factors, such as the COVID-19 pandemic.

The structure of the results includes:
- League: The football league being analyzed.
- Season: The corresponding season.
- anova-F: The F-statistic from the ANOVA test.
- anova-pvalue: The p-value associated with the test, indicating statistical significance.
- anova-eta-sq: The effect size (eta-squared), representing the magnitude of the observed differences.

In [2]:
from IPython.display import display, HTML

# Créez du code HTML pour afficher les images côte à côte
html_code = """
<img src="replicabilite/new_statistical_method/results/reproduction_anova.png" style="height: auto;" />
"""

# Afficher les images côte à côte
display(HTML(html_code))


#### Findings from the Analysis:

The results of the ANOVA across several major European football leagues from 2014 to 2020 reveal important insights into the home advantage and its variability over time.

- **Ligue 1:**
Most seasons show statistically significant differences between home and away performances (p-value < 0.05). However, the 2020 season is an outlier with a p-value of 0.9529, indicating no significant difference. This suggests that home advantage was negligible in 2020, likely due to the absence of spectators caused by the pandemic.

- **La Liga:**
Every season exhibits significant differences, with all p-values below 0.05, suggesting a consistent home advantage. Even in 2020, despite pandemic-related disruptions, the differences remained significant, implying that home advantage persisted.

- **EPL:**
All seasons except 2020 show significant differences. The 2020 season has a p-value of 0.6022, indicating no significant difference. This suggests the pandemic had a strong impact, reducing the home advantage.

- **Bundesliga:**
Significant differences are observed in all seasons except for 2019 and 2020, with p-values of 0.6049 and 0.0276 respectively. The lack of significance in 2019 suggests variability even before the pandemic, while 2020 results highlight the pandemic's effect.

- **Serie A:**
Significant differences are found in all seasons except 2019 (p = 0.2219) and 2020 (p = 0.1086). This indicates that home advantage was not significant in these years, especially during the pandemic.

- **RFPL:**
Several seasons, including 2015, 2017, 2019, and 2020, show no significant differences. This variability suggests that the home advantage fluctuated, with the pandemic further diminishing it in 2020.

#### Conclusion with that new statistical method :
The analysis demonstrates that the impact of COVID-19 on home advantage was not uniform across all leagues. Some leagues, like La Liga, maintained significant differences even during the pandemic. However, others, such as Serie A, RFPL, and the Bundesliga, exhibited a clear reduction in home advantage in 2020. The data indicates that COVID-19 had a substantial impact on football matches in 2020, neutralizing the traditional home advantage in several cases.

While the new findings largely support the previous conclusion that home advantage diminished during the COVID-19 seasons, they also introduce the same important nuances that in the wilcoxon study of Mathieu Acher's work. The first observation suggested a general trend of reduced home advantage across the board, but the current analysis reveals that this was not uniformly the case for all leagues. For example, La Liga maintained significant differences in home and away performances even in 2020, suggesting that the impact of the pandemic on home advantage was not as pronounced there. This nuanced view indicates that while the pandemic did influence home advantage in many leagues, the extent of its impact varied, highlighting league-specific factors and suggesting that the generalization of diminished home advantage might need reconsideration. As Mathieu Acher's work says, "statistical results show that the COVID period had variable effect" within the leagues.

---

### 3rd change - Extended dataset : Adding the results of 2021, 2022 et 2023

In the reproduction, we used the same dataset as Mathieu Acher's study, covering the years from 2014 to 2020. To assess whether the observed trends persist beyond this period, we extended the dataset to include data from the 2021 to 2023 seasons. This extension allows us to explore whether the patterns identified in the original study, particularly the impact of external factors like the COVID-19 pandemic on home advantage, continued, diminished, or evolved in subsequent years. By incorporating this additional data, we aim to provide a more comprehensive understanding of the dynamics at play and evaluate the long-term implications of the findings.

#### Table

We added the `./replicabilite/more_seasons/scrap _2023.py` file, which allows us to retrieve all team data from 2014 to 2023 for both home and away games. The results are stored in the CSV file `./replicabilite/more_seasons/understat_team_stats_home_away.csv`.

We replicated the table using `./replicabilite/more_seasons/reproduce_diff_points_2023.py`, which allows us to observe the differences in points and xPoints between seasons for all leagues. The resulting graph is `./replicabilite/more_seasons/results/diff_points_xpoints.png`.

In [7]:
from IPython.display import display, HTML

# Créez du code HTML pour afficher les images côte à côte
html_code = """
<div style="display: flex; justify-content: center;">
    <img src="replicabilite/more_seasons/results/diff_points_xpoints.png" style="max-width: 60%;" />
</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


By adding the data from the 2021, 2022, and 2023 seasons, we observe some new insights about the differences in points and expected points (xPoints) between home and away matches.

In Ligue 1, after the notable dip in home advantage during the 2020 season, where there was a negative difference in points (-3), the next seasons show a partial recovery. While the 2021 and 2022 seasons reflect the same 'diff points' than before Covid (if somewhat a bit lower : 138 and 114 points, respectively), in 2023, there is a notable lower difference between points scored at home vs away with only 45 points. However, the 'diff xPoints' of that season is as high as pre-Covid seasons which suggests that the impact on the performances of footballers is not only dependant on if they play with or without spectators. 

La Liga presents a more robust "recovery" but we have to take into account that the Covid did not completly change their performances : even though the advantage of playing at home was slightly less significant in season 2020 (135 points), the advantage was still there. The difference in points remains relatively stable from 2021 to 2023, with values ranging from 183 to 219 points. This indicates that the home advantage, has well recovered after season 2020, nearing or surpassing pre-pandemic figures.

For the English Premier League (EPL), the 2020 season exhibited a sharp decline in home advantage with a negative difference of -27 points, a stark contrast to other leagues. However, in 2021, the difference in points improves significantly to 102, and by 2022 and 2023, the differences further increase to 225 and 156 points, respectively. This suggests a strong rebound in home advantage post-pandemic, the situation becomes as it was pre-pandemic.

In the Bundesliga, the recovery appears steady. After the decline in 2020 (99 points), the difference in points increases consistently, reaching 159 in 2021 and 177 in 2022, before slightly dropping to 129 in 2023. 

Serie A shows a more gradual recovery. The difference in points continues to remain lower than the pre-pandemic period, with 42 points in 2021. However, there are 126 points in 2022 and, by 2023, the figure climbs to 150 points. That indicates a slow but steady return of home advantage, though still not matching earlier years' levels.

Finally, the Russian Premier League (RFPL) exhibits fluctuations. After the minimal difference in 2019 (-3 points) and a slight recovery in 2020 (126 points), the following seasons maintain moderate differences, with 81 points in 2021, 123 in 2022, and 156 in 2023. This pattern suggests a relatively stable but moderate home advantage post-pandemic. It ends by being even stronger than before Covid-19.

So, this suggests a gradual restoration of home advantage, although it hasn't yet returned to pre-pandemic levels. 

In conclusion, the extended dataset from the 2021 to 2023 seasons provides a nuanced view of the home advantage in football across different leagues post-COVID-19. While some leagues like La Liga, the EPL and the Bundesliga demonstrate a strong and swift recovery in home advantage, returning to or even surpassing pre-pandemic levels, others such as Ligue 1 and Serie A show a more gradual or incomplete rebound. The RFPL, on the other hand, exhibits fluctuations but ultimately suggests a strengthening of home advantage compared to pre-pandemic times.

#### Graphs

We then replicated the graphs using `./replicabilite/more_seasons/graphs_par_ligue_2023.py`  which allows us to observe the evolution of points earned and expected points both at home and away for all leagues from 2014 to 2023. The outputs are stored in the `./replicabilite/more_seasons/results/evolutions_par_ligue` folder, with one graph per league.

In [6]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: center; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/evolutions_par_ligue/evolution_points_Bundesliga.png" width="60%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/evolutions_par_ligue/evolution_points_EPL.png" width="60%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/evolutions_par_ligue/evolution_points_La_liga.png" width="60%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/evolutions_par_ligue/evolution_points_Ligue_1.png" width="60%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/evolutions_par_ligue/evolution_points_RFPL.png" width="60%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/evolutions_par_ligue/evolution_points_Serie_A.png" width="60%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


For Bundesliga, EPL and La Liga, points evolution return to their situation pre-Covid.

As for Serie A, after a slight dip in 2021 where the advantage of playing at home stay less present, from 2022, the advantage is greater.

In Ligue 1, points evolution goes back to normal in 2021 and 2022 but the difference between home vs away is lower in 2023 whereas the xPoints do not show that trend.

Finally, in RFPL, the advantage of playing at home is steadily greater than in pre-pandemic seasons.


#### Statistical tests

##### Non-parametrical Wilcoxon Signed-Rank test 

We replicated the Wilcoxon test with the function *wilcoxon* from the python library and the Cohen's d with a function that we made by ourselves. The code is done in the `./replicabilite/more_seasons/wilcoxon_with_undestat_2023.py` file and the results are saved in the file `./replicabilite/more_seasons/results/more_seasons_wilcoxon.png`.

In [8]:
from IPython.display import display, HTML

html_code = """
<img src="./replicabilite/more_seasons/results/more_seasons_wilcoxon.png" style="height: auto;" />
"""

display(HTML(html_code))


**Interpretation of Results from the Extended Dataset**

1. **Ligue 1:**
Significant results for both Cohen’s d and Wilcoxon p-values are observed in 2020, indicating a disruption to home advantage during the COVID-19 period. Surprisingly, 2023 also shows significant results for Wilcoxon, suggesting that the hypothesis of home advantage is not verified even with the return of spectators. This season seems to be an exception as we did not observed the same results in 2021 and 2022 but this points to the possibility that other factors, such as changes in team tactics or shifts in crowd dynamics, influenced the outcomes in 2023.

2. **EPL:**
Significant results in 2020 for both Cohen’s d and Wilcoxon p-values confirm a reduction in home advantage during the pandemic. However, no significant results are observed for 2023, suggesting a return to more balanced conditions and the re-establishment of home advantage.

3. **La Liga:**
Consistent lack of significant results across all seasons indicates that home advantage remained stable throughout the studied period, even during the COVID-19 era. This suggests that crowd presence or absence had minimal impact on performance in this league.

4. **Bundesliga:**
Significant results in 2019 for both Cohen’s d and Wilcoxon p-values highlight disruptions to home advantage during the hybrid 2019-2020 season. However, the absence of significant results for next seasons, including 2020, suggests that the home advantage is indeed due to spectators.

5. **Serie A:**
Significant results in multiple seasons (2017, 2019, 2020, 2021) for points indicate consistent disruptions to home advantage, extending beyond the COVID-19 period. The lack of significant xPoints or xG results implies that these disruptions are likely due to league-specific dynamics rather than the absence of spectators.

6. **RFPL:**
Significant results for points in 2015, 2017, and 2019 highlight variability in home advantage that predates the COVID-19 pandemic. The absence of significant results in subsequent seasons suggests that the pandemic and the return of spectators had little to no effect on the league’s home advantage dynamics.

**Comparison with Previous Results (2014–2020)**

- In **EPL**, results from 2020 and following seasons reaffirm the conclusions of the study: home advantage was disrupted during the COVID-19 period and is back to normal after the pandemic.

- In **Ligue 1**, the results from 2020, 2021 and 2022 confirm the hypothesis : home advantage was interrupted during Covid and is then back. However, the significant results for 2023 in Ligue 1 contrast with expectations, as home advantage is still not observed despite the return of spectators. This challenges the notion that crowd presence alone dictates home advantage.

- For **Bundesliga**, the results are consistent with the earlier conclusions. The disruption observed in 2019 aligns with the hybrid nature of the season, while next seasons indicate a return to normal dynamics.

- **La Liga** continues to exhibit stability in home advantage, confirming prior observations that the league is less affected by external factors like crowd dynamics.

- **Serie A** and **RFPL** maintain patterns of variability in home advantage across seasons, which aligns with previous findings. These results suggest that the pandemic had a limited impact on home advantage in these leagues, with disruptions more likely attributable to other league-specific factors.

**Implications for Home Advantage and the Role of Spectators**

The extended dataset through 2023 provides critical insights into the dynamics of home advantage. While leagues like EPL and Bundesliga seem to have reverted to pre-pandemic dynamics, Ligue 1’s significant results for 2023 indicate that home advantage is still not fully re-established, even with spectators present. This could suggest that the influence of crowds may vary not only by league but also by season. The disruption of home advantage during the COVID-19 period in leagues like Ligue 1 and EPL supports the hypothesis that crowd presence plays a significant role. However, the lack of significant results for La Liga and the continued variability in Serie A and RFPL suggest that other factors—such as tactics, travel conditions, fan behavior or psychological influences—also contribute significantly to home advantage.


#### Non-parametrical Mann–Whitney U test

We replicated the Mann-Whitney U test with the function *mannwhitneyu* from the python library. The code is done in the `replicabilite/more_seasons/mannwhitneyu_2023.py` and the results are saved in the folder `replicabilite/more_seasons/results/tableau_ligues`, with one graph per league.

In [10]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: center; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/tableau_ligues/mann_whitney_Bundesliga.png" width="60%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/tableau_ligues/mann_whitney_EPL.png" width="60%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/tableau_ligues/mann_whitney_La_liga.png" width="60%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/tableau_ligues/mann_whitney_Ligue_1.png" width="60%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/tableau_ligues/mann_whitney_RFPL.png" width="60%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: center; margin-bottom: 20px">
        <img src="./replicabilite/more_seasons/results/tableau_ligues/mann_whitney_Serie_A.png" width="60%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


**Result analysis per league**

1. **Bundesliga**
    - The 2020 season shows significant differences from many pre-COVID seasons , confirming a sharp drop in home advantage due to pandemic conditions.
    - By 2021, home advantage began returning to levels comparable to those seen before the pandemic even if there is no significant difference with 2020.
    - By 2022 and 2023, the lack of significant differences with pre-COVID seasons confirms a full return to normal.

2. **EPL**
    - The 2020 season stands out as a clear anomaly. Significant differences between 2020 and subsequent seasons (2021, 2022, and 2023) confirm that the conditions during 2020 were unique.
    - The lack of significant differences between 2021 and the pre-COVID seasons (2014-2019) indicates a return to normal levels of home advantage starting in 2021. This suggests that the disruptions of 2020 did not have lasting effects on the long-term trend.
    - Significant differences between 2021 and 2020 emphasize that the extreme drop in home advantage in 2020 was temporary.
    - The 2023 season showed significant differences with 2015, 2018, 2019, and 2020. This could indicate a subtle shift in home advantage dynamics compared to these specific years. However, the difference with 2020 reinforces the notion that 2023 is not part of the COVID-induced anomaly and differences with 2015, 2018, and 2019 may point to evolving factors, such as tactical changes, improved away team preparation, or other external influences.

3. **La Liga**
    - Significant differences between 2020 and multiple pre-COVID seasons highlight the strong impact of the pandemic on home advantage.
    - Differences between 2021 and 2015 indicate a gradual normalization, but residual effects from the pandemic persisted longer than in some other leagues.
    - By 2023, the lack of significant differences with pre-COVID seasons suggests that La Liga has returned to pre-pandemic home advantage levels.

4. **Ligue 1**
    - The 2020 season in Ligue 1 exhibits significant differences compared to several pre-COVID seasons. This highlights a strong reduction in home advantage during the pandemic.
    - Significant differences between 2020 and 2021, as well as between 2020 and 2023, indicate a recovery of home advantage in the seasons following the pandemic.
    - By 2023, the lack of significant differences with pre-COVID seasons indicates a return to typical home advantage levels in Ligue 1.

5. **RFPL**
    - The RFPL did not exhibit any statistically significant differences in home advantage across the seasons analyzed. This suggests a stable trend in home advantage, with no clear impact from the pandemic or other external factors.

6. **Serie A**
    - The 2020 season shows significant differences compared to 2015 and 2016, highlighting the disruptive impact of COVID-19 on home advantage. The 2021 season also shows differences with 2015, indicating that some residual effects from the pandemic may have persisted for a short time.
    - Beyond 2021, the lack of significant differences with pre-COVID seasons suggests a stabilization in home advantage, returning to typical pre-pandemic levels.

**Conclusion for all leagues**

This study shows, for most leagues, a significant disruption in home advantage in 2020, confirming the impact of pandemic conditions such as empty stadiums.
Across the EPL, Serie A, La Liga, Bundesliga, and Ligue 1, home advantage largely returned to pre-COVID levels by 2022-2023, with only minor variations in transitional periods across leagues.
Unlike the other leagues, the RFPL did not exhibit any significant differences, maintaining a stable home advantage throughout the pandemic and post-pandemic periods.
In the EPL, the 2023 season shows significant differences with pre-COVID seasons and 2020, suggesting unique dynamics not due to the presence of spectators.



#### Conclusion with an extended dataset

Overall, this analysis highlights the impact of the pandemic on home advantage, when the differences of results at home vs away were much lower. It confirms the findings of Mathieu Acher's study. The presence of spectators affect team performances because the following seasons returned with a better scoring at home than away. However, some dip in the points in 2023 may suggest that spectators presence is not the only source of influence to their scoring.

---

### 4th change - SciPy version

We also performed the Mann-Whitney U test using the updated version of SciPy, which no longer supports one-sided tests but only allows for two-sided tests. This methodological change has a significant impact on the conclusions we can draw. 

In [1]:
from IPython.display import display, HTML

# Créer du code HTML pour afficher les images côte à côte sur une même ligne
html_code = """
<div style="display: flex; justify-content: space-between; flex-wrap: wrap;">

    <!-- Bundesliga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/last_version_python/results/tableau_ligues/mann_whitney_Bundesliga.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_bundesliga.png" width="40%" />
    </div>

    <!-- EPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/last_version_python/results/tableau_ligues/mann_whitney_EPL.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_epl.png" width="40%" />
    </div>

    <!-- La liga -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/last_version_python/results/tableau_ligues/mann_whitney_La_liga.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_liga.png" width="40%" />
    </div>

    <!-- Ligue 1 -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/last_version_python/results/tableau_ligues/mann_whitney_Ligue_1.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_ligue1.png" width="40%" />
    </div>

    <!-- RFPL -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/last_version_python/results/tableau_ligues/mann_whitney_RFPL.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_rfpl.png" width="40%" />
    </div>

    <!-- Serie A -->
    <div style="display: flex; flex-direction: row; align-items: center; justify-content: space-between; margin-bottom: 20px">
        <img src="./replicabilite/last_version_python/results/tableau_ligues/mann_whitney_Serie_A.png" width="45%" />
        <img src="./results_acherm/mann_whitney_ligues/mannw_seriea.png" width="40%" />
    </div>

</div>
"""

# Afficher les images côte à côte
display(HTML(html_code))


In the original analysis, the use of one-sided tests allowed us to specifically test for a directional effect—whether home advantage was significantly reduced during the COVID-19 period. This approach aligns closely with the hypothesis under investigation and provides more precise results regarding the impact of the pandemic on home advantage.  

However, with the updated version of SciPy requiring two-sided tests, the analysis now assesses whether there is any difference (in either direction) between the distributions being compared, rather than focusing on a specific direction of change. As a result, the statistical power to detect directional effects is reduced, and in some cases, we do not reach the same conclusions as with the one-sided test. This limitation could obscure subtle directional effects, such as the reduction in home advantage, which were previously detectable.  

#### Results of the Analysis
If this analysis were conducted as a standalone study, the results would suggest the following:  

1. **Bundesliga:**  
The lack of significant p-values for comparisons between 2019 and earlier seasons (2014, 2016, and 2018) implies that the disruptions to home advantage observed in 2019 may not be as strong as hypothesized. However, significant comparisons involving 2020 remain, indicating that the COVID-19 season had a measurable impact, likely due to the absence of fans in stadiums.  

2. **EPL:**  
The absence of significant results for 2019 vs. 2014, 2020 vs. 2015, and 2020 vs. 2019 suggests a more variable effect of the pandemic. While home advantage disruptions in 2020 are evident in comparisons with earlier seasons (e.g., 2014–2018), the results point to inconsistencies in how the home advantage was affected across seasons.  

3. **La Liga:**  
The non-significant results for 2020 vs. 2015 and 2020 vs. 2019 suggest that La Liga's home advantage remained relatively stable during the COVID-19 period. These findings reinforce the idea that this league was less influenced by the absence of fans compared to others.  

4. **Ligue 1:**  
Significant results remain consistent across comparisons, indicating that home advantage was clearly disrupted during the COVID-19 period. This league shows the most robust evidence for a pandemic-related effect on home performance.  

5. **RFPL:**  
No significant differences were observed, suggesting that home advantage in the Russian Premier League was not meaningfully affected during the COVID-19 period. This aligns with findings that this league exhibits less variability in home advantage across seasons.  

6. **Serie A:**  
The lack of significant results for 2020 vs. 2015 and 2020 vs. 2016 suggests that the disruptions to home advantage in 2020 were not uniformly observed across all comparisons. Nevertheless, significant results for comparisons with other seasons (e.g., 2017–2019) indicate that the impact of the pandemic was still present, though less pronounced.  

#### Comparison with the One-Sided Test Results
When compared to the original study that used one-sided tests, several differences emerge:  

- For **Bundesliga**, the previously significant results for 2019 vs. 2014, 2019 vs. 2016, and 2019 vs. 2018 are now non-significant, weakening the evidence that the hybrid 2019 season disrupted home advantage. However, the significant results for 2020 remain, consistent with the original conclusion that COVID-19 impacted home advantage in that season.  
- For **EPL**, the loss of significance for 2019 vs. 2014, 2020 vs. 2015, and 2020 vs. 2019 diminishes the strength of conclusions about disruptions to home advantage in 2019 and parts of 2020. This suggests that the evidence for a COVID-19 effect is less consistent under two-sided tests.  
- In **La Liga**, the non-significant results for 2020 vs. 2015 and 2020 vs. 2019 reinforce the idea that home advantage was not disrupted, aligning with the conclusions of the previous statistical tests.  
- For **Ligue 1** and **RFPL**, the results remain consistent with the one-sided tests, suggesting that the conclusions drawn in the original study are robust for these leagues.  
- In **Serie A**, the new non-significant results for 2020 vs. 2015 and 2020 vs. 2016 suggest a reduced impact of COVID-19 on home advantage in those comparisons, though significant results for other seasons still support the presence of disruptions during the pandemic.  

The shift from one-sided to two-sided tests reveals that some previously identified effects might be more nuanced or less robust than initially thought.  

#### Conclusion for another version of SciPy

This analysis highlights the sensitivity of the study’s conclusions to the choice of statistical method. The original study, which relied on one-sided tests, was more aligned with the directional hypothesis (a reduction in home advantage during COVID-19). The use of two-sided tests introduces additional scrutiny by testing for deviations in both directions, leading to differences in significance for some season comparisons.  

For the replicability of the study, this indicates:  

- **Consistency of Results:** The differences observed here suggest that conclusions about the impact of COVID-19 on home advantage may vary depending on the statistical approach used. While many findings remain consistent (e.g., Ligue 1, RFPL), the variability in other leagues (e.g., Bundesliga, EPL, Serie A) underscores the importance of methodological consistency for replicability.  
- **Reevaluation of Findings:** The results from two-sided tests indicate that some effects initially observed might not be as robust as previously thought. This necessitates caution in generalizing findings and emphasizes the need for replication with different statistical methods to validate conclusions.  

In summary, while the original conclusions remain valid in many cases, the differences highlighted here demonstrate that methodological choices significantly influence outcomes. This methodological change highlights the importance of consistent statistical tools in longitudinal studies and underlines how updates to software packages can influence the replicability of results and conclusions. Care must be taken to adapt hypotheses and interpretations to the constraints imposed by the new testing methods.

---

### Conclusion of replicability



Through the replication of the study, we've been able to improve the quality of our analysis by testing different changes to the protocol. Our findings reinforce the initial hypothesis: in general, the presence of supporters has a significant impact on team performance. The methods we applied provided clearer evidence that  during the unique circumstances of the pandemic, where we had no supporters, the home advantage was lower than usual. It shows that this advantage is strongly influenced by whether or not fans are in the stadium.

That said, there’s still room to explore other methods that could further enhance our understanding. For instance, using a different dataset might help confirm if the results hold up across other leagues or periods. Another promising avenue could be shifting the focus of our statistical tests from individual matches to team-level performance. This might reveal new patterns or insights about how specific teams adapt to changes in crowd presence.

In summary, while this replication effort has successfully confirmed key findings and improved the analysis, it also highlights opportunities for further research. Testing additional methods could provide even deeper insights into the role that supporters play in influencing game outcomes, making this a valuable area for ongoing investigation.