***Predicting GDP/GNP Growth With Football-related Metrics***

***Primary Research Question***

To what extent can football-related metrics (e.g. FIFA rankings, tournament participation, football infrastructure in a country) predict year-over-year GDP and GNP growth across countries where football is the biggest sport?

***Organization:***

**Step 1: Defining Footballing Countries**

- First we define countries where football is culturally significant

**Step 2: Data Collection**

- Football Metrics
- Economic Metrics
    
**Step 3: Data Preprocessing**

- Cleaning and merging data into one dataset with GDP and GNP as target variables
    
**Step 4: Predictive Analysis**

- Regression model to predict GDP/GNP
- Classification model to predict whether GDP/GNP will increase or decrease
    
**Step 5: Analysis**

- Can football-related metrics be significant predictors of GDP growth?
- Random Forest can help find important features with its criterion hyperparameter
    
**Step 6: Findings**

- Key takeaways"
- Findings for future research

***Step 1: Defining football***

In the first step of our project, we find the top 10 countries where football is the most popular sport. 

For this metric, we build upon research done by [Ticketgum](https://www.ticketgum.com/blog/most-football-crazy-countries). In order to find the countries that are the craziest about football, Ticketgum performed research on 42 countries on variables including:

- Number of football stadiums and their capacities
- Match attendance rates
- Total market value
- Interest in the World Cup
- Domestic broadcast deal values

Ticketgum took these variables and used a weighted average to create an index for the 42 countries. Here are the rankings that Ticketgum found:

1. England (8.37 out of 10)
2. Spain (7.83 out of 10)
3. Germany (7.83 out of 10)
4. Brazil (7.60 out of 10)
5. Italy  (7.52 out of 10)
6. Argentina (7.17 out of 10)
7. Mexico (6.71 out of 10)
8. United States (6.71 out of 10)
9. France (6.63 out of 10)
10. Saudi Arabia (5.74 out of 10)

We decided to use the findings of this study as the metric for the countries where football is the most popular sport because of the criteria. 

We wanted to make sure that the criteria used to find the countries did not overlap with the criteria we were going to use for the following analysis. 

***Step 2: Data Collection***

**Football Metrics**


In [20]:
!pip install pandas
import pandas as pd

[0m

In [3]:
# 2024
# Load your dataset 
fifa_ranking_2024 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2024.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2024['Team'] = fifa_ranking_2024['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2024 Cleaned.csv'
fifa_ranking_2024.to_csv(cleaned_file_path, index=False)

fifa_ranking_2024 = fifa_ranking_2024.drop(columns=['Unnamed: 6', 'Unnamed: 7', '+/-', 'Match window'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2024)

      RK                      Team  Total Points  Previous Points
0      1                 Argentina       1883.50          1889.02
1      2                    France       1859.85          1851.92
2      3                     Spain       1844.33          1836.42
3      4                   England       1807.83          1817.28
4      5                    Brazil       1784.37          1772.02
..   ...                       ...           ...              ...
205  206  Turks and Caicos Islands        803.98           802.81
206  207    British Virgin Islands        780.30           790.63
207  208         US Virgin Islands        779.71           792.15
208  209                  Anguilla        769.31           782.08
209  210                San Marino        737.04           746.05

[210 rows x 4 columns]


In [4]:
# 2023
# Load your dataset 
fifa_ranking_2023 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2023.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2023['Team'] = fifa_ranking_2023['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2024 Cleaned.csv'
fifa_ranking_2023.to_csv(cleaned_file_path, index=False)

fifa_ranking_2023 = fifa_ranking_2023.drop(columns=['+/-', 'Match window', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2023)

      RK                      Team  Total Points  Previous Points
0      1                 Argentina       1855.20          1855.20
1      2                    France       1845.44          1845.44
2      3                   England       1800.05          1800.05
3      4                   Belgium       1798.46          1798.46
4      5                    Brazil       1784.09          1784.09
..   ...                       ...           ...              ...
201  202                   Bahamas        835.81           835.81
202  203             Liechtenstein        833.01           833.01
203  204                 Sri Lanka        822.03           822.03
204  205                      Guam        821.91           821.91
205  206  Turks and Caicos Islands        818.57           818.57

[206 rows x 4 columns]


In [5]:
# 2022
# Load your dataset 
fifa_ranking_2022 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2022.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2022['Team'] = fifa_ranking_2022['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2022 Cleaned.csv'
fifa_ranking_2022.to_csv(cleaned_file_path, index=False)

fifa_ranking_2022 = fifa_ranking_2022.drop(columns=['+/-', 'Match window', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2022)

      RK                    Team  Total Points  Previous Points
0      1                  Brazil       1840.77          1841.30
1      2               Argentina       1838.38          1773.88
2      3                  France       1823.39          1759.78
3      4                 Belgium       1781.30          1816.71
4      5                 England       1774.19          1728.47
..   ...                     ...           ...              ...
206  207               Sri Lanka        825.25           825.25
207  208       US Virgin Islands        823.97           823.97
208  209  British Virgin Islands        809.32           809.32
209  210                Anguilla        790.74           790.74
210  211              San Marino        763.15           762.22

[211 rows x 4 columns]


In [6]:
# 2021
# Load your dataset 
fifa_ranking_2021 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2021.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2021['Team'] = fifa_ranking_2021['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2021 Cleaned.csv'
fifa_ranking_2021.to_csv(cleaned_file_path, index=False)

fifa_ranking_2021 = fifa_ranking_2021.drop(columns=['+/-', 'Match window', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2021)

      RK                    Team  Total Points  Previous Points
0      1                 Belgium       1828.45          1828.45
1      2                  Brazil       1826.35          1826.35
2      3                  France       1786.15          1786.15
3      4                 England       1755.52          1755.52
4      5               Argentina       1750.51          1750.51
..   ...                     ...           ...              ...
205  206                    Guam        838.33           838.33
206  207       US Virgin Islands        816.13           816.13
207  208  British Virgin Islands        812.94           812.94
208  209                Anguilla        792.34           792.34
209  210              San Marino        780.33           780.33

[210 rows x 4 columns]


In [7]:
# 2020
# Load your dataset 
fifa_ranking_2020 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2020.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2020['Team'] = fifa_ranking_2020['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2020 Cleaned.csv'
fifa_ranking_2020.to_csv(cleaned_file_path, index=False)

fifa_ranking_2020 = fifa_ranking_2020.drop(columns=['+/-', 'Match window', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2020)

      RK                    Team  Total Points  Previous Points
0      1                 Belgium          1780             1780
1      2                  France          1755             1755
2      3                  Brazil          1743             1743
3      4                 England          1670             1670
4      5                Portugal          1662             1662
..   ...                     ...           ...              ...
205  206               Sri Lanka           853              853
206  207       US Virgin Islands           844              844
207  208  British Virgin Islands           842              842
208  209                Anguilla           821              821
209  210              San Marino           810              810

[210 rows x 4 columns]


In [8]:
# 2019
# Load your dataset 
fifa_ranking_2019 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2019.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2019['Team'] = fifa_ranking_2019['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2019 Cleaned.csv'
fifa_ranking_2019.to_csv(cleaned_file_path, index=False)

fifa_ranking_2019 = fifa_ranking_2019.drop(columns=['+/-', 'Match window', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2019)

      RK                    Team  Total Points  Previous Points
0      1                 Belgium          1765             1765
1      2                  France          1733             1733
2      3                  Brazil          1712             1712
3      4                 England          1661             1661
4      5                 Uruguay          1645             1645
..   ...                     ...           ...              ...
205  205                 Eritrea           856              856
206  207       US Virgin Islands           844              844
207  208  British Virgin Islands           842              842
208  209              San Marino           824              824
209  210                Anguilla           821              821

[210 rows x 4 columns]


In [9]:
# 2018
# Load your dataset 
fifa_ranking_2018 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2018.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2018['Team'] = fifa_ranking_2018['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2018 Cleaned.csv'
fifa_ranking_2018.to_csv(cleaned_file_path, index=False)

fifa_ranking_2018 = fifa_ranking_2018.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2018)

      RK                      Team  Total Points  Previous Points
0      1                   Belgium          1727             1727
1      2                    France          1726             1726
2      3                    Brazil          1676             1676
3      4                   Croatia          1634             1634
4      5                   England          1631             1631
..   ...                       ...           ...              ...
206  207    British Virgin Islands           867              867
207  208  Turks and Caicos Islands           864              864
208  208                  Anguilla           864              864
209  210                   Bahamas           858              858
210  211                San Marino           854              854

[211 rows x 4 columns]


In [10]:
# 2017
# Load your dataset 
fifa_ranking_2017 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2017.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2017['Team'] = fifa_ranking_2017['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2017 Cleaned.csv'
fifa_ranking_2017.to_csv(cleaned_file_path, index=False)

fifa_ranking_2017 = fifa_ranking_2017.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2017)

      RK       Team  Total Points  Previous Points
0      1    Germany          1602             1602
1      2     Brazil          1483             1483
2      3   Portugal          1358             1358
3      4  Argentina          1348             1348
4      5    Belgium          1325             1325
..   ...        ...           ...              ...
206  206    Bahamas             0                0
207  206    Eritrea             0                0
208  206  Gibraltar             0                0
209  206    Somalia             0                0
210  206      Tonga             0                0

[211 rows x 4 columns]


In [11]:
# 2016
# Load your dataset 
fifa_ranking_2016 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2016.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2016['Team'] = fifa_ranking_2016['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2016 Cleaned.csv'
fifa_ranking_2016.to_csv(cleaned_file_path, index=False)

fifa_ranking_2016 = fifa_ranking_2016.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2016)

      RK       Team  Total Points  Previous Points
0      1  Argentina          1634             1634
1      2     Brazil          1544             1544
2      3    Germany          1433             1433
3      4      Chile          1404             1404
4      5    Belgium          1368             1368
..   ...        ...           ...              ...
206  205   Djibouti             0                0
207  205    Somalia             0                0
208  205  Gibraltar             0                0
209  205      Tonga             0                0
210  205    Eritrea             0                0

[211 rows x 4 columns]


In [12]:
# 2015
# Load your dataset 
fifa_ranking_2015 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2015.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2015['Team'] = fifa_ranking_2015['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2015 Cleaned.csv'
fifa_ranking_2015.to_csv(cleaned_file_path, index=False)

fifa_ranking_2015 = fifa_ranking_2015.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2015)

      RK       Team  Total Points  Previous Points
0      1    Belgium          1494             1440
1      2  Argentina          1455             1383
2      3      Spain          1370             1287
3      4    Germany          1347             1388
4      5      Chile          1273             1288
..   ...        ...           ...              ...
204  204    Bahamas             0                0
205  204   Djibouti             0                0
206  204    Eritrea             0                8
207  204    Somalia             0                6
208  204      Tonga             0               17

[209 rows x 4 columns]


In [13]:
# 2014
# Load your dataset 
fifa_ranking_2014 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2014.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2014['Team'] = fifa_ranking_2014['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2014 Cleaned.csv'
fifa_ranking_2014.to_csv(cleaned_file_path, index=False)

# Drop unnecessary columns
fifa_ranking_2014 = fifa_ranking_2014.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2014)

      RK            Team  Total Points  Previous Points
0      1         Germany          1725             1725
1      2       Argentina          1538             1538
2      3        Colombia          1450             1450
3      4         Belgium          1417             1417
4      5     Netherlands          1374             1374
..   ...             ...           ...              ...
204  205  Cayman Islands             5                5
205  206        Djibouti             4                5
206  206    Cook Islands             4                4
207  208        Anguilla             2                2
208  209          Bhutan             0                0

[209 rows x 4 columns]


In [14]:
# 2013
# Load your dataset 
fifa_ranking_2013 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2013.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2013['Team'] = fifa_ranking_2013['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2013 Cleaned.csv'
fifa_ranking_2013.to_csv(cleaned_file_path, index=False)

# Drop unnecessary columns
fifa_ranking_2013 = fifa_ranking_2013.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned "Team" column
print(fifa_ranking_2013)

      RK                      Team  Total Points  Previous Points
0      1                     Spain          1507             1507
1      2                   Germany          1318             1318
2      3                 Argentina          1251             1251
3      4                  Colombia          1200             1200
4      5                  Portugal          1172             1172
..   ...                       ...           ...              ...
204  205              Cook Islands             5                5
205  206                  Anguilla             3                3
206  207                    Bhutan             0                0
207  207                San Marino             0                0
208  207  Turks and Caicos Islands             0                0

[209 rows x 4 columns]


In [15]:
# 2012
# Load your dataset 
fifa_ranking_2012 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2012.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2012['Team'] = fifa_ranking_2012['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2012 Cleaned.csv'
fifa_ranking_2012.to_csv(cleaned_file_path, index=False)

# Drop unnecessary columns
fifa_ranking_2012 = fifa_ranking_2012.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned dataset
print(fifa_ranking_2012)

      RK                      Team  Total Points  Previous Points
0      1                     Spain          1606             1564
1      2                   Germany          1437             1421
2      3                 Argentina          1290             1349
3      4                     Italy          1165             1169
4      5                  Colombia          1164             1110
..   ...                       ...           ...              ...
204  205                  Anguilla             4                4
205  206                Mauritania             3                3
206  207                    Bhutan             0                0
207  207                San Marino             0                0
208  207  Turks and Caicos Islands             0                0

[209 rows x 4 columns]


In [16]:
# 2011
# Load your dataset 
fifa_ranking_2011 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2011.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2011['Team'] = fifa_ranking_2011['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2011 Cleaned.csv'
fifa_ranking_2011.to_csv(cleaned_file_path, index=False)

# Drop unnecessary columns
fifa_ranking_2011 = fifa_ranking_2011.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned dataset
print(fifa_ranking_2011)

      RK         Team  Total Points  Previous Points
0      1        Spain          1564             1564
1      2  Netherlands          1365             1365
2      3      Germany          1345             1345
3      4      Uruguay          1309             1309
4      5      England          1173             1173
..   ...          ...           ...              ...
203  204   Mauritania             4                4
204  205  Timor-Leste             2                2
205  206      Andorra             0                0
206  206   Montserrat             0                0
207  206   San Marino             0                0

[208 rows x 4 columns]


In [17]:
# 2010
# Load your dataset 
fifa_ranking_2010 = pd.read_csv('Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2010.csv')

# Step 1: Clean the "Team" column to extract only the country name
# Remove extra text such as codes (e.g., ARG) and URLs
fifa_ranking_2010['Team'] = fifa_ranking_2010['Team'].str.replace(r'[A-Z]{3}\s*https.*', '', regex=True)

# Step 2: Save the cleaned DataFrame to a new CSV file
cleaned_file_path = 'Data sets/Fifa Rankings (2010-2024)/FIFA Ranking 2010 Cleaned.csv'
fifa_ranking_2010.to_csv(cleaned_file_path, index=False)

# Drop unnecessary columns
fifa_ranking_2010 = fifa_ranking_2010.drop(columns=['+/-', 'More'])

# Step 3: Display the cleaned dataset
print(fifa_ranking_2010)

      RK                 Team  Total Points  Previous Points
0      1                Spain          1887             1920
1      2          Netherlands          1723             1718
2      3              Germany          1485             1489
3      4               Brazil          1446             1493
4      5            Argentina          1338             1353
..   ...                  ...           ...              ...
202  203             Anguilla             0                0
203  203       American Samoa             0                0
204  203           San Marino             0                0
205  203           Montserrat             0                0
206  208  Sao Tome e Principe             0                0

[207 rows x 4 columns]


In [38]:
#Loading data for number of players produced by countries in Europe's top 5 leagues 
#From 2010
players2010 = pd.read_csv("Data sets/NumberPlayers/2010-2011.csv")
players2010['Nation'] = players2010['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2010.head())

#2011
players2011 = pd.read_csv("Data sets/NumberPlayers/2011-2012.csv")
players2011['Nation'] = players2011['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2011.head())

#2012
players2012 = pd.read_csv("Data sets/NumberPlayers/2012-2013.csv")
players2012['Nation'] = players2012['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2012.head())

#2013
players2013 = pd.read_csv("Data sets/NumberPlayers/2013-2014.csv")
players2013['Nation'] = players2013['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2013.head())

#2014
players2014 = pd.read_csv("Data sets/NumberPlayers/2014-2015.csv")
players2014['Nation'] = players2014['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2014.head())

#2015
players2015 = pd.read_csv("Data sets/NumberPlayers/2015-2016.csv")
players2015['Nation'] = players2015['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2015.head())

#2016
players2016 = pd.read_csv("Data sets/NumberPlayers/2016-2017.csv")
players2016['Nation'] = players2016['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2016.head())

#2017
players2017 = pd.read_csv("Data sets/NumberPlayers/2017-2018.csv")
players2017['Nation'] = players2017['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2017.head())

#2018
players2018 = pd.read_csv("Data sets/NumberPlayers/2018-2019.csv")
players2018['Nation'] = players2018['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2018.head())

#2019
players2019 = pd.read_csv("Data sets/NumberPlayers/2019-2020.csv")
players2019['Nation'] = players2019['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2019.head())

#2020
players2020 = pd.read_csv("Data sets/NumberPlayers/2020-2021.csv")
players2020['Nation'] = players2020['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2020.head())

#2021
players2021 = pd.read_csv("Data sets/NumberPlayers/2021-2022.csv")
players2021['Nation'] = players2021['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2021.head())

#2022
players2022 = pd.read_csv("Data sets/NumberPlayers/2022-2023.csv")
players2022['Nation'] = players2022['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2022.head())

#2023
players2023 = pd.read_csv("Data sets/NumberPlayers/2023-2024.csv")
players2023['Nation'] = players2023['Nation'].apply(lambda x: ' '.join(x.split()[1:]))
print(players2023.head())

   Rk   Nation  # Players      Min
0   1    Spain        366  482,043
1   2   France        315  479,081
2   3    Italy        295  432,371
3   4  Germany        214  307,004
4   5  England        192  262,464
   Rk   Nation  # Players      Min
0   1    Spain        371  498,299
1   2   France        309  471,749
2   3    Italy        297  415,552
3   4  Germany        223  308,675
4   5  England        195  274,593
   Rk   Nation  # Players      Min
0   1    Spain        367  517,994
1   2   France        316  477,803
2   3    Italy        286  390,798
3   4  Germany        229  305,124
4   5  England        179  235,848
   Rk   Nation  # Players      Min
0   1    Spain        362  526,958
1   2   France        298  435,352
2   3    Italy        277  385,095
3   4  Germany        254  314,257
4   5  England        164  233,055
   Rk   Nation  # Players      Min
0   1    Spain        415  526,786
1   2    Italy        361  359,181
2   3   France        346  422,882
3   4  Germany      

In [21]:
# Loading GDP data (2010-2023)
gdp = pd.read_csv("Data sets/GDP/GDP.csv")

# Dropping unnecessary columns
gdp = gdp.drop(columns=['Country Code', 'Indicator Name'])

# Setting float format globally to display full digits
pd.set_option('display.float_format', '{:.0f}'.format)

# Displaying the first few rows of the dataset
print(gdp.head())

FileNotFoundError: [Errno 2] No such file or directory: 'Data sets/GDP/GDP.csv'

In [22]:
# Loading GNI data (2010-2023)
gni = pd.read_csv("Data sets/GNIPerCapita/GNI Per Capita.csv")

# Dropping unnecessary columns
gni = gni.drop(columns=['Country Code', 'Indicator Name'])


# Displaying the first few rows of the dataset
print(gdp.head())

FileNotFoundError: [Errno 2] No such file or directory: 'Data sets/GNIPerCapita/GNI Per Capita.csv'