<a href="https://colab.research.google.com/github/arielwendichansky/Copa-America-Model-2024/blob/main/Copa_America_final_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**CopaAnalyzer: Unveiling the next Copa America Champion with ML**

In the heart of every passionate football enthusiast lies an insatiable desire to witness the beautiful game's unpredictable drama unfold on the grandest stages. As an Argentine, my love for football runs deep, particularly when the pride of my nation takes center stage, led by the GOAT Lionel Messi. The 2022 World Cup victory only fueled my passion and sparked a new endeavor—to create a predictive model that forecasts the next Copa America champion.

CopaAnalyzer utilizes logistic regression to peer into the forthcoming tournament's outcome. Drawing from three primary data sources (historical games data since 2020, FIFA rankings, and team skill) this model aims to unravel the intricacies of South American football dynamics.

This project serves as the culmination of my journey through a data analytics boot camp, showcasing the application of tools and techniques acquired along the way. From data wrangling to model evaluation, each step reflects a commitment to harnessing the power of data to unlock footballing insights.

Join me as we dive into the realm of CopaAnalyzer, where analytics meets anticipation, and together, let's celebrate the enduring magic of football.

Let the ball start rolling!

# Installs

In [None]:
! pip install selenium
! pip install requests beautifulsoup4 pandas

Collecting selenium
  Downloading selenium-4.21.0-py3-none-any.whl (9.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m44.1 MB/s[0m eta [36m0:00:00[0m
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.25.1-py3-none-any.whl (467 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m467.7/467.7 kB[0m [31m37.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.11.1-py3-none-any.whl (17 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl (10 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl (24 kB)
Collecting h11<1,>=0.9.0 (from wsproto>=0.14->trio-websocket~=0.9->selenium)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# Library to scrap info from different websites related to football.
! pip install LanusStats
! pip install --upgrade LanusStats

Collecting LanusStats
  Downloading LanusStats-1.5.1-py3-none-any.whl (27 kB)
Collecting mplsoccer (from LanusStats)
  Downloading mplsoccer-1.2.4-py3-none-any.whl (79 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.4/79.4 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting bs4 (from LanusStats)
  Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4, mplsoccer, LanusStats
Successfully installed LanusStats-1.5.1 bs4-0.0.2 mplsoccer-1.2.4


In [None]:
# Gemini model
!pip install -q -U google-generativeai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/158.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.8/158.8 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25h

# Libraries

In [None]:
# To know todays date
from datetime import date

# To handle the data
import pandas as pd
import numpy as np
import random
import joblib

# For web scraping
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
import requests as re
import LanusStats as ls

# To visualize the data
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# To preprocess the data and divide the data
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE

#metrics
from sklearn.metrics import accuracy_score, f1_score, precision_score,recall_score, classification_report, confusion_matrix, roc_auc_score

# Machine learning model
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Gemini packages
import pathlib
import textwrap
import google.generativeai as genai
from IPython.display import display
from IPython.display import Markdown
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

# Web Scraping

Disclaimer: The website from where the data is being scraped is continuously updating so this might change the data extracted

In [None]:
# URL of the webpage to scrape
url = 'https://www.ole.com.ar/copa-america/copa-america-2024-listas-convocados-argentina-messi_0_7KCGeSguIO.html'

# Send a GET request to the URL
response = re.get(url)

# Parse the HTML content of the webpage
soup = bs(response.text, 'html.parser')

# Find all div elements with class "custom-text"
news_items = soup.find_all("div", class_="custom-text")

# Create a dictionary to store the titles and content for each country
country_news = {}

# Initialize variables to store the current country and its player list
current_country = None
player_list = ""
skip_paragraph = False  # Flag to determine if the paragraph should be skipped

# Iterate through each news item
for item in news_items:
    # Find the h2 element to extract the title (country)
    title_element = item.find("h2")
    if title_element:
        # If there's a title, update the current country and reset the player list
        if current_country:
            country_news[current_country] = player_list.strip()
            player_list = ""
        current_country = title_element.text.strip()
        skip_paragraph = True  # Skip the first paragraph after the title
    else:
        # If there's no title, check if the paragraph should be skipped
        if skip_paragraph:
            skip_paragraph = False
            continue

        # Add the content to the player list
        content_paragraphs = item.find_all("p")
        player_list += "\n".join([p.text.strip() for p in content_paragraphs]) + "\n"

# Add the last country and its player list to the dictionary
if current_country:
    country_news[current_country] = player_list.strip()

# Print the dictionary with desired format
for country, players in country_news.items():
    print(country)
    print(players)
    print("----------------------")


GRUPO A
Se acerca la Copa América 2024 y las distintas selecciones definen sus planteles. Hasta acá, sólo Brasil y Ecuador anunciaron sus planteles completos con 26 jugadores para el torneo que se va a jugar en Estados Unidos. Antes y después, hubo otros seleccionados que brindaron nóminas preliminares, con jugadores por cortar como es el caso de Argentina. La fecha límite para presentar las listas es el 15 de junio.
----------------------
Argentina
Arqueros: Emiliano Martínez (Aston Villa); Franco Armani (River Plate) y Gerónimo Rulli (Ajax).
Defensores: Gonzalo Montiel (Nottingham Forest); Nahuel Molina (Atlético Madrid); Leonardo Balerdi (Olympique de Marsella); Cristian Romero (Tottenham); Germán Pezzella (Real Betis); Lucas Martínez Quarta (Fiorentina); Nicolás Otamendi (Benfica); Lisandro Martínez (Manchester United); Marcos Acuña (Sevilla); Nicolás Tagliafico (Lyon) y Valentín Barco (Brighton).
Volantes: Guido Rodríguez (Real Betis); Leandro Paredes (Roma); Alexis Mac Allister (

In [None]:
# Define a function to extract player names and teams from the player list
def extract_players_and_teams(player_list, country):
    players = []
    lines = player_list.split('\n')
    category = ""
    for line in lines:
        if line.strip() == "":
            continue
        if ":" in line:
            category = line.split(":")[0].strip()
            players_name = line.split(":")[1].strip()
            # Replace ' y ' with ';' and ',' with ';' to standardize delimiters
            players_name = players_name.replace(' y ', ';').replace(',', ';')
            player_list = players_name.split(';')
            # Clean up any leading/trailing whitespace
            for player in player_list:
                players.append((country, player.strip(), category))
    return players

player_data = []

# Iterate through each country and its player list
for country, players_list in country_news.items():
    if country.startswith('GRUPO'):
        continue
    if players_list:  # Ensure there's player data to process
        country_players = extract_players_and_teams(players_list, country)
        player_data.extend(country_players)
        print(f"{len(country_players)} players added for country: {country}")

# Create a DataFrame from the player data list
df = pd.DataFrame(player_data, columns=['Country', 'Name (Team)', 'Category'])
print(df)


29 players added for country: Argentina
16 players added for country: Perú
54 players added for country: Chile
27 players added for country: Canadá
29 players added for country: México
26 players added for country: Ecuador
47 players added for country: Venezuela
26 players added for country: Jamaica
27 players added for country: Estados Unidos
21 players added for country: Uruguay
27 players added for country: Panamá
28 players added for country: Bolivia
26 players added for country: Brasil
28 players added for country: Colombia
27 players added for country: Paraguay
27 players added for country: Costa Rica
        Country                          Name (Team)    Category
0     Argentina      Emiliano Martínez (Aston Villa)    Arqueros
1     Argentina          Franco Armani (River Plate)    Arqueros
2     Argentina               Gerónimo Rulli (Ajax).    Arqueros
3     Argentina  Gonzalo Montiel (Nottingham Forest)  Defensores
4     Argentina      Nahuel Molina (Atlético Madrid)  Defens

From the website where the list of player where extracted some players does not have a ',' to divide each. Therefore, cleaning the data to have each row with a unique player is necessary.

In [None]:
# Function to split rows with multiple players and extract team names
def split_players(row):
    players = row['Name (Team)'].split(') ')
    new_rows = []
    for player in players:
        if player:
            if not player.endswith(')'):
                player
            parts = player.split(' (')
            if len(parts) == 2:
                name, team = parts
                new_rows.append([row['Country'], name.strip(), team.strip(), row['Category']])
    return new_rows

# Apply the function and create a new DataFrame
new_rows = []
for _, row in df.iterrows():
    new_rows.extend(split_players(row))

# Convert new rows into a DataFrame
new_df = pd.DataFrame(new_rows, columns=['Country', 'Name','Team', 'Category'])

# Remove ')' and '.' from 'Team' column
new_df['Team'] = new_df['Team'].str.replace(r')', '')
new_df['Team'] = new_df['Team'].str.replace(r'.', '')

new_df.head()

Unnamed: 0,Country,Name,Team,Category
0,Argentina,Emiliano Martínez,Aston Villa,Arqueros
1,Argentina,Franco Armani,River Plate,Arqueros
2,Argentina,Gerónimo Rulli,Ajax,Arqueros
3,Argentina,Gonzalo Montiel,Nottingham Forest,Defensores
4,Argentina,Nahuel Molina,Atlético Madrid,Defensores


## Scraping data for teams without official list

In [None]:
for country in country_news:
  if country.startswith('GRUPO'):
    continue
  else:
    if country not in new_df['Country'].unique():

      print(country)



(Date 2024-06-05) These 4 countries have not presented the players who will play in the Copa America Edition 2024 yet (I will take the shortlist from their last international match).

(Date 2024-06-07) As mentioned, the page from where the data is being scraped is frequenty updated with new information. Therefore, the 4 countries in which the list of players was missing (Canada, Jamaica, Panama, and Uruguay) are already up-to-date. It will be possible to see what was my previous idea, but I will keep it with the official list of players.

### Uruguay

In [None]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

# URL of the webpage to scrape
url = 'https://www.vozdeamerica.com/a/uruguay-confirma-convocados-amistoso-mexico-copa-america/7640452.html'

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    print("Request successful.")
else:
    print("Failed to retrieve page:", response.status_code)
    exit()

# Parse the HTML content of the webpage
soup = bs(response.text, 'html.parser')

# Create a dictionary to store the positions and players for Uruguay
uruguay_squad = {}

# Find the section containing player positions
content_section = soup.find('div', id='article-content')

# List of valid positions to ensure only these are included
valid_positions = ['Porteros:', 'Defensas:', 'Mediocampistas:', 'Delanteros:']

if content_section:
    # Get all paragraphs in the content section
    paragraphs = content_section.find_all('p')

    current_position = None
    for paragraph in paragraphs:
        # Check if the paragraph contains a position
        position_tag = paragraph.find('strong')
        if position_tag:
            position_text = position_tag.text.strip()
            if position_text in valid_positions:
                current_position = position_text
                uruguay_squad[current_position] = []
        elif current_position:
            player_text = paragraph.text.strip()
            if player_text:  # Avoid adding empty strings
                uruguay_squad[current_position].append(player_text)

# Adding players missing in the list online
uruguay_squad['Mediocampistas:'].append('- Federico Valverde (Real Madrid)')
uruguay_squad['Defensas:'].append('- Ronald Araújo (Barcelona)')

# Prepare data for DataFrame
data = []
for position, players in uruguay_squad.items():
    for player in players:
        name_team = player.split(' (')
        name = name_team[0].strip('-').strip()
        team = name_team[1].strip(')')
        data.append(['Uruguay', name, team, position.strip(':')])

# Creating a dataframe for the Uruguay squad
df_uruguay = pd.DataFrame(data, columns=['Country', 'Name', 'Team', 'Category'])

# Display the DataFrame
print(df_uruguay)

Request successful.
    Country                Name                 Team        Category
0   Uruguay       Sergio Rochet        Internacional        Porteros
1   Uruguay       Santiago Mele               Junior        Porteros
2   Uruguay   Sebastián Cáceres              América        Defensas
3   Uruguay  José María Giménez   Atlético de Madrid        Defensas
4   Uruguay         Lucas Olaza            Krasnodar        Defensas
5   Uruguay     Mathías Olivera               Napoli        Defensas
6   Uruguay      Nahitan Nández           Al-Qadsiah        Defensas
7   Uruguay       Ronald Araújo            Barcelona        Defensas
8   Uruguay       Manuel Ugarte  París Saint-Germain  Mediocampistas
9   Uruguay        César Araújo         Orlando City  Mediocampistas
10  Uruguay   Rodrigo Bentancur            Tottenham  Mediocampistas
11  Uruguay   Federico Valverde          Real Madrid  Mediocampistas
12  Uruguay   Facundo Pellistri              Granada      Delanteros
13  Uruguay  M

### Panama

In [None]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

# URL of the webpage to scrape
url = 'https://www.tudn.com/futbol/copa-america-2024/copa-america-2024-seleccion-panama-convocatoria-para-amistosos-espana'

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    print("Request successful.")
else:
    print("Failed to retrieve page:", response.status_code)
    exit()

# Parse the HTML content of the webpage
soup = bs(response.text, 'html.parser')

# Create a dictionary to store the positions and players for Panama
panama_squad = {}

# Find the section containing player positions
content_sections = soup.find_all('div', class_='content-base articleBody col-span-full mb-8 tracking-[0.005em] sm:col-start-3 md:col-span-7 md:col-start-5 lg:col-span-6 lg:col-start-5')

# List of valid positions to ensure only these are included
valid_positions = ['Porteros', 'Defensas', 'Mediocampistas', 'Delanteros']


for section in content_sections:
    # Get all lists in the content section
    lists = section.find_all('li')
    current_position = None
    for li in lists:
        text = li.text.strip()
        if ':' in text:
            position = text.split(':')[0]
            if position in valid_positions:
                current_position = position
                panama_squad[current_position] = []
            players = text.split(':')[1].split('), ')
            for player in players:
                if current_position:
                    panama_squad[current_position].append(player)


# Prepare data for DataFrame
data = []
for position, players in panama_squad.items():
    for player in players:

        name_team = player.split(' (')
        name = name_team[0].strip()
        team = name_team[1].strip(')')
        data.append(['Panama', name, team, position.strip(':')])

# Creating a dataframe for the Panama squad
df_panama = pd.DataFrame(data, columns=['Country', 'Name', 'Team', 'Category'])

# Clean and display the DataFrame

df_panama['Team'] = df_panama['Team'].str.replace(r')', '')
df_panama['Team'] = df_panama['Team'].str.replace(r'(', '')
df_panama['Team'] = df_panama['Team'].str.replace(r'.', '')
df_panama['Team'] = df_panama['Team'].str.replace(r',', '')

# Function to keep only the first two words
def keep_first_two_words(team):
    words = team.split()
    if len(words) > 2:
        return ' '.join(words[:-1])
    else:
        return team


# Apply the function to the 'Team' column
df_panama['Team'] = df_panama['Team'].apply(keep_first_two_words)
print(df_panama)

Request successful.
   Country                 Name                  Team        Category
0   Panama     Orlando Mosquera      Maccabi Tel-Aviv        Porteros
1   Panama        Eddie Roberts      CA Independiente        Porteros
2   Panama       Andrés Andrade             LASK Linz        Defensas
3   Panama           Eric Davis             FC Kosice        Defensas
4   Panama      Michael Murillo    Olympique Marsella        Defensas
5   Panama       César Blackman     Slovan Bratislava        Defensas
6   Panama       Edgardo Fariña         Municipal GUA        Defensas
7   Panama      Roderick Miller        Turan Tovuz IK        Defensas
8   Panama          Orman Davis      CA Independiente        Defensas
9   Panama       Sergio Ramírez      CA Independiente        Defensas
10  Panama        Gabriel Brown             Dep Árabe        Defensas
11  Panama  José Luis Rodríguez          FC Famalicão  Mediocampistas
12  Panama       Édgar Bárcenas        Mazatlán FCMEX  Mediocampistas


### Canada

In order to use different tools, as well for facility, I will use Gemini AI to create the following lists

In [None]:
model = genai.GenerativeModel('gemini-pro')

In [None]:
response = model.generate_content(f''' For the following list of players from Canada create a dictionary (canada_players) with the following keys:
Country, Name, Team, Category.
ARQUEROS (4)
Maxime Crépeau - Portland Timbers
Thomas McGill - Brighton & Hove Albion FC
Dayne St. Clair - Minnesota United FC
Grégoire Swiderski - Girondins de Bordeaux B^
DEFENSORES (9)
Moïse Bombito - Colorado Rapids
Derek Cornelius - Malmö FF
Alphonso Davies - Bayern Munich
Luc de Fougerolles - Fulham FC
Kyle Hiebert - St. Louis CITY SC
Alistair Johnston - Celtic FC
Richie Laryea - Toronto FC
Kamal Miller - Portland Timbers
Dominick Zator - Korona Kielce
MEDIOCAMPISTAS (5)
Mathieu Choinière - CF Montréal
Stephen Eustáquio - FC Porto
Ismaël Koné - Watford FC
Jonathan Osorio - Toronto FC
Samuel Piette - CF Montréal
DELANTEROS (9)
Thelonius Bair - Motherwell
Charles-Andreas Brym - Sparta Rotterdam
Tajon Buchanan - Inter Milan
Jonathan David - LOSC Lille
Junior Hoilett - Aberdeen FC
Cyle Larin - RCD Mallorca
Liam Millar - FC Basel
Jacob Shaffelburg - Nashville SC
Iké Ugbo - ESTAC Troyes
''')
to_markdown(response.text)

> ```python
> canada_players = {
>     "Country": ["Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada", "Canada"],
>     "Name": ["Maxime Crépeau", "Thomas McGill", "Dayne St. Clair", "Grégoire Swiderski", "Moïse Bombito", "Derek Cornelius", "Alphonso Davies", "Luc de Fougerolles", "Kyle Hiebert", "Alistair Johnston", "Richie Laryea", "Kamal Miller", "Dominick Zator", "Mathieu Choinière", "Stephen Eustáquio", "Ismaël Koné", "Jonathan Osorio", "Samuel Piette", "Thelonius Bair", "Charles-Andreas Brym"],
>     "Team": ["Portland Timbers", "Brighton & Hove Albion FC", "Minnesota United FC", "Girondins de Bordeaux B", "Colorado Rapids", "Malmö FF", "Bayern Munich", "Fulham FC", "St. Louis CITY SC", "Celtic FC", "Toronto FC", "Portland Timbers", "Korona Kielce", "CF Montréal", "FC Porto", "Watford FC", "Toronto FC", "CF Montréal", "Motherwell", "Sparta Rotterdam"],
>     "Category": ["ARQUEROS", "ARQUEROS", "ARQUEROS", "ARQUEROS", "DEFENSORES", "DEFENSORES", "DEFENSORES", "DEFENSORES", "DEFENSORES", "DEFENSORES", "DEFENSORES", "DEFENSORES", "DEFENSORES", "MEDIOCAMPISTAS", "MEDIOCAMPISTAS", "MEDIOCAMPISTAS", "MEDIOCAMPISTAS", "MEDIOCAMPISTAS", "DELANTEROS", "DELANTEROS"]
> }
> ```

In [None]:
response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "```python\ncanada_players = {\n    \"Country\": [\"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\", \"Canada\"],\n    \"Name\": [\"Maxime Cr\u00e9peau\", \"Thomas McGill\", \"Dayne St. Clair\", \"Gr\u00e9goire Swiderski\", \"Mo\u00efse Bombito\", \"Derek Cornelius\", \"Alphonso Davies\", \"Luc de Fougerolles\", \"Kyle Hiebert\", \"Alistair Johnston\", \"Richie Laryea\", \"Kamal Miller\", \"Dominick Zator\", \"Mathieu Choini\u00e8re\", \"Stephen Eust\u00e1quio\", \"Isma\u00ebl Kon\u00e9\", \"Jonathan Osorio\", \"Samuel Piette\", \"Thelonius Bair\", \"Charles-Andreas Brym\"],\n    \"Tea

In [None]:
# Access the content attribute from the response
content = response.candidates[0].content.parts[0].text

# Find the starting index of 'player_list ='
start_index = content.find('canada_players = ')

# Find the ending index of ']\n```'
end_index = content.find('```', start_index)

# Extract the substring containing player_list
player_list_str = content[start_index:end_index].strip()

# Evaluate the string as Python code to get the player_list dictionary
exec(player_list_str)

# Convert the player_list dictionary to a DataFrame
df_canada = pd.DataFrame(canada_players)


# Display the DataFrame
print(df_canada)

   Country                  Name                       Team        Category
0   Canada        Maxime Crépeau           Portland Timbers        ARQUEROS
1   Canada         Thomas McGill  Brighton & Hove Albion FC        ARQUEROS
2   Canada       Dayne St. Clair        Minnesota United FC        ARQUEROS
3   Canada    Grégoire Swiderski    Girondins de Bordeaux B        ARQUEROS
4   Canada         Moïse Bombito            Colorado Rapids      DEFENSORES
5   Canada       Derek Cornelius                   Malmö FF      DEFENSORES
6   Canada       Alphonso Davies              Bayern Munich      DEFENSORES
7   Canada    Luc de Fougerolles                  Fulham FC      DEFENSORES
8   Canada          Kyle Hiebert          St. Louis CITY SC      DEFENSORES
9   Canada     Alistair Johnston                  Celtic FC      DEFENSORES
10  Canada         Richie Laryea                 Toronto FC      DEFENSORES
11  Canada          Kamal Miller           Portland Timbers      DEFENSORES
12  Canada  

### Jamaica

It is important to note that I am not providing the teams where each player is listed currently and asked the AI model to browse for this.

In [None]:
jamaica_response = model.generate_content(f''' For the following list of players from Jamaica create a dictionary (jamaica_players) with the following keys:
Country, Name, Team, Category.
Search on the web for the teams where each player is listed.
Porteros: C. Boyce-Clarke(21 años),  S. Davis(23 años), J. Hibbert(19 años), J. Waite(25 años)
Defensas: J. Bell(26 años), D. Bernard(23 años), T. Gray(21 años), M. Hector(31 años), G. Irving(25 años), D. Lembikisa(20 años), D. Lowe(31 años), A. Reid(17 años), R. King (22 años), G. Leigh(29 años)
Mediocampistas: K. Anderson(19 años), D. Johnson(31 años), K. Lambert (27 años), K. Palmer (27 años), F. Reid(32 años)
Delanteros: M. Antonio(34 años), D. Beckford(26 años), D. Campbell(20 años), R. Cephas(24 años), B. De Cordova-Reid (31 años), K. Dixon(19 años), A. Marshsall(26 años), S. Nicholson(27 años)
''')
to_markdown(jamaica_response.text)




> ```python
> jamaica_players = {
>     "Country": "Jamaica",
>     "Players": [
>         {
>             "Name": "C. Boyce-Clarke",
>             "Team": None,
>             "Category": "Porteros"
>         },
>         {
>             "Name": "S. Davis",
>             "Team": None,
>             "Category": "Porteros"
>         },
>         {
>             "Name": "J. Hibbert",
>             "Team": None,
>             "Category": "Porteros"
>         },
>         {
>             "Name": "J. Waite",
>             "Team": "Harrogate Town",
>             "Category": "Porteros"
>         },
>         {
>             "Name": "J. Bell",
>             "Team": "Newport County",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "D. Bernard",
>             "Team": "Oldham Athletic",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "T. Gray",
>             "Team": "Leyton Orient",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "M. Hector",
>             "Team": "Chelsea",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "G. Irving",
>             "Team": "Swindon Town",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "D. Lembikisa",
>             "Team": "Northampton Town",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "D. Lowe",
>             "Team": "Nottingham Forest",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "A. Reid",
>             "Team": "Fulham",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "R. King",
>             "Team": "Bristol Rovers",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "G. Leigh",
>             "Team": "Newport County",
>             "Category": "Defensas"
>         },
>         {
>             "Name": "K. Anderson",
>             "Team": "Wigan Athletic",
>             "Category": "Mediocampistas"
>         },
>         {
>             "Name": "D. Johnson",
>             "Team": "Bolton Wanderers",
>             "Category": "Mediocampistas"
>         },
>         {
>             "Name": "K. Lambert",
>             "Team": "Bristol City",
>             "Category": "Mediocampistas"
>         },
>         {
>             "Name": "K. Palmer",
>             "Team": "Portsmouth",
>             "Category": "Mediocampistas"
>         },
>         {
>             "Name": "F. Reid",
>             "Team": "Crystal Palace",
>             "Category": "Mediocampistas"
>         },
>         {
>             "Name": "M. Antonio",
>             "Team": "West Ham",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "D. Beckford",
>             "Team": "Newport County",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "D. Campbell",
>             "Team": "Luton Town",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "R. Cephas",
>             "Team": "Shrewsbury Town",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "B. De Cordova-Reid",
>             "Team": "Fulham",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "K. Dixon",
>             "Team": "Dundee United",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "A. Marshsall",
>             "Team": "Queen of the South",
>             "Category": "Delanteros"
>         },
>         {
>             "Name": "S. Nicholson",
>             "Team": "Bristol Rovers",
>             "Category": "Delanteros"
>         },
>     ]
> }
> ```

In [None]:
jamaica_response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "```python\njamaica_players = {\n    \"Country\": \"Jamaica\",\n    \"Players\": [\n        {\n            \"Name\": \"C. Boyce-Clarke\",\n            \"Team\": None,\n            \"Category\": \"Porteros\"\n        },\n        {\n            \"Name\": \"S. Davis\",\n            \"Team\": None,\n            \"Category\": \"Porteros\"\n        },\n        {\n            \"Name\": \"J. Hibbert\",\n            \"Team\": None,\n            \"Category\": \"Porteros\"\n        },\n        {\n            \"Name\": \"J. Waite\",\n            \"Team\": \"Harrogate Town\",\n            \"Category\": \"Porteros\"\n        },\n        {\n            \"Name\": \"J. Bell\",\n            \"Team\": \"Newport County\",\n            \"Category\": \"Defensas\"\n        },\n    

In [None]:
# Access the content attribute from the response
jamaica_content = jamaica_response.candidates[0].content.parts[0].text

# Find the starting index of 'player_dict ='
start_index = jamaica_content.find('jamaica_players = ')

# Find the ending index of ']\n```'
end_index = jamaica_content.find('```', start_index)

# Extract the substring containing player_list
player_list_str = jamaica_content[start_index:end_index].strip()

# Evaluate the string as Python code to get the player_list dictionary
exec(player_list_str)

# Convert the player_list dictionary to a DataFrame
df_jamaica = pd.DataFrame(jamaica_players['Players'])
df_jamaica['Country'] = 'Jamaica'
df_jamaica = df_jamaica[['Country','Name','Team','Category']]
# Display the DataFrame
print(df_jamaica)

    Country                Name                Team        Category
0   Jamaica     C. Boyce-Clarke                None        Porteros
1   Jamaica            S. Davis                None        Porteros
2   Jamaica          J. Hibbert                None        Porteros
3   Jamaica            J. Waite      Harrogate Town        Porteros
4   Jamaica             J. Bell      Newport County        Defensas
5   Jamaica          D. Bernard     Oldham Athletic        Defensas
6   Jamaica             T. Gray       Leyton Orient        Defensas
7   Jamaica           M. Hector             Chelsea        Defensas
8   Jamaica           G. Irving        Swindon Town        Defensas
9   Jamaica        D. Lembikisa    Northampton Town        Defensas
10  Jamaica             D. Lowe   Nottingham Forest        Defensas
11  Jamaica             A. Reid              Fulham        Defensas
12  Jamaica             R. King      Bristol Rovers        Defensas
13  Jamaica            G. Leigh      Newport Cou

## Data Merge

Merging all the df from the different countries into a single one (Date: 2024-06-05)

In [None]:
# Create a list of DataFrames
dfs = [new_df, df_uruguay, df_panama, df_canada, df_jamaica]

# Concatenate the DataFrames along the rows
players_ca_24 = pd.concat(dfs, ignore_index=True)

# Display the merged DataFrame
print(players_ca_24)

       Country                Name               Team    Category
0    Argentina   Emiliano Martínez        Aston Villa    Arqueros
1    Argentina       Franco Armani        River Plate    Arqueros
2    Argentina      Gerónimo Rulli               Ajax    Arqueros
3    Argentina     Gonzalo Montiel  Nottingham Forest  Defensores
4    Argentina       Nahuel Molina    Atlético Madrid  Defensores
..         ...                 ...                ...         ...
537    Jamaica  Joel Latibeaudiere                     Delanteros
538    Jamaica        Kasey Palmer                     Delanteros
539    Jamaica      Karoy Anderson                     Delanteros
540    Jamaica      Devon Williams                     Delanteros
541    Jamaica       Kevon Lambert                     Delanteros

[542 rows x 4 columns]


In [None]:
players_ca_24.to_csv('players_ca_24.csv') # Saving data for future analysis

(Date: 2024-06-07) The dataframe that will be used is the new_df as this one has the official list of players that will play the next Copa America championship.

# Info from Kaggle

In [None]:
from google.colab import userdata
userdata.get('Kaggle')

In [None]:
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

cp: cannot stat 'kaggle.json': No such file or directory
chmod: cannot access '/root/.kaggle/kaggle.json': No such file or directory


In [None]:
! kaggle datasets download -d nyagami/fc-24-players-database-and-stats-from-easports

Dataset URL: https://www.kaggle.com/datasets/nyagami/fc-24-players-database-and-stats-from-easports
License(s): CC0-1.0
Downloading fc-24-players-database-and-stats-from-easports.zip to /content
  0% 0.00/2.55M [00:00<?, ?B/s]
100% 2.55M/2.55M [00:00<00:00, 161MB/s]


In [None]:
!unzip fc-24-players-database-and-stats-from-easports.zip

Archive:  fc-24-players-database-and-stats-from-easports.zip
  inflating: all_players.csv         
  inflating: female_players.csv      
  inflating: male_players.csv        


In [None]:
# Taking the Overall weight for each player from the game FC24 where they scores every player based on their skills
player_rating = pd.read_csv('male_players.csv')
player_rating.head()

Unnamed: 0.1,Unnamed: 0,Name,Nation,Club,Position,Age,Overall,Pace,Shooting,Passing,...,Strength,Aggression,Att work rate,Def work rate,Preferred foot,Weak foot,Skill moves,URL,Gender,GK
0,0,Kylian Mbappé,France,Paris SG,ST,24,91,97,90,80,...,77,64,High,Low,Right,4,5,https://www.ea.com/games/ea-sports-fc/ratings/...,M,
1,1,Erling Haaland,Norway,Manchester City,ST,23,91,89,93,66,...,93,87,High,Medium,Left,3,3,https://www.ea.com/games/ea-sports-fc/ratings/...,M,
2,2,Kevin De Bruyne,Belgium,Manchester City,CM,32,91,72,88,94,...,74,75,High,Medium,Right,5,4,https://www.ea.com/games/ea-sports-fc/ratings/...,M,
3,3,Lionel Messi,Argentina,Inter Miami CF,CF,36,90,80,87,90,...,68,44,Low,Low,Left,4,4,https://www.ea.com/games/ea-sports-fc/ratings/...,M,
4,4,Karim Benzema,France,Al Ittihad,CF,35,90,79,88,83,...,82,63,Medium,Medium,Right,4,4,https://www.ea.com/games/ea-sports-fc/ratings/...,M,


In [None]:
! kaggle datasets download -d hamzaadhnanshakir/international-football-tournament-results

Dataset URL: https://www.kaggle.com/datasets/hamzaadhnanshakir/international-football-tournament-results
License(s): CC0-1.0
Downloading international-football-tournament-results.zip to /content
  0% 0.00/48.4k [00:00<?, ?B/s]
100% 48.4k/48.4k [00:00<00:00, 46.0MB/s]


In [None]:
! unzip international-football-tournament-results.zip # Taking all the previous results for copa america so to have historical data

Archive:  international-football-tournament-results.zip
  inflating: afcon.csv               
  inflating: asian_cup.csv           
  inflating: copa_america.csv        
  inflating: euros.csv               
  inflating: world_cup.csv           


In [None]:
hs_df = pd.read_csv('copa_america.csv')
hs_df.head()

Unnamed: 0,Year,Date,Home Team,Away Team,Home Score,Away Score,Shootout,Tournament,City,Country,Neutral Venue,Winning Team,first_shooter,Losing Team
0,1916,1916-07-02,Chile,Uruguay,0.0,4.0,False,Copa América,Buenos Aires,Argentina,True,Uruguay,,Chile
1,1916,1916-07-06,Argentina,Chile,6.0,1.0,False,Copa América,Buenos Aires,Argentina,False,Argentina,,Chile
2,1916,1916-07-08,Brazil,Chile,1.0,1.0,False,Copa América,Buenos Aires,Argentina,True,Draw,,Draw
3,1916,1916-07-10,Argentina,Brazil,1.0,1.0,False,Copa América,Buenos Aires,Argentina,False,Draw,,Draw
4,1916,1916-07-12,Brazil,Uruguay,1.0,2.0,False,Copa América,Buenos Aires,Argentina,True,Uruguay,,Brazil


In [None]:
! kaggle datasets download -d cashncarry/fifaworldranking

Dataset URL: https://www.kaggle.com/datasets/cashncarry/fifaworldranking
License(s): CC0-1.0
Downloading fifaworldranking.zip to /content
  0% 0.00/1.14M [00:00<?, ?B/s]
100% 1.14M/1.14M [00:00<00:00, 77.2MB/s]


In [None]:
! unzip fifaworldranking.zip # Taking FIFA ranking for every team

Archive:  fifaworldranking.zip
  inflating: fifa_ranking-2023-07-20.csv  
  inflating: fifa_ranking-2024-04-04.csv  


In [None]:
rankining_df = pd.read_csv('fifa_ranking-2024-04-04.csv')
rankining_df.head()

Unnamed: 0,rank,country_full,country_abrv,total_points,previous_points,rank_change,confederation,rank_date
0,83.0,Guatemala,GUA,15.0,0.0,83,CONCACAF,1992-12-31
1,32.0,Zambia,ZAM,38.0,0.0,32,CAF,1992-12-31
2,33.0,Portugal,POR,38.0,0.0,33,UEFA,1992-12-31
3,34.0,Austria,AUT,38.0,0.0,34,UEFA,1992-12-31
4,35.0,Colombia,COL,36.0,0.0,35,CONMEBOL,1992-12-31


# Data pre-processing

The data frames we have so far are the following:

* new_df = list of players from web scraping.
* players_ca_24 = df with all the players convocated to play the next Copa America (NOT USED)
* player_rating = rating of all the  male players that are in the FC24 game
* hs_df = historical data from past editions of the Copa America
* rankining_df = Ranking of the different teams participating in the competition

In [None]:
new_df['Country'].unique()

array(['Argentina', 'Perú', 'Chile', 'Canadá', 'México', 'Ecuador',
       'Venezuela', 'Jamaica', 'Estados Unidos', 'Uruguay', 'Panamá',
       'Bolivia', 'Brasil', 'Colombia', 'Paraguay', 'Costa Rica'],
      dtype=object)

In [None]:
# Standarizing the country names and column so to merge dataframes
for col in new_df.columns:
    if new_df[col].dtype == object:  # Check column type 'object' (string)
        new_df[col] = new_df[col].str.replace('á', 'a')
        new_df[col] = new_df[col].str.replace('é', 'e')
        new_df[col] = new_df[col].str.replace('í', 'i')
        new_df[col] = new_df[col].str.replace('ó', 'o')
        new_df[col] = new_df[col].str.replace('ú', 'u')
        new_df[col] = new_df[col].str.replace('ï', 'i')
        new_df[col] = new_df[col].str.replace('Á', 'A')

new_df.replace('Brasil','Brazil', inplace=True)
new_df.rename(columns={'Country':'Nation'}, inplace=True)

In [None]:
# Doing same for player_rating dataset
for col in player_rating.columns:
    if player_rating[col].dtype == object:  # Check column type 'object' (string)
        player_rating[col] = player_rating[col].str.replace('á', 'a')
        player_rating[col] = player_rating[col].str.replace('é', 'e')
        player_rating[col] = player_rating[col].str.replace('í', 'i')
        player_rating[col] = player_rating[col].str.replace('ó', 'o')
        player_rating[col] = player_rating[col].str.replace('ú', 'u')
        player_rating[col] = player_rating[col].str.replace('ï', 'i')
        player_rating[col] = player_rating[col].str.replace('Á', 'A')

In [None]:
players_ = new_df.merge(player_rating[['Name','Nation','Club', 'Position', 'Age', 'Overall','Preferred foot']], on=('Name','Nation'), how='left')
players_.head()

Unnamed: 0,Nation,Name,Team,Category,Club,Position,Age,Overall,Preferred foot
0,Argentina,Emiliano Martinez,Aston Villa,Arqueros,Aston Villa,GK,31.0,85.0,Right
1,Argentina,Franco Armani,River Plate,Arqueros,River Plate,GK,36.0,77.0,Right
2,Argentina,Geronimo Rulli,Ajax,Arqueros,Ajax,GK,31.0,81.0,Right
3,Argentina,Gonzalo Montiel,Nottingham Forest,Defensores,Nott'm Forest,RB,26.0,79.0,Right
4,Argentina,Nahuel Molina,Atletico Madrid,Defensores,Atletico de Madrid,RB,25.0,82.0,Right


In [None]:
# Getting rid of the columns that won't be used
players_.drop(columns=['Team','Category'], inplace=True)

In [None]:
list_players = new_df.groupby('Nation').count().reset_index()
list_players_by_nation = list_players[['Nation', 'Name']]
print('Old list of players:')
print(list_players_by_nation)
print('-'*50)
players_nation = players_.groupby('Nation').count().reset_index()
players_count_by_nation = players_nation[['Nation', 'Name']]
print('New list of players:')
print(players_count_by_nation)

Old list of players:
            Nation  Name
0        Argentina    29
1          Bolivia    28
2           Brazil    26
3           Canada    27
4            Chile    46
5         Colombia    28
6       Costa Rica    26
7          Ecuador    26
8   Estados Unidos    27
9          Jamaica    26
10          Mexico    31
11          Panama    27
12        Paraguay    27
13            Peru    16
14         Uruguay    20
15       Venezuela    47
--------------------------------------------------
New list of players:
            Nation  Name
0        Argentina    29
1          Bolivia    28
2           Brazil    32
3           Canada    27
4            Chile    46
5         Colombia    28
6       Costa Rica    26
7          Ecuador    26
8   Estados Unidos    27
9          Jamaica    26
10          Mexico    31
11          Panama    27
12        Paraguay    27
13            Peru    16
14         Uruguay    20
15       Venezuela    47


For the Brazil team, the number of players has increased, these could be because of repeated names for the same country that are in the player_rating dataset.

In [None]:
# Checking all the Brazilian players
players_[players_.Nation=='Brazil']

Unnamed: 0,Nation,Name,Club,Position,Age,Overall,Preferred foot
350,Brazil,Alisson,Liverpool,GK,31.0,89.0,Right
351,Brazil,Bento,,,,,
352,Brazil,Rafael,,,,,
353,Brazil,Danilo,Juventus,CB,32.0,81.0,Right
354,Brazil,Danilo,Nott'm Forest,CDM,22.0,76.0,Left
355,Brazil,Danilo,VfL Bochum,LB,31.0,76.0,Left
356,Brazil,Danilo,Rangers,ST,24.0,74.0,Right
357,Brazil,Yan Couto,Girona FC,RM,21.0,72.0,Right
358,Brazil,Guilherme Arana,,,,,
359,Brazil,Wendell,FC Porto,LB,30.0,76.0,Left


In [None]:
# Drop duplicates in the players_ DataFrame based on the 'Name' column and keeping only the first as they're the ones with highest Overall score.
players_.drop_duplicates(subset=['Name'], keep='first', inplace=True)

players_[players_.Nation=='Brazil']

Unnamed: 0,Nation,Name,Club,Position,Age,Overall,Preferred foot
350,Brazil,Alisson,Liverpool,GK,31.0,89.0,Right
351,Brazil,Bento,,,,,
352,Brazil,Rafael,,,,,
353,Brazil,Danilo,Juventus,CB,32.0,81.0,Right
357,Brazil,Yan Couto,Girona FC,RM,21.0,72.0,Right
358,Brazil,Guilherme Arana,,,,,
359,Brazil,Wendell,FC Porto,LB,30.0,76.0,Left
360,Brazil,Beraldo,,,,,
361,Brazil,Éder Militão,Real Madrid,CB,25.0,86.0,Right
362,Brazil,Gabriel Magalhães,,,,,


It can be seen that there are many players with null values. The reason for this can be:


*   Players league is unavailable in the FC24 game, so they are not in the player_rating dataset.
* The player's name is written differently than the one in the player_rating dataset.

I'm going to check this and for the second option change the name to how it is shown in the game

In [None]:
# Counting the numbers of player with null values per each coountry
player_null = players_[players_['Club'].isnull()]
player_null = (player_null.groupby('Nation').count().reset_index())
player_null.rename(columns={'Name':'Count'}, inplace=True)
player_null = player_null[['Nation', 'Count']]
print('Total null values:', player_null['Count'].sum())
player_null

Total null values: 235


Unnamed: 0,Nation,Count
0,Argentina,1
1,Bolivia,14
2,Brazil,9
3,Canada,7
4,Chile,25
5,Colombia,11
6,Costa Rica,22
7,Ecuador,11
8,Estados Unidos,27
9,Jamaica,15


In [None]:
# Creating a list of the player wtith null values
player_name = players_[players_['Club'].isnull()]
player_name.groupby('Nation')
player_name = player_name[['Nation', 'Name']]
player_name = list(player_name.values)

In [None]:
# Using AI model to browse for the full name of each player
player_response = model.generate_content(f'''
For the list of football players in the following list {player_name}, I need you to adjust their names to their real names and how they are known as. For example, "Rodrigo De Paul" the real name is  "Rodrigo Javier De Paul," and "Vinícius Júnior" is known as "Vini Jr."
Display the full list {player_name}, with just full names and short name, in a dictionary format (players_full_name). Ensure that the adjustments maintain accuracy and consistency, as this is crucial for properdata analysis.''')
to_markdown(player_response.text)

> ```
> players_full_name = {
>     "Rodrigo Javier De Paul": "Rodrigo De Paul",
>     "Anderson Santamaria": "Anderson Santamaria",
>     "Oliver Sonne": "Oliver Sonne",
>     "Pedro Quispe": "Pedro Quispe",
>     "Andre Carrillo": "Andre Carrillo",
>     "Bryan Reyna": "Bryan Reyna",
>     "Vicente Reyes": "Vicente Reyes",
>     "Mauricio Isla": "Mauricio Isla",
>     "Benjamin Kuscevic": "Benjamin Kuscevic",
>     "Igor Lichnovsky": "Igor Lichnovsky",
>     "Gary Medel": "Gary Medel",
>     "Nicolas Diaz": "Nicolas Diaz",
>     "Sebastian Vegas": "Sebastian Vegas",
>     "Erick Pulgar": "Erick Pulgar",
>     "Ulises Ortegoza": "Ulises Ortegoza",
>     "Williams Alarcon": "Williams Alarcon",
>     "Jean Meneses": "Jean Meneses",
>     "Victor Mendez": "Victor Mendez",
>     "Luciano Cabral": "Luciano Cabral",
>     "Maximiliano Guerrero": "Maximiliano Guerrero",
>     "Cesar Perez": "Cesar Perez",
>     "Felipe Loyola": "Felipe Loyola",
>     "Alexis Sanchez": "Alexis Sanchez",
>     "Eduardo Vargas": "Eduardo Vargas",
>     "Diego Valdes": "Diego Valdes",
>     "Benjamin Brereton": "Benjamin Brereton",
>     "Victor Davila": "Victor Davila",
>     "Dario Osorio": "Dario Osorio",
>     "Diego Valencia": "Diego Valencia",
>     "Steffan Pino": "Steffan Pino",
>     "Lucas Assadi": "Lucas Assadi",
>     "Thomas McGill": "Thomas McGill",
>     "Gregoire Swiderski": "Gregoire Swiderski",
>     "Moise Bombito": "Moise Bombito",
>     "Luc de Fougerolles": "Luc de Fougerolles",
>     "Kyle Hiebert": "Kyle Hiebert",
>     "Thelonius Bair": "Thelonius Bair",
>     "Junior Hoilett": "Junior Hoilett",
>     "Angel Malagon": "Angel Malagon",
>     "Jose Raul Rangel": "Jose Raul Rangel",
>     "Julio Gonzalez": "Julio Gonzalez",
>     "Israel Reyes": "Israel Reyes",
>     "Brian Garcia": "Brian Garcia",
>     "Victor Guzman": "Victor Guzman",
>     "Alexis Peña": "Alexis Peña",
>     "Jesus Orozco": "Jesus Orozco",
>     "Bryan Gonzalez": "Bryan Gonzalez",
>     "Luis Romo": "Luis Romo",
>     "Erick Sanchez": "Erick Sanchez",
>     "Roberto Alvarado": "Roberto Alvarado",
>     "Luis Chavez": "Luis Chavez",
>     "Angel Montaño": "Angel Montaño",
>     "Fernando Beltran": "Fernando Beltran",
>     "Carlos Rodriguez": "Carlos Rodriguez",
>     "Marcelo Flores": "Marcelo Flores",
>     "Cesar Huerta": "Cesar Huerta",
>     "Julian Quiñones": "Julian Quiñones",
>     "Alexis Vega": "Alexis Vega",
>     "Uriel Antuna": "Uriel Antuna",
>     "Guillermo Martinez": "Guillermo Martinez",
>     "Diego Lainez": "Diego Lainez",
>     "Felix Torres": "Felix Torres",
>     "Jose Hurtado": "Jose Hurtado",
>     "William Pacho": "William Pacho",
>     "Andres Micolta": "Andres Micolta",
>     "Layan Loor": "Layan Loor",
>     "Kendry Paez": "Kendry Paez",
>     "Angel Mena": "Angel Mena",
>     "Alan Franco": "Alan Franco",
>     "John Yeboah": "John Yeboah",
>     "Enner Valencia": "Enner Valencia",
>     "Jordy Caicedo": "Jordy Caicedo",
>     "Rafael Romo": "Rafael Romo",
>     "Joel Graterol": "Joel Graterol",
>     "Jose David Contreras": "Jose David Contreras",
>     "Nahuel Ferraresi": "Nahuel Ferraresi",
>     "Carlos Viva": "Carlos Viva",
>     "Diego Luna": "Diego Luna",
>     "Teo Quintero": "Teo Quintero",
>     "Jhon Chancellor": "Jhon Chancellor",
>     "Miguel Navarro": "Miguel Navarro",
>     "Renne Rivas": "Renne Rivas",
>     "Roberto Rosales": "Roberto Rosales",
>     "Delvin Alfonzo": "Delvin Alfonzo",
>     "Cristian Casseres": "Cristian Casseres",
>     "Bryant Ortega": "Bryant Ortega",
>     "Tomas Rincon": "Tomas Rincon",
>     "Edson Castillo": "Edson Castillo",
>     "Telasco Segovia": "Telasco Segovia",
>     "Eduard Bello": "Eduard Bello",
>     "Kervin Andrade": "Kervin Andrade",
>     "Jhon Murillo": "Jhon Murillo",
>     "Yeferson Soteldo": "Yeferson Soteldo",
>     "Enrique Peña Zuaner": "Enrique Peña Zuaner",
>     "Freddy Vargas": "Freddy Vargas",
>     "Shaquan Davis": "Shaquan Davis",
>     "Jayden Hibbert": "Jayden Hibbert",
>     "Jahmali Waite": "Jahmali Waite",
>     "Richard King": "Richard King",
>     "Tayvon Gray": "Tayvon Gray",
>     "Jon Bell": "Jon Bell",
>     "Bobby Reid": "Bobby Reid",
>     "Alex Marshall": "Alex Marshall",
>     "Adrian Reid": "Adrian Reid",
>     "Kevon Lambert": "Kevon Lambert",
>     "Shamar Nicholson": "Shamar Nicholson",
>     "Renaldo Cephas": "Renaldo Cephas",
>     "Deshane Beckford": "Deshane Beckford",
>     "Kaheim Dixon": "Kaheim Dixon",
>     "Devonte Campbell": "Devonte Campbell",
>     "Kristoffer Lund": "Kristoffer Lund",
>     "Shaq Moore": "Shaq Moore",
>     "Johnny Cardoso": "Johnny Cardoso",
>     "Gio Reyna": "Gio Reyna",
>     "Timmy Tillman": "Timmy Tillman",
>     "Tim Weah": "Tim Weah",
>     "Sergio Rochet": "Sergio Rochet",
>     "Santiago Mele": "Santiago Mele",
>     "Sebastian Caceres": "Sebastian Caceres",
>     "Lucas Olaza": "Lucas Olaza",
>     "Maximiliano Araujo": "Maximiliano Araujo",
>     "Brian Rodriguez": "Brian Rodriguez",
>     "Orlando Mosquera": "Orlando Mosquera",
>     "Luis Mejia": "Luis Mejia",
>     "Cesar Samudio": "Cesar Samudio",
>     "Fidel Escobar": "Fidel Escobar",
>     "Eduardo Anderson": "Eduardo Anderson",
>     "Jose Cordoba": "Jose Cordoba",
>     "Eric Davis": "Eric Davis",
>     "Cesar Blackman": "Cesar Blackman",
>     "Edgardo Fariña": "Edgardo Fariña",
>     "Roderick Miller": "Roderick Miller",
>     "Martin Krug": "Martin Krug",
>     "Cristian Martinez": "Cristian Martinez",
>     "Edgar Barcenas": "Edgar Barcenas",
>     "Jovani Welch": "Jovani Welch",
>     "Freddy Gondola": "Freddy Gondola",
>     "Carlos Harvey": "Carlos Harvey",
>     "Abidel Ararza": "Abidel Ararza",
>     "Cesar Yanis": "Cesar Yanis",
>     "Ismael Diaz": "Ismael Diaz",
>     "Jose Fajardo": "Jose Fajardo",
>     "Eduardo Guerrero": "Eduardo Guerrero",
>     "Cecilio Waterman": "Cecilio Waterman",
>     "Gustavo Almada": "Gustavo Almada",
>     "Diego Medina": "Diego Medina",
>     "Roberto Carlos Fernandez": "Roberto Carlos Fernandez",
>     "Marcelo Suarez": "Marcelo Suarez",
>     "Luis Haquin": "Luis Haquin",
>     "Efrain Morales": "Efrain Morales",
>     "Pablo Vaca": "Pablo Vaca",
>     "Hector Cuellar": "Hector Cuellar",
>     "Robson Matheus": "Robson Matheus",
>     "Miguel Terceros": "Miguel Terceros",
>     "Adalid Terrazas": "Adalid Terrazas",
>     "Jaume Cuellar": "Jaume Cuellar",
>     "Rodrigo Ramallo": "Rodrigo Ramallo",
>     "Bruno Miranda": "Bruno Miranda",
>     "Bento": "Bento",
>     "Rafael": "Rafael",
>     "Guilherme Arana": "Guilherme Arana",
>     "Beraldo": "Beraldo",
>     "Gabriel Magalhães": "Gabriel Magalhães",
>     "Endrick": "Endrick",
>     "Rodrygo Goes": "Rodrygo Goes",
>     "Savinho": "Savinho",
>     "Vinicius Junior": "Vini Jr.",
>     "Camilo Vargas": "Camilo Vargas",
>     "James Rodriguez": "James Rodriguez",
>     "Jorge Carrascal": "Jorge Carrascal",
>     "Juan Fernando Quintero": "Juan Fernando Quintero",
>     "Kevin Castaño": "Kevin Castaño",
>     "Matheus Uribe

In [None]:
import ast
# Access the content attribute from the response
player_content = player_response.candidates[0].content.parts[0].text

# Find the starting index of 'player_dict ='
start_index = player_content.find('players_full_name = ')

# Find the ending index of ']\n```'
end_index = player_content.find('```', start_index)

# Extract the substring containing player_list
player_list_str = player_content[start_index:end_index].strip()+ '":""}'
# Evaluate the string as Python code to get the player_list dictionary
exec(player_list_str)

# Flatten the nested dictionary and convert to a DataFrame
players_data = [(full_name, name) for full_name, name in players_full_name.items()]

df_player_fullname = pd.DataFrame(players_data, columns=['Name', 'short Name'])
df_player_fullname.dropna
# Display the DataFrame
print(df_player_fullname)

                       Name              short Name
0    Rodrigo Javier De Paul         Rodrigo De Paul
1       Anderson Santamaria     Anderson Santamaria
2              Oliver Sonne            Oliver Sonne
3              Pedro Quispe            Pedro Quispe
4            Andre Carrillo          Andre Carrillo
..                      ...                     ...
168         James Rodriguez         James Rodriguez
169         Jorge Carrascal         Jorge Carrascal
170  Juan Fernando Quintero  Juan Fernando Quintero
171           Kevin Castaño           Kevin Castaño
172            Matheus Urib                        

[173 rows x 2 columns]


In [None]:
# Change the players table column name to merge with full name table
players_.rename(columns={'Name':'short Name'}, inplace=True)
df_player_fullname.dropna(inplace=True)

In [None]:
# Merging tables
players_ = players_.merge(df_player_fullname[['Name', 'short Name']], on=('short Name'), how='left')
players_

Unnamed: 0,Nation,short Name,Club,Position,Age,Overall,Preferred foot,Name
0,Argentina,Emiliano Martinez,Aston Villa,GK,31.0,85.0,Right,
1,Argentina,Franco Armani,River Plate,GK,36.0,77.0,Right,
2,Argentina,Geronimo Rulli,Ajax,GK,31.0,81.0,Right,
3,Argentina,Gonzalo Montiel,Nott'm Forest,RB,26.0,79.0,Right,
4,Argentina,Nahuel Molina,Atletico de Madrid,RB,25.0,82.0,Right,
...,...,...,...,...,...,...,...,...
447,Costa Rica,Anthony Contreras,,,,,,
448,Costa Rica,Kenneth Vargas,,,,,,
449,Costa Rica,Alvaro Zamora,,,,,,
450,Costa Rica,Andy Rojas,,,,,,


In [None]:
# Recheck values with database from FC 24 game
df_merged = players_.merge(player_rating[['Name', 'Club',	'Position',	'Age',	'Overall',	'Preferred foot']], on=('Name'), how='left')

# Fill NaN values from player_rating into the merged DataFrame
df_merged['Club'] = df_merged['Club_x'].fillna(df_merged['Club_y'])
df_merged['Position'] = df_merged['Position_x'].fillna(df_merged['Position_y'])
df_merged['Age'] = df_merged['Age_x'].fillna(df_merged['Age_y'])
df_merged['Overall'] = df_merged['Overall_x'].fillna(df_merged['Overall_y'])
df_merged['Preferred foot'] = df_merged['Preferred foot_x'].fillna(df_merged['Preferred foot_y'])
df_merged['Name'] = df_merged['Name'].fillna(df_merged['short Name'])

# Drop redundant columns
df_merged = df_merged.drop(columns=['Club_y', 'Position_y', 'Age_y', 'Overall_y', 'Preferred foot_y'])
df_merged = df_merged.drop(columns=['Club_x', 'Position_x', 'Age_x', 'Overall_x', 'Preferred foot_x'])

df_merged[df_merged.Nation == 'Argentina']

Unnamed: 0,Nation,short Name,Name,Club,Position,Age,Overall,Preferred foot
0,Argentina,Emiliano Martinez,Emiliano Martinez,Aston Villa,GK,31.0,85.0,Right
1,Argentina,Franco Armani,Franco Armani,River Plate,GK,36.0,77.0,Right
2,Argentina,Geronimo Rulli,Geronimo Rulli,Ajax,GK,31.0,81.0,Right
3,Argentina,Gonzalo Montiel,Gonzalo Montiel,Nott'm Forest,RB,26.0,79.0,Right
4,Argentina,Nahuel Molina,Nahuel Molina,Atletico de Madrid,RB,25.0,82.0,Right
5,Argentina,Leonardo Balerdi,Leonardo Balerdi,OM,CB,24.0,76.0,Right
6,Argentina,Cristian Romero,Cristian Romero,Spurs,CB,25.0,82.0,Right
7,Argentina,German Pezzella,German Pezzella,Real Betis,CB,32.0,76.0,Right
8,Argentina,Lucas Martinez Quarta,Lucas Martinez Quarta,Fiorentina,CB,27.0,77.0,Right
9,Argentina,Nicolas Otamendi,Nicolas Otamendi,SL Benfica,CB,35.0,82.0,Right


In [None]:
# Counting the numbers of player with null values per each coountry
player_null = df_merged[df_merged['Overall'].isnull()]
player_null = (player_null.groupby('Nation').count().reset_index())
player_null.rename(columns={'short Name':'Count'}, inplace=True)
player_null = player_null[['Nation', 'Count']]
print('Total null values:', player_null['Count'].sum())
player_null

Total null values: 228


Unnamed: 0,Nation,Count
0,Bolivia,14
1,Brazil,9
2,Canada,7
3,Chile,24
4,Colombia,11
5,Costa Rica,22
6,Ecuador,10
7,Estados Unidos,27
8,Jamaica,13
9,Mexico,23


It is possible to see that the number of null values continues to be high

### Checking data fifa 23

I am going to use the data of the same game but from last year, which has more complete info.

In [None]:
! kaggle datasets download -d sanjeetsinghnaik/fifa-23-players-dataset

In [None]:
! unzip fifa-23-players-dataset.zip

In [None]:
# Checking the info contained in this new dataset
fifa_23 = pd.read_csv('Fifa 23 Players Data.csv')
fifa_23.head()


Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
0,L. Messi,Lionel Messi,91,91,54000000,RW,CAM,Argentina,https://cdn.sofifa.net/players/158/023/23_60.png,35,...,91,88,91,67,66,67,62,53,62,22
1,K. Benzema,Karim Benzema,91,91,64000000,"CF,ST",CF,France,https://cdn.sofifa.net/players/165/153/23_60.png,34,...,89,84,89,67,67,67,63,58,63,21
2,R. Lewandowski,Robert Lewandowski,91,91,84000000,ST,ST,Poland,https://cdn.sofifa.net/players/188/545/23_60.png,33,...,86,83,86,67,69,67,64,63,64,22
3,K. De Bruyne,Kevin De Bruyne,91,91,107500000,"CM,CAM",CM,Belgium,https://cdn.sofifa.net/players/192/985/23_60.png,31,...,91,91,91,82,82,82,78,72,78,24
4,K. Mbappé,Kylian Mbappé,91,95,190500000,"ST,LW",ST,France,https://cdn.sofifa.net/players/231/747/23_60.png,23,...,92,84,92,70,66,70,66,57,66,21


In [None]:
for col in fifa_23.columns:
    if fifa_23[col].dtype == object:  # Check column type 'object' (string)
        fifa_23[col] = fifa_23[col].str.replace('á', 'a')
        fifa_23[col] = fifa_23[col].str.replace('é', 'e')
        fifa_23[col] = fifa_23[col].str.replace('í', 'i')
        fifa_23[col] = fifa_23[col].str.replace('ó', 'o')
        fifa_23[col] = fifa_23[col].str.replace('ú', 'u')
        fifa_23[col] = fifa_23[col].str.replace('ï', 'i')
        fifa_23[col] = fifa_23[col].str.replace('Á', 'A')

In [None]:
# Changing columns name
df_merged.rename(columns={'Country':'Nationality'}, inplace=True)
df_merged.rename(columns={'Name':'Full Name'}, inplace=True)

In [None]:
fifa_23[fifa_23['Full Name'].str.contains('Lisandro ', case=False, na=False)]

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
293,L. Martinez,Lisandro Martinez,81,86,35000000,"CB,LB,CDM",CDM,Argentina,https://cdn.sofifa.net/players/239/301/23_60.png,24,...,76,79,76,81,83,81,81,83,81,20
3297,L. Lopez,Lisandro Lopez,72,72,800000,"ST,CAM",ST,Argentina,https://cdn.sofifa.net/players/142/707/23_60.png,39,...,71,71,71,61,63,61,60,60,60,18
4054,L. Magallan,Lisandro Magallan,71,71,1600000,CB,CB,Argentina,https://cdn.sofifa.net/players/211/263/23_60.png,28,...,55,59,55,65,68,65,66,71,66,19
7699,Lisandro Semedo,Lisandro Pedro Varela Semedo,67,68,1200000,RM,RM,Cape Verde Islands,https://cdn.sofifa.net/players/243/402/23_60.png,26,...,68,62,68,53,51,53,50,45,50,16


In [None]:
fifa_23.columns

Index(['Known As', 'Full Name', 'Overall', 'Potential', 'Value(in Euro)',
       'Positions Played', 'Best Position', 'Nationality', 'Image Link', 'Age',
       'Height(in cm)', 'Weight(in kg)', 'TotalStats', 'BaseStats',
       'Club Name', 'Wage(in Euro)', 'Release Clause', 'Club Position',
       'Contract Until', 'Club Jersey Number', 'Joined On', 'On Loan',
       'Preferred Foot', 'Weak Foot Rating', 'Skill Moves',
       'International Reputation', 'National Team Name',
       'National Team Image Link', 'National Team Position',
       'National Team Jersey Number', 'Attacking Work Rate',
       'Defensive Work Rate', 'Pace Total', 'Shooting Total', 'Passing Total',
       'Dribbling Total', 'Defending Total', 'Physicality Total', 'Crossing',
       'Finishing', 'Heading Accuracy', 'Short Passing', 'Volleys',
       'Dribbling', 'Curve', 'Freekick Accuracy', 'LongPassing', 'BallControl',
       'Acceleration', 'Sprint Speed', 'Agility', 'Reactions', 'Balance',
       'Shot Powe

In [None]:
# Mergin las dataset with information from FIFA 2023
df = df_merged.merge(fifa_23[['Full Name', 'Overall', 'Value(in Euro)', 'Best Position', 'Nationality', 'Age', 'Height(in cm)', 'Weight(in kg)', 'Club Name', 'Preferred Foot']], on=('Full Name'), how='left')

# Fill NaN values from player_rating into the merged DataFrame
df['Club'] = df['Club'].fillna(df['Club Name'])
df['Position'] = df['Position'].fillna(df['Best Position'])
df['Age_x'] = df['Age_x'].fillna(df['Age_y'])
df['Overall_x'] = df['Overall_x'].fillna(df['Overall_y'])
df['Preferred foot'] = df['Preferred foot'].fillna(df['Preferred Foot'])

# Drop redundant columns
df = df.drop(columns=['Overall_y', 'Best Position', 'Nationality', 'Age_y', 'Club Name', 'Preferred Foot'])

# Rename columns
df = df.rename(columns={
    'Age_x': 'Age',
    'Overall_x': 'Overall'})

df

Unnamed: 0,Nation,short Name,Full Name,Club,Position,Age,Overall,Preferred foot,Value(in Euro),Height(in cm),Weight(in kg)
0,Argentina,Emiliano Martinez,Emiliano Martinez,Aston Villa,GK,31.0,85.0,Right,29000000.0,195.0,88.0
1,Argentina,Emiliano Martinez,Emiliano Martinez,Aston Villa,GK,31.0,85.0,Right,3000000.0,184.0,64.0
2,Argentina,Franco Armani,Franco Armani,River Plate,GK,36.0,77.0,Right,1900000.0,189.0,88.0
3,Argentina,Geronimo Rulli,Geronimo Rulli,Ajax,GK,31.0,81.0,Right,20500000.0,189.0,84.0
4,Argentina,Gonzalo Montiel,Gonzalo Montiel,Nott'm Forest,RB,26.0,79.0,Right,20000000.0,175.0,70.0
...,...,...,...,...,...,...,...,...,...,...,...
458,Costa Rica,Anthony Contreras,Anthony Contreras,,,,,,,,
459,Costa Rica,Kenneth Vargas,Kenneth Vargas,,,,,,,,
460,Costa Rica,Alvaro Zamora,Alvaro Zamora,,,,,,,,
461,Costa Rica,Andy Rojas,Andy Rojas,,,,,,,,


In [None]:
fifa_23.rename(columns={'Known As':'short Name'}, inplace=True)

In [None]:
# Mergin last with short names  from FIFA 2023
df.replace('Vinicius Junior','Vini Jr.', inplace=True)
df = df_merged.merge(fifa_23[['short Name', 'Overall', 'Value(in Euro)', 'Best Position', 'Nationality', 'Age', 'Height(in cm)', 'Weight(in kg)', 'Club Name', 'Preferred Foot']], on=('short Name'), how='left')

# Fill NaN values from player_rating into the merged DataFrame
df['Club'] = df['Club'].fillna(df['Club Name'])
df['Position'] = df['Position'].fillna(df['Best Position'])
df['Age_x'] = df['Age_x'].fillna(df['Age_y'])
df['Overall_x'] = df['Overall_x'].fillna(df['Overall_y'])
df['Preferred foot'] = df['Preferred foot'].fillna(df['Preferred Foot'])

# Drop redundant columns
df = df.drop(columns=['Overall_y', 'Best Position', 'Nationality', 'Age_y', 'Club Name', 'Preferred Foot'])

# Rename columns
df = df.rename(columns={
    'Age_x': 'Age',
    'Overall_x': 'Overall'})



players_.drop_duplicates(subset=['Name'], keep='first', inplace=True)

df[df.Nation == 'Brazil']

Unnamed: 0,Nation,short Name,Full Name,Club,Position,Age_x,Overall_x,Preferred foot,Overall_y,Value(in Euro),Best Position,Nationality,Age_y,Height(in cm),Weight(in kg),Club Name,Preferred Foot
345,Brazil,Alisson,Alisson,Liverpool,GK,31.0,89.0,Right,89.0,79000000.0,GK,Brazil,29.0,191.0,91.0,Liverpool,Right
346,Brazil,Bento,Bento,,,,,,,,,,,,,,
347,Brazil,Rafael,Rafael,,,,,,,,,,,,,,
348,Brazil,Danilo,Danilo,Juventus,CB,32.0,81.0,Right,80.0,18000000.0,RB,Brazil,30.0,184.0,78.0,Juventus,Right
349,Brazil,Danilo,Danilo,Juventus,CB,32.0,81.0,Right,77.0,9000000.0,LB,Brazil,30.0,170.0,73.0,VfL Bochum 1848,Left
350,Brazil,Danilo,Danilo,Juventus,CB,32.0,81.0,Right,71.0,2900000.0,ST,Brazil,23.0,174.0,68.0,Feyenoord,Right
351,Brazil,Yan Couto,Yan Couto,Girona FC,RM,21.0,72.0,Right,71.0,4099999.0,RM,Brazil,20.0,168.0,60.0,Girona FC,Right
352,Brazil,Guilherme Arana,Guilherme Arana,,,,,,,,,,,,,,
353,Brazil,Wendell,Wendell,FC Porto,LB,30.0,76.0,Left,75.0,5000000.0,LB,Brazil,28.0,176.0,70.0,FC Porto,Left
354,Brazil,Beraldo,Beraldo,,,,,,,,,,,,,,


In [None]:
df.drop_duplicates(subset=['Full Name'], keep='first', inplace=True)

In [None]:
# Counting the numbers of player with null values per each coountry
player_null_ = df[df['Overall'].isnull()]
player_null_ = (player_null_.groupby('Nation').count().reset_index())
player_null_.rename(columns={'short Name':'Count'}, inplace=True)
player_null_ = player_null_[['Nation', 'Count']]
print('Total null values:', player_null_['Count'].sum())
player_null_

Total null values: 158


Unnamed: 0,Nation,Count
0,Bolivia,10
1,Brazil,9
2,Canada,6
3,Chile,17
4,Colombia,9
5,Costa Rica,20
6,Ecuador,6
7,Estados Unidos,6
8,Jamaica,13
9,Mexico,14


This is the dataframe with the lowest number of null values

In [None]:
# df[df.Nation == 'Brazil']
fifa_23[fifa_23['short Name'].str.contains('paqueta', case=False, na=False)]

Unnamed: 0,short Name,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
236,Lucas Paqueta,Lucas Tolentino Coelho de Lima,82,87,46000000,"CAM,CM,CF",CAM,Brazil,https://cdn.sofifa.net/players/233/927/23_60.png,24,...,82,83,82,77,78,77,75,74,75,21


In [None]:
players_nation_mean = df.groupby('Nation').mean('Overall').reset_index()
players_nation_mean

Unnamed: 0,Nation,Age,Overall,Value(in Euro),Height(in cm),Weight(in kg)
0,Argentina,27.310345,80.172414,28400000.0,179.653846,74.5
1,Bolivia,26.5,66.0,972727.3,181.818182,75.545455
2,Brazil,25.941176,80.882353,,,
3,Canada,25.904762,71.380952,7476471.0,180.588235,76.352941
4,Chile,28.0,70.793103,3004138.0,179.103448,74.068966
5,Colombia,27.315789,75.578947,11958820.0,180.0,75.941176
6,Costa Rica,26.666667,67.833333,1135000.0,180.0,72.4
7,Ecuador,24.894737,70.947368,4126667.0,177.666667,71.933333
8,Estados Unidos,23.666667,72.666667,8177381.0,184.0,78.428571
9,Jamaica,26.0,66.153846,2582500.0,181.4,74.9
