# ETL: FIFA World Cup 2022 Dataset

Ce notebook présente les étapes d'un processus ETL (Extract, Transform, Load) sur le jeu de données de la Coupe du Monde 2022.
L'objectif est d'analyser le dataset brut, de nettoyer et de sélectionner les colonnes pertinentes pour une analyse ultérieure.

In [None]:
import kagglehub
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os

pd.set_option('display.max_columns', None)

## 1. Extraction et Chargement des Données
Nous téléchargeons la dernière version du dataset depuis Kaggle via `kagglehub`.

In [None]:
# Download latest version
path = kagglehub.dataset_download("die9origephit/fifa-world-cup-2022-complete-dataset")

print("Path to dataset files:", path)

# Locate and read the CSV
csv_path = None
for root, dirs, files in os.walk(path):
    for file in files:
        if file == "Fifa_world_cup_matches.csv":
            csv_path = os.path.join(root, file)
            break

if csv_path:
    df = pd.read_csv(csv_path)
    print("Dataset loaded successfully.")
else:
    print("File not found.")

### 1.2 Analyse du Dataset (Exploration)
Dans cette étape, nous explorons la structure des données, vérifions les types et recherchons les valeurs manquantes.

In [None]:
# Aperçu des premières lignes
df.head()

In [50]:
# Informations sur les colonnes et les types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64 entries, 0 to 63
Data columns (total 88 columns):
 #   Column                                                 Non-Null Count  Dtype 
---  ------                                                 --------------  ----- 
 0   team1                                                  64 non-null     object
 1   team2                                                  64 non-null     object
 2   possession team1                                       64 non-null     object
 3   possession team2                                       64 non-null     object
 4   possession in contest                                  64 non-null     object
 5   number of goals team1                                  64 non-null     int64 
 6   number of goals team2                                  64 non-null     int64 
 7   date                                                   64 non-null     object
 8   hour                                                   64 non-

In [None]:
# Vérification des valeurs nulles
nulls = df.isnull().sum()
print("Colonnes avec des valeurs nulles :")
print(nulls[nulls > 0])

In [49]:
# Statistiques descriptives sommaires
df.describe()

Unnamed: 0,number of goals team1,number of goals team2,total attempts team1,total attempts team2,conceded team1,conceded team2,goal inside the penalty area team1,goal inside the penalty area team2,goal outside the penalty area team1,goal outside the penalty area team2,assists team1,assists team2,on target attempts team1,on target attempts team2,off target attempts team1,off target attempts team2,attempts inside the penalty area team1,attempts inside the penalty area team2,attempts outside the penalty area team1,attempts outside the penalty area team2,left channel team1,left channel team2,left inside channel team1,left inside channel team2,central channel team1,central channel team2,right inside channel team1,right inside channel team2,right channel team1,right channel team2,total offers to receive team1,total offers to receive team2,inbehind offers to receive team1,inbehind offers to receive team2,inbetween offers to receive team1,inbetween offers to receive team2,infront offers to receive team1,infront offers to receive team2,receptions between midfield and defensive lines team1,receptions between midfield and defensive lines team2,attempted line breaks team1,attempted line breaks team2,completed line breaksteam1,completed line breaks team2,attempted defensive line breaks team1,attempted defensive line breaks team2,completed defensive line breaksteam1,completed defensive line breaks team2,yellow cards team1,yellow cards team2,red cards team1,red cards team2,fouls against team1,fouls against team2,offsides team1,offsides team2,passes team1,passes team2,passes completed team1,passes completed team2,crosses team1,crosses team2,crosses completed team1,crosses completed team2,switches of play completed team1,switches of play completed team2,corners team1,corners team2,free kicks team1,free kicks team2,penalties scored team1,penalties scored team2,goal preventions team1,goal preventions team2,own goals team1,own goals team2,forced turnovers team1,forced turnovers team2,defensive pressures applied team1,defensive pressures applied team2
count,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0
mean,1.578125,1.109375,11.140625,11.28125,1.109375,1.578125,1.46875,0.984375,0.09375,0.109375,1.171875,0.734375,4.203125,3.75,4.703125,5.03125,6.9375,6.953125,4.203125,4.328125,13.625,13.5,4.921875,4.421875,4.65625,5.15625,4.265625,5.03125,11.65625,12.796875,592.375,550.21875,126.375,119.625,231.5,212.859375,234.5,217.734375,11.40625,10.5,173.46875,166.59375,114.25,106.484375,18.484375,18.265625,10.15625,9.71875,1.78125,1.75,0.0625,0.0,12.640625,12.359375,1.96875,1.96875,509.515625,492.109375,437.0,419.890625,18.09375,18.53125,4.59375,4.078125,6.453125,6.15625,4.484375,4.453125,14.09375,14.390625,0.140625,0.125,11.59375,11.359375,0.015625,0.015625,71.96875,70.125,289.75,293.265625
std,1.551289,1.055856,4.972519,5.807682,1.055856,1.551289,1.563155,0.999876,0.293785,0.314576,1.363407,0.895176,2.527184,2.713868,2.394966,2.911219,3.77912,4.459446,2.470009,2.766321,6.550173,7.287737,2.53424,3.201213,2.852004,3.296071,2.685896,3.141978,5.812463,6.544547,170.21084,169.487694,33.776812,36.660822,70.466698,59.487191,85.887893,101.472843,6.920682,5.614607,32.77822,27.965806,33.217895,27.795736,7.144744,6.183034,5.771354,5.202163,1.740906,1.511858,0.243975,0.0,5.247425,3.789573,1.727175,1.727175,156.348511,166.213681,156.9237,165.710028,8.239893,7.195609,3.298478,2.269918,3.749835,3.432888,2.777416,2.794153,4.219075,5.202616,0.350382,0.377964,5.911299,4.990045,0.125,0.125,14.394629,13.531269,88.406888,80.91623
min,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,302.0,212.0,62.0,52.0,99.0,86.0,75.0,68.0,1.0,1.0,104.0,101.0,55.0,45.0,4.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,0.0,0.0,225.0,224.0,167.0,154.0,4.0,5.0,0.0,0.0,1.0,1.0,0.0,0.0,6.0,5.0,0.0,0.0,0.0,2.0,0.0,0.0,38.0,44.0,139.0,141.0
25%,0.0,0.0,8.0,7.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,3.0,3.0,4.0,4.0,3.0,3.0,8.0,8.0,3.0,2.0,2.0,3.0,2.0,3.0,8.75,8.0,465.75,439.0,100.0,95.75,176.75,176.0,173.5,159.75,8.0,7.0,150.75,149.25,90.5,86.75,13.0,14.75,7.0,7.0,0.0,1.0,0.0,0.0,9.0,10.0,1.0,0.75,392.75,392.25,318.25,317.5,11.75,13.0,2.75,2.75,4.0,3.0,2.0,2.0,11.0,11.0,0.0,0.0,7.75,8.0,0.0,0.0,63.0,60.25,229.0,233.75
50%,1.0,1.0,10.0,10.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,4.0,3.0,4.0,5.0,6.0,6.0,4.0,4.0,12.0,11.5,5.0,4.0,4.0,4.5,4.0,4.0,11.0,12.0,611.5,544.0,125.5,115.5,232.0,207.5,235.5,196.5,10.0,9.0,171.5,167.0,111.0,99.0,17.0,17.0,8.0,9.0,1.5,2.0,0.0,0.0,13.0,12.0,2.0,2.0,508.0,466.0,437.0,396.5,19.0,18.0,4.0,4.0,6.0,6.0,4.5,4.0,14.0,14.5,0.0,0.0,11.0,10.0,0.0,0.0,71.0,72.0,281.0,292.5
75%,2.0,2.0,14.0,14.0,2.0,2.0,2.0,2.0,0.0,0.0,2.0,1.0,6.0,5.0,6.25,6.25,9.25,9.0,5.0,5.0,17.25,18.0,6.25,7.0,6.25,6.25,6.0,6.25,14.25,18.0,696.0,640.75,152.25,137.0,275.25,261.25,286.0,259.0,15.0,14.0,193.0,185.0,134.25,123.25,23.0,22.0,12.25,12.0,3.0,2.0,0.0,0.0,15.0,14.25,3.0,3.0,594.5,571.0,523.0,498.25,23.0,22.25,6.0,5.0,9.0,8.25,6.0,6.0,16.0,17.0,0.0,0.0,14.0,14.0,0.0,0.0,83.5,79.0,328.0,327.5
max,7.0,4.0,25.0,32.0,4.0,7.0,7.0,4.0,1.0,1.0,6.0,4.0,10.0,13.0,11.0,17.0,18.0,24.0,13.0,15.0,30.0,36.0,12.0,13.0,14.0,16.0,11.0,19.0,27.0,29.0,1085.0,1138.0,207.0,217.0,418.0,360.0,487.0,678.0,43.0,28.0,276.0,241.0,233.0,188.0,39.0,37.0,27.0,25.0,8.0,8.0,1.0,0.0,30.0,24.0,10.0,7.0,1061.0,1070.0,1003.0,992.0,46.0,38.0,17.0,12.0,18.0,14.0,12.0,14.0,27.0,30.0,1.0,2.0,32.0,26.0,1.0,1.0,101.0,104.0,637.0,585.0


## 2. Transformation des Données (ETL)
Cette étape consiste à nettoyer les données brutes, à sélectionner les attributs pertinents et à enrichir le dataset si nécessaire.

### 2.1 Sélection des colonnes
Nous conservons uniquement les colonnes pertinentes pour notre analyse (équipes, scores, statistiques clés).

In [None]:
columns_to_keep = [
    'team1', 'team2', 'date', 'hour', 'category',
    'number of goals team1', 'number of goals team2',
    'possession team1', 'possession team2', 'possession in contest',
    'total attempts team1', 'total attempts team2',
    'on target attempts team1', 'on target attempts team2',
    'yellow cards team1', 'yellow cards team2',
    'red cards team1', 'red cards team2',
    'penalties scored team1', 'penalties scored team2'
]

df_selected = df[columns_to_keep].copy()
df_selected.head()

### Justification des choix

Nous choisissons de garder les colonnes suivantes :

1.  **Informations Générales** :
    *   `team1`, `team2` : Les équipes qui s'affrontent.
    *   `date`, `hour`, `category` : Contexte temporel et phase du tournoi (e.g., Groupe, Finale).

2.  **Résultat du Match** :
    *   `number of goals team1`, `number of goals team2` : Indispensable pour connaître le vainqueur.

3.  **Statistiques de Jeu (Performance)** :
    *   `possession team1`, `possession team2` : Indicateur clé de domination.
    *   `total attempts team1`, `total attempts team2` : Volume offensif.
    *   `on target attempts team1`, `on target attempts team2` : Précision et dangerosité réelle.

4.  **Discipline** :
    *   `yellow cards team1`, `yellow cards team2`
    *   `red cards team1`, `red cards team2` : Impact sur le jeu et fair-play.
    *   `penalties scored...` : Pour identifier les matchs aux tirs au but.

Ces colonnes nous permettront de répondre aux questions principales : Qui a gagné ? Qui a dominé ? Le match était-il agressif ?

### 2.2 Nettoyage et Conversion de Types
Les colonnes de possession contiennent le caractère `%` et sont de type objet (string). Nous devons les convertir en valeurs numériques.
De plus, la colonne `date` doit être convertie au format `datetime` pour permettre des analyses temporelles.

In [None]:
# Conversion des pourcentages (suppression du % et conversion en float)
def clean_percentage(x):
    if isinstance(x, str):
        return float(x.replace('%', ''))
    return x

cols_percent = ['possession team1', 'possession team2', 'possession in contest']
for col in cols_percent:
    df_selected[col] = df_selected[col].apply(clean_percentage)

# Conversion de la date
df_selected['date'] = pd.to_datetime(df_selected['date'], format='%Y-%m-%d')

df_selected.info()

### 2.3 Feature Engineering (Enrichissement)

Création de nouvelles colonnes utiles pour l'analyse :
*   `total_goals` : Nombre total de buts dans le match.
*   `goal_difference` : Écart de buts (valeur absolue).

In [None]:
df_selected['total_goals'] = df_selected['number of goals team1'] + df_selected['number of goals team2']
df_selected['goal_difference'] = abs(df_selected['number of goals team1'] - df_selected['number of goals team2'])

print("Aperçu des données enrichies :")
df_selected.head()

Les données sont maintenant propres et enrichies, prêtes pour l'analyse.

In [47]:

equipes = pd.concat([df_selected['team1'], df_selected['team2']]).unique().tolist()
print(equipes, len(equipes))

['QATAR', 'ENGLAND', 'SENEGAL', 'UNITED STATES', 'ARGENTINA', 'DENMARK', 'MEXICO', 'FRANCE', 'MOROCCO', 'GERMANY', 'SPAIN', 'BELGIUM', 'SWITZERLAND', 'URUGUAY', 'PORTUGAL', 'BRAZIL', 'WALES', 'NETHERLANDS', 'TUNISIA', 'POLAND', 'JAPAN', 'CROATIA', 'CAMEROON', 'KOREA REPUBLIC', 'ECUADOR', 'IRAN', 'AUSTRALIA', 'SAUDI ARABIA', 'CANADA', 'COSTA RICA', 'GHANA', 'SERBIA'] 32


### 2.4 Adaptation au Schéma SQL (Matches)

Nous transformons le DataFrame pour correspondre exactement à la structure de la table SQL `matches` cible. 
Cela inclut la génération d'IDs pour les équipes, le renommage des colonnes et la création des indicateurs manquants.

In [51]:
# Génération d'IDs pour les équipes (mapping simple basé sur l'ordre alphabétique)
# Dans un cas réel, ces IDs viendraient d'une table 'teams' déjà peuplée
unique_teams = sorted(pd.concat([df_selected['team1'], df_selected['team2']]).unique())
# team_mapping = {team: i+1 for i, team in enumerate(unique_teams)}

df_matches = df_selected.copy()

# Création des Foreign Keys
df_matches['home_team_id'] = df_matches['team1']
df_matches['away_team_id'] = df_matches['team2']

# Renommage des colonnes
df_matches = df_matches.rename(columns={
    'number of goals team1': 'home_result',
    'number of goals team2': 'away_result',
    'category': 'round'
})

# Calcul de la colonne 'result'
conditions = [
    df_matches['home_result'] > df_matches['away_result'],
    df_matches['home_result'] < df_matches['away_result']
]
choices = ['home_team', 'away_team']
df_matches['result'] = np.select(conditions, choices, default='draw')

# Gestion des booléens
df_matches['penalties'] = (df_matches['penalties scored team1'] + df_matches['penalties scored team2']) > 0
df_matches['extra_time'] = df_matches['penalties'] # Assumption simplifiée
df_matches['replay'] = False

# Colonnes statiques ou manquantes
df_matches['edition'] = 2022
df_matches['city'] = None # Information non disponible dans ce dataset
df_matches['id_stadium'] = None # Information non disponible

# Génération d'un ID de match séquentiel
df_matches['id_match'] = range(1, len(df_matches) + 1)

# Sélection et ordonnancement final des colonnes selon le schéma SQL
final_cols = [
    'id_match', 
    'home_team_id', 'away_team_id', 
    'home_result', 'away_result', 'result', 
    'extra_time', 'penalties', 'replay', 
    'date', 'round', 'city', 'id_stadium', 'edition'
]

df_matches_final = df_matches[final_cols].copy()

print("Aperçu du DataFrame final prêt pour l'insertion SQL :")
df_matches_final.head()

Aperçu du DataFrame final prêt pour l'insertion SQL :


Unnamed: 0,id_match,home_team_id,away_team_id,home_result,away_result,result,extra_time,penalties,replay,date,round,city,id_stadium,edition
0,1,QATAR,ECUADOR,0,2,away_team,True,True,False,2022-11-20,Group A,,,2022
1,2,ENGLAND,IRAN,6,2,home_team,True,True,False,2022-11-21,Group B,,,2022
2,3,SENEGAL,NETHERLANDS,0,2,away_team,False,False,False,2022-11-21,Group A,,,2022
3,4,UNITED STATES,WALES,1,1,draw,True,True,False,2022-11-21,Group B,,,2022
4,5,ARGENTINA,SAUDI ARABIA,1,2,away_team,True,True,False,2022-11-22,Group C,,,2022


In [None]:
import requests
import json
import os  # Nécessaire pour gérer les chemins de dossiers

# 1. URL pour les données en Anglais
url = "https://api.fifa.com/api/v3/calendar/matches?language=en&count=500&idSeason=255711"

# 2. Configuration (Headers)
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Origin': 'https://www.fifa.com',
    'Referer': 'https://www.fifa.com/'
}

print("⏳ Téléchargement des données en Anglais...")

try:
    # 3. Récupération
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    data = response.json()

    # --- PARTIE MODIFIÉE : GESTION DU DOSSIER DATA ---
    
    # On définit le nom du fichier
    nom_fichier = "matches_wc2022_en.json"
    
    # On définit le chemin : ".." signifie "remonter d'un dossier", puis aller dans "data"
    # os.path.join permet que ça marche sur Windows (anti-slash) et Mac/Linux (slash)
    chemin_dossier = os.path.join("..", "data/raw")
    chemin_complet = os.path.join(chemin_dossier, nom_fichier)

    # Sécurité : On crée le dossier s'il n'existe pas (même si vous l'avez déjà)
    os.makedirs(chemin_dossier, exist_ok=True)
    
    # 4. Sauvegarde dans le dossier spécifique
    with open(chemin_complet, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)
        
    # On affiche le chemin absolu pour confirmer l'emplacement
    print(f"✅ Succès ! Fichier enregistré ici :")
    print(os.path.abspath(chemin_complet))

except Exception as e:
    print(f"❌ Erreur : {e}")

⏳ Téléchargement des données en Anglais...
✅ Succès ! Fichier enregistré ici :
C:\Users\mabed\Documents\Brief-2-ETL-donnees-footballistiques-Short-Kings\data\matches_wc2022_en.json


In [None]:
import json
import os
import pandas as pd
import numpy as np

# 1. Chargement du JSON
json_path = os.path.join('..', 'data', 'matches_wc2022_en.json')
with open(json_path, 'r', encoding='utf-8') as f:
    wc_data = json.load(f)

# 2. Fonction de nettoyage des rounds
def clean_round_name(r_raw):
    r = str(r_raw).upper().strip()
    # On gère les majuscules pour éviter le bug de la 3ème place
    if 'PRELIMINARY' in r or 'FIRST' in r: return 'Preliminary'
    if 'GROUP' in r: return 'Group Stage'
    if '1/8' in r or 'ROUND OF 16' in r: return 'Round of 16'
    if '1/4' in r or 'QUARTER' in r: return 'Quarter-finals'
    if '1/2' in r or 'SEMI' in r: return 'Semi-final'
    if '3RD' in r or 'THIRD' in r or 'PLAY-OFF' in r: return 'Third Place'
    if 'FINAL' in r: return 'Final'
    return r_raw

# 3. Extraction
extracted_data = []

for match in wc_data['Results']:
    home = match['Home']['TeamName'][0]['Description'].upper()
    away = match['Away']['TeamName'][0]['Description'].upper()
    
    stage = match['StageName'][0]['Description']
    group = match['GroupName'][0]['Description'] if match['GroupName'] else None
    round_raw = group if (stage == "First stage" and group) else stage
    
    # --- MODIFICATION ICI ---
    # A. Récupération des scores du JEU (90 min + Prolongations)
    # L'API FIFA stocke généralement le score hors tirs au but dans HomeTeamScore
    score_board_home = match.get('HomeTeamScore', 0)
    score_board_away = match.get('AwayTeamScore', 0)
    
    # B. Récupération des pénaltys
    pen_home = match.get('HomeTeamPenaltyScore', 0)
    pen_away = match.get('AwayTeamPenaltyScore', 0)
    is_penalties = (pen_home > 0 or pen_away > 0)
    
    # C. Calcul du score TOTAL (pour déterminer le vainqueur uniquement)
    total_calc_home = score_board_home + pen_home
    total_calc_away = score_board_away + pen_away

    extracted_data.append({
        'home_team_id': home,
        'away_team_id': away,
        'round': clean_round_name(round_raw),
        'city': match['Stadium']['CityName'][0]['Description'],
        'id_stadium': match['Stadium']['Name'][0]['Description'],
        'extra_time': is_penalties,
        'penalties': is_penalties,
        
        # Ce que l'utilisateur verra (Score du match)
        'home_result': score_board_home, 
        'away_result': score_board_away,
        
        # Colonnes temporaires pour le calcul du gagnant (incluant TAB)
        'temp_total_home': total_calc_home,
        'temp_total_away': total_calc_away
    })

# 4. DataFrame JSON
df_json = pd.DataFrame(extracted_data)

# 5. Fusion
cols_to_drop = ['round', 'city', 'id_stadium', 'extra_time', 'penalties', 'home_result', 'away_result']
df_matches_final = df_matches_final.drop(columns=[c for c in cols_to_drop if c in df_matches_final.columns], errors='ignore')

# Important : Assurez-vous que les IDs sont bien en majuscules dans le DF principal aussi
df_matches_final['home_team_id'] = df_matches_final['home_team_id'].str.upper().str.strip()
df_matches_final['away_team_id'] = df_matches_final['away_team_id'].str.upper().str.strip()

df_matches_final = df_matches_final.merge(df_json, on=['home_team_id', 'away_team_id'], how='left')

# 6. Calcul du RESULT (Vainqueur)
# On utilise les colonnes temporaires (qui incluent les pénaltys)
conditions = [
    (df_matches_final['temp_total_home'] > df_matches_final['temp_total_away']),
    (df_matches_final['temp_total_home'] < df_matches_final['temp_total_away'])
]
choices = ['home_team', 'away_team']
df_matches_final['result'] = np.select(conditions, choices, default='draw')

# 7. NETTOYAGE FINAL
# On supprime les colonnes de calcul temporaires, on ne garde que les scores "officiels" du match
df_matches_final = df_matches_final.drop(columns=['temp_total_home', 'temp_total_away'])

# Affichage
print("Données enrichies : Scores (hors TAB) affichés, mais Vainqueur (avec TAB) calculé correctemen t:")
# On affiche un match spécifique qui a fini aux tirs au but (ex: Finale France-Argentine) pour vérifier
display(df_matches_final[df_matches_final['penalties'] == True].head())

Données enrichies et optimisées :


Unnamed: 0,id_match,home_team_id,away_team_id,result,replay,date,edition,round,city,id_stadium,extra_time,penalties,home_result,away_result
0,1,QATAR,ECUADOR,away_team,False,2022-11-20,2022,Group Stage,Al Khor,Al Bayt Stadium,False,False,0,2
1,2,ENGLAND,IRAN,home_team,False,2022-11-21,2022,Group Stage,Ar-Rayyan,Khalifa International Stadium,False,False,6,2
2,3,SENEGAL,NETHERLANDS,away_team,False,2022-11-21,2022,Group Stage,Doha,Al Thumama Stadium,False,False,0,2
3,4,UNITED STATES,WALES,draw,False,2022-11-21,2022,Group Stage,Ar-Rayyan,Ahmad Bin Ali Stadium,False,False,1,1
4,5,ARGENTINA,SAUDI ARABIA,away_team,False,2022-11-22,2022,Group Stage,Al Daayen,Lusail Stadium,False,False,1,2
5,6,DENMARK,TUNISIA,draw,False,2022-11-22,2022,Group Stage,Doha,Education City Stadium,False,False,0,0
6,7,MEXICO,POLAND,draw,False,2022-11-22,2022,Group Stage,Doha,Stadium 974,False,False,0,0
7,8,FRANCE,AUSTRALIA,home_team,False,2022-11-22,2022,Group Stage,Al Wakrah,Al Janoub Stadium,False,False,4,1
8,9,MOROCCO,CROATIA,draw,False,2022-11-23,2022,Group Stage,Al Khor,Al Bayt Stadium,False,False,0,0
9,10,GERMANY,JAPAN,away_team,False,2022-11-23,2022,Group Stage,Ar-Rayyan,Khalifa International Stadium,False,False,1,2


In [79]:
df_matches_final.to_csv('../data/processed/df_matches_final.csv', index=False)
