# PROJET « dur » : prédiction de match de ligue1

Le but du projet est de prédire le résultat (victoire, nul, défaite) des matchs de ligue 1 sur la saison en cours (2023-2024).

In [5]:
#importation des librairie
import numpy as np
import pandas as pd 
import sklearn as sk 
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## Importation des données

In [2]:
clubs = pd.read_csv("data/clubs_fr.csv", sep=",") # Une liste des clubs français avec quelques stats sur la constitution (récente) de l’équipe
game_events = pd.read_csv("data/game_events.csv", sep=",") # Un ensemble d’actions pour chaque joueur pendant chaque match
game_lineups = pd.read_csv("data/game_lineups.csv", sep=",", low_memory=False) # La constitution des équipes pour chaque match
match2023 = pd.read_csv("data/match_2023.csv", sep=",") # matchs a predir
matchs = pd.read_csv("data/matchs_2013_2022.csv", sep=",") #matchs de 2013 a 2022
player_appearance = pd.read_csv("data/player_appearance.csv", sep=",") # Un ensemble d’info simple sur chaque joueur pendant chaque match
pre_season = pd.read_csv("data/player_valuation_before_season.csv", sep=",") # Pour chaque joueur, sa valeur sur le marché, à une certaine date

## Nettoyage

In [3]:
clubs.dropna(axis=0, how="all")
clubs.dropna(axis=1, how="all")

game_events.dropna(axis=0, how="all")
game_events.dropna(axis=1, how="all")

game_lineups.dropna(axis=0, how="all")
game_lineups.dropna(axis=1, how="all")

matchs.dropna(axis=0, how="all")
matchs.dropna(axis=1, how="all")

player_appearance.dropna(axis=0, how="all")
player_appearance.dropna(axis=1, how="all")

pre_season.dropna(axis=0, how="all")
pre_season.dropna(axis=1, how="all")

Unnamed: 0,player_id,date,market_value_in_eur,current_club_id,player_club_domestic_competition_id
0,773,2004-10-04,3500000,14171,FR1
1,1327,2004-10-04,4000000,1159,FR1
2,1423,2004-10-04,1000000,855,FR1
3,1572,2004-10-04,1000000,162,FR1
4,1613,2004-10-04,200000,855,FR1
...,...,...,...,...,...
30210,478872,2023-07-27,500000,618,FR1
30211,550862,2023-07-27,450000,1420,FR1
30212,363717,2023-07-28,150000,1162,FR1
30213,396131,2023-07-28,150000,1421,FR1


## Exploration des données

Tout d'abord, nous allons voir quels sont les données dont nous disposons 

In [4]:
print("clubs : ",clubs.columns)
print("game_events : ",game_events.columns)
print("game_lineups : ",game_lineups.columns)
print("matchs : ",matchs.columns)
print("player_appearance : ",player_appearance.columns)
print("pre_season : ",pre_season.columns)

clubs :  Index(['club_id', 'club_code', 'name', 'domestic_competition_id', 'squad_size',
       'average_age', 'foreigners_number', 'foreigners_percentage',
       'national_team_players', 'stadium_name', 'stadium_seats',
       'net_transfer_record', 'coach_name'],
      dtype='object')
game_events :  Index(['Unnamed: 0', 'game_event_id', 'date', 'game_id', 'minute', 'type',
       'club_id', 'player_id', 'description', 'player_in_id',
       'player_assist_id'],
      dtype='object')
game_lineups :  Index(['Unnamed: 0', 'game_lineups_id', 'date', 'game_id', 'player_id',
       'club_id', 'player_name', 'type', 'position', 'number', 'team_captain'],
      dtype='object')
matchs :  Index(['Unnamed: 0', 'game_id', 'season', 'round', 'date', 'home_club_id',
       'away_club_id', 'home_club_goals', 'away_club_goals',
       'home_club_position', 'away_club_position', 'home_club_manager_name',
       'away_club_manager_name', 'stadium', 'attendance', 'referee',
       'home_club_formation

### Fonctions d'aggragation de données 

In [9]:
def aggregate_player_stats_by_year(player_appearance_df):
    # Convertir la colonne de date en format datetime
    player_appearance_df['date'] = pd.to_datetime(player_appearance_df['date'])
    
    # Extraire l'année à partir de la colonne de date
    player_appearance_df['year'] = player_appearance_df['date'].dt.year
    
    # Grouper les données par joueur et par année
    grouped_stats = player_appearance_df.groupby(['player_id', 'year']).agg({
        'yellow_cards': 'sum',
        'red_cards': 'sum',
        'goals': 'sum',
        'assists': 'sum',
        'minutes_played': 'sum'
    }).reset_index()
    
    return grouped_stats

player_stats = aggregate_player_stats_by_year(player_appearance).sort_values(by='year')
player_stats

Unnamed: 0,player_id,year,yellow_cards,red_cards,goals,assists,minutes_played
716,23938,2012,1,0,0,1,1440
2088,66788,2012,0,0,0,0,43
722,23951,2012,2,0,0,0,1710
2086,66783,2012,0,0,3,1,667
3108,126699,2012,5,0,0,1,1238
...,...,...,...,...,...,...,...
6232,378953,2023,1,0,1,1,831
5077,263236,2023,4,1,0,2,1794
1121,37647,2023,0,0,2,0,286
2861,113707,2023,1,0,4,5,1720
