Import des dépendances

In [1]:
import pandas as pd
from sqlalchemy import create_engine

Connexion : Création du lien entre Python et la base PostgreSQL via SQLAlchemy.

In [2]:
# Connexion PostgreSQL
engine = create_engine('postgresql://csgo_user:csgo_pass@localhost:5432/csgo_db')

Optimisation RAM : Utilisation de chunksize=10000 pour charger les données par petits paquets et ne pas saturer la mémoire du PC.

Robustesse : Forçage du type dtype=str pour accepter toutes les données brutes (même les erreurs de saisie comme le "o") sans faire planter l'importation.

Chargement : Importation automatique des 4 fichiers CSV vers des tables brutes (RAW) dans PostgreSQL.

In [6]:
# Liste des CSV à charger
files = [('results.csv','results'), ('players.csv','players'), 
         ('picks.csv','picks'), ('economy.csv','economy')]

for csv_file, table_name in files:
    print(f"Chargement de {table_name}...")
    
    # dtype=str évite les erreurs de type (ex: le "o" dans best_of)
    # chunksize=10000
    with pd.read_csv(f'data/{csv_file}', chunksize=10000, dtype=str, low_memory=False) as reader:
        first = True
        for chunk in reader:
            mode = 'replace' if first else 'append'
            chunk.to_sql(table_name, engine, if_exists=mode, index=False)
            first = False

Chargement de results...
Chargement de players...
Chargement de picks...
Chargement de economy...


Audit des donnees


Inspection : Requête sur le schéma information_schema pour lister précisément les noms des colonnes importées.

Validation : Permet de vérifier les noms réels (ex: _map ou player_name) avant d'écrire les requêtes de transformation SQL.

In [3]:
from sqlalchemy import text

with engine.connect() as conn:
    # Cette requête liste toutes les tables et leurs colonnes
    result = conn.execute(text("""
        SELECT table_name, column_name 
        FROM information_schema.columns 
        WHERE table_schema = 'public'
        ORDER BY table_name;
    """))
    
    for row in result:
        print(f"Table: {row[0]} | Colonne: {row[1]}")

Table: dim_matches | Colonne: date
Table: dim_matches | Colonne: match_id
Table: dim_matches | Colonne: team_2
Table: dim_matches | Colonne: team_1
Table: dim_matches | Colonne: match_winner
Table: dim_players | Colonne: player_name
Table: dim_players | Colonne: player_id
Table: dim_players | Colonne: country
Table: economy | Colonne: 20_winner
Table: economy | Colonne: 24_t1
Table: economy | Colonne: 8_t1
Table: economy | Colonne: 22_t1
Table: economy | Colonne: 18_t2
Table: economy | Colonne: 13_t1
Table: economy | Colonne: 10_t2
Table: economy | Colonne: 17_winner
Table: economy | Colonne: date
Table: economy | Colonne: 19_t2
Table: economy | Colonne: 9_t1
Table: economy | Colonne: 25_t1
Table: economy | Colonne: 3_t1
Table: economy | Colonne: 1_t1
Table: economy | Colonne: 23_winner
Table: economy | Colonne: 29_winner
Table: economy | Colonne: 14_t2
Table: economy | Colonne: 25_t2
Table: economy | Colonne: 5_t2
Table: economy | Colonne: 2_winner
Table: economy | Colonne: 14_t1
Tabl

L'objectif est de passer d'un modele plat (fichiers CSV volumineux et redondants) à un schema en etoile performant. 
Cela permet de separer les donnees descriptives des donnees de performance.

In [5]:
with engine.connect() as conn:
    conn.execute(text("""
        DROP TABLE IF EXISTS player CASCADE;
        CREATE TABLE player (
            id_player INTEGER PRIMARY KEY,
            player_name VARCHAR(255),
            country VARCHAR(100)
        );

        INSERT INTO player (id_player, player_name, country)
        SELECT DISTINCT 
            CAST(NULLIF(TRIM(player_id), '') AS INTEGER),
            player_name,
            country
        FROM players
        WHERE player_id IS NOT NULL 
          AND TRIM(player_id) != ''
          AND TRIM(player_id) ~ '^[0-9]+$'
        ON CONFLICT (id_player) DO NOTHING;
    """))
    conn.commit()


In [6]:


with engine.connect() as conn:
    conn.execute(text("""
        DROP TABLE IF EXISTS map CASCADE;
        CREATE TABLE map (
            id_map SERIAL PRIMARY KEY,
            map_name VARCHAR(100) UNIQUE NOT NULL
        );

        INSERT INTO map (map_name)
        SELECT DISTINCT map_name
        FROM (
            SELECT COALESCE(map_1, map_2, map_3) AS map_name
            FROM players
            WHERE COALESCE(map_1, map_2, map_3) IS NOT NULL
            UNION
            SELECT DISTINCT _map AS map_name FROM results WHERE _map IS NOT NULL
            UNION
            SELECT DISTINCT _map AS map_name FROM economy WHERE _map IS NOT NULL
        ) sub
        WHERE map_name IS NOT NULL AND map_name != ''
        ON CONFLICT (map_name) DO NOTHING;
    """))
    conn.commit()

In [7]:
with engine.connect() as conn:
    conn.execute(text("""
        DROP TABLE IF EXISTS team CASCADE;
        CREATE TABLE team (
            id_team SERIAL PRIMARY KEY,
            team_name VARCHAR(255) UNIQUE NOT NULL
        );

        INSERT INTO team (team_name)
        SELECT DISTINCT team_name
        FROM (
            SELECT team_1 AS team_name FROM results
            UNION
            SELECT team_2 AS team_name FROM results
            UNION
            SELECT team AS team_name FROM players
            UNION
            SELECT opponent AS team_name FROM players
        ) sub
        WHERE team_name IS NOT NULL AND team_name != ''
        ON CONFLICT (team_name) DO NOTHING;
    """))
    conn.commit()



In [9]:
with engine.connect() as conn:
    conn.execute(text("""
        DROP TABLE IF EXISTS player_team_match CASCADE;
        CREATE TABLE player_team_match (
            id_player INTEGER,
            id_team INTEGER,
            id_match INTEGER,
            kills INTEGER,
            deaths INTEGER,
            assists INTEGER,
            rating NUMERIC,
            adr NUMERIC,
            kast NUMERIC,
            kddiff INTEGER,
            headshots INTEGER,
            fkdiff INTEGER,
            flash_assists INTEGER,
            PRIMARY KEY (id_player, id_team, id_match),
            FOREIGN KEY (id_player) REFERENCES player(id_player),
            FOREIGN KEY (id_team, id_match) REFERENCES team_match(id_team, id_match)
        );

        INSERT INTO player_team_match (
            id_player, id_team, id_match, kills, deaths, assists, rating, 
            adr, kast, kddiff, headshots, fkdiff, flash_assists
        )
        SELECT 
            CAST(NULLIF(TRIM(p.player_id), '') AS INTEGER),
            t.id_team,
            CAST(NULLIF(TRIM(p.match_id), '') AS INTEGER),
            CASE WHEN TRIM(COALESCE(p.kills::TEXT, '')) ~ '^[0-9]+$' THEN CAST(p.kills AS INTEGER) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.deaths::TEXT, '')) ~ '^[0-9]+$' THEN CAST(p.deaths AS INTEGER) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.assists::TEXT, '')) ~ '^[0-9]+$' THEN CAST(p.assists AS INTEGER) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.rating::TEXT, '')) ~ '^[0-9]*\.?[0-9]+$' THEN CAST(p.rating AS NUMERIC) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.adr::TEXT, '')) ~ '^[0-9]*\.?[0-9]+$' THEN CAST(p.adr AS NUMERIC) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.kast::TEXT, '')) ~ '^[0-9]*\.?[0-9]+$' THEN CAST(p.kast AS NUMERIC) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.kddiff::TEXT, '')) ~ '^-?[0-9]+$' THEN CAST(p.kddiff AS INTEGER) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.hs::TEXT, '')) ~ '^[0-9]+$' THEN CAST(p.hs AS INTEGER) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.fkdiff::TEXT, '')) ~ '^-?[0-9]+$' THEN CAST(p.fkdiff AS INTEGER) ELSE NULL END,
            CASE WHEN TRIM(COALESCE(p.flash_assists::TEXT, '')) ~ '^[0-9]+$' THEN CAST(p.flash_assists AS INTEGER) ELSE NULL END
        FROM players p
        JOIN team t ON t.team_name = p.team
        JOIN team_match tm ON tm.id_team = t.id_team AND tm.id_match = CAST(NULLIF(TRIM(p.match_id), '') AS INTEGER)
        WHERE p.player_id IS NOT NULL 
          AND TRIM(p.player_id) != ''
          AND TRIM(p.player_id) ~ '^[0-9]+$'
          AND p.match_id IS NOT NULL 
          AND TRIM(p.match_id) != ''
          AND TRIM(p.match_id) ~ '^[0-9]+$'
        ON CONFLICT (id_player, id_team, id_match) DO NOTHING;
    """))
    conn.commit()

In [10]:
with engine.connect() as conn:
    conn.execute(text("""
        CREATE INDEX IF NOT EXISTS idx_team_match_team ON team_match(id_team);
        CREATE INDEX IF NOT EXISTS idx_team_match_match ON team_match(id_match);
        
        CREATE INDEX IF NOT EXISTS idx_player_team_match_player ON player_team_match(id_player);
        CREATE INDEX IF NOT EXISTS idx_player_team_match_team ON player_team_match(id_team);
        CREATE INDEX IF NOT EXISTS idx_player_team_match_match ON player_team_match(id_match);
        
        CREATE INDEX IF NOT EXISTS idx_match_date ON match(match_date);
        CREATE INDEX IF NOT EXISTS idx_match_map ON match(id_map);
    """))
    conn.commit()

Cardinalités

Un joueur peut participer à plusieurs matchs (1,N)
Une équipe peut jouer plusieurs matchs (1,N)
Un match se joue sur une carte (1,1)
Un match implique deux équipes via team_match (2,2)
Chaque participation de joueur est liée à une équipe et un match via team_match

Sur le lien localhost:8080 et en utilisant les credentials suivant : epsi@gmail.com | azerty1234!

Le tableau de bord CS GO Stats contiendra le Top 10 des joueurs avec les statistiques d'élimination par partie ainsi que les éliminations par pays.