Import des dépendances

In [11]:
import pandas as pd
from sqlalchemy import create_engine

Connexion : Création du lien entre Python et la base PostgreSQL via SQLAlchemy.

In [12]:
# Connexion PostgreSQL
engine = create_engine('postgresql://csgo_user:csgo_pass@localhost:5432/csgo_db')

Optimisation RAM : Utilisation de chunksize=10000 pour charger les données par petits paquets et ne pas saturer la mémoire du PC.

Robustesse : Forçage du type dtype=str pour accepter toutes les données brutes (même les erreurs de saisie comme le "o") sans faire planter l'importation.

Chargement : Importation automatique des 4 fichiers CSV vers des tables brutes (RAW) dans PostgreSQL.

In [6]:
# Liste des CSV à charger
files = [('results.csv','results'), ('players.csv','players'), 
         ('picks.csv','picks'), ('economy.csv','economy')]

for csv_file, table_name in files:
    print(f"Chargement de {table_name}...")
    
    # dtype=str évite les erreurs de type (ex: le "o" dans best_of)
    # chunksize=10000
    with pd.read_csv(f'data/{csv_file}', chunksize=10000, dtype=str, low_memory=False) as reader:
        first = True
        for chunk in reader:
            mode = 'replace' if first else 'append'
            chunk.to_sql(table_name, engine, if_exists=mode, index=False)
            first = False

Chargement de results...
Chargement de players...
Chargement de picks...
Chargement de economy...


Audit des donnees


Inspection : Requête sur le schéma information_schema pour lister précisément les noms des colonnes importées.

Validation : Permet de vérifier les noms réels (ex: _map ou player_name) avant d'écrire les requêtes de transformation SQL.

In [7]:
from sqlalchemy import text

with engine.connect() as conn:
    # Cette requête liste toutes les tables et leurs colonnes
    result = conn.execute(text("""
        SELECT table_name, column_name 
        FROM information_schema.columns 
        WHERE table_schema = 'public'
        ORDER BY table_name;
    """))
    
    for row in result:
        print(f"Table: {row[0]} | Colonne: {row[1]}")

Table: dim_matches | Colonne: team_1
Table: dim_matches | Colonne: team_2
Table: dim_matches | Colonne: match_winner
Table: dim_matches | Colonne: date
Table: dim_matches | Colonne: match_id
Table: dim_players | Colonne: player_id
Table: dim_players | Colonne: country
Table: dim_players | Colonne: player_name
Table: economy | Colonne: t1_start
Table: economy | Colonne: t2_start
Table: economy | Colonne: 1_t1
Table: economy | Colonne: 2_t1
Table: economy | Colonne: 3_t1
Table: economy | Colonne: 4_t1
Table: economy | Colonne: 5_t1
Table: economy | Colonne: 6_t1
Table: economy | Colonne: 7_t1
Table: economy | Colonne: 8_t1
Table: economy | Colonne: 9_t1
Table: economy | Colonne: 10_t1
Table: economy | Colonne: 11_t1
Table: economy | Colonne: 12_t1
Table: economy | Colonne: 13_t1
Table: economy | Colonne: 14_t1
Table: economy | Colonne: 15_t1
Table: economy | Colonne: 16_t1
Table: economy | Colonne: 17_t1
Table: economy | Colonne: 18_t1
Table: economy | Colonne: 19_t1
Table: economy | Col

L'objectif est de passer d'un modele plat (fichiers CSV volumineux et redondants) à un schema en etoile performant. 
Cela permet de separer les donnees descriptives des donnees de performance.

In [16]:
with engine.connect() as conn:
    conn.execute(text("""
        DROP TABLE IF EXISTS player CASCADE;
        CREATE TABLE player (
            id_player INTEGER PRIMARY KEY,
            player_name VARCHAR(255),
            country VARCHAR(100)
        );

        INSERT INTO player (id_player, player_name, country)
        SELECT DISTINCT 
            NULLIF(player_id, '')::INTEGER,
            player_name,
            country
        FROM players
        WHERE player_id IS NOT NULL AND player_id != ''
        ON CONFLICT (id_player) DO NOTHING;

        DROP TABLE IF EXISTS map CASCADE;
        CREATE TABLE map (
            id_map SERIAL PRIMARY KEY,
            map_name VARCHAR(100) UNIQUE NOT NULL
        );

        INSERT INTO map (map_name)
        SELECT DISTINCT map_name
        FROM (
            SELECT COALESCE(map_1, map_2, map_3) AS map_name
            FROM players
            WHERE COALESCE(map_1, map_2, map_3) IS NOT NULL
        ) sub
        WHERE map_name IS NOT NULL AND map_name != ''
        ON CONFLICT (map_name) DO NOTHING;

        DROP TABLE IF EXISTS team CASCADE;
        CREATE TABLE team (
            id_team SERIAL PRIMARY KEY,
            team_name VARCHAR(255) UNIQUE NOT NULL
        );

        INSERT INTO team (team_name)
        SELECT DISTINCT team_name
        FROM (
            SELECT team_1 AS team_name FROM results
            UNION
            SELECT team_2 AS team_name FROM results
            UNION
            SELECT team AS team_name FROM players
            UNION
            SELECT opponent AS team_name FROM players
        ) sub
        WHERE team_name IS NOT NULL AND team_name != ''
        ON CONFLICT (team_name) DO NOTHING;

        DROP TABLE IF EXISTS match CASCADE;
        CREATE TABLE match (
            id_match INTEGER PRIMARY KEY,
            match_date DATE,
            id_map INTEGER,
            FOREIGN KEY (id_map) REFERENCES map(id_map)
        );
    """))
    conn.commit()
    
    conn.execute(text("""
        WITH match_maps AS (
            SELECT DISTINCT
                NULLIF(match_id, '')::INTEGER AS match_id,
                FIRST_VALUE(COALESCE(map_1, map_2, map_3)) 
                    OVER (PARTITION BY NULLIF(match_id, '')::INTEGER 
                          ORDER BY COALESCE(map_1, map_2, map_3)) AS map_name
            FROM players
            WHERE match_id IS NOT NULL AND match_id != ''
              AND COALESCE(map_1, map_2, map_3) IS NOT NULL
        )
        INSERT INTO match (id_match, match_date, id_map)
        SELECT 
            NULLIF(r.match_id, '')::INTEGER AS id_match,
            MIN(NULLIF(r.date, '')::DATE) AS match_date,
            m.id_map
        FROM results r
        LEFT JOIN match_maps mm ON mm.match_id = NULLIF(r.match_id, '')::INTEGER
        LEFT JOIN map m ON m.map_name = mm.map_name
        WHERE r.match_id IS NOT NULL AND r.match_id != ''
        GROUP BY NULLIF(r.match_id, '')::INTEGER, m.id_map
        ON CONFLICT (id_match) DO NOTHING;
    """))
    conn.commit()
    
    conn.execute(text("""
        DROP TABLE IF EXISTS team_match CASCADE;
        CREATE TABLE team_match (
            id_team INTEGER,
            id_match INTEGER,
            is_winner BOOLEAN,
            score INTEGER,
            PRIMARY KEY (id_team, id_match),
            FOREIGN KEY (id_team) REFERENCES team(id_team),
            FOREIGN KEY (id_match) REFERENCES match(id_match)
        );

        INSERT INTO team_match (id_team, id_match, is_winner, score)
        SELECT 
            t.id_team,
            NULLIF(r.match_id, '')::INTEGER,
            (t.team_name = r.match_winner) AS is_winner,
            CASE 
                WHEN t.team_name = r.team_1 THEN NULLIF(r.result_1, '')::INTEGER
                WHEN t.team_name = r.team_2 THEN NULLIF(r.result_2, '')::INTEGER
            END AS score
        FROM results r
        JOIN team t ON t.team_name IN (r.team_1, r.team_2)
        WHERE r.match_id IS NOT NULL AND r.match_id != ''
        ON CONFLICT (id_team, id_match) DO NOTHING;

        DROP TABLE IF EXISTS player_team_match CASCADE;
        CREATE TABLE player_team_match (
            id_player INTEGER,
            id_team INTEGER,
            id_match INTEGER,
            kills INTEGER,
            deaths INTEGER,
            assists INTEGER,
            rating NUMERIC,
            adr NUMERIC,
            kast NUMERIC,
            kddiff INTEGER,
            headshots INTEGER,
            fkdiff INTEGER,
            flash_assists INTEGER,
            PRIMARY KEY (id_player, id_team, id_match),
            FOREIGN KEY (id_player) REFERENCES player(id_player),
            FOREIGN KEY (id_team, id_match) REFERENCES team_match(id_team, id_match)
        );

        INSERT INTO player_team_match 
        SELECT 
            NULLIF(p.player_id, '')::INTEGER,
            t.id_team,
            NULLIF(p.match_id, '')::INTEGER,
            NULLIF(p.kills, '')::INTEGER,
            NULLIF(p.deaths, '')::INTEGER,
            NULLIF(p.assists, '')::INTEGER,
            NULLIF(p.rating, '')::NUMERIC,
            NULLIF(p.adr, '')::NUMERIC,
            NULLIF(p.kast, '')::NUMERIC,
            NULLIF(p.kddiff, '')::INTEGER,
            NULLIF(p.hs, '')::INTEGER,
            NULLIF(p.fkdiff, '')::INTEGER,
            NULLIF(p.flash_assists, '')::INTEGER
        FROM players p
        JOIN team t ON t.team_name = p.team
        WHERE p.player_id IS NOT NULL AND p.player_id != ''
          AND p.match_id IS NOT NULL AND p.match_id != ''
          AND COALESCE(p.map_1, p.map_2, p.map_3) IS NOT NULL
        ON CONFLICT (id_player, id_team, id_match) DO NOTHING;
    """))
    conn.commit()

DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for type integer: "0.0"

[SQL: 
        DROP TABLE IF EXISTS team_match CASCADE;
        CREATE TABLE team_match (
            id_team INTEGER,
            id_match INTEGER,
            is_winner BOOLEAN,
            score INTEGER,
            PRIMARY KEY (id_team, id_match),
            FOREIGN KEY (id_team) REFERENCES team(id_team),
            FOREIGN KEY (id_match) REFERENCES match(id_match)
        );

        INSERT INTO team_match (id_team, id_match, is_winner, score)
        SELECT 
            t.id_team,
            NULLIF(r.match_id, '')::INTEGER,
            (t.team_name = r.match_winner) AS is_winner,
            CASE 
                WHEN t.team_name = r.team_1 THEN NULLIF(r.result_1, '')::INTEGER
                WHEN t.team_name = r.team_2 THEN NULLIF(r.result_2, '')::INTEGER
            END AS score
        FROM results r
        JOIN team t ON t.team_name IN (r.team_1, r.team_2)
        WHERE r.match_id IS NOT NULL AND r.match_id != ''
        ON CONFLICT (id_team, id_match) DO NOTHING;

        DROP TABLE IF EXISTS player_team_match CASCADE;
        CREATE TABLE player_team_match (
            id_player INTEGER,
            id_team INTEGER,
            id_match INTEGER,
            kills INTEGER,
            deaths INTEGER,
            assists INTEGER,
            rating NUMERIC,
            adr NUMERIC,
            kast NUMERIC,
            kddiff INTEGER,
            headshots INTEGER,
            fkdiff INTEGER,
            flash_assists INTEGER,
            PRIMARY KEY (id_player, id_team, id_match),
            FOREIGN KEY (id_player) REFERENCES player(id_player),
            FOREIGN KEY (id_team, id_match) REFERENCES team_match(id_team, id_match)
        );

        INSERT INTO player_team_match 
        SELECT 
            NULLIF(p.player_id, '')::INTEGER,
            t.id_team,
            NULLIF(p.match_id, '')::INTEGER,
            NULLIF(p.kills, '')::INTEGER,
            NULLIF(p.deaths, '')::INTEGER,
            NULLIF(p.assists, '')::INTEGER,
            NULLIF(p.rating, '')::NUMERIC,
            NULLIF(p.adr, '')::NUMERIC,
            NULLIF(p.kast, '')::NUMERIC,
            NULLIF(p.kddiff, '')::INTEGER,
            NULLIF(p.hs, '')::INTEGER,
            NULLIF(p.fkdiff, '')::INTEGER,
            NULLIF(p.flash_assists, '')::INTEGER
        FROM players p
        JOIN team t ON t.team_name = p.team
        WHERE p.player_id IS NOT NULL AND p.player_id != ''
          AND p.match_id IS NOT NULL AND p.match_id != ''
          AND COALESCE(p.map_1, p.map_2, p.map_3) IS NOT NULL
        ON CONFLICT (id_player, id_team, id_match) DO NOTHING;
    ]
(Background on this error at: https://sqlalche.me/e/20/9h9h)

In [9]:
from sqlalchemy import text

with engine.connect() as conn:
    conn.execute(text("""
        DROP TABLE IF EXISTS dim_players CASCADE;
        CREATE TABLE dim_players AS 
        SELECT DISTINCT 
            NULLIF(player_id, '')::INTEGER AS player_id, 
            player_name, 
            country 
        FROM players
        WHERE player_id IS NOT NULL AND player_id != '';
        
        ALTER TABLE dim_players ADD PRIMARY KEY (player_id);
    """))
    conn.commit()
    
    conn.execute(text("""
        DROP TABLE IF EXISTS dim_matches CASCADE;
        CREATE TABLE dim_matches AS 
        SELECT 
            NULLIF(match_id, '')::INTEGER AS match_id,
            MIN(NULLIF(date, '')::DATE) AS date,
            MIN(team_1) AS team_1,
            MIN(team_2) AS team_2,
            MIN(match_winner) AS match_winner
        FROM results
        WHERE match_id IS NOT NULL AND match_id != ''
        GROUP BY NULLIF(match_id, '')::INTEGER;
        
        ALTER TABLE dim_matches ADD PRIMARY KEY (match_id);
    """))
    conn.commit()
    
    conn.execute(text("""
        DROP TABLE IF EXISTS fact_player_stats CASCADE;
        CREATE TABLE fact_player_stats AS 
        SELECT 
            row_number() OVER ()::INTEGER AS stats_id,
            NULLIF(p.match_id, '')::NUMERIC::INTEGER AS match_id, 
            NULLIF(p.player_id, '')::NUMERIC::INTEGER AS player_id,
            p.team AS player_team,
            p.opponent,
            COALESCE(p.map_1, p.map_2, p.map_3) AS map_name,
            NULLIF(p.kills, '')::NUMERIC::INTEGER AS kills, 
            NULLIF(p.deaths, '')::NUMERIC::INTEGER AS deaths, 
            NULLIF(p.assists, '')::NUMERIC::INTEGER AS assists, 
            NULLIF(p.rating, '')::NUMERIC AS rating,
            NULLIF(p.adr, '')::NUMERIC AS adr,
            NULLIF(p.kast, '')::NUMERIC AS kast,
            NULLIF(p.kddiff, '')::NUMERIC::INTEGER AS kddiff,
            NULLIF(p.hs, '')::NUMERIC::INTEGER AS headshots,
            NULLIF(p.fkdiff, '')::NUMERIC::INTEGER AS fkdiff,
            NULLIF(p.flash_assists, '')::NUMERIC::INTEGER AS flash_assists
        FROM players p
        WHERE COALESCE(p.map_1, p.map_2, p.map_3) IS NOT NULL
          AND p.player_id IS NOT NULL AND p.player_id != ''
          AND p.match_id IS NOT NULL AND p.match_id != ''
          AND EXISTS (
              SELECT 1 FROM results r 
              WHERE NULLIF(r.match_id, '')::INTEGER = NULLIF(p.match_id, '')::NUMERIC::INTEGER
          )
          AND EXISTS (
              SELECT 1 FROM players p2 
              WHERE NULLIF(p2.player_id, '')::INTEGER = NULLIF(p.player_id, '')::NUMERIC::INTEGER
          );
        
        ALTER TABLE fact_player_stats ADD PRIMARY KEY (stats_id);
        
        ALTER TABLE fact_player_stats 
            ADD CONSTRAINT fk_player 
            FOREIGN KEY (player_id) REFERENCES dim_players(player_id);
        
        ALTER TABLE fact_player_stats 
            ADD CONSTRAINT fk_match 
            FOREIGN KEY (match_id) REFERENCES dim_matches(match_id);
        
        CREATE INDEX idx_player ON fact_player_stats(player_id);
        CREATE INDEX idx_match ON fact_player_stats(match_id);
    """))
    conn.commit()


On extrait les informations fixes des joueurs , "dim_players", pour eviter de repeter leur nom et leur pays des milliers de fois.

On centralise les resultats globaux par match (scores, vainqueur, carte utilisee) dans la table "dim_matches".

C'est la table centrale "fact_player_stats" qui lie les performances (kills, deaths, rating) aux joueurs et aux matchs.

Sur le lien localhost:8080 et en utilisant les credentials suivant : epsi@gmail.com | azerty1234!

Le tableau de bord CS GO Stats contiendra le Top 10 des joueurs avec les statistiques d'élimination par partie ainsi que les éliminations par pays.