# Data Engineering for CS2 Features

In this notebook, we will start building a pipeline to create new features from the parsed match data of CS2.  
The parsed data is available in CSV format, with one file per match and per entity (e.g., players, rounds, events).  
For example, player-level data can be found in:  ../data/parsed/match001/players.csv

The goal is to gradually enrich these raw tables with engineered features and save the results into a new folder:  ../data/features/

---

## Step 1: Focus on Core Player Metrics

As a starting point, we will reconstruct some of the most fundamental player-level statistics that are widely used in CS analysis:

- **K/D/A ratio**: Kills, Deaths, and Assists, combined into a performance metric.  
- **ADR (Average Damage per Round)**: A key measure of a player’s contribution to the team beyond just kills.  

These will serve as the foundation for more advanced features later on.

---

## Notebook Structure

1. **Setup and Imports**  
   - Import required Python libraries.  
   - Define paths to the raw data (`../data/parsed/`) and the feature output directory (`../data/features/`).  

2. **Load Parsed Data**  
   - Load player-level CSVs from one or multiple matches.  
   - Perform initial inspection of columns to understand available raw stats.  

3. **Feature Engineering: K/D/A and ADR**  
   - Reconstruct K/D/A statistics from raw data.  
   - Compute ADR at the player level.  

4. **Export Enriched Features**  
   - Save the enriched player-level features into `../data/features/` as new CSVs.  

---

Let’s begin with the setup and imports.

In [5]:
# --- Setup and Imports ---

import os
import glob
import pandas as pd

# Paths
RAW_DATA_DIR = "../data/parsed/"
FEATURES_DIR = "../data/features/"

# Create features directory if it doesn't exist
os.makedirs(FEATURES_DIR, exist_ok=True)

def load_match_tables(match_folder):
    """
    Load all CSV tables from a given match folder into a dictionary of DataFrames.
    Example:
        match_data = load_match_tables("../data/parsed/match001/")
        match_data["players"].head()
    """
    tables = {}
    csv_files = glob.glob(os.path.join(match_folder, "*.csv"))
    
    for file in csv_files:
        name = os.path.splitext(os.path.basename(file))[0]  # e.g. "players"
        tables[name] = pd.read_csv(file)
    
    return tables

# Example: load all tables for match001
match_id = "match001"
match_path = os.path.join(RAW_DATA_DIR, match_id)
match_data = load_match_tables(match_path)

# Preview available tables
print("Loaded tables:", list(match_data.keys()))
for name, df in match_data.items():
    print(f"{name}: {df.shape}")

Loaded tables: ['rounds', 'purchases', 'grenades', 'economy', 'damages', 'players', 'duels', 'game', 'kills']
rounds: (14, 7)
purchases: (435, 5)
grenades: (95, 8)
economy: (140, 6)
damages: (424, 10)
players: (10, 6)
duels: (100, 8)
game: (1, 5)
kills: (100, 14)


# 1/ Create the team

In [15]:
# --- Create Teams Table ---

# SteamID of DrRisto
drristo_steamid = 76561198870364933

# Copy players table
players_df = match_data["players"].copy()

# Determine DrRisto's team side from StartSide
drristo_row = players_df[players_df["steamid"] == drristo_steamid]
if drristo_row.empty:
    raise ValueError("DrRisto not found in players table!")

drristo_start_side = drristo_row.iloc[0]["start_side"]

# Assign teams for all players
def assign_team(start_side):
    if start_side == drristo_start_side:
        return "team_reZilienZ"
    else:
        return "opponents_team"

players_df["team"] = players_df["start_side"].apply(assign_team)

# Keep only relevant columns
teams = players_df[["name", "steamid", "team"]]

# Save teams table
teams.to_csv(os.path.join(FEATURES_DIR, "teams.csv"), index=False)
print("Teams table created:")
teams.head(10)  # preview players to verify all are included

Teams table created:


Unnamed: 0,name,steamid,team
0,Pimiento Picante,76561199643239709,team_reZilienZ
1,KurtKnusper,76561199028151244,opponents_team
2,Kolle257,76561198175180459,opponents_team
3,Valerius,76561199127458988,opponents_team
4,Tokugawa,76561199766385308,opponents_team
5,Dr_Risto,76561198870364933,team_reZilienZ
6,TheDidacte,76561198059502594,team_reZilienZ
7,Baron26,76561198988987546,team_reZilienZ
8,NedsTrex | FRUKTPEEK!,76561199272285495,opponents_team
9,anonymeTito,76561198812311227,team_reZilienZ


# 2/Kills

In [22]:
# --- Compute Player Features: Kills, Assists, ADR, Grenades ---

# Load relevant tables
kills_df = match_data["kills"]
damages_df = match_data["damages"]
players_df = match_data["players"].copy()
teams_df = pd.read_csv(os.path.join(FEATURES_DIR, "teams.csv"))
rounds_df = match_data["rounds"]

# Ajoute le steamid de l'assist à partir de son nom
kills_df = kills_df.merge(
    teams_df.rename(columns={"name": "assist_player_name", "steamid": "assist_player_steamid"}),
    on="assist_player_name", how="left"
)

# Maintenant tu peux merger pour la team de l'assist
kills_df = kills_df.merge(
    teams_df.rename(columns={"steamid": "assist_player_steamid", "team": "assist_team"}),
    on="assist_player_steamid", how="left"
)

# --- Merge team info for killer ---
kills_df = kills_df.merge(teams_df.rename(columns={"steamid": "killer_steamid", "team": "killer_team"}),
                          on="killer_steamid", how="left")

# --- Kills ---
kills_count = kills_df.groupby("killer_steamid").size().rename("kills")

# --- Assists ---
# Only count if assist exists AND assist comes from a player on the same team as the killer
kills_df["valid_assist"] = (kills_df["assist"] == True) & (kills_df["killer_team"] == kills_df["assist_team"])
assists_count = kills_df.groupby("killer_steamid")["valid_assist"].sum().rename("assists")

# --- ADR (Average Damage per Round to health only) ---
adr_total = damages_df.groupby("attacker_steamid")["health_damage"].sum().rename("total_health_damage")
rounds_played = rounds_df["round_id"].nunique()
adr = (adr_total / rounds_played).rename("adr")

# --- Grenade damage ---
# Only count damages from HE Grenade or Molotov
# --- Grenade damage (sum health damage from HE Grenade or Molotov) ---
grenade_damages_total = grenade_damages_total = damages_df[
    damages_df["weapon"].isin(["HE Grenade", "Molotov"])
].groupby("attacker_steamid")["health_damage"].sum().rename("grenade_damage")

# --- Merge all features with players ---
features = players_df.set_index("steamid").drop(columns=["kills", "deaths"])
features = features.join([kills_count, assists_count, adr, grenade_damages_total])
features = features.fillna(0)

# --- Add team info ---
features = features.merge(teams_df, on="steamid", how="left")
features = features.drop(columns=["name_y"])  # drop duplicate name column

# --- Save features table ---
features.to_csv(os.path.join(FEATURES_DIR, "player_features.csv"), index=False)
print("Player features table created:")
features.head(10)

Player features table created:


Unnamed: 0,steamid,match_id,name_x,start_side,kills,assists,adr,grenade_damage,team
0,76561199643239709,match001,Pimiento Picante,T,20,8,183.928571,2.0,team_reZilienZ
1,76561199028151244,match001,KurtKnusper,CT,5,2,76.357143,0.0,opponents_team
2,76561198175180459,match001,Kolle257,CT,13,6,139.642857,23.0,opponents_team
3,76561199127458988,match001,Valerius,CT,12,2,141.928571,0.0,opponents_team
4,76561199766385308,match001,Tokugawa,CT,4,1,65.785714,0.0,opponents_team
5,76561198870364933,match001,Dr_Risto,T,8,4,85.928571,16.0,team_reZilienZ
6,76561198059502594,match001,TheDidacte,T,21,4,170.642857,19.0,team_reZilienZ
7,76561198988987546,match001,Baron26,T,8,2,104.714286,158.0,team_reZilienZ
8,76561199272285495,match001,NedsTrex | FRUKTPEEK!,CT,4,2,49.214286,0.0,opponents_team
9,76561198812311227,match001,anonymeTito,T,5,1,71.5,0.0,team_reZilienZ
