# LoL Team Composition – v5 (Kaggle + Riot + Team Synergy)

This notebook builds an end‑to‑end pipeline that combines:

- Historical **Kaggle** match data (from the original notebook).
- Your **Riot API** export stored in `lol_full_clean.csv`.
- Additional **team synergy features** that describe the line‑up as a whole (not only individual stats).

**v5 goals**:

- Build a unified, team‑level dataset from **Kaggle + Riot**.
- Engineer features that capture:
  - How kills / damage / gold are distributed across the team.
  - **Role balance** inside each composition.
  - Variability between players (standard deviation, entropy, etc.).
- Train a **Random Forest (v5)** model on this merged data.
- Save the model and its feature list under the `model_artifacts_v5` folder.

At the end, we also query the Riot API for specific players and use the v5 model to suggest the best 5‑man team compositions for them.


# LoL Team Composition – Offline Model Training (Final Version)

This notebook trains a **team win prediction model** for League of Legends using an offline Kaggle dataset.

## Goal

Given a **team of 5 players** and their **historical performance by role**, we want to build a model that can estimate:

> **How likely is this team to win a match with this composition?**

This notebook does **not** use the Riot API. It only:

1. Downloads and loads a clean LoL dataset from Kaggle.
2. Builds player-level and team-level features (including per-role history).
3. Trains a Random Forest model to predict whether a team wins.
4. Saves the trained model and its feature list to `model_artifacts/` to be used later in an API notebook.


## Library Imports and Environment Setup

This cell imports the necessary Python libraries for data processing and configuration.
It also adjusts pandas display settings to allow for better visibility of dataframes during exploration.


In [3]:
# Imports and basic configuration

import os
from pathlib import Path

import numpy as np
import pandas as pd

# Configure pandas to show up to 100 columns and increase display width for better readability
pd.set_option("display.max_columns", 100)
pd.set_option("display.width", 200)

# Print the current working directory
print("Working directory:", os.getcwd())


Working directory: /content


## Cell 1: Download Dataset from Kaggle

This cell uses the `kagglehub` package to download the League of Legends match history dataset.  
It pulls the latest version of the dataset and provides the local path where the data files are stored.


In [4]:
# Cell 1 - Download LoL dataset via KaggleHub

!pip install -q kagglehub

import kagglehub

# Download the latest version of the dataset.
# This returns a folder path that contains all CSV files.
path = kagglehub.dataset_download(
    "nathansmallcalder/lol-match-history-and-summoner-data-80k-matches"
)

print("Dataset downloaded successfully!")
print("Path to dataset files:", path)


Downloading from https://www.kaggle.com/api/v1/datasets/download/nathansmallcalder/lol-match-history-and-summoner-data-80k-matches?dataset_version_number=1...


100%|██████████| 3.60M/3.60M [00:00<00:00, 106MB/s]

Extracting files...
Dataset downloaded successfully!
Path to dataset files: /root/.cache/kagglehub/datasets/nathansmallcalder/lol-match-history-and-summoner-data-80k-matches/versions/1





## Cell 2: Set Data Path and List CSV Files

This cell sets the dataset path using the output from the previous step, then searches for all CSV files within that directory.  
It prints the total number of files found and displays their filenames.


In [5]:
# Cell 2 - Set data path and list CSV files

DATA_PATH = Path(path)
print("Data folder:", DATA_PATH.resolve())

csv_files = list(DATA_PATH.rglob("*.csv"))
print(f"Found {len(csv_files)} CSV file(s):")
for f in csv_files:
    print(" -", f.name)


Data folder: /root/.cache/kagglehub/datasets/nathansmallcalder/lol-match-history-and-summoner-data-80k-matches/versions/1
Found 7 CSV file(s):
 - TeamMatchTbl.csv
 - ChampionTbl.csv
 - MatchTbl.csv
 - MatchStatsTbl.csv
 - SummonerMatchTbl.csv
 - ItemTbl.csv
 - RankTbl.csv


## Cell 3: Load Core Tables by File Name

This cell defines the file paths for the core CSV tables in the dataset, including match-level and player-level information.  
It then defines a helper function `safe_read_csv()` to load these files safely, printing warnings if any file is missing.  
Finally, it loads the three key tables:
- Match-level data
- Summoner-match mapping
- Match statistics
and prints their dimensions.


In [6]:
# Cell 3 - Load core tables by file name

MATCHES_CSV        = DATA_PATH / "MatchTbl.csv"
SUMMONER_MATCH_CSV = DATA_PATH / "SummonerMatchTbl.csv"
MATCH_STATS_CSV    = DATA_PATH / "MatchStatsTbl.csv"

def safe_read_csv(path: Path, **kwargs) -> pd.DataFrame:
    """Read a CSV file safely. If the file does not exist,
    return an empty DataFrame and print a warning.
    """
    if not path.exists():
        print(f"Warning: File not found: {path.name}")
        return pd.DataFrame()
    print(f"Loading: {path.name}")
    return pd.read_csv(path, **kwargs)

matches_df = safe_read_csv(MATCHES_CSV)
summoner_match_df = safe_read_csv(SUMMONER_MATCH_CSV)
match_stats_df = safe_read_csv(MATCH_STATS_CSV)

print("\nShapes:")
print("  matches_df        :", matches_df.shape)
print("  summoner_match_df :", summoner_match_df.shape)
print("  match_stats_df    :", match_stats_df.shape)


Loading: MatchTbl.csv
Loading: SummonerMatchTbl.csv
Loading: MatchStatsTbl.csv

Shapes:
  matches_df        : (35421, 5)
  summoner_match_df : (78863, 4)
  match_stats_df    : (78863, 31)


## Cell 4: Preview Core Tables

This cell previews the three main dataframes loaded in the previous step.  
For each table (`matches_df`, `summoner_match_df`, and `match_stats_df`), it prints:
- The dataframe name
- Its shape (rows × columns)
- List of column names
- The first few rows using `.head()` for visual inspection

If any of the dataframes is empty, a warning is printed instead.


In [7]:
# Cell 4 - Preview core tables (head + columns)

tables = {
    "matches_df": matches_df,
    "summoner_match_df": summoner_match_df,
    "match_stats_df": match_stats_df,
}

for name, df in tables.items():
    print("\n" + "=" * 80)
    print(f"Table: {name}")
    if df.empty:
        print("Warning: This DataFrame is empty.")
        continue

    print("Shape:", df.shape)
    print("Columns:", list(df.columns))
    display(df.head())



Table: matches_df
Shape: (35421, 5)
Columns: ['MatchId', 'Patch', 'QueueType', 'RankFk', 'GameDuration']


Unnamed: 0,MatchId,Patch,QueueType,RankFk,GameDuration
0,EUW1_6681382047,13.22.541.9804,CLASSIC,0,1050
1,EUW1_6681412019,13.22.541.9804,CLASSIC,0,778
2,EUW1_6681445530,13.22.541.9804,ARAM,0,753
3,EUW1_6681464371,13.22.541.9804,ARAM,0,853
4,EUW1_6681718380,13.22.541.9804,ARAM,0,1226



Table: summoner_match_df
Shape: (78863, 4)
Columns: ['SummonerMatchId', 'SummonerFk', 'MatchFk', 'ChampionFk']


Unnamed: 0,SummonerMatchId,SummonerFk,MatchFk,ChampionFk
0,1,1,EUW1_7565751492,902
1,2,1,EUW1_7565549583,902
2,3,1,EUW1_7564803077,16
3,4,1,EUW1_7564368646,103
4,5,1,EUW1_7564332041,800



Table: match_stats_df
Shape: (78863, 31)
Columns: ['MatchStatsId', 'SummonerMatchFk', 'MinionsKilled', 'DmgDealt', 'DmgTaken', 'TurretDmgDealt', 'TotalGold', 'Lane', 'Win', 'item1', 'item2', 'item3', 'item4', 'item5', 'item6', 'kills', 'deaths', 'assists', 'PrimaryKeyStone', 'PrimarySlot1', 'PrimarySlot2', 'PrimarySlot3', 'SecondarySlot1', 'SecondarySlot2', 'SummonerSpell1', 'SummonerSpell2', 'CurrentMasteryPoints', 'EnemyChampionFk', 'DragonKills', 'BaronKills', 'visionScore']


Unnamed: 0,MatchStatsId,SummonerMatchFk,MinionsKilled,DmgDealt,DmgTaken,TurretDmgDealt,TotalGold,Lane,Win,item1,item2,item3,item4,item5,item6,kills,deaths,assists,PrimaryKeyStone,PrimarySlot1,PrimarySlot2,PrimarySlot3,SecondarySlot1,SecondarySlot2,SummonerSpell1,SummonerSpell2,CurrentMasteryPoints,EnemyChampionFk,DragonKills,BaronKills,visionScore
0,1,1,30,4765,12541,0,7058,BOTTOM,0,3870,2055,3107,3171,6620,2022,0,2,12,8465,8463,8473,8453,8345,8347,4,7,902,51,0,0,67
1,2,2,29,8821,14534,1,9618,BOTTOM,0,3870,2065,3107,3158,6620,3916,2,5,23,8465,8463,8473,8453,8345,8347,4,7,902,236,0,0,88
2,3,3,34,6410,19011,3,9877,BOTTOM,1,3870,3107,1011,3171,6617,3916,0,5,22,8214,8226,8210,8237,8345,8347,4,7,16,498,0,0,97
3,4,4,51,22206,14771,3,12374,NONE,1,6655,3089,4645,3020,0,0,8,4,35,8112,8143,8140,8106,8226,8210,4,14,103,54,0,0,0
4,5,5,0,39106,33572,0,15012,TOP,1,4015,223157,226653,222503,223089,447108,13,8,2,0,0,0,0,0,0,2202,2201,800,12,0,0,0


## Cell 5: Build `raw_df` — One Row per Player per Match

This cell constructs the foundational dataset (`raw_df`) which contains one row per player per match.

Steps:
1. Verifies that required columns are present in the input tables.
2. Removes duplicated player statistics by keeping the latest `MatchStatsId` per player-match entry.
3. Merges summoner data with match statistics to link player performance with their match context.
4. Merges again with match-level information (e.g., duration, queue type).
5. Renames key columns for consistency and modeling.
6. Filters for "CLASSIC" queue games only, ignoring other modes like ARAM.
7. Selects only a subset of relevant columns for further processing.
8. Converts the win/loss outcome to binary format (`0` or `1`).

The final `raw_df` is the unified player-match dataset used as the input for modeling and feature engineering.


In [8]:
# Cell 5 - Build raw_df: one row per player per match

# 0) Check that required columns are available in both dataframes
required_cols_summoner = {"SummonerMatchId", "SummonerFk", "MatchFk", "ChampionFk"}
required_cols_stats = {"SummonerMatchFk", "Lane", "Win", "kills", "deaths", "assists"}

print("Has required columns in summoner_match_df:", required_cols_summoner.issubset(summoner_match_df.columns))
print("Has required columns in match_stats_df   :", required_cols_stats.issubset(match_stats_df.columns))

# 1) Deduplicate match_stats_df to ensure only one row per SummonerMatchFk
if match_stats_df["SummonerMatchFk"].duplicated().any():
    print("Found duplicated SummonerMatchFk in match_stats_df. Deduplicating...")
    match_stats_dedup = (
        match_stats_df
        .sort_values("MatchStatsId")
        .drop_duplicates(subset="SummonerMatchFk", keep="last")
    )
else:
    match_stats_dedup = match_stats_df.copy()

print("Original match_stats_df shape :", match_stats_df.shape)
print("Deduplicated match_stats_df   :", match_stats_dedup.shape)

# 2) Merge summoner-match table with player statistics
player_match_df = summoner_match_df.merge(
    match_stats_dedup,
    left_on="SummonerMatchId",
    right_on="SummonerMatchFk",
    how="inner",
    validate="one_to_one",
)

print("player_match_df shape:", player_match_df.shape)

# 3) Merge with match-level data
raw_df = player_match_df.merge(
    matches_df,
    left_on="MatchFk",
    right_on="MatchId",
    how="left",
    validate="many_to_one",
)

print("raw_df shape before cleaning:", raw_df.shape)

# 4) Rename columns to standard names for modeling
raw_df = raw_df.rename(
    columns={
        "SummonerFk": "player_id",
        "MatchFk": "match_id",
        "ChampionFk": "champion_id",
        "Lane": "role",
        "Win": "win",
        "QueueType": "queue_type",
        "RankFk": "rank_id",
        "GameDuration": "game_duration",
        "TotalGold": "total_gold",
    }
)

# 5) Filter to keep only "CLASSIC" queue games
if "queue_type" in raw_df.columns:
    raw_df = raw_df[raw_df["queue_type"] == "CLASSIC"].copy()
    print("raw_df shape after filtering CLASSIC only:", raw_df.shape)

# 6) Retain only relevant columns
base_cols = [
    "player_id",
    "match_id",
    "champion_id",
    "role",
    "win",
    "kills",
    "deaths",
    "assists",
    "MinionsKilled",
    "DmgDealt",
    "DmgTaken",
    "TurretDmgDealt",
    "total_gold",
    "queue_type",
    "rank_id",
    "game_duration",
]

existing_cols = [c for c in base_cols if c in raw_df.columns]
raw_df = raw_df[existing_cols].copy()

# 7) Convert win column to binary integer type
if "win" in raw_df.columns:
    raw_df["win"] = raw_df["win"].astype(int)

print("raw_df shape after selecting columns:", raw_df.shape)
display(raw_df.head())


Has required columns in summoner_match_df: True
Has required columns in match_stats_df   : True
Found duplicated SummonerMatchFk in match_stats_df. Deduplicating...
Original match_stats_df shape : (78863, 31)
Deduplicated match_stats_df   : (43818, 31)
player_match_df shape: (43818, 35)
raw_df shape before cleaning: (43818, 40)
raw_df shape after filtering CLASSIC only: (30903, 40)
raw_df shape after selecting columns: (30903, 16)


Unnamed: 0,player_id,match_id,champion_id,role,win,kills,deaths,assists,MinionsKilled,DmgDealt,DmgTaken,TurretDmgDealt,total_gold,queue_type,rank_id,game_duration
0,1,EUW1_7565751492,902,SUPPORT,0,0,2,12,30,4765,12541,0,7058,CLASSIC,7,1751
1,1,EUW1_7565549583,902,SUPPORT,0,2,5,23,29,8821,14534,1,9618,CLASSIC,7,2092
2,1,EUW1_7564803077,16,SUPPORT,1,0,5,22,34,6410,19011,3,9877,CLASSIC,7,2332
6,1,EUW1_7564257986,902,SUPPORT,0,0,1,7,28,3775,12061,0,6344,CLASSIC,7,1676
7,1,EUW1_7563685543,267,SUPPORT,0,1,5,6,36,4217,13464,0,7403,CLASSIC,7,1749


## Cell 6: Generate `player_role_stats` — Per-Player, Per-Role Aggregations

This cell computes performance statistics for each player grouped by their role (e.g., TOP, JUNGLE, etc.).

Steps:
- Filters out any rows with missing role information.
- Groups data by player and role.
- Aggregates multiple performance metrics:
  - Number of games played
  - Win count and win rate
  - Average kills, deaths, assists
  - Average CS (minions killed) and gold earned
- Computes a win rate column as the ratio of wins to games played.

The resulting `player_role_stats` DataFrame summarizes each player's typical performance per role.


In [9]:
# Cell 6 - Build player_role_stats (per player, per role)

group_cols = ["player_id", "role"]

# Remove rows with missing role values
role_df = raw_df.dropna(subset=["role"]).copy()

# Group by player and role, compute performance metrics
player_role_stats = (
    role_df
    .groupby(group_cols)
    .agg(
        games_played=("match_id", "nunique"),
        wins=("win", "sum"),
        avg_kills=("kills", "mean"),
        avg_deaths=("deaths", "mean"),
        avg_assists=("assists", "mean"),
        avg_cs=("MinionsKilled", "mean"),
        avg_gold=("total_gold", "mean"),
    )
    .reset_index()
)

# Calculate winrate as percentage of games won
player_role_stats["winrate"] = player_role_stats["wins"] / player_role_stats["games_played"]

print("player_role_stats shape:", player_role_stats.shape)
display(player_role_stats.head(10))


player_role_stats shape: (6794, 10)


Unnamed: 0,player_id,role,games_played,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate
0,1,SUPPORT,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
1,2,BOTTOM,16,4,5.875,6.75,6.9375,233.5625,13425.6875,0.25
2,2,JUNGLE,3,0,6.333333,6.0,6.0,97.0,10652.0,0.0
3,2,NONE,1,1,6.0,2.0,3.0,129.0,8356.0,1.0
4,3,BOTTOM,15,4,7.266667,7.6,6.066667,169.6,11693.066667,0.266667
5,3,JUNGLE,1,0,3.0,6.0,1.0,129.0,7406.0,0.0
6,3,NONE,1,0,1.0,5.0,3.0,19.0,5137.0,0.0
7,3,SUPPORT,2,1,4.0,4.5,8.5,27.0,7621.0,0.5
8,4,JUNGLE,8,1,9.625,8.5,6.875,94.125,12510.75,0.125
9,4,MIDDLE,1,0,5.0,12.0,2.0,172.0,10970.0,0.0


## Cell 7: Basic Sanity Checks

This cell performs simple data validation steps to ensure the dataset is structured correctly.

Actions:
- Prints the number of unique players and matches in the dataset.
- Displays the unique roles identified in the dataset.
- Selects a sample player from `player_role_stats` and displays their full row for manual inspection.

These checks help confirm that the data preparation steps ran successfully.


In [10]:
# Cell 7 - Basic sanity checks

print("Unique players:", raw_df["player_id"].nunique())
print("Unique matches:", raw_df["match_id"].nunique())
print("Roles:", raw_df["role"].unique())

# Select one player and show their role stats
sample_player = player_role_stats["player_id"].iloc[0]
print("\nSample player_id:", sample_player)
display(player_role_stats[player_role_stats["player_id"] == sample_player])


Unique players: 1842
Unique matches: 24314
Roles: ['SUPPORT' 'BOTTOM' 'NONE' 'JUNGLE' 'TOP' 'MIDDLE']

Sample player_id: 1


Unnamed: 0,player_id,role,games_played,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate
0,1,SUPPORT,10,5,1.3,2.9,17.1,33.0,8923.7,0.5


## Cell 8: Load and Prepare Team-Level Dataset

This cell loads the `TeamMatchTbl.csv` file which contains team-level information for each match.

Steps:
- Loads the team-level CSV into `team_df`.
- Selects relevant columns: match ID, win outcomes for each team, and the champions selected by both blue and red sides.
- Renames `MatchFk` to `match_id` for consistency with other datasets.
- Converts the `BlueWin` column into a binary outcome column named `winner_team`.

The final `team_df` contains one row per match with complete information on both teams' compositions and the match result.


In [11]:
# Cell 8 - Load and prepare team-level dataset

TEAM_CSV = DATA_PATH / "TeamMatchTbl.csv"

# Load the team-level match table
team_df = pd.read_csv(TEAM_CSV)
print("team_df shape:", team_df.shape)
display(team_df.head())

# Keep only relevant columns
team_df = team_df[[
    "MatchFk", "BlueWin", "RedWin",
    "B1Champ", "B2Champ", "B3Champ", "B4Champ", "B5Champ",
    "R1Champ", "R2Champ", "R3Champ", "R4Champ", "R5Champ"
]]

# Rename match ID column for consistency
team_df = team_df.rename(columns={"MatchFk": "match_id"})

# Convert BlueWin into a binary winner label
team_df["winner_team"] = team_df["BlueWin"].astype(int)

print("Cleaned team_df shape:", team_df.shape)
display(team_df.head())


team_df shape: (35045, 24)


Unnamed: 0,TeamID,MatchFk,B1Champ,B2Champ,B3Champ,B4Champ,B5Champ,R1Champ,R2Champ,R3Champ,R4Champ,R5Champ,BlueBaronKills,BlueRiftHeraldKills,BlueDragonKills,BlueTowerKills,BlueKills,RedBaronKills,RedRiftHeraldKills,RedDragonKills,RedTowerKills,RedKills,RedWin,BlueWin
0,1,EUW1_7565751492,897,154,157,51,902,164,5,25,221,497,0,1,1,3,13,1,0,3,8,26,1,0
1,2,EUW1_7565549583,82,238,157,236,89,6,254,127,42,902,1,0,3,10,39,0,1,1,3,33,0,1
2,3,EUW1_7564803077,516,28,4,498,235,23,64,38,901,16,0,1,2,7,27,2,0,3,8,37,1,0
3,4,EUW1_7564368646,54,34,59,498,103,61,25,55,106,5,0,0,0,4,55,0,0,0,0,39,0,1
4,5,EUW1_7564332041,12,800,111,150,142,141,101,55,950,4,0,0,0,0,42,0,0,0,0,0,0,1


Cleaned team_df shape: (35045, 14)


Unnamed: 0,match_id,BlueWin,RedWin,B1Champ,B2Champ,B3Champ,B4Champ,B5Champ,R1Champ,R2Champ,R3Champ,R4Champ,R5Champ,winner_team
0,EUW1_7565751492,0,1,897,154,157,51,902,164,5,25,221,497,0
1,EUW1_7565549583,1,0,82,238,157,236,89,6,254,127,42,902,1
2,EUW1_7564803077,0,1,516,28,4,498,235,23,64,38,901,16,0
3,EUW1_7564368646,1,0,54,34,59,498,103,61,25,55,106,5,1
4,EUW1_7564332041,1,0,12,800,111,150,142,141,101,55,950,4,1


## Cell 9: Merge Player-Level and Team-Level Data

This cell links the individual player statistics with the overall team results for each match.

Steps:
- Merges `raw_df` with `team_df` on `match_id`.
- Defines a function `get_team_side()` to determine whether the player was on the blue or red team based on their champion ID.
- Adds a `team_side` column identifying which side each player was on.
- Computes a binary outcome column `team_win` indicating whether the player’s team won the match.
- Selects a subset of relevant columns to form the modeling dataset `merged_df`.

This merged dataset now contains player stats enriched with team-level outcome and context.


In [12]:
# Cell 9 - Merge player-level data with team-level results

# Merge raw player data with team-level results
merged_df = raw_df.merge(team_df, on="match_id", how="left")

# Identify which side the player was on by checking their champion
def get_team_side(row):
    blue_team = [row[f"B{i}Champ"] for i in range(1, 6)]
    red_team = [row[f"R{i}Champ"] for i in range(1, 6)]
    if row["champion_id"] in blue_team:
        return "Blue"
    elif row["champion_id"] in red_team:
        return "Red"
    else:
        return "Unknown"

merged_df["team_side"] = merged_df.apply(get_team_side, axis=1)

# Determine if the player's team won
merged_df["team_win"] = merged_df.apply(
    lambda r: 1 if (r["team_side"] == "Blue" and r["winner_team"] == 1)
                 or (r["team_side"] == "Red" and r["winner_team"] == 0)
              else 0,
    axis=1
)

# Select relevant columns for modeling
keep_cols = [
    "player_id", "match_id", "role", "champion_id",
    "kills", "deaths", "assists",
    "total_gold", "MinionsKilled",
    "team_side", "team_win",
    "queue_type", "rank_id",
]

merged_df = merged_df[keep_cols].copy()

print("merged_df ready for modeling!")
print("Shape:", merged_df.shape)
display(merged_df.head(10))


merged_df ready for modeling!
Shape: (30903, 13)


Unnamed: 0,player_id,match_id,role,champion_id,kills,deaths,assists,total_gold,MinionsKilled,team_side,team_win,queue_type,rank_id
0,1,EUW1_7565751492,SUPPORT,902,0,2,12,7058,30,Blue,0,CLASSIC,7
1,1,EUW1_7565549583,SUPPORT,902,2,5,23,9618,29,Red,0,CLASSIC,7
2,1,EUW1_7564803077,SUPPORT,16,0,5,22,9877,34,Red,1,CLASSIC,7
3,1,EUW1_7564257986,SUPPORT,902,0,1,7,6344,28,Red,0,CLASSIC,7
4,1,EUW1_7563685543,SUPPORT,267,1,5,6,7403,36,Red,0,CLASSIC,7
5,1,EUW1_7563605642,SUPPORT,902,2,2,30,11905,31,Blue,1,CLASSIC,7
6,1,EUW1_7556939547,SUPPORT,16,0,2,14,7140,30,Blue,0,CLASSIC,7
7,1,EUW1_7556673814,SUPPORT,267,1,2,26,8961,34,Red,1,CLASSIC,7
8,1,EUW1_7555467853,SUPPORT,902,1,1,12,9291,29,Red,1,CLASSIC,7
9,1,EUW1_7548453301,SUPPORT,147,6,4,19,11640,49,Blue,1,CLASSIC,7


## Cell 10: Enrich `merged_df` with Player Historical Role Statistics

This cell adds historical performance statistics (per player, per role) to the merged player-team dataset.

Steps:
- Renames the columns in `player_role_stats` to clearly indicate that these are historical features (prefix: `hist_`).
- Merges the renamed `player_role_stats` with `merged_df` based on both `player_id` and `role`.
- Produces `model_player_df`, which contains player match data enriched with role-specific historical statistics.

This merged dataset now includes both match-level and long-term player tendencies for modeling.


In [13]:
# Cell 10 - Enrich merged_df with player_role_stats

# Rename columns to reflect historical performance metrics
player_hist = player_role_stats.rename(
    columns={
        "games_played": "hist_games_played",
        "wins": "hist_wins",
        "winrate": "hist_winrate",
        "avg_kills": "hist_avg_kills",
        "avg_deaths": "hist_avg_deaths",
        "avg_assists": "hist_avg_assists",
        "avg_cs": "hist_avg_cs",
        "avg_gold": "hist_avg_gold",
    }
)

# Merge historical stats into the player-team match dataset
model_player_df = merged_df.merge(
    player_hist,
    on=["player_id", "role"],
    how="left"
)

print("model_player_df shape:", model_player_df.shape)
display(model_player_df.head())


model_player_df shape: (30903, 21)


Unnamed: 0,player_id,match_id,role,champion_id,kills,deaths,assists,total_gold,MinionsKilled,team_side,team_win,queue_type,rank_id,hist_games_played,hist_wins,hist_avg_kills,hist_avg_deaths,hist_avg_assists,hist_avg_cs,hist_avg_gold,hist_winrate
0,1,EUW1_7565751492,SUPPORT,902,0,2,12,7058,30,Blue,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
1,1,EUW1_7565549583,SUPPORT,902,2,5,23,9618,29,Red,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
2,1,EUW1_7564803077,SUPPORT,16,0,5,22,9877,34,Red,1,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
3,1,EUW1_7564257986,SUPPORT,902,0,1,7,6344,28,Red,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
4,1,EUW1_7563685543,SUPPORT,267,1,5,6,7403,36,Red,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5


## Cell 11: Aggregate Player Data to Team-Level Format

This cell creates a team-level dataset (`team_model_df`) by aggregating player statistics per team per match.

Steps:
- Groups data by `match_id` and `team_side` to get one row per team.
- Aggregates numeric features using the `mean` (or `sum` for games played) for each team:
  - In-game stats like kills, deaths, assists, gold, and CS
  - Historical performance features from `player_role_stats`
- Flattens the resulting multi-index columns to a single-level format.
- Merges with `team_win` to preserve the match result (`1` if team won, else `0`).
- Adds a binary indicator column `is_blue_team` for distinguishing blue vs. red side.

This team-level dataset is now ready for training models to predict team victory based on aggregated stats.


In [14]:
# Cell 11 - Aggregate to team-level (team_model_df)

group_cols = ["match_id", "team_side"]

# Define aggregations to apply per team
agg_dict = {
    "kills": ["mean"],
    "deaths": ["mean"],
    "assists": ["mean"],
    "total_gold": ["mean"],
    "MinionsKilled": ["mean"],
    "hist_winrate": ["mean"],
    "hist_games_played": ["mean", "sum"],
    "hist_avg_kills": ["mean"],
    "hist_avg_deaths": ["mean"],
    "hist_avg_assists": ["mean"],
    "hist_avg_cs": ["mean"],
    "hist_avg_gold": ["mean"],
}

# Group player stats by team
team_model_df = (
    model_player_df
    .groupby(group_cols)
    .agg(agg_dict)
)

# Flatten multi-index column names
team_model_df.columns = [
    f"{col[0]}_{col[1]}" for col in team_model_df.columns.to_flat_index()
]

team_model_df = team_model_df.reset_index()

# Merge in the team win outcome
team_win_df = (
    model_player_df
    .groupby(group_cols)["team_win"]
    .max()
    .reset_index()
)

team_model_df = team_model_df.merge(team_win_df, on=group_cols, how="left")

# Add a binary indicator for blue team
team_model_df["is_blue_team"] = (team_model_df["team_side"] == "Blue").astype(int)

print("team_model_df shape:", team_model_df.shape)
display(team_model_df.head())


team_model_df shape: (26287, 17)


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,total_gold_mean,MinionsKilled_mean,hist_winrate_mean,hist_games_played_mean,hist_games_played_sum,hist_avg_kills_mean,hist_avg_deaths_mean,hist_avg_assists_mean,hist_avg_cs_mean,hist_avg_gold_mean,team_win,is_blue_team
0,EUW1_6681382047,Blue,39.0,1.0,1.0,16025.0,144.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1
1,EUW1_6681412019,Blue,11.0,0.0,8.0,8202.0,93.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1
2,EUW1_6688385247,Blue,22.0,1.0,1.0,15980.0,149.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1
3,EUW1_6688490074,Red,27.0,2.0,4.0,18290.0,154.0,1.0,1.0,1,27.0,2.0,4.0,154.0,18290.0,1,0
4,EUW1_6796881027,Blue,11.0,10.0,6.0,13082.0,143.0,0.4375,16.0,16,9.5625,12.0625,5.625,168.875,13051.75,1,1


## Cell 12: Define Generic Team Synergy Feature Builder

This cell defines a reusable function `build_team_synergy_from_players()` that constructs team-level features capturing synergy and diversity based on individual player stats.

Key Functionalities:
- Validates required input columns (kills, deaths, assists, etc.).
- Aggregates key metrics (mean, standard deviation) for kills, deaths, assists, minions, and KDA.
- Optionally includes minion stats if available.
- Computes **team-level KDA** as the mean of individual KDAs.
- Analyzes **role balance** via:
  - Number of unique roles
  - Role entropy (diversity)
  - Role imbalance (max - min occurrences)
- Computes **resource distribution**:
  - Standard deviation and max value for gold, damage, kills, assists, and minion share
  - If explicit share columns aren't present, computes them from raw totals

The output is a team-level DataFrame (`team_syn_df`) with all features prefixed by `syn_`, indexed by `match_id` and `team_side`.

This builder is useful for capturing internal team structure and coordination beyond individual skill.


In [15]:
# Cell 12 - Define Team Synergy Builder

import numpy as np
import pandas as pd

def build_team_synergy_from_players(
    df: pd.DataFrame,
    id_cols = ("match_id", "team_side"),
    role_col: str = "role",
    lane_col: str | None = None,
    kills_col: str = "kills",
    deaths_col: str = "deaths",
    assists_col: str = "assists",
    minions_col: str = "MinionsKilled",
    gold_col: str | None = None,
    dmg_col: str | None = None,
    share_cols: dict | None = None,
):
    """
    Constructs team-level synergy features from player-level data.

    Parameters
    ----------
    df : DataFrame
        One row per player per match.
    id_cols : tuple
        Identifies each team (typically match_id and team_side).
    role_col : str
        Column specifying role (Top/Jungle/Mid/etc).
    lane_col : str | None
        Optional column for lane info.
    kills_col, deaths_col, assists_col, minions_col : str
        Columns representing basic in-game statistics.
    gold_col, dmg_col : str | None
        Raw gold or damage columns (used if share is not precomputed).
    share_cols : dict | None
        Optional precomputed share columns:
        {
          "gold": "gold_share",
          "dmg": "dmg_share",
          "kill": "kill_share",
          "assist": "assist_share",
          "minion": "minion_share",
        }

    Returns
    -------
    team_syn_df : DataFrame
        Aggregated team-level DataFrame with `syn_*` features.
    """

    df = df.copy()

    # Validate required columns
    for col in [*id_cols, kills_col, deaths_col, assists_col]:
        if col not in df.columns:
            raise ValueError(f"Missing required column: {col}")

    has_minions = minions_col in df.columns

    if share_cols is None:
        share_cols = {}

    g = df.groupby(list(id_cols))
    features = []

    # Team size
    syn_n_players = g.size().rename("syn_n_players")
    features.append(syn_n_players)

    # Basic KDA stats
    features += [
        g[kills_col].mean().rename("syn_kills_mean"),
        g[kills_col].std(ddof=0).fillna(0).rename("syn_kills_std"),
        g[deaths_col].mean().rename("syn_deaths_mean"),
        g[deaths_col].std(ddof=0).fillna(0).rename("syn_deaths_std"),
        g[assists_col].mean().rename("syn_assists_mean"),
        g[assists_col].std(ddof=0).fillna(0).rename("syn_assists_std"),
    ]

    # Minion stats (if available)
    if has_minions:
        features += [
            g[minions_col].mean().rename("syn_minions_mean"),
            g[minions_col].std(ddof=0).fillna(0).rename("syn_minions_std"),
        ]

    # Team-level KDA
    deaths_safe = df[deaths_col].replace(0, 1e-3)
    df["_kda_tmp"] = (df[kills_col] + df[assists_col]) / deaths_safe
    features += [
        g["_kda_tmp"].mean().rename("syn_kda_mean"),
        g["_kda_tmp"].std(ddof=0).fillna(0).rename("syn_kda_std"),
    ]

    # Role balance metrics
    if role_col in df.columns:
        role_counts = g[role_col].value_counts().unstack(fill_value=0)
        total_players = role_counts.sum(axis=1).replace(0, np.nan)

        role_probs = role_counts.div(total_players, axis=0).replace(0, np.nan)
        syn_role_entropy = (-(role_probs * np.log(role_probs)).sum(axis=1)).fillna(0)
        syn_role_entropy.name = "syn_role_entropy"

        features += [
            (role_counts > 0).sum(axis=1).rename("syn_role_nunique"),
            syn_role_entropy,
            role_counts.max(axis=1).rename("syn_role_max_count"),
            role_counts.min(axis=1).rename("syn_role_min_count"),
            (role_counts.max(axis=1) - role_counts.min(axis=1)).rename("syn_role_imbalance"),
        ]

    # Resource distribution shares (gold, kills, assists, etc.)
    def compute_share_and_agg(metric_name, share_col_name, base_col_name):
        col_tmp = None
        if share_col_name and share_col_name in df.columns:
            col_tmp = share_col_name
        elif base_col_name and base_col_name in df.columns:
            group_sum = g[base_col_name].transform("sum").replace(0, np.nan)
            col_tmp = f"_tmp_{metric_name}_share"
            df[col_tmp] = df[base_col_name] / group_sum
        else:
            return None
        return (
            g[col_tmp].std(ddof=0).fillna(0).rename(f"syn_{metric_name}_share_std"),
            g[col_tmp].max().rename(f"syn_{metric_name}_share_max")
        )

    share_spec = {
        "gold":   {"share": share_cols.get("gold"),   "base": gold_col},
        "dmg":    {"share": share_cols.get("dmg"),    "base": dmg_col},
        "kill":   {"share": share_cols.get("kill"),   "base": kills_col},
        "assist": {"share": share_cols.get("assist"), "base": assists_col},
        "minion": {"share": share_cols.get("minion"), "base": minions_col if has_minions else None},
    }

    for metric, spec in share_spec.items():
        res = compute_share_and_agg(metric, spec["share"], spec["base"])
        if res:
            features += list(res)

    team_syn_df = pd.concat(features, axis=1).reset_index()
    return team_syn_df


## 9️⃣ Add Team Synergy features to the Kaggle dataset

In this step we use `model_player_df` (one row per player per match) to build:

- `kaggle_team_synergy_df` at the **team level**.
- Then we merge it with `team_model_df` so that we end up with:
  - The original team features from Kaggle.
  - Plus synergy features whose names start with `syn_`.


In [16]:
# ============================================
# 🧱 Build Kaggle team-level synergy features
# ============================================

if "model_player_df" not in globals():
    raise RuntimeError("❌ model_player_df is missing. Please run all Kaggle preprocessing cells above.")

if "team_model_df" not in globals():
    raise RuntimeError("❌ team_model_df is missing. Please make sure the team_model_df construction cell ran successfully.")

# [comment removed to keep this notebook English-only]
kaggle_team_syn_df = build_team_synergy_from_players(
    model_player_df,
    id_cols=("match_id", "team_side"),
    role_col="role",
    lane_col=None,
    kills_col="kills",
    deaths_col="deaths",
    assists_col="assists",
    minions_col="MinionsKilled",
    gold_col="total_gold",
    dmg_col="DmgDealt",
    share_cols={  # Kaggle   share
        "gold": None,
        "dmg": None,
        "kill": None,
        "assist": None,
        "minion": None,
    },
)

print("kaggle_team_syn_df shape:", kaggle_team_syn_df.shape)
display(kaggle_team_syn_df.head())

# [comment removed to keep this notebook English-only]
kaggle_team_full_df = team_model_df.merge(
    kaggle_team_syn_df,
    on=["match_id", "team_side"],
    how="left",
)

print("kaggle_team_full_df shape:", kaggle_team_full_df.shape)
display(kaggle_team_full_df.head())


kaggle_team_syn_df shape: (26287, 26)


Unnamed: 0,match_id,team_side,syn_n_players,syn_kills_mean,syn_kills_std,syn_deaths_mean,syn_deaths_std,syn_assists_mean,syn_assists_std,syn_minions_mean,syn_minions_std,syn_kda_mean,syn_kda_std,syn_role_nunique,syn_role_entropy,syn_role_max_count,syn_role_min_count,syn_role_imbalance,syn_gold_share_std,syn_gold_share_max,syn_kill_share_std,syn_kill_share_max,syn_assist_share_std,syn_assist_share_max,syn_minion_share_std,syn_minion_share_max
0,EUW1_6681382047,Blue,1,39.0,0.0,1.0,0.0,1.0,0.0,144.0,0.0,40.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
1,EUW1_6681412019,Blue,1,11.0,0.0,0.0,0.0,8.0,0.0,93.0,0.0,19000.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
2,EUW1_6688385247,Blue,1,22.0,0.0,1.0,0.0,1.0,0.0,149.0,0.0,23.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
3,EUW1_6688490074,Red,1,27.0,0.0,2.0,0.0,4.0,0.0,154.0,0.0,15.5,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,EUW1_6796881027,Blue,1,11.0,0.0,10.0,0.0,6.0,0.0,143.0,0.0,1.7,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0


kaggle_team_full_df shape: (26287, 41)


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,total_gold_mean,MinionsKilled_mean,hist_winrate_mean,hist_games_played_mean,hist_games_played_sum,hist_avg_kills_mean,hist_avg_deaths_mean,hist_avg_assists_mean,hist_avg_cs_mean,hist_avg_gold_mean,team_win,is_blue_team,syn_n_players,syn_kills_mean,syn_kills_std,syn_deaths_mean,syn_deaths_std,syn_assists_mean,syn_assists_std,syn_minions_mean,syn_minions_std,syn_kda_mean,syn_kda_std,syn_role_nunique,syn_role_entropy,syn_role_max_count,syn_role_min_count,syn_role_imbalance,syn_gold_share_std,syn_gold_share_max,syn_kill_share_std,syn_kill_share_max,syn_assist_share_std,syn_assist_share_max,syn_minion_share_std,syn_minion_share_max
0,EUW1_6681382047,Blue,39.0,1.0,1.0,16025.0,144.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1,1,39.0,0.0,1.0,0.0,1.0,0.0,144.0,0.0,40.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
1,EUW1_6681412019,Blue,11.0,0.0,8.0,8202.0,93.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1,1,11.0,0.0,0.0,0.0,8.0,0.0,93.0,0.0,19000.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
2,EUW1_6688385247,Blue,22.0,1.0,1.0,15980.0,149.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1,1,22.0,0.0,1.0,0.0,1.0,0.0,149.0,0.0,23.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
3,EUW1_6688490074,Red,27.0,2.0,4.0,18290.0,154.0,1.0,1.0,1,27.0,2.0,4.0,154.0,18290.0,1,0,1,27.0,0.0,2.0,0.0,4.0,0.0,154.0,0.0,15.5,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,EUW1_6796881027,Blue,11.0,10.0,6.0,13082.0,143.0,0.4375,16.0,16,9.5625,12.0625,5.625,168.875,13051.75,1,1,1,11.0,0.0,10.0,0.0,6.0,0.0,143.0,0.0,1.7,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0


## 🔟 Load Riot API dataset (`lol_full_clean.csv`)

Here we load the Riot dataset that contains:

- One row per **player per match**.
- Columns such as: `kills`, `deaths`, `assists`, `MinionsKilled`, `goldPerMin`, `dmgPerMin`, `gold_share`, and others.

> Make sure that `lol_full_clean.csv` is available in the same directory as this notebook.


In [17]:
# ============================================
# 📂 Load Riot API per-player dataset
# ============================================
from pathlib import Path
RIOT_CSV = Path("lol_full_clean_1.csv")
...
riot_df = pd.read_csv(RIOT_CSV)
...


Ellipsis

### Code Cell 16: Pipeline step

This cell performs the next processing step in the pipeline (data cleaning, feature engineering, or analysis). Please see the inline comments for details.


In [18]:
# [comment removed to keep this notebook English-only]
riot_df["team_side"] = np.where(riot_df["teamId"] == 100, "Blue", "Red")

# [comment removed to keep this notebook English-only]
riot_df["win"] = (
    riot_df["win"]
    .replace({"True": 1, "False": 0, True: 1, False: 0})  #
    .fillna(0)                                             #  NaN  0
    .astype(int)                                           #   int
)


  .replace({"True": 1, "False": 0, True: 1, False: 0})  #


## 1️⃣1️⃣ Build Riot team‑level dataset with synergy

In this section we aggregate the Riot data from **player level → team level**:

1. Compute basic numeric aggregates (e.g., `kills_mean`, `deaths_mean`, etc.).
2. Build `riot_team_syn_df` using the same synergy function that we used for Kaggle.
3. Merge them into a single DataFrame called `riot_team_full_df`.


In [19]:
# ============================================
# 🧱 Riot: aggregate to team-level + add synergy
# ============================================

group_cols = ["match_id", "team_side"]

# 1) basic team-level aggregation
agg_dict_riot = {
    "kills": ["mean"],
    "deaths": ["mean"],
    "assists": ["mean"],
    "MinionsKilled": ["mean"],
    "csPerMin": ["mean"],
    "goldPerMin": ["mean"],
    "dmgPerMin": ["mean"],
    "kda": ["mean"],
    "team_gold": ["first"],
    "team_kills": ["first"],
    "team_deaths": ["first"],
    "team_assists": ["first"],
    "team_damage": ["first"],
    "team_minions": ["first"],
    "team_baron_kills": ["first"],
    "team_dragon_kills": ["first"],
    "team_tower_kills": ["first"],
    "team_inhibitor_kills": ["first"],
    "team_riftHerald_kills": ["first"],
    "team_objectives_win": ["first"],
}

missing_cols = [c for c in agg_dict_riot.keys() if c not in riot_df.columns]
if missing_cols:
    print("⚠️ Note: these columns do not exist in riot_df and will be ignored:", missing_cols)
    for c in missing_cols:
        agg_dict_riot.pop(c, None)

riot_team_df = (
    riot_df
    .groupby(group_cols)
    .agg(agg_dict_riot)
)

riot_team_df.columns = [
    f"{col[0]}_{col[1]}" for col in riot_team_df.columns.to_flat_index()
]
riot_team_df = riot_team_df.reset_index()

# [comment removed to keep this notebook English-only]
riot_team_win = (
    riot_df
    .groupby(group_cols)["win"]
    .max()
    .reset_index()
    .rename(columns={"win": "team_win"})
)

riot_team_df = riot_team_df.merge(riot_team_win, on=group_cols, how="left")

# is_blue_team
riot_team_df["is_blue_team"] = (riot_team_df["team_side"] == "Blue").astype(int)

print("riot_team_df shape:", riot_team_df.shape)
display(riot_team_df.head())

# 2) build synergy from player-level
riot_team_syn_df = build_team_synergy_from_players(
    riot_df,
    id_cols=("match_id", "team_side"),
    role_col="role",
    lane_col="lane",
    kills_col="kills",
    deaths_col="deaths",
    assists_col="assists",
    minions_col="MinionsKilled",
    gold_col="goldEarned" if "goldEarned" in riot_df.columns else None,
    dmg_col="totalDamageDealtToChampions" if "totalDamageDealtToChampions" in riot_df.columns else None,
    share_cols={  # Riot  share
        "gold": "gold_share" if "gold_share" in riot_df.columns else None,
        "dmg": "dmg_share" if "dmg_share" in riot_df.columns else None,
        "kill": "kill_share" if "kill_share" in riot_df.columns else None,
        "assist": "assist_share" if "assist_share" in riot_df.columns else None,
        "minion": "minion_share" if "minion_share" in riot_df.columns else None,
    },
)

print("riot_team_syn_df shape:", riot_team_syn_df.shape)
display(riot_team_syn_df.head())

# 3) merge base + synergy
riot_team_full_df = riot_team_df.merge(
    riot_team_syn_df,
    on=["match_id", "team_side"],
    how="left",
)

print("riot_team_full_df shape:", riot_team_full_df.shape)
display(riot_team_full_df.head())


riot_team_df shape: (13424, 24)


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,MinionsKilled_mean,csPerMin_mean,goldPerMin_mean,dmgPerMin_mean,kda_mean,team_gold_first,team_kills_first,team_deaths_first,team_assists_first,team_damage_first,team_minions_first,team_baron_kills_first,team_dragon_kills_first,team_tower_kills_first,team_inhibitor_kills_first,team_riftHerald_kills_first,team_objectives_win_first,team_win,is_blue_team
0,EUW1_7548714571,Blue,5.0,3.8,9.6,179.8,6.18578,419.917431,913.12844,8.353333,61028,25,19,48,132708,899,1,2,6,0,1,True,1,1
1,EUW1_7548714571,Red,3.8,5.0,5.6,178.4,6.137615,369.694954,649.12844,1.97,53729,19,25,28,94340,892,0,3,4,0,0,False,0,0
2,EUW1_7557866224,Blue,8.8,6.0,11.6,152.6,5.747646,456.580038,992.949153,3.885887,60611,44,30,58,131814,763,0,2,10,1,1,True,1,1
3,EUW1_7557866224,Red,6.0,8.8,8.8,141.4,5.3258,395.630885,641.205273,1.906753,52520,30,44,44,85120,707,0,1,2,0,0,False,0,0
4,EUW1_7557877123,Blue,7.0,3.4,9.2,164.4,6.787452,464.361328,831.25845,6.416667,56233,35,17,46,100665,822,0,1,8,1,1,True,1,1


riot_team_syn_df shape: (13424, 28)


Unnamed: 0,match_id,team_side,syn_n_players,syn_kills_mean,syn_kills_std,syn_deaths_mean,syn_deaths_std,syn_assists_mean,syn_assists_std,syn_minions_mean,syn_minions_std,syn_kda_mean,syn_kda_std,syn_role_nunique,syn_role_entropy,syn_role_max_count,syn_role_min_count,syn_role_imbalance,syn_gold_share_std,syn_gold_share_max,syn_dmg_share_std,syn_dmg_share_max,syn_kill_share_std,syn_kill_share_max,syn_assist_share_std,syn_assist_share_max,syn_minion_share_std,syn_minion_share_max
0,EUW1_7548714571,Blue,5,5.0,2.0,3.8,2.315167,9.6,3.611094,179.8,84.712219,8.353333,7.650813,4,1.332179,2,0,2,0.035245,0.231369,0.059062,0.29094,0.08,0.32,0.075231,0.291667,0.094229,0.283648
1,EUW1_7548714571,Red,5,3.8,2.135416,5.0,1.095445,5.6,3.32265,178.4,88.010454,1.97,0.763937,4,1.332179,2,0,2,0.041087,0.267826,0.066463,0.31838,0.11239,0.368421,0.118666,0.428571,0.098666,0.330717
2,EUW1_7557866224,Blue,5,8.8,6.368673,6.0,2.828427,11.6,5.885576,152.6,74.168996,3.885887,1.161042,4,1.332179,2,0,2,0.044525,0.271782,0.090272,0.314458,0.144743,0.431818,0.101475,0.37931,0.097207,0.290957
3,EUW1_7557866224,Red,5,6.0,2.607681,8.8,2.4,8.8,3.059412,141.4,57.2,1.906753,0.965058,4,1.332179,2,0,2,0.039781,0.257483,0.065379,0.29113,0.086923,0.333333,0.069532,0.295455,0.080905,0.265912
4,EUW1_7557877123,Blue,5,7.0,5.215362,3.4,2.059126,9.2,5.491812,164.4,74.802674,6.416667,3.691582,4,1.332179,2,0,2,0.035262,0.266107,0.056277,0.26163,0.14901,0.457143,0.119387,0.434783,0.091001,0.270073


riot_team_full_df shape: (13424, 50)


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,MinionsKilled_mean,csPerMin_mean,goldPerMin_mean,dmgPerMin_mean,kda_mean,team_gold_first,team_kills_first,team_deaths_first,team_assists_first,team_damage_first,team_minions_first,team_baron_kills_first,team_dragon_kills_first,team_tower_kills_first,team_inhibitor_kills_first,team_riftHerald_kills_first,team_objectives_win_first,team_win,is_blue_team,syn_n_players,syn_kills_mean,syn_kills_std,syn_deaths_mean,syn_deaths_std,syn_assists_mean,syn_assists_std,syn_minions_mean,syn_minions_std,syn_kda_mean,syn_kda_std,syn_role_nunique,syn_role_entropy,syn_role_max_count,syn_role_min_count,syn_role_imbalance,syn_gold_share_std,syn_gold_share_max,syn_dmg_share_std,syn_dmg_share_max,syn_kill_share_std,syn_kill_share_max,syn_assist_share_std,syn_assist_share_max,syn_minion_share_std,syn_minion_share_max
0,EUW1_7548714571,Blue,5.0,3.8,9.6,179.8,6.18578,419.917431,913.12844,8.353333,61028,25,19,48,132708,899,1,2,6,0,1,True,1,1,5,5.0,2.0,3.8,2.315167,9.6,3.611094,179.8,84.712219,8.353333,7.650813,4,1.332179,2,0,2,0.035245,0.231369,0.059062,0.29094,0.08,0.32,0.075231,0.291667,0.094229,0.283648
1,EUW1_7548714571,Red,3.8,5.0,5.6,178.4,6.137615,369.694954,649.12844,1.97,53729,19,25,28,94340,892,0,3,4,0,0,False,0,0,5,3.8,2.135416,5.0,1.095445,5.6,3.32265,178.4,88.010454,1.97,0.763937,4,1.332179,2,0,2,0.041087,0.267826,0.066463,0.31838,0.11239,0.368421,0.118666,0.428571,0.098666,0.330717
2,EUW1_7557866224,Blue,8.8,6.0,11.6,152.6,5.747646,456.580038,992.949153,3.885887,60611,44,30,58,131814,763,0,2,10,1,1,True,1,1,5,8.8,6.368673,6.0,2.828427,11.6,5.885576,152.6,74.168996,3.885887,1.161042,4,1.332179,2,0,2,0.044525,0.271782,0.090272,0.314458,0.144743,0.431818,0.101475,0.37931,0.097207,0.290957
3,EUW1_7557866224,Red,6.0,8.8,8.8,141.4,5.3258,395.630885,641.205273,1.906753,52520,30,44,44,85120,707,0,1,2,0,0,False,0,0,5,6.0,2.607681,8.8,2.4,8.8,3.059412,141.4,57.2,1.906753,0.965058,4,1.332179,2,0,2,0.039781,0.257483,0.065379,0.29113,0.086923,0.333333,0.069532,0.295455,0.080905,0.265912
4,EUW1_7557877123,Blue,7.0,3.4,9.2,164.4,6.787452,464.361328,831.25845,6.416667,56233,35,17,46,100665,822,0,1,8,1,1,True,1,1,5,7.0,5.215362,3.4,2.059126,9.2,5.491812,164.4,74.802674,6.416667,3.691582,4,1.332179,2,0,2,0.035262,0.266107,0.056277,0.26163,0.14901,0.457143,0.119387,0.434783,0.091001,0.270073


## 1️⃣2️⃣ Merge Kaggle + Riot (enriched with synergy)

At this point we have:

- `kaggle_team_full_df` from Kaggle + synergy.
- `riot_team_full_df` from Riot + synergy.

In this cell we:

1. Select the numeric columns that both datasets share.
2. Build `full_train_df` containing:
   - `team_win` as the target label.
   - All common numeric features + a `dataset_source` flag indicating Kaggle vs Riot.


In [20]:
# ============================================
# 🔗 Merge Kaggle + Riot into one training set
# ============================================

if "kaggle_team_full_df" not in globals():
    raise RuntimeError("❌ kaggle_team_full_df is missing. Make sure the Kaggle Synergy cell ran successfully.")

if "riot_team_full_df" not in globals():
    raise RuntimeError("❌ riot_team_full_df is missing. Make sure the Riot Synergy cell ran successfully.")

kaggle_df = kaggle_team_full_df.copy()
riot_df_team = riot_team_full_df.copy()

# [comment removed to keep this notebook English-only]
for df_tmp in (kaggle_df, riot_df_team):
    if df_tmp["team_win"].dtype not in ("int64", "float64"):
        df_tmp["team_win"] = df_tmp["team_win"].astype(int)

# [comment removed to keep this notebook English-only]
kaggle_num_cols = kaggle_df.select_dtypes(include=["number", "bool"]).columns.tolist()
riot_num_cols = riot_df_team.select_dtypes(include=["number", "bool"]).columns.tolist()

common_numeric = sorted(set(kaggle_num_cols) & set(riot_num_cols))

target_col = "team_win"
feature_cols_v5 = [c for c in common_numeric if c != target_col]

print("Number of shared numeric features between Kaggle and Riot:", len(feature_cols_v5))
print("Sample of features:", feature_cols_v5[:30])

kaggle_model_df = kaggle_df[[target_col] + feature_cols_v5].copy()
riot_model_df = riot_df_team[[target_col] + feature_cols_v5].copy()

kaggle_model_df["dataset_source"] = 0  # Kaggle
riot_model_df["dataset_source"] = 1    # Riot

full_train_df = pd.concat([kaggle_model_df, riot_model_df], ignore_index=True)

print("full_train_df shape:", full_train_df.shape)
display(full_train_df.head())


Number of shared numeric features between Kaggle and Riot: 29
Sample of features: ['MinionsKilled_mean', 'assists_mean', 'deaths_mean', 'is_blue_team', 'kills_mean', 'syn_assist_share_max', 'syn_assist_share_std', 'syn_assists_mean', 'syn_assists_std', 'syn_deaths_mean', 'syn_deaths_std', 'syn_gold_share_max', 'syn_gold_share_std', 'syn_kda_mean', 'syn_kda_std', 'syn_kill_share_max', 'syn_kill_share_std', 'syn_kills_mean', 'syn_kills_std', 'syn_minion_share_max', 'syn_minion_share_std', 'syn_minions_mean', 'syn_minions_std', 'syn_n_players', 'syn_role_entropy', 'syn_role_imbalance', 'syn_role_max_count', 'syn_role_min_count', 'syn_role_nunique']
full_train_df shape: (39711, 31)


Unnamed: 0,team_win,MinionsKilled_mean,assists_mean,deaths_mean,is_blue_team,kills_mean,syn_assist_share_max,syn_assist_share_std,syn_assists_mean,syn_assists_std,syn_deaths_mean,syn_deaths_std,syn_gold_share_max,syn_gold_share_std,syn_kda_mean,syn_kda_std,syn_kill_share_max,syn_kill_share_std,syn_kills_mean,syn_kills_std,syn_minion_share_max,syn_minion_share_std,syn_minions_mean,syn_minions_std,syn_n_players,syn_role_entropy,syn_role_imbalance,syn_role_max_count,syn_role_min_count,syn_role_nunique,dataset_source
0,1,144.0,1.0,1.0,1,39.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,40.0,0.0,1.0,0.0,39.0,0.0,1.0,0.0,144.0,0.0,1,-0.0,1,1,0,1,0
1,1,93.0,8.0,0.0,1,11.0,1.0,0.0,8.0,0.0,0.0,0.0,1.0,0.0,19000.0,0.0,1.0,0.0,11.0,0.0,1.0,0.0,93.0,0.0,1,-0.0,1,1,0,1,0
2,1,149.0,1.0,1.0,1,22.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,23.0,0.0,1.0,0.0,22.0,0.0,1.0,0.0,149.0,0.0,1,-0.0,1,1,0,1,0
3,1,154.0,4.0,2.0,0,27.0,1.0,0.0,4.0,0.0,2.0,0.0,1.0,0.0,15.5,0.0,1.0,0.0,27.0,0.0,1.0,0.0,154.0,0.0,1,-0.0,1,1,0,1,0
4,1,143.0,6.0,10.0,1,11.0,1.0,0.0,6.0,0.0,10.0,0.0,1.0,0.0,1.7,0.0,1.0,0.0,11.0,0.0,1.0,0.0,143.0,0.0,1,-0.0,1,1,0,1,0


## 1️⃣3️⃣ Train merged Random Forest model (v5)

Now we train the final v5 model using `full_train_df`:

- `y = team_win` (binary label).
- `X = all shared numeric features + dataset_source`.
- We fit a `RandomForestClassifier`, then print:
  - Accuracy on a held‑out validation split.
  - The classification report.
  - The confusion matrix.
- Finally, we save the trained model and the feature list under `model_artifacts_v5`.


In [21]:
# ============================================
# 🤖 Train merged Random Forest model (v5)
# ============================================

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from pathlib import Path
import joblib

if "full_train_df" not in globals():
    raise RuntimeError("❌ full_train_df is missing. Please run the merge cell before training.")

target_col = "team_win"

# [comment removed to keep this notebook English-only]
feature_cols_v5 = [c for c in full_train_df.columns if c != target_col]

X = full_train_df[feature_cols_v5].values
y = full_train_df[target_col].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

rf_model_v5 = RandomForestClassifier(
    n_estimators=500,
    max_depth=None,
    min_samples_split=4,
    min_samples_leaf=2,
    n_jobs=-1,
    random_state=42,
    class_weight="balanced_subsample",
)

rf_model_v5.fit(X_train, y_train)

y_pred = rf_model_v5.predict(X_test)

acc = accuracy_score(y_test, y_pred)
print(f"✅ Merged v5 Test Accuracy: {acc:.4f}\n")

print("Classification Report (v5):")
print(classification_report(y_test, y_pred))

print("Confusion Matrix (v5):")
print(confusion_matrix(y_test, y_pred))

# ============================================
# 💾 Save model + feature list
# ============================================

OUTPUT_DIR = Path("model_artifacts_v5")
OUTPUT_DIR.mkdir(exist_ok=True)

model_path = OUTPUT_DIR / "lol_team_comp_rf_model_v5.pkl"
features_path = OUTPUT_DIR / "lol_team_comp_features_v5.txt"

joblib.dump(rf_model_v5, model_path)
print(f"✅ Saved merged v5 model to: {model_path}")

with open(features_path, "w") as f:
    for col in feature_cols_v5:
        f.write(col + "\n")
print(f"✅ Saved merged v5 feature list to: {features_path}")


✅ Merged v5 Test Accuracy: 0.8458

Classification Report (v5):
              precision    recall  f1-score   support

           0       0.84      0.85      0.85      3939
           1       0.85      0.84      0.85      4004

    accuracy                           0.85      7943
   macro avg       0.85      0.85      0.85      7943
weighted avg       0.85      0.85      0.85      7943

Confusion Matrix (v5):
[[3346  593]
 [ 632 3372]]
✅ Saved merged v5 model to: model_artifacts_v5/lol_team_comp_rf_model_v5.pkl
✅ Saved merged v5 feature list to: model_artifacts_v5/lol_team_comp_features_v5.txt


### Code Cell 20: Fetch Riot players and recommend top‑3 team compositions

This cell calls the Riot API for a set of players, builds `player_role_stats_api`, enumerates possible role assignments, uses the v5 model to score each team composition, and finally prints the top‑3 recommended teams.


In [22]:
# ============================================
# 🎯 Predict Best Role Composition for Real Players (API)
# ============================================

import os, time, requests, pandas as pd, numpy as np
from itertools import permutations
from tqdm import tqdm

# Ensure Riot API key is available
if "RIOT_API_KEY" not in os.environ:
    os.environ["RIOT_API_KEY"] = input("Paste your Riot API key: ").strip()

KEY = os.environ["RIOT_API_KEY"]
SESSION = requests.Session()
SESSION.headers.update({"X-Riot-Token": KEY})

PLATFORM = "euw1"   # LoL platform (EUW)
REGION   = "europe" # Match API region

riot_players = [
    {"gameName": "Kenal",   "tagLine": "EUW"},
    {"gameName": "Maynter", "tagLine": "EUW"},
    {"gameName": "Kozi",    "tagLine": "Z10"},
    {"gameName": "Fleshy",  "tagLine": "EU1"},
    {"gameName": "Koldo",   "tagLine": "1233"},
]

# ============================
# 🧩 Utility: Get player PUUID
# ============================
def get_puuid(gameName, tagLine):
    url = f"https://{REGION}.api.riotgames.com/riot/account/v1/accounts/by-riot-id/{gameName}/{tagLine}"
    r = SESSION.get(url)
    if r.status_code != 200:
        print(f"❌ Error {r.status_code} for {gameName}#{tagLine}")
        return None
    return r.json()["puuid"]

# ============================
# 🧩 Get recent matches + roles
# ============================
def get_recent_matches(puuid, n=20):
    url = f"https://{REGION}.api.riotgames.com/lol/match/v5/matches/by-puuid/{puuid}/ids?type=ranked&start=0&count={n}"
    r = SESSION.get(url)
    if r.status_code != 200:
        return []
    return r.json()

def get_player_stats(puuid, match_id):
    url = f"https://{REGION}.api.riotgames.com/lol/match/v5/matches/{match_id}"
    r = SESSION.get(url)
    if r.status_code != 200:
        return None
    data = r.json()
    for p in data["info"]["participants"]:
        if p["puuid"] == puuid:
            return {
                "lane": p.get("lane"),
                "role": p.get("role"),
                "win": int(p.get("win", False)),
                "kills": p.get("kills"),
                "deaths": p.get("deaths"),
                "assists": p.get("assists"),
                "cs": p.get("totalMinionsKilled", 0) + p.get("neutralMinionsKilled", 0),
                "gold": p.get("goldEarned"),
            }
    return None

# ============================
# 📊 Build player-role history (API)
# ============================
player_role_stats_api = []

for pl in tqdm(riot_players, desc="Fetching players"):
    puuid = get_puuid(pl["gameName"], pl["tagLine"])
    if not puuid:
        continue

    matches = get_recent_matches(puuid)
    all_stats = []
    for mid in matches:
        s = get_player_stats(puuid, mid)
        if s:
            all_stats.append(s)
        time.sleep(1.2)  # avoid hitting rate limit

    if not all_stats:
        continue

    df = pd.DataFrame(all_stats)
    grouped = df.groupby("role").agg(
        games=("win", "count"),
        wins=("win", "sum"),
        avg_kills=("kills", "mean"),
        avg_deaths=("deaths", "mean"),
        avg_assists=("assists", "mean"),
        avg_cs=("cs", "mean"),
        avg_gold=("gold", "mean"),
    )
    grouped["winrate"] = grouped["wins"] / grouped["games"]
    grouped["player"] = pl["gameName"]
    grouped.reset_index(inplace=True)
    player_role_stats_api.append(grouped)

# Concatenate all players into a single DataFrame
player_role_df_api = pd.concat(player_role_stats_api, ignore_index=True)
display(player_role_df_api)


Paste your Riot API key: RGAPI-83dd63f6-1528-4a63-8e43-d182c40cc6a7


Fetching players: 100%|██████████| 5/5 [02:24<00:00, 28.99s/it]


Unnamed: 0,role,games,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate,player
0,CARRY,11,8,9.0,5.090909,7.818182,247.909091,14884.0,0.727273,Kenal
1,DUO,4,4,4.5,3.75,10.0,205.75,11824.25,1.0,Kenal
2,SOLO,2,1,5.5,5.0,5.5,192.5,9902.5,0.5,Kenal
3,SUPPORT,3,3,4.333333,5.333333,10.0,84.0,8775.333333,1.0,Kenal
4,CARRY,1,1,5.0,5.0,11.0,190.0,11084.0,1.0,Maynter
5,DUO,5,2,4.6,5.4,7.2,221.8,12944.2,0.4,Maynter
6,NONE,1,0,7.0,7.0,9.0,155.0,11368.0,0.0,Maynter
7,SOLO,12,8,5.25,3.833333,7.083333,223.583333,12653.583333,0.666667,Maynter
8,SUPPORT,1,0,2.0,7.0,7.0,175.0,9795.0,0.0,Maynter
9,CARRY,1,1,18.0,2.0,3.0,227.0,17186.0,1.0,Kozi


## 1️⃣4️⃣ Top 3 Team Compositions by historical winrate (Riot data)

In this part we use the Riot player‑level data `riot_df` to:

1. Build, for each team in each match, a **composition** represented as a list of champions.
2. Compute the **winrate for each composition**, using only compositions that have enough matches.
3. Display the **top 3 compositions** with the highest winrate along with the number of games played.

> Note: this analysis is purely historical and depends only on the Riot data you downloaded, not on the ML model.  
> In other words, these compositions are simply the ones with the best winrate in your dataset.


In [23]:
# ============================================
# 🏆 Compute top-3 team comps by historical winrate (Riot)
# ============================================

if "riot_df" not in globals():
    raise RuntimeError("❌ riot_df is missing. Please make sure you executed the Riot CSV loading cell.")

# [comment removed to keep this notebook English-only]
riot_team_champs = (
    riot_df
    .groupby(["match_id", "team_side"])
    .agg(
        team_champs=("championName", lambda s: ",".join(sorted(s.astype(str)))),
        team_win=("win", "max"),
    )
    .reset_index()
)

print("riot_team_champs shape:", riot_team_champs.shape)
display(riot_team_champs.head())

# [comment removed to keep this notebook English-only]
comp_stats = (
    riot_team_champs
    .groupby("team_champs")
    .agg(
        games=("team_win", "count"),
        wins=("team_win", "sum"),
    )
    .reset_index()
)

comp_stats["winrate"] = comp_stats["wins"] / comp_stats["games"]

print("\n📊 Distribution of the number of games for each team composition:")
display(comp_stats["games"].value_counts().sort_index().head(10))

# [comment removed to keep this notebook English-only]
candidates = [20, 10, 5, 3, 2, 1]
chosen_min = None
filtered_comp_stats = None

for mg in candidates:
    tmp = comp_stats[comp_stats["games"] >= mg].copy()
    if not tmp.empty:
        chosen_min = mg
        filtered_comp_stats = tmp
        break

if filtered_comp_stats is None or filtered_comp_stats.empty:
    print("⚠️ Could not find any team composition even with the >=1 game condition. Using all data without filtering.")
    filtered_comp_stats = comp_stats.copy()
    chosen_min = 0

if chosen_min == 0:
    print("\n⚠️ No team composition appears more than once, or the distribution is extremely sparse.")
elif chosen_min == 1:
    print("\nℹ️ The weakest constraint we could apply is: MIN_GAMES = 1 (most team comps were played only once).")
else:
    print(f"\n✅ We used MIN_GAMES = {chosen_min} (each team comp was played at least {chosen_min} times).")

# [comment removed to keep this notebook English-only]
filtered_comp_stats = filtered_comp_stats.sort_values(
    ["winrate", "games"],
    ascending=[False, False]
)

print(f"\nNumber of team compositions after filtering (>= {chosen_min} games):", filtered_comp_stats.shape[0])

top3_comps = filtered_comp_stats.head(3)

print("\n🏆 Top 3 team comps by historical winrate (Riot data):")
display(top3_comps)


riot_team_champs shape: (13424, 4)


Unnamed: 0,match_id,team_side,team_champs,team_win
0,EUW1_7548714571,Blue,"Bard,KogMaw,Rumble,Smolder,Volibear",1
1,EUW1_7548714571,Red,"Darius,Nasus,Nidalee,Viego,Yunara",0
2,EUW1_7557866224,Blue,"JarvanIV,Karma,Nautilus,Syndra,Tristana",1
3,EUW1_7557866224,Red,"Caitlyn,KSante,Orianna,Sett,Sylas",0
4,EUW1_7557877123,Blue,"Janna,Jinx,Qiyana,Vayne,Viktor",1



📊 Distribution of the number of games for each team composition:


Unnamed: 0_level_0,count
games,Unnamed: 1_level_1
1,13400
2,12



✅ We used MIN_GAMES = 2 (each team comp was played at least 2 times).

Number of team compositions after filtering (>= 2 games): 12

🏆 Top 3 team comps by historical winrate (Riot data):


Unnamed: 0,team_champs,games,wins,winrate
1115,"Akali,Ambessa,Bard,Ezreal,JarvanIV",2,2,1.0
2548,"Alistar,LeeSin,Orianna,Riven,Yunara",2,2,1.0
4551,"Ashe,Nocturne,Orianna,Renekton,Seraphine",2,2,1.0


### Code Cell 22: Champion pair synergy analysis

This cell computes champion‑pair statistics within each team, calculates winrates, filters by minimum games played, and shows the strongest pairs.


In [24]:
# ============================================
# 🤝 Champion Pair Synergy Analysis
# ============================================

from itertools import combinations

if "riot_team_champs" not in globals():
    raise RuntimeError("❌ riot_team_champs  .     .")

print("🔍       (pair synergy)...")

pair_records = []

# [comment removed to keep this notebook English-only]
for _, row in riot_team_champs.iterrows():
    champs = row["team_champs"].split(",")
    win = row["team_win"]
# [comment removed to keep this notebook English-only]
    for c1, c2 in combinations(sorted(champs), 2):
        pair_records.append((c1, c2, win))

pair_df = pd.DataFrame(pair_records, columns=["champ1", "champ2", "team_win"])

# [comment removed to keep this notebook English-only]
pair_stats = (
    pair_df
    .groupby(["champ1", "champ2"])
    .agg(
        games=("team_win", "count"),
        wins=("team_win", "sum"),
    )
    .reset_index()
)

pair_stats["winrate"] = pair_stats["wins"] / pair_stats["games"]

# [comment removed to keep this notebook English-only]
MIN_GAMES_PAIR = 5
filtered_pairs = pair_stats[pair_stats["games"] >= MIN_GAMES_PAIR].copy()

# [comment removed to keep this notebook English-only]
filtered_pairs = filtered_pairs.sort_values(["winrate", "games"], ascending=[False, False])

print(f"\n    (>= {MIN_GAMES_PAIR} ):", filtered_pairs.shape[0])
print("\n🏆 Top 10 champion pairs by winrate:")
display(filtered_pairs.head(10))


🔍       (pair synergy)...

    (>= 5 ): 7323

🏆 Top 10 champion pairs by winrate:


Unnamed: 0,champ1,champ2,games,wins,winrate
3211,Chogath,Thresh,10,10,1.0
753,Alistar,Xayah,9,9,1.0
4298,Ezreal,Thresh,8,8,1.0
5463,Hecarim,Irelia,8,8,1.0
10272,Poppy,Shen,8,8,1.0
11282,Shen,Velkoz,8,8,1.0
2482,Brand,Ryze,7,7,1.0
4143,Elise,Warwick,7,7,1.0
4206,Ezreal,Garen,7,7,1.0
5986,Ivern,Kassadin,7,7,1.0


### Code Cell 23: Define generic team synergy feature builder

This cell defines `build_team_synergy_from_players`, a reusable function that takes player‑level stats and computes team‑level synergy features (KDA distribution, role balance, entropy, etc.).


In [25]:
# ============================================
# 🧠 Smart Top-3 Team Comps — Based on Player History
# ============================================

import pandas as pd
import numpy as np
from itertools import product

# [comment removed to keep this notebook English-only]
needed = ["player_role_stats", "rf_model_v5", "feature_cols_v5", "full_train_df", "build_team_synergy_from_players"]
for g in needed:
    if g not in globals():
        raise RuntimeError(f"❌ Variable {g} is missing.")

# --------------------------------------------
# [comment removed to keep this notebook English-only]
# --------------------------------------------
unique_players = player_role_stats["player_id"].drop_duplicates().values
PLAYERS = np.random.choice(unique_players, size=5, replace=False)

print("🎯 Selected players for testing:")
display(
    player_role_stats[player_role_stats["player_id"].isin(PLAYERS)]
    .sort_values(["player_id", "role"])
    .reset_index(drop=True)
)

# --------------------------------------------
# [comment removed to keep this notebook English-only]
# [comment removed to keep this notebook English-only]
# --------------------------------------------
role_choices = {}

for pid in PLAYERS:
    subset = player_role_stats[player_role_stats["player_id"] == pid].copy()
    subset = subset[subset["role"].notna() & (subset["role"] != "NONE")]
    if subset.empty:
        continue

# [comment removed to keep this notebook English-only]
    subset = subset.sort_values(["winrate", "games_played"], ascending=[False, False])
# [comment removed to keep this notebook English-only]
    top_roles = subset["role"].head(2).tolist()

    role_choices[pid] = top_roles

print("\n📊 Best roles per player (based on performance):")
for pid, roles in role_choices.items():
    print(f"- {pid}: {roles}")

players_order = list(role_choices.keys())
role_combos = list(product(*[role_choices[pid] for pid in players_order]))
print(f"\n🔢 Number of possible formations: {len(role_combos)}")

# --------------------------------------------
# [comment removed to keep this notebook English-only]
# --------------------------------------------
def build_team(player_ids, assigned_roles):
    rows = []
    for pid, role in zip(player_ids, assigned_roles):
        row_stats = player_role_stats[
            (player_role_stats["player_id"] == pid) &
            (player_role_stats["role"] == role)
        ]
        if row_stats.empty:
            return None
        rs = row_stats.iloc[0]
        rows.append({
            "player_id": pid,
            "role": role,
            "match_id": "DUMMY",
            "team_side": "Blue",
            "kills": rs["avg_kills"],
            "deaths": rs["avg_deaths"],
            "assists": rs["avg_assists"],
            "MinionsKilled": rs["avg_cs"],
            "total_gold": rs["avg_gold"],
        })
    return pd.DataFrame(rows)

# --------------------------------------------
# [comment removed to keep this notebook English-only]
# --------------------------------------------
feature_means = full_train_df[feature_cols_v5].mean()
all_feats = set(feature_cols_v5)
results = []

for combo in role_combos:
    team_df = build_team(players_order, combo)
    if team_df is None:
        continue

# [comment removed to keep this notebook English-only]
    syn_df = build_team_synergy_from_players(
        team_df,
        id_cols=("match_id", "team_side"),
        role_col="role",
        kills_col="kills",
        deaths_col="deaths",
        assists_col="assists",
        minions_col="MinionsKilled",
        gold_col="total_gold",
        dmg_col=None,
        share_cols={k: None for k in ["gold", "dmg", "kill", "assist", "minion"]}
    )

    row_feats = {col: val for col, val in syn_df.iloc[0].to_dict().items() if col in all_feats}

# [comment removed to keep this notebook English-only]
    agg = team_df[["kills", "deaths", "assists", "MinionsKilled"]].mean().to_dict()
    for name, val in agg.items():
        col = name + "_mean"
        if col in all_feats:
            row_feats[col] = val

# [comment removed to keep this notebook English-only]
    row_feats["is_blue_team"] = 1
    row_feats["syn_n_players"] = len(team_df)
    row_feats["dataset_source"] = 0

# [comment removed to keep this notebook English-only]
    for feat in feature_cols_v5:
        if feat not in row_feats:
            row_feats[feat] = feature_means[feat]

    X_one = pd.DataFrame([row_feats])[feature_cols_v5]
    win_proba = rf_model_v5.predict_proba(X_one.values)[0, 1]

    results.append({
        "formation": ", ".join(f"{pid}→{role}" for pid, role in zip(players_order, combo)),
        "win_proba": win_proba
    })

# --------------------------------------------
# [comment removed to keep this notebook English-only]
# --------------------------------------------
results_df = pd.DataFrame(results).sort_values("win_proba", ascending=False).reset_index(drop=True)

print(f"\n✅ Number of tried formations: {len(results_df)}")
print("\n🏆 Top 3 team comps (based on strongest roles for each player):")
display(results_df.head(3))


🎯 Selected players for testing:


Unnamed: 0,player_id,role,games_played,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate
0,302,JUNGLE,4,1,13.25,6.0,7.0,144.5,16875.25,0.25
1,302,MIDDLE,13,6,3.153846,7.923077,6.076923,205.461538,11274.0,0.461538
2,302,TOP,3,2,4.0,6.0,5.0,228.666667,12459.333333,0.666667
3,501,JUNGLE,11,4,5.909091,8.272727,6.545455,68.909091,11163.090909,0.363636
4,501,NONE,1,0,3.0,1.0,0.0,9.0,6660.0,0.0
5,501,SUPPORT,4,3,3.0,4.0,15.75,25.25,9298.5,0.75
6,553,BOTTOM,2,2,9.5,4.5,12.0,157.0,11609.5,1.0
7,553,JUNGLE,1,0,3.0,6.0,6.0,125.0,11022.0,0.0
8,553,MIDDLE,3,0,2.0,5.333333,5.0,180.0,9101.0,0.0
9,553,SUPPORT,4,0,1.0,8.0,14.75,31.75,9663.5,0.0



📊 Best roles per player (based on performance):
- 302: ['TOP', 'MIDDLE']
- 1788: ['MIDDLE', 'SUPPORT']
- 553: ['BOTTOM', 'TOP']
- 795: ['TOP', 'BOTTOM']
- 501: ['SUPPORT', 'JUNGLE']

🔢 Number of possible formations: 32

✅ Number of tried formations: 32

🏆 Top 3 team comps (based on strongest roles for each player):


Unnamed: 0,formation,win_proba
0,"302→TOP, 1788→SUPPORT, 553→BOTTOM, 795→TOP, 50...",0.821874
1,"302→TOP, 1788→MIDDLE, 553→BOTTOM, 795→TOP, 501...",0.806502
2,"302→MIDDLE, 1788→SUPPORT, 553→BOTTOM, 795→TOP,...",0.723841


### Code Cell 24: Define generic team synergy feature builder

This cell defines `build_team_synergy_from_players`, a reusable function that takes player‑level stats and computes team‑level synergy features (KDA distribution, role balance, entropy, etc.).


In [26]:
# ============================================
# 🔌 1) Fetch 5 players from Riot API & build player_role_stats_api (with rate limit handling)
# ============================================

import requests
import pandas as pd
import numpy as np
from itertools import product
import time

# [comment removed to keep this notebook English-only]
RIOT_API_KEY = "RGAPI-83dd63f6-1528-4a63-8e43-d182c40cc6a7"  # key

HEADERS = {"X-Riot-Token": RIOT_API_KEY}

# [comment removed to keep this notebook English-only]
riot_players = [
    {"gameName": "Kenal",   "tagLine": "EUW"},
    {"gameName": "Maynter", "tagLine": "EUW"},
    {"gameName": "Kozi",    "tagLine": "Z10"},
    {"gameName": "Fleshy",  "tagLine": "EU1"},
    {"gameName": "Koldo",   "tagLine": "1233"},
]

# [comment removed to keep this notebook English-only]
# [comment removed to keep this notebook English-only]
ACCOUNT_REGION = "europe"
MATCH_REGION = "europe"

def riot_get(url, params=None, max_retries=3):
    """
    Wrapper  handling  429 (rate limit).
      429:
      -  Retry-After     2
      -    max_retries
    """
    for attempt in range(max_retries):
        r = requests.get(url, headers=HEADERS, params=params)
        if r.status_code == 429:
            retry_after = r.headers.get("Retry-After")
            try:
                wait_sec = int(retry_after) + 1 if retry_after is not None else 3
            except ValueError:
                wait_sec = 3
            print(f"⚠️ Rate limit 429, attempt {attempt+1}/{max_retries}. Sleeping for {wait_sec}s ...")
            time.sleep(wait_sec)
            continue

        if r.status_code != 200:
            raise RuntimeError(f"Riot API error {r.status_code}: {r.text}")
        return r.json()

    raise RuntimeError("Too many 429 responses from Riot API, even after retrying.")

# [comment removed to keep this notebook English-only]
players_meta = []

for p in riot_players:
    acc_url = f"https://{ACCOUNT_REGION}.api.riotgames.com/riot/account/v1/accounts/by-riot-id/{p['gameName']}/{p['tagLine']}"
    acc_data = riot_get(acc_url)
    puuid = acc_data["puuid"]

    players_meta.append({
        "gameName": p["gameName"],
        "tagLine": p["tagLine"],
        "puuid": puuid,
    })

players_meta_df = pd.DataFrame(players_meta)
print("✅ Got PUUIDs:")
display(players_meta_df)

# [comment removed to keep this notebook English-only]

# [comment removed to keep this notebook English-only]
def fetch_player_matches(puuid, count=50):
    url = f"https://{MATCH_REGION}.api.riotgames.com/lol/match/v5/matches/by-puuid/{puuid}/ids"
    ids = riot_get(url, params={"start": 0, "count": count})
    return ids

all_rows = []

for row in players_meta:
    puuid = row["puuid"]
    name_tag = f"{row['gameName']}#{row['tagLine']}"

    match_ids = fetch_player_matches(puuid, count=50)
    print(f"📂 {name_tag}: fetched {len(match_ids)} matches")

    for match_id in match_ids:
        match_url = f"https://{MATCH_REGION}.api.riotgames.com/lol/match/v5/matches/{match_id}"
        match_data = riot_get(match_url)

# [comment removed to keep this notebook English-only]
        time.sleep(1.0)

        info = match_data.get("info", {})
        participants = info.get("participants", [])

        for part in participants:
            if part.get("puuid") != puuid:
                continue

            role_raw = part.get("teamPosition") or part.get("individualPosition") or "NONE"
# [comment removed to keep this notebook English-only]
            if role_raw == "UTILITY":
                role = "SUPPORT"
            elif role_raw == "BOTTOM":
                role = "BOTTOM"
            elif role_raw == "MIDDLE":
                role = "MIDDLE"
            elif role_raw == "JUNGLE":
                role = "JUNGLE"
            elif role_raw == "TOP":
                role = "TOP"
            else:
                role = "NONE"

            kills = part.get("kills", 0)
            deaths = part.get("deaths", 0)
            assists = part.get("assists", 0)
            total_minions = part.get("totalMinionsKilled", 0) + part.get("neutralMinionsKilled", 0)
            gold = part.get("goldEarned", 0)
            win = 1 if part.get("win", False) else 0

            all_rows.append({
                "player_id": name_tag,
                "match_id": match_id,
                "role": role,
                "kills": kills,
                "deaths": deaths,
                "assists": assists,
                "cs": total_minions,
                "gold": gold,
                "win": win,
            })

riot_players_matches_df = pd.DataFrame(all_rows)
print("\n📊 Raw per-match stats for given players:")
display(riot_players_matches_df.head())

if riot_players_matches_df.empty:
    raise RuntimeError("❌ No stats were found for these players. Please check the names and that your Riot API key is valid.")

# [comment removed to keep this notebook English-only]
grouped = (
    riot_players_matches_df
    .groupby(["player_id", "role"])
    .agg(
        games_played=("match_id", "nunique"),
        wins=("win", "sum"),
        avg_kills=("kills", "mean"),
        avg_deaths=("deaths", "mean"),
        avg_assists=("assists", "mean"),
        avg_cs=("cs", "mean"),
        avg_gold=("gold", "mean"),
    )
    .reset_index()
)

grouped["winrate"] = grouped["wins"] / grouped["games_played"].replace(0, np.nan)
player_role_stats_api = grouped

print("\n✅ player_role_stats_api (from Riot API for these players):")
display(player_role_stats_api)


# ============================================
# 🧠 2) Use rf_model_v5 to get Top-3 team comps for these 5 players
# ============================================

needed = ["rf_model_v5", "feature_cols_v5", "full_train_df", "build_team_synergy_from_players"]
for g in needed:
    if g not in globals():
        raise RuntimeError(f"❌ Variable {g} is missing. Please run the v5 training and API preparation cells before this one.")

from itertools import product

PLAYERS = player_role_stats_api["player_id"].drop_duplicates().tolist()
print("\n🎯 Players that will be used from the API:", PLAYERS)

if len(PLAYERS) != 5:
    print("⚠️ Number of players != 5, we will use all available players, but the original idea is 5 players.")

# [comment removed to keep this notebook English-only]
PLAYERS = PLAYERS[:5]

display(
    player_role_stats_api[player_role_stats_api["player_id"].isin(PLAYERS)]
    .sort_values(["player_id", "role"])
)

# [comment removed to keep this notebook English-only]
role_choices = {}

for pid in PLAYERS:
    subset = player_role_stats_api[player_role_stats_api["player_id"] == pid].copy()
    subset = subset[subset["role"].notna() & (subset["role"] != "NONE")]
    if subset.empty:
        continue

    subset = subset.sort_values(["winrate", "games_played"], ascending=[False, False])
    top_roles = subset["role"].head(2).tolist()  #  2 role
    role_choices[pid] = top_roles

print("\n📊 Best roles per player (from API data):")
for pid, roles in role_choices.items():
    print(f"- {pid}: {roles}")

players_order = list(role_choices.keys())
role_combos = list(product(*[role_choices[pid] for pid in players_order]))
print(f"\n🔢 Number of possible formations: {len(role_combos)}")

feature_means = full_train_df[feature_cols_v5].mean()
all_feats = set(feature_cols_v5)
results = []

def build_team_from_api_role_stats(player_ids, assigned_roles):
    rows = []
    for pid, role in zip(player_ids, assigned_roles):
        row_stats = player_role_stats_api[
            (player_role_stats_api["player_id"] == pid) &
            (player_role_stats_api["role"] == role)
        ]
        if row_stats.empty:
            return None
        rs = row_stats.iloc[0]
        rows.append({
            "player_id": pid,
            "role": role,
            "match_id": "DUMMY_MATCH",
            "team_side": "Blue",
            "kills": rs["avg_kills"],
            "deaths": rs["avg_deaths"],
            "assists": rs["avg_assists"],
            "MinionsKilled": rs["avg_cs"],
            "total_gold": rs["avg_gold"],
        })
    return pd.DataFrame(rows)

for combo in role_combos:
    team_df = build_team_from_api_role_stats(players_order, combo)
    if team_df is None:
        continue

    syn_df = build_team_synergy_from_players(
        team_df,
        id_cols=("match_id", "team_side"),
        role_col="role",
        lane_col=None,
        kills_col="kills",
        deaths_col="deaths",
        assists_col="assists",
        minions_col="MinionsKilled",
        gold_col="total_gold",
        dmg_col=None,
        share_cols={k: None for k in ["gold", "dmg", "kill", "assist", "minion"]},
    )

    syn_row = syn_df.iloc[0].to_dict()
    row_feats = {col: val for col, val in syn_row.items() if col in all_feats}

    agg = team_df[["kills", "deaths", "assists", "MinionsKilled"]].mean().to_dict()
    for name, val in agg.items():
        col = name + "_mean"
        if col in all_feats:
            row_feats[col] = val

    row_feats["is_blue_team"] = 1 if "is_blue_team" in all_feats else 1
    if "syn_n_players" in all_feats:
        row_feats["syn_n_players"] = len(team_df)
    if "dataset_source" in all_feats:
        row_feats["dataset_source"] = 1  # 1 = Riot

# [comment removed to keep this notebook English-only]
    for feat in feature_cols_v5:
        if feat not in row_feats:
            row_feats[feat] = feature_means[feat]

    X_one = pd.DataFrame([row_feats])[feature_cols_v5]
    win_proba = rf_model_v5.predict_proba(X_one.values)[0, 1]

    formation_desc = ", ".join(
        f"{pid}→{role}" for pid, role in zip(players_order, combo)
    )

    results.append({
        "formation": formation_desc,
        "win_proba": win_proba,
    })

results_df = pd.DataFrame(results).sort_values("win_proba", ascending=False).reset_index(drop=True)

print("\n🏆 Top 3 team compositions for these players (from Riot API + v5 model):")
display(results_df.head(3))


✅ Got PUUIDs:


Unnamed: 0,gameName,tagLine,puuid
0,Kenal,EUW,_PsCFQWIq26FGzOpSjE-ajVjoWzyLKa07sjjwCUJM3OVc0...
1,Maynter,EUW,spoTsfy1GOP0rsnmT5nhx3ZbFbr_z16NPxnoePlQEe8Yf0...
2,Kozi,Z10,pabj-LhhZiwghVqKHrYjo_UkDFvUa3i3CcLjQKe8CwPl_O...
3,Fleshy,EU1,Xe2dJQJtCaT5ZhD5ZxUygkuR0wPz9yn-kOFAD6RbC8gpFi...
4,Koldo,1233,2yKUxzr5mxEYBwfuodqgIamAECcF9ZS2S4LVQuPkA6rfym...


📂 Kenal#EUW: fetched 50 matches
📂 Maynter#EUW: fetched 50 matches
📂 Kozi#Z10: fetched 50 matches
📂 Fleshy#EU1: fetched 50 matches
📂 Koldo#1233: fetched 50 matches

📊 Raw per-match stats for given players:


Unnamed: 0,player_id,match_id,role,kills,deaths,assists,cs,gold,win
0,Kenal#EUW,EUW1_7600972506,BOTTOM,4,2,12,234,11917,1
1,Kenal#EUW,EUW1_7600763591,BOTTOM,5,1,4,129,8127,1
2,Kenal#EUW,EUW1_7600732171,BOTTOM,7,4,13,195,12577,1
3,Kenal#EUW,EUW1_7600697231,BOTTOM,6,5,7,233,12545,0
4,Kenal#EUW,EUW1_7600641366,BOTTOM,5,9,9,228,14192,1



✅ player_role_stats_api (from Riot API for these players):


Unnamed: 0,player_id,role,games_played,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate
0,Fleshy#EU1,MIDDLE,1,0,4.0,10.0,2.0,329.0,16776.0,0.0
1,Fleshy#EU1,NONE,4,2,3.5,6.5,6.0,0.25,10672.5,0.5
2,Fleshy#EU1,SUPPORT,45,31,2.933333,4.355556,17.022222,32.244444,8710.555556,0.688889
3,Kenal#EUW,BOTTOM,37,28,6.675676,4.567568,7.72973,205.135135,12155.081081,0.756757
4,Kenal#EUW,JUNGLE,1,1,3.0,2.0,6.0,107.0,7307.0,1.0
5,Kenal#EUW,MIDDLE,4,2,5.0,7.25,7.75,199.0,10898.25,0.5
6,Kenal#EUW,SUPPORT,3,2,1.666667,7.333333,11.0,13.0,6001.0,0.666667
7,Kenal#EUW,TOP,5,3,7.8,5.4,8.0,211.2,12926.6,0.6
8,Koldo#1233,JUNGLE,38,24,6.368421,3.605263,9.421053,164.157895,11054.578947,0.631579
9,Koldo#1233,MIDDLE,6,3,7.5,4.666667,4.833333,189.833333,11560.666667,0.5



🎯 Players that will be used from the API: ['Fleshy#EU1', 'Kenal#EUW', 'Koldo#1233', 'Kozi#Z10', 'Maynter#EUW']


Unnamed: 0,player_id,role,games_played,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate
0,Fleshy#EU1,MIDDLE,1,0,4.0,10.0,2.0,329.0,16776.0,0.0
1,Fleshy#EU1,NONE,4,2,3.5,6.5,6.0,0.25,10672.5,0.5
2,Fleshy#EU1,SUPPORT,45,31,2.933333,4.355556,17.022222,32.244444,8710.555556,0.688889
3,Kenal#EUW,BOTTOM,37,28,6.675676,4.567568,7.72973,205.135135,12155.081081,0.756757
4,Kenal#EUW,JUNGLE,1,1,3.0,2.0,6.0,107.0,7307.0,1.0
5,Kenal#EUW,MIDDLE,4,2,5.0,7.25,7.75,199.0,10898.25,0.5
6,Kenal#EUW,SUPPORT,3,2,1.666667,7.333333,11.0,13.0,6001.0,0.666667
7,Kenal#EUW,TOP,5,3,7.8,5.4,8.0,211.2,12926.6,0.6
8,Koldo#1233,JUNGLE,38,24,6.368421,3.605263,9.421053,164.157895,11054.578947,0.631579
9,Koldo#1233,MIDDLE,6,3,7.5,4.666667,4.833333,189.833333,11560.666667,0.5



📊 Best roles per player (from API data):
- Fleshy#EU1: ['SUPPORT', 'MIDDLE']
- Kenal#EUW: ['JUNGLE', 'BOTTOM']
- Koldo#1233: ['SUPPORT', 'JUNGLE']
- Kozi#Z10: ['TOP', 'MIDDLE']
- Maynter#EUW: ['TOP', 'MIDDLE']

🔢 Number of possible formations: 32

🏆 Top 3 team compositions for these players (from Riot API + v5 model):


Unnamed: 0,formation,win_proba
0,"Fleshy#EU1→SUPPORT, Kenal#EUW→JUNGLE, Koldo#12...",0.877529
1,"Fleshy#EU1→SUPPORT, Kenal#EUW→JUNGLE, Koldo#12...",0.868273
2,"Fleshy#EU1→SUPPORT, Kenal#EUW→BOTTOM, Koldo#12...",0.865593


# ============================================
# 💡 Explanation — Why we save model & data artifacts
# ============================================

"""
This cell explains why we save our artifacts.

After training and testing our model (v5), it’s essential to save:
1. The trained model (rf_model_v5)
2. The dataset used for training
3. The list of feature columns
4. The processed player statistics from Kaggle and Riot API
5. The final results (Top-3 team compositions)

Why?
- So we can reuse the trained model later without retraining.
- To keep consistency between training and future predictions.
- To make sure we can reload the same setup anytime (for testing, validation, or API integration).

In the next cell, we will export all these components into the folder:
📁 exports_v5

This makes it easy to re-load them later (for example, in a separate notebook or a new session)
without needing to re-run training again.
"""


In [27]:
# ============================================
# 💾 Save all important data & model artifacts (v5)
# ============================================

import os
import joblib
import pandas as pd

# Create a folder to store everything neatly
EXPORT_DIR = "exports_v5"
os.makedirs(EXPORT_DIR, exist_ok=True)

# --------------------------------------------
# 1) Save trained model
# --------------------------------------------
MODEL_PATH = os.path.join(EXPORT_DIR, "rf_model_v5.pkl")
joblib.dump(rf_model_v5, MODEL_PATH)
print(f"✅ Model saved to: {MODEL_PATH}")

# --------------------------------------------
# 2) Save training data
# --------------------------------------------
TRAIN_PATH = os.path.join(EXPORT_DIR, "full_train_df_v5.csv")
full_train_df.to_csv(TRAIN_PATH, index=False)
print(f"✅ Training dataset saved to: {TRAIN_PATH}")

# --------------------------------------------
# 3) Save feature columns used in the model
# --------------------------------------------
FEATURES_PATH = os.path.join(EXPORT_DIR, "feature_cols_v5.txt")
with open(FEATURES_PATH, "w") as f:
    f.write("\n".join(feature_cols_v5))
print(f"✅ Feature list saved to: {FEATURES_PATH}")

# --------------------------------------------
# 4) Save Kaggle-based player-role stats (for offline testing)
# --------------------------------------------
PLAYER_STATS_PATH = os.path.join(EXPORT_DIR, "player_role_stats_kaggle.csv")
player_role_stats.to_csv(PLAYER_STATS_PATH, index=False)
print(f"✅ player_role_stats (Kaggle) saved to: {PLAYER_STATS_PATH}")

# --------------------------------------------
# 5) Save Riot API player-role stats (if available)
# --------------------------------------------
if "player_role_stats_api" in globals():
    PLAYER_STATS_API_PATH = os.path.join(EXPORT_DIR, "player_role_stats_api.csv")
    player_role_stats_api.to_csv(PLAYER_STATS_API_PATH, index=False)
    print(f"✅ player_role_stats_api (from Riot API) saved to: {PLAYER_STATS_API_PATH}")
else:
    print("⚠️ player_role_stats_api not found — skipping API save.")

# --------------------------------------------
# 6) Optional: Save Top-3 results if already computed
# --------------------------------------------
if "results_df" in globals():
    RESULTS_PATH = os.path.join(EXPORT_DIR, "top3_team_comps.csv")
    results_df.head(3).to_csv(RESULTS_PATH, index=False)
    print(f"🏆 Top-3 team compositions saved to: {RESULTS_PATH}")
else:
    print("ℹ️ No results_df found yet — run team composition cell first.")

print("\n📁 All artifacts have been saved inside the folder:", EXPORT_DIR)


✅ Model saved to: exports_v5/rf_model_v5.pkl
✅ Training dataset saved to: exports_v5/full_train_df_v5.csv
✅ Feature list saved to: exports_v5/feature_cols_v5.txt
✅ player_role_stats (Kaggle) saved to: exports_v5/player_role_stats_kaggle.csv
✅ player_role_stats_api (from Riot API) saved to: exports_v5/player_role_stats_api.csv
🏆 Top-3 team compositions saved to: exports_v5/top3_team_comps.csv

📁 All artifacts have been saved inside the folder: exports_v5


## ✅ Summary and Next Steps

In this notebook you:

1. Loaded and cleaned **Kaggle** and **Riot** match data at the player and team level.
2. Engineered rich **team synergy features** that describe each composition as a whole.
3. Merged Kaggle + Riot into a single training dataset and trained a **Random Forest v5** model.
4. Analysed team compositions and champion pairs based on historical Riot statistics.
5. Queried the Riot API for a chosen set of players and used the v5 model to recommend the **top‑3 team compositions** for them.

You can now:

- Change the list of Riot players to test different squads.
- Fetch more Riot data and retrain the v5 model for better accuracy.
- Integrate this pipeline into a web dashboard or match‑planning tool for your esports project.


In [28]:
# ============================================
# 📊 Display Key DataFrames (for documentation)
# ============================================

import pandas as pd
from IPython.display import display, HTML

def show_df(title, df, n=5):
    """Utility function to display a DataFrame with a clean header"""
    display(HTML(f"<h3 style='color:#8A2BE2;'>{title}</h3>"))
    if df is not None and isinstance(df, pd.DataFrame) and not df.empty:
        display(df.head(n))
    else:
        display(HTML("<p style='color:red;'>⚠️ DataFrame is empty or not defined.</p>"))
    display(HTML("<hr style='border:1px solid #aaa;margin:20px 0;'>"))

# Show the main DataFrames
show_df("1️⃣ raw_df — Player–Match Level (Kaggle)", globals().get("raw_df"))
show_df("2️⃣ player_role_stats — Player Performance per Role", globals().get("player_role_stats"))
show_df("3️⃣ model_player_df — Player + Team + History", globals().get("model_player_df"))
show_df("4️⃣ team_model_df — Team-Level (Before Synergy)", globals().get("team_model_df"))
show_df("5️⃣ kaggle_team_full_df — Team-Level (With Synergy)", globals().get("kaggle_team_full_df"))
show_df("6️⃣ riot_team_full_df — Riot Dataset Processed", globals().get("riot_team_full_df"))
show_df("7️⃣ full_train_df — Final Training Data (Kaggle + Riot)", globals().get("full_train_df"))


Unnamed: 0,player_id,match_id,champion_id,role,win,kills,deaths,assists,MinionsKilled,DmgDealt,DmgTaken,TurretDmgDealt,total_gold,queue_type,rank_id,game_duration
0,1,EUW1_7565751492,902,SUPPORT,0,0,2,12,30,4765,12541,0,7058,CLASSIC,7,1751
1,1,EUW1_7565549583,902,SUPPORT,0,2,5,23,29,8821,14534,1,9618,CLASSIC,7,2092
2,1,EUW1_7564803077,16,SUPPORT,1,0,5,22,34,6410,19011,3,9877,CLASSIC,7,2332
6,1,EUW1_7564257986,902,SUPPORT,0,0,1,7,28,3775,12061,0,6344,CLASSIC,7,1676
7,1,EUW1_7563685543,267,SUPPORT,0,1,5,6,36,4217,13464,0,7403,CLASSIC,7,1749


Unnamed: 0,player_id,role,games_played,wins,avg_kills,avg_deaths,avg_assists,avg_cs,avg_gold,winrate
0,1,SUPPORT,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
1,2,BOTTOM,16,4,5.875,6.75,6.9375,233.5625,13425.6875,0.25
2,2,JUNGLE,3,0,6.333333,6.0,6.0,97.0,10652.0,0.0
3,2,NONE,1,1,6.0,2.0,3.0,129.0,8356.0,1.0
4,3,BOTTOM,15,4,7.266667,7.6,6.066667,169.6,11693.066667,0.266667


Unnamed: 0,player_id,match_id,role,champion_id,kills,deaths,assists,total_gold,MinionsKilled,team_side,team_win,queue_type,rank_id,hist_games_played,hist_wins,hist_avg_kills,hist_avg_deaths,hist_avg_assists,hist_avg_cs,hist_avg_gold,hist_winrate
0,1,EUW1_7565751492,SUPPORT,902,0,2,12,7058,30,Blue,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
1,1,EUW1_7565549583,SUPPORT,902,2,5,23,9618,29,Red,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
2,1,EUW1_7564803077,SUPPORT,16,0,5,22,9877,34,Red,1,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
3,1,EUW1_7564257986,SUPPORT,902,0,1,7,6344,28,Red,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5
4,1,EUW1_7563685543,SUPPORT,267,1,5,6,7403,36,Red,0,CLASSIC,7,10,5,1.3,2.9,17.1,33.0,8923.7,0.5


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,total_gold_mean,MinionsKilled_mean,hist_winrate_mean,hist_games_played_mean,hist_games_played_sum,hist_avg_kills_mean,hist_avg_deaths_mean,hist_avg_assists_mean,hist_avg_cs_mean,hist_avg_gold_mean,team_win,is_blue_team
0,EUW1_6681382047,Blue,39.0,1.0,1.0,16025.0,144.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1
1,EUW1_6681412019,Blue,11.0,0.0,8.0,8202.0,93.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1
2,EUW1_6688385247,Blue,22.0,1.0,1.0,15980.0,149.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1
3,EUW1_6688490074,Red,27.0,2.0,4.0,18290.0,154.0,1.0,1.0,1,27.0,2.0,4.0,154.0,18290.0,1,0
4,EUW1_6796881027,Blue,11.0,10.0,6.0,13082.0,143.0,0.4375,16.0,16,9.5625,12.0625,5.625,168.875,13051.75,1,1


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,total_gold_mean,MinionsKilled_mean,hist_winrate_mean,hist_games_played_mean,hist_games_played_sum,hist_avg_kills_mean,hist_avg_deaths_mean,hist_avg_assists_mean,hist_avg_cs_mean,hist_avg_gold_mean,team_win,is_blue_team,syn_n_players,syn_kills_mean,syn_kills_std,syn_deaths_mean,syn_deaths_std,syn_assists_mean,syn_assists_std,syn_minions_mean,syn_minions_std,syn_kda_mean,syn_kda_std,syn_role_nunique,syn_role_entropy,syn_role_max_count,syn_role_min_count,syn_role_imbalance,syn_gold_share_std,syn_gold_share_max,syn_kill_share_std,syn_kill_share_max,syn_assist_share_std,syn_assist_share_max,syn_minion_share_std,syn_minion_share_max
0,EUW1_6681382047,Blue,39.0,1.0,1.0,16025.0,144.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1,1,39.0,0.0,1.0,0.0,1.0,0.0,144.0,0.0,40.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
1,EUW1_6681412019,Blue,11.0,0.0,8.0,8202.0,93.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1,1,11.0,0.0,0.0,0.0,8.0,0.0,93.0,0.0,19000.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
2,EUW1_6688385247,Blue,22.0,1.0,1.0,15980.0,149.0,1.0,3.0,3,24.0,0.666667,3.333333,128.666667,13402.333333,1,1,1,22.0,0.0,1.0,0.0,1.0,0.0,149.0,0.0,23.0,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
3,EUW1_6688490074,Red,27.0,2.0,4.0,18290.0,154.0,1.0,1.0,1,27.0,2.0,4.0,154.0,18290.0,1,0,1,27.0,0.0,2.0,0.0,4.0,0.0,154.0,0.0,15.5,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,EUW1_6796881027,Blue,11.0,10.0,6.0,13082.0,143.0,0.4375,16.0,16,9.5625,12.0625,5.625,168.875,13051.75,1,1,1,11.0,0.0,10.0,0.0,6.0,0.0,143.0,0.0,1.7,0.0,1,-0.0,1,0,1,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0


Unnamed: 0,match_id,team_side,kills_mean,deaths_mean,assists_mean,MinionsKilled_mean,csPerMin_mean,goldPerMin_mean,dmgPerMin_mean,kda_mean,team_gold_first,team_kills_first,team_deaths_first,team_assists_first,team_damage_first,team_minions_first,team_baron_kills_first,team_dragon_kills_first,team_tower_kills_first,team_inhibitor_kills_first,team_riftHerald_kills_first,team_objectives_win_first,team_win,is_blue_team,syn_n_players,syn_kills_mean,syn_kills_std,syn_deaths_mean,syn_deaths_std,syn_assists_mean,syn_assists_std,syn_minions_mean,syn_minions_std,syn_kda_mean,syn_kda_std,syn_role_nunique,syn_role_entropy,syn_role_max_count,syn_role_min_count,syn_role_imbalance,syn_gold_share_std,syn_gold_share_max,syn_dmg_share_std,syn_dmg_share_max,syn_kill_share_std,syn_kill_share_max,syn_assist_share_std,syn_assist_share_max,syn_minion_share_std,syn_minion_share_max
0,EUW1_7548714571,Blue,5.0,3.8,9.6,179.8,6.18578,419.917431,913.12844,8.353333,61028,25,19,48,132708,899,1,2,6,0,1,True,1,1,5,5.0,2.0,3.8,2.315167,9.6,3.611094,179.8,84.712219,8.353333,7.650813,4,1.332179,2,0,2,0.035245,0.231369,0.059062,0.29094,0.08,0.32,0.075231,0.291667,0.094229,0.283648
1,EUW1_7548714571,Red,3.8,5.0,5.6,178.4,6.137615,369.694954,649.12844,1.97,53729,19,25,28,94340,892,0,3,4,0,0,False,0,0,5,3.8,2.135416,5.0,1.095445,5.6,3.32265,178.4,88.010454,1.97,0.763937,4,1.332179,2,0,2,0.041087,0.267826,0.066463,0.31838,0.11239,0.368421,0.118666,0.428571,0.098666,0.330717
2,EUW1_7557866224,Blue,8.8,6.0,11.6,152.6,5.747646,456.580038,992.949153,3.885887,60611,44,30,58,131814,763,0,2,10,1,1,True,1,1,5,8.8,6.368673,6.0,2.828427,11.6,5.885576,152.6,74.168996,3.885887,1.161042,4,1.332179,2,0,2,0.044525,0.271782,0.090272,0.314458,0.144743,0.431818,0.101475,0.37931,0.097207,0.290957
3,EUW1_7557866224,Red,6.0,8.8,8.8,141.4,5.3258,395.630885,641.205273,1.906753,52520,30,44,44,85120,707,0,1,2,0,0,False,0,0,5,6.0,2.607681,8.8,2.4,8.8,3.059412,141.4,57.2,1.906753,0.965058,4,1.332179,2,0,2,0.039781,0.257483,0.065379,0.29113,0.086923,0.333333,0.069532,0.295455,0.080905,0.265912
4,EUW1_7557877123,Blue,7.0,3.4,9.2,164.4,6.787452,464.361328,831.25845,6.416667,56233,35,17,46,100665,822,0,1,8,1,1,True,1,1,5,7.0,5.215362,3.4,2.059126,9.2,5.491812,164.4,74.802674,6.416667,3.691582,4,1.332179,2,0,2,0.035262,0.266107,0.056277,0.26163,0.14901,0.457143,0.119387,0.434783,0.091001,0.270073


Unnamed: 0,team_win,MinionsKilled_mean,assists_mean,deaths_mean,is_blue_team,kills_mean,syn_assist_share_max,syn_assist_share_std,syn_assists_mean,syn_assists_std,syn_deaths_mean,syn_deaths_std,syn_gold_share_max,syn_gold_share_std,syn_kda_mean,syn_kda_std,syn_kill_share_max,syn_kill_share_std,syn_kills_mean,syn_kills_std,syn_minion_share_max,syn_minion_share_std,syn_minions_mean,syn_minions_std,syn_n_players,syn_role_entropy,syn_role_imbalance,syn_role_max_count,syn_role_min_count,syn_role_nunique,dataset_source
0,1,144.0,1.0,1.0,1,39.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,40.0,0.0,1.0,0.0,39.0,0.0,1.0,0.0,144.0,0.0,1,-0.0,1,1,0,1,0
1,1,93.0,8.0,0.0,1,11.0,1.0,0.0,8.0,0.0,0.0,0.0,1.0,0.0,19000.0,0.0,1.0,0.0,11.0,0.0,1.0,0.0,93.0,0.0,1,-0.0,1,1,0,1,0
2,1,149.0,1.0,1.0,1,22.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,23.0,0.0,1.0,0.0,22.0,0.0,1.0,0.0,149.0,0.0,1,-0.0,1,1,0,1,0
3,1,154.0,4.0,2.0,0,27.0,1.0,0.0,4.0,0.0,2.0,0.0,1.0,0.0,15.5,0.0,1.0,0.0,27.0,0.0,1.0,0.0,154.0,0.0,1,-0.0,1,1,0,1,0
4,1,143.0,6.0,10.0,1,11.0,1.0,0.0,6.0,0.0,10.0,0.0,1.0,0.0,1.7,0.0,1.0,0.0,11.0,0.0,1.0,0.0,143.0,0.0,1,-0.0,1,1,0,1,0


In [56]:
from IPython.display import display, HTML
import pandas as pd

def vertical_df(df_row):
    return pd.DataFrame({"Column": df_row.index, "Value": df_row.values})

# --- Step 1 ---
df1 = vertical_df(raw_df.iloc[0])

# --- Step 2 (new columns only) ---
new2 = [c for c in player_role_stats.columns if c not in raw_df.columns]
df2 = vertical_df(player_role_stats[new2].iloc[0])

# --- Step 3 (new columns only) ---
new3 = [c for c in model_player_df.columns if c not in player_role_stats.columns]
df3 = vertical_df(model_player_df[new3].iloc[0])

# --- Step 4 (new columns only) ---
new4 = [c for c in team_model_df.columns if c not in model_player_df.columns]
df4 = vertical_df(team_model_df[new4].iloc[0])

html = f"""
<style>
.box {{
    float: left;
    width: 24%;
    padding: 5px;
}}
</style>

<h2 style="text-align:center; margin-bottom:20px;">Data Transformation Overview</h2>

<div class="box">
<h3>Step 1: raw_df</h3>
{df1.to_html(index=False)}
</div>

<div class="box">
<h3>Step 2: player_role_stats</h3>
{df2.to_html(index=False)}
</div>

<div class="box">
<h3>Step 3: model_player_df</h3>
{df3.to_html(index=False)}
</div>

<div class="box">
<h3>Step 4: team_model_df</h3>
{df4.to_html(index=False)}
</div>

<div style="clear: both;"></div>
"""

display(HTML(html))


Column,Value
player_id,1
match_id,EUW1_7565751492
champion_id,902
role,SUPPORT
win,0
kills,0
deaths,2
assists,12
MinionsKilled,30
DmgDealt,4765

Column,Value
games_played,10.0
wins,5.0
avg_kills,1.3
avg_deaths,2.9
avg_assists,17.1
avg_cs,33.0
avg_gold,8923.7
winrate,0.5

Column,Value
match_id,EUW1_7565751492
champion_id,902
kills,0
deaths,2
assists,12
total_gold,7058
MinionsKilled,30
team_side,Blue
team_win,0
queue_type,CLASSIC

Column,Value
kills_mean,39.0
deaths_mean,1.0
assists_mean,1.0
total_gold_mean,16025.0
MinionsKilled_mean,144.0
hist_winrate_mean,1.0
hist_games_played_mean,3.0
hist_games_played_sum,3.0
hist_avg_kills_mean,24.0
hist_avg_deaths_mean,0.666667


## 📌 Description of All Steps

**Step 1 – raw_df**  
Raw match record for one player. Base structure containing player, match, role, and basic stats.

**Step 2 – player_role_stats**  
Aggregated player performance per role. Provides stable averages instead of single-match noise.

**Step 3 – model_player_df**  
Match context combined with each player’s historical role statistics. Prepares rows for team-level grouping.

**Step 4 – team_model_df**  
Team-level combined features. Final dataset used to train the win-prediction model.
