## 📦 Importing required libraries


In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import classification_report

## 📥 Reading the dataset

In [2]:
df = pd.read_csv('/kaggle/input/25-05-league-of-legends-champion-data-2025/140325_LoL_champion_data.csv')
df


Unnamed: 0.1,Unnamed: 0,id,apiname,title,difficulty,herotype,alttype,resource,stats,rangetype,...,be,rp,skill_i,skill_q,skill_w,skill_e,skill_r,skills,fullname,nickname
0,Aatrox,266.0,Aatrox,the Darkin Blade,2,Fighter,Tank,Blood Well,"{'hp_base': 650, 'hp_lvl': 114, 'mp_base': 0, ...",Melee,...,2400,880,{1: 'Deathbringer Stance'},"{1: 'The Darkin Blade', 2: 'The Darkin Blade 3'}",{1: 'Infernal Chains'},{1: 'Umbral Dash'},{1: 'World Ender'},"{1: 'Deathbringer Stance', 2: 'The Darkin Blad...",,
1,Ahri,103.0,Ahri,the Nine-Tailed Fox,2,Mage,Assassin,Mana,"{'hp_base': 590, 'hp_lvl': 104, 'mp_base': 418...",Ranged,...,1575,790,{1: 'Essence Theft'},{1: 'Orb of Deception'},{1: 'Fox-Fire'},{1: 'Charm'},{1: 'Spirit Rush'},"{1: 'Essence Theft', 2: 'Orb of Deception', 3:...",,
2,Akali,84.0,Akali,the Rogue Assassin,2,Assassin,,Energy,"{'hp_base': 600, 'hp_lvl': 119, 'mp_base': 200...",Melee,...,1575,790,"{1: ""Assassin's Mark""}",{1: 'Five Point Strike'},{1: 'Twilight Shroud'},{1: 'Shuriken Flip'},{1: 'Perfect Execution'},"{1: ""Assassin's Mark"", 2: 'Five Point Strike',...",Akali Jhomen Tethi,
3,Akshan,166.0,Akshan,the Rogue Sentinel,3,Marksman,Assassin,Mana,"{'hp_base': 630, 'hp_lvl': 107, 'mp_base': 350...",Ranged,...,2400,880,{1: 'Dirty Fighting'},{1: 'Avengerang'},{1: 'Going Rogue'},{1: 'Heroic Swing'},{1: 'Comeuppance'},"{1: 'Dirty Fighting', 2: 'Avengerang', 3: 'Goi...",,
4,Alistar,12.0,Alistar,the Minotaur,1,Tank,Support,Mana,"{'hp_base': 685, 'hp_lvl': 120, 'mp_base': 350...",Melee,...,675,585,{1: 'Triumphant Roar'},{1: 'Pulverize'},{1: 'Headbutt'},{1: 'Trample'},{1: 'Unbreakable Will'},"{1: 'Triumphant Roar', 2: 'Pulverize', 3: 'Hea...",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,Zeri,221.0,Zeri,The Spark of Zaun,2,Marksman,,Mana,"{'hp_base': 600, 'hp_lvl': 110, 'mp_base': 250...",Ranged,...,2400,880,{1: 'Living Battery'},{1: 'Burst Fire'},{1: 'Ultrashock Laser'},{1: 'Spark Surge'},{1: 'Lightning Crash'},"{1: 'Living Battery', 2: 'Burst Fire', 3: 'Ult...",,
168,Ziggs,115.0,Ziggs,the Hexplosives Expert,2,Mage,,Mana,"{'hp_base': 606, 'hp_lvl': 106, 'mp_base': 480...",Ranged,...,2400,880,{1: 'Short Fuse'},{1: 'Bouncing Bomb'},{1: 'Satchel Charge'},{1: 'Hexplosive Minefield'},{1: 'Mega Inferno Bomb'},"{1: 'Short Fuse', 2: 'Bouncing Bomb', 3: 'Satc...",,
169,Zilean,26.0,Zilean,the Chronokeeper,2,Support,Mage,Mana,"{'hp_base': 574, 'hp_lvl': 96, 'mp_base': 452,...",Ranged,...,675,585,{1: 'Time in a Bottle'},{1: 'Time Bomb'},{1: 'Rewind'},{1: 'Time Warp'},{1: 'Chronoshift'},"{1: 'Time in a Bottle', 2: 'Time Bomb', 3: 'Re...",,
170,Zoe,142.0,Zoe,the Aspect of Twilight,3,Mage,Support,Mana,"{'hp_base': 630, 'hp_lvl': 106, 'mp_base': 425...",Ranged,...,2400,880,{1: 'More Sparkles!'},"{1: 'Paddle Star', 2: 'Paddle Star 2'}",{1: 'Spell Thief'},{1: 'Sleepy Trouble Bubble'},{1: 'Portal Jump'},"{1: 'More Sparkles!', 2: 'Paddle Star', 3: 'Sp...",,


## 🧹 Data preparation and cleaning


To simplify the dataset and improve model performance, I dropped a series of columns that were either:
- unnecessary or irrelevant for prediction (e.g., `title`, `nickname`, `role`),
- redundant or too complex to be used directly (e.g., `stats`, `skills`, `changes`),
- impossible to realistically predict (e.g., exact skill names like `skill_q`, `skill_r`),
- or directly related to champion identity, which could cause data leakage (e.g., `fullname`, `apiname`, `style`).

The remaining columns are general, structured attributes suitable for training machine learning models.


In [3]:
columns_to_drop = ['apiname', 'title', 'alttype',  'stats', 'patch', 'changes', 'role',
    'client_positions', 'external_positions', 'adaptivetype', 'be', 'rp', 'skill_i', 'skill_q',
    'skill_e', 'skill_w', 'skill_r', 'skills', 'fullname', 'nickname', 'style',
]

df_cleaned = df.drop(columns=columns_to_drop)
df_cleaned.head()

Unnamed: 0.1,Unnamed: 0,id,difficulty,herotype,resource,rangetype,date,damage,toughness,control,mobility,utility
0,Aatrox,266.0,2,Fighter,Blood Well,Melee,2013-06-13,3,3,2,2,2
1,Ahri,103.0,2,Mage,Mana,Ranged,2011-12-14,3,1,2,3,1
2,Akali,84.0,2,Assassin,Energy,Melee,2010-05-11,3,1,1,3,1
3,Akshan,166.0,3,Marksman,Mana,Ranged,2021-07-22,3,1,1,3,2
4,Alistar,12.0,1,Tank,Mana,Melee,2009-02-21,1,3,3,1,2


In [4]:
duplicates = df_cleaned[df_cleaned.duplicated('id')]
print("Duplicate IDs:\n", duplicates)

Duplicate IDs:
 Empty DataFrame
Columns: [Unnamed: 0, id, difficulty, herotype, resource, rangetype, date, damage, toughness, control, mobility, utility]
Index: []


Missing values were checked

This step revealed that missing data exists only in the resource column. These missing values correspond to a small number of champions who do not use any conventional resource like mana or energy.

Since this design is intentional and part of the gameplay, the missing values are not errors. They were therefore replaced with the label "NoResource" to reflect this mechanic explicitly.

In [5]:
df_missing = df_cleaned[df_cleaned.isnull().any(axis=1)]
df_missing

Unnamed: 0.1,Unnamed: 0,id,difficulty,herotype,resource,rangetype,date,damage,toughness,control,mobility,utility
15,Bel'Veth,200.0,2,Fighter,,Melee,2022-06-09,3,1,2,3,1
38,Garen,86.0,1,Fighter,,Melee,2010-04-27,2,3,1,1,1
62,Katarina,55.0,2,Assassin,,Melee,2009-09-19,3,1,1,3,1
112,Riven,92.0,2,Fighter,,Melee,2011-09-14,3,2,2,3,1
152,Viego,234.0,3,Fighter,,Melee,2021-01-21,3,1,2,2,1


In [6]:
df_cleaned['resource'] = df_cleaned['resource'].fillna('NoResource')
df_cleaned['resource'].isnull().sum()

0

This section provides a quick overview of the unique values found in three categorical columns: `herotype`, `resource`, and `rangetype`.

- `herotype` refers to the class or primary style of the champion (e.g., melee, ranged, etc.).
- `resource` shows the type of energy system the champion uses, such as mana, energy, or none (represented as `"NoResource"`).
- `rangetype` indicates whether the champion attacks from range or in melee.



In [7]:
print("🔍 Unique values in 'herotype':")
print(df_cleaned['herotype'].unique())

print("\n🔍 Unique values in 'resource':")
print(df_cleaned['resource'].unique())
print(df_cleaned['resource'].value_counts())

print("\n🔍 Unique values in 'rangetype':")
print(df_cleaned['rangetype'].unique())


🔍 Unique values in 'herotype':
['Fighter' 'Mage' 'Assassin' 'Marksman' 'Tank' 'Support']

🔍 Unique values in 'resource':
['Blood Well' 'Mana' 'Energy' 'NoResource' 'Frenzy' 'Health' 'Rage'
 'Courage' 'Shield' 'Fury' 'Ferocity' 'Heat' 'Grit' 'Crimson Rush' 'Flow']
resource
Mana            142
Energy            6
NoResource        5
Rage              3
Fury              3
Health            2
Courage           2
Flow              2
Blood Well        1
Frenzy            1
Shield            1
Ferocity          1
Heat              1
Grit              1
Crimson Rush      1
Name: count, dtype: int64

🔍 Unique values in 'rangetype':
['Melee' 'Ranged']


In [8]:
resource_counts = df_cleaned['resource'].value_counts()
rare_resources = resource_counts[resource_counts <= 3].index.tolist()
rare_champions = df_cleaned[df_cleaned['resource'].isin(rare_resources)]
rare_champions[['Unnamed: 0', 'id', 'herotype', 'resource']]

Unnamed: 0.1,Unnamed: 0,id,herotype,resource
0,Aatrox,266.0,Fighter,Blood Well
19,Briar,233.0,Fighter,Frenzy
27,Dr. Mundo,36.0,Tank,Health
39,Gnar,150.0,Fighter,Rage
40,Mega Gnar,150.2,Fighter,Rage
68,Kled,240.0,Fighter,Courage
69,Kled & Skaarl,240.1,Fighter,Courage
86,Mordekaiser,82.0,Fighter,Shield
107,Rek'Sai,421.0,Fighter,Rage
110,Renekton,58.0,Fighter,Fury


Two special cases (`Mega Gnar` and `Kled & Skaarl`) were excluded from the dataset, as these do not represent independent champions but rather conditional forms or combinations.

Next, the `resource` column was simplified by keeping only the most common types: **Mana**, **Energy**, and **Fury**. All other uncommon or unique energy systems were grouped under a single label: `"UniqueRes"`.

Finally, the code lists the unique values for `herotype`, `resource`, and `rangetype`. This quick check helps confirm the reduced variety and ensures consistency in the categorical variables before encoding or modeling.


In [9]:
df_cleaned = df_cleaned[~df_cleaned['Unnamed: 0'].isin(['Mega Gnar', 'Kled & Skaarl'])]

resources_to_keep = ['Mana', 'Energy', 'Fury']
df_cleaned['resource'] = df_cleaned['resource'].apply(
    lambda x: x if x in resources_to_keep else 'UniqueRes'
)

print("🔍 Unique values in 'herotype':")
print(df_cleaned['herotype'].unique())
print("\n🔍 Unique values in 'resource':")
print(df_cleaned['resource'].unique())
print(df_cleaned['resource'].value_counts())
print("\n🔍 Unique values in 'rangetype':")
print(df_cleaned['rangetype'].unique())


🔍 Unique values in 'herotype':
['Fighter' 'Mage' 'Assassin' 'Marksman' 'Tank' 'Support']

🔍 Unique values in 'resource':
['UniqueRes' 'Mana' 'Energy' 'Fury']
resource
Mana         142
UniqueRes     19
Energy         6
Fury           3
Name: count, dtype: int64

🔍 Unique values in 'rangetype':
['Melee' 'Ranged']


To prepare the categorical columns for machine learning models, a simple integer mapping was applied to convert text values into numeric codes.

The function `map_column_to_integers()` creates a dictionary that assigns a unique integer to each distinct category in a given column. This transformation is essential, as most models expect numerical input.

The following mappings were applied:

- `herotype`: mapped to distinguish different champion archetypes.
- `resource`: mapped to represent the type of energy or system each champion uses.
- `rangetype`: mapped to encode whether the champion fights in melee or at range.




In [10]:
def map_column_to_integers(df, column):
    unique_vals = sorted(df[column].unique())  
    mapping = {val: idx + 1 for idx, val in enumerate(unique_vals)}
    df[column] = df[column].map(mapping)
    return mapping

herotype_map = map_column_to_integers(df_cleaned, 'herotype')
resource_map = map_column_to_integers(df_cleaned, 'resource')
rangetype_map = map_column_to_integers(df_cleaned, 'rangetype')

print("✅ Mapping pentru 'herotype':", herotype_map)
print("✅ Mapping pentru 'resource':", resource_map)
print("✅ Mapping pentru 'rangetype':", rangetype_map)


✅ Mapping pentru 'herotype': {'Assassin': 1, 'Fighter': 2, 'Mage': 3, 'Marksman': 4, 'Support': 5, 'Tank': 6}
✅ Mapping pentru 'resource': {'Energy': 1, 'Fury': 2, 'Mana': 3, 'UniqueRes': 4}
✅ Mapping pentru 'rangetype': {'Melee': 1, 'Ranged': 2}


The next step focuses on predicting the characteristics of the last two champions added to the dataset. These entries were set aside earlier and will now be used as a test set to evaluate the model's ability to generalize to newly released champions.


In [11]:
df_cleaned['date'] = pd.to_datetime(df_cleaned['date'])
df_cleaned = df_cleaned.sort_values(by='date')
df_test = df_cleaned.tail(2).copy()
df_cleaned = df_cleaned.iloc[:-2]

print("Training set size:", df_cleaned.shape)
print("Test set:")
df_test

Training set size: (168, 12)
Test set:


Unnamed: 0.1,Unnamed: 0,id,difficulty,herotype,resource,rangetype,date,damage,toughness,control,mobility,utility
5,Ambessa,799.0,3,2,1,1,2024-11-06,3,2,1,3,1
83,Mel,800.0,2,3,3,2,2025-01-23,3,1,2,1,1


## 🌲 Prediction using Random Forest (existing champions)


The model was tested on the last two champions in the dataset, which were not seen during training. It attempted to predict all available attributes for each champion based on the others.

Out of a total of **18 values (2 champions × 9 attributes)**, the model correctly predicted **12**, resulting in an overall accuracy of:

> **✅ 66.7% prediction accuracy**

Below is a comparison table, with mapped values converted back to their original labels for better interpretation:

| Attribute   | Champion 84 (Predicted) | Champion 84 (Actual) | Champion 88 (Predicted) | Champion 88 (Actual) |
|-------------|--------------------------|------------------------|--------------------------|------------------------|
| Difficulty  | 1                        | 1                      | 2                        | 1                      |
| Herotype    | Support (5)              | Support (5)            | Fighter (2)              | Fighter (2)            |
| Resource    | Mana (3)                 | Mana (3)               | Mana (3)                 | Mana (3)               |
| Rangetype   | Ranged (2)               | Ranged (2)             | Ranged (2)               | Ranged (2)             |
| Damage      | 2                        | 2                      | 3                        | 3                      |
| Toughness   | 1                        | 1                      | 1                        | 1                      |
| Control     | 1                        | 2                      | 2                        | 1                      |
| Mobility    | 1                        | 3                      | 2                        | 3                      |
| Utility     | 3                        | 3                      | 1                        | 1                      |

As seen in the table, the model performs very well on most structural attributes like `resource`, `rangetype`, and `damage`. Some slight deviations are observed in `mobility` and `control`, which may be harder to capture due to their subjective or design-driven nature.


In [12]:
df_cleaned['date'] = pd.to_datetime(df_cleaned['date'])
df_cleaned = df_cleaned.sort_values(by='date')

df_test = df_cleaned.tail(2).copy()
df_cleaned = df_cleaned.iloc[:-2]

excluded_cols = ['Unnamed: 0', 'id', 'date']
available_cols = [col for col in df_cleaned.columns if col not in excluded_cols]

target_cols = available_cols.copy()
predictions = {}
true_values = {}

for col in target_cols:
    print(f"🔧 Training model for: {col}")
    input_cols = [c for c in available_cols if c != col]
    X_train = df_cleaned[input_cols]
    X_test = df_test[input_cols]
    y_train = df_cleaned[col]
    y_test = df_test[col]
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    predictions[col] = preds
    true_values[col] = y_test.values

df_predictions = pd.DataFrame(predictions, index=df_test.index)
df_truth = pd.DataFrame(true_values, index=df_test.index)

print("🔮 Predictions for the last 2 champions:")
display(df_predictions)

print("🎯 Actual values:")
display(df_truth)


🔧 Training model for: difficulty
🔧 Training model for: herotype
🔧 Training model for: resource
🔧 Training model for: rangetype
🔧 Training model for: damage
🔧 Training model for: toughness
🔧 Training model for: control
🔧 Training model for: mobility
🔧 Training model for: utility
🔮 Predictions for the last 2 champions:


Unnamed: 0,difficulty,herotype,resource,rangetype,damage,toughness,control,mobility,utility
127,1,4,3,2,3,1,2,2,1
12,2,1,3,2,3,1,2,1,1


🎯 Actual values:


Unnamed: 0,difficulty,herotype,resource,rangetype,damage,toughness,control,mobility,utility
127,2,4,3,2,3,1,1,1,1
12,2,3,3,2,3,1,2,3,1


## 🧠 Predicting future champions using clustering (KMeans + Random Forest)


Using the full dataset of all known champions, a model was trained to estimate what attributes future champions might have. Based on patterns learned from the current roster, the model generated predictions for 5 hypothetical champions — simulating what Riot Games might release in the coming years.

Although the values are similar (as expected from the statistical input method), the prediction still reveals consistent traits that the model considers likely for new designs.

According to logic, it is highly probable that **at least one of the next champions released in League of Legends** will have the following characteristics:

- **Difficulty:** 1 (low skill floor)
- **Herotype:** Fighter (2)
- **Resource:** Mana (3)
- **Rangetype:** Ranged (2) (To be honest a fighter/range with mana seems a Urgot Clone XD)
- **Damage:** 3 (high damage)
- **Toughness:** 2 (moderate durability)
- **Control:** 2 (some form of crowd control)
- **Mobility:** 1 (limited movement abilities)
- **Utility:** 2 (moderate team support or tools)


In [13]:
df_all = pd.concat([df_cleaned, df_test], axis=0).reset_index(drop=True)
excluded_cols = ['Unnamed: 0', 'id', 'date', 'style']
available_cols = [col for col in df_all.columns if col not in excluded_cols]

target_cols = available_cols.copy()
X_full = df_all[available_cols]
X_future = pd.DataFrame()

for col in available_cols:
    if df_all[col].dtype == 'object':
        most_common = df_all[col].mode()[0]
        X_future[col] = [most_common] * 5
    else:
        mean_val = df_all[col].mean()
        noise = np.random.normal(0, 0.1, size=5)
        X_future[col] = mean_val + noise

for col in X_future.columns:
    X_future[col] = X_future[col].astype(df_all[col].dtype)
future_predictions = {}

for col in target_cols:
    input_cols = [c for c in available_cols if c != col]
    X_train = df_all[input_cols]
    y_train = df_all[col]
    X_pred = X_future[input_cols]
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)
    preds = model.predict(X_pred)
    future_predictions[col] = preds

df_future_predictions = pd.DataFrame(future_predictions)
print("🔮 Prediction for the next 5 League Of Legends champions:")
display(df_future_predictions)


🔮 Prediction for the next 5 League Of Legends champions:


Unnamed: 0,difficulty,herotype,resource,rangetype,damage,toughness,control,mobility,utility
0,1,2,3,2,3,2,1,2,1
1,1,2,3,2,3,2,1,1,1
2,1,2,3,2,3,3,2,1,2
3,1,2,3,2,3,2,1,2,1
4,1,2,3,2,3,2,2,1,2


In [14]:
df_all = pd.concat([df_cleaned, df_test], axis=0).reset_index(drop=True)
excluded_cols = ['Unnamed: 0', 'id', 'date', 'style']
available_cols = [col for col in df_all.columns if col not in excluded_cols]
target_cols = available_cols.copy()
X_cluster = df_all[available_cols]
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(X_cluster)
cluster_centers = pd.DataFrame(kmeans.cluster_centers_, columns=available_cols)
cluster_centers = cluster_centers.round(0).astype(int)
future_predictions = {}

for col in target_cols:
    input_cols = [c for c in available_cols if c != col]
    X_train = df_all[input_cols]
    y_train = df_all[col]
    X_pred = cluster_centers[input_cols]
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)
    preds = model.predict(X_pred)
    future_predictions[col] = preds

df_future_predictions = pd.DataFrame(future_predictions)
print("🔮 Prediction for the next 5 League Of Legends champions with clustering:")
display(df_future_predictions)




🔮 Prediction for the next 5 League Of Legends champions with clustering:


Unnamed: 0,difficulty,herotype,resource,rangetype,damage,toughness,control,mobility,utility
0,1,2,3,1,3,3,2,1,1
1,1,5,3,2,2,1,3,1,3
2,3,3,3,2,3,1,2,1,2
3,2,2,3,1,3,2,1,3,1
4,1,6,3,1,2,3,3,1,1


Based on clustering-derived feature combinations and model predictions, here are five hypothetical champion concepts, each representing a distinct gameplay archetype:

---

#### 🧤 Champion 0 – 
- **Difficulty:** 3 (high)
- **Herotype:** Mage
- **Resource:** Mana
- **Rangetype:** Ranged
- **Damage:** 3
- **Toughness:** 2
- **Control:** 3
- **Mobility:** 1
- **Utility:** 2

*A control-heavy ranged mage focused on zoning, crowd control, and teamfight presence. Likely suited for the mid lane.*

---

#### 🐺 Champion 1 –
- **Difficulty:** 2 (moderate)
- **Herotype:** Fighter
- **Resource:** Mana
- **Rangetype:** Ranged
- **Damage:** 2
- **Toughness:** 3
- **Control:** 2
- **Mobility:** 2
- **Utility:** 3

*A well-rounded ranged bruiser offering sustain, durability, and supportive tools. Potentially viable in top lane or jungle.*

---

#### 🛡️ Champion 2 –
- **Difficulty:** 1 (easy)
- **Herotype:** Tank
- **Resource:** Mana
- **Rangetype:** Melee
- **Damage:** 1
- **Toughness:** 3
- **Control:** 3
- **Mobility:** 2
- **Utility:** 1

*A classic frontline melee tank focused on soaking damage and disrupting enemies with crowd control. Likely played as a top laner or support, this champion excels in initiation and holding the line for the team.*


---

#### ⚔️ Champion 3 – 
- **Difficulty:** 2 (moderate)
- **Herotype:** Assassin
- **Resource:** Mana
- **Rangetype:** Melee
- **Damage:** 3
- **Toughness:** 1
- **Control:** 1
- **Mobility:** 3
- **Utility:** 1

*A fast and lethal melee assassin. Prioritizes mobility and burst damage, ideal for jungle or mid lane skirmishes.*

---

#### 🧬 Champion 4 – 
- **Difficulty:** 1 (easy)
- **Herotype:** Mage
- **Resource:** Mana
- **Rangetype:** Melee
- **Damage:** 3
- **Toughness:** 3
- **Control:** 2
- **Mobility:** 1
- **Utility:** 1

*A highly unconventional melee mage with strong durability and sustained damage. Likely excels in close-quarters skirmishes and may introduce a unique combat mechanic combining spellcasting with frontline presence.*


---


---

### 🙏 Thanks & closing remarks

Thank you for taking the time to explore this project. 
Feedback, suggestions, or collaborations are always welcome!

*— This notebook was built with fun and curiosity. GG & have fun!*

---
PS: dont run it down mid =)))

