# Assignment: Exploring Pok√©mon Data with Unsupervised Learning
## Lucien BAUER - Master 2 Data Science
### Date: February 6, 2026

---

## ‚ö†Ô∏è IMPORTANT : Upload des fichiers

**AVANT D'EX√âCUTER LE CODE :**

1. Clique sur l'ic√¥ne üìÅ √† gauche
2. Clique sur l'ic√¥ne d'upload üì§
3. Uploade ces 3 fichiers :
   - pokemon_complete.csv
   - moves_complete.csv
   - learnset_complete.csv

**OU** ex√©cute la cellule ci-dessous pour uploader via le code :

In [None]:
# Upload des fichiers (m√©thode interactive)
from google.colab import files

print("üì§ Uploade les 3 fichiers CSV quand la fen√™tre s'ouvre :")
print("   1. pokemon_complete.csv")
print("   2. moves_complete.csv")
print("   3. learnset_complete.csv\n")

uploaded = files.upload()

print("\n‚úÖ Fichiers upload√©s avec succ√®s !")
print(f"Fichiers re√ßus : {list(uploaded.keys())}")

üì§ Uploade les 3 fichiers CSV quand la fen√™tre s'ouvre :
   1. pokemon_complete.csv
   2. moves_complete.csv
   3. learnset_complete.csv



## Imports et Configuration

In [None]:
# Imports standards
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Clustering
from sklearn.cluster import KMeans, DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score, davies_bouldin_score

# Dimensionality Reduction
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

# Text processing
from sklearn.feature_extraction.text import TfidfVectorizer
import re

# Anomaly detection
from sklearn.ensemble import IsolationForest

# Distance metrics
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances

# Configuration des graphiques
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úì Toutes les librairies sont charg√©es!")

## Part 1: Understanding the Data

In [None]:
# Chargement des donn√©es (fichiers upload√©s dans Colab)
df_pokemon = pd.read_csv('pokemon_complete.csv')
df_moves = pd.read_csv('moves_complete.csv')
df_learnset = pd.read_csv('learnset_complete.csv')

print(f"Pok√©mon: {df_pokemon.shape}")
print(f"Moves: {df_moves.shape}")
print(f"Learnset: {df_learnset.shape}")

In [None]:
# Aper√ßu
print("=== Pok√©mon Dataset ===")
display(df_pokemon.head())

print("\n=== Moves Dataset ===")
display(df_moves.head())

In [None]:
# Statistiques de base
n_pokemon = df_pokemon['pokemon_id'].nunique()
n_moves = df_moves['move_id'].nunique()
moves_per_pokemon = df_learnset.groupby('pokemon_id').size()

print(f"üìä Nombre de Pok√©mon: {n_pokemon}")
print(f"üìä Nombre de moves: {n_moves}")
print(f"üìä Moves moyens par Pok√©mon: {moves_per_pokemon.mean():.2f}")
print(f"   Min: {moves_per_pokemon.min()}, Max: {moves_per_pokemon.max()}")

In [None]:
# Valeurs manquantes
print("Valeurs manquantes Pok√©mon:")
missing_poke = df_pokemon.isnull().sum()
print(missing_poke[missing_poke > 0])

print("\nValeurs manquantes Moves:")
missing_moves = df_moves.isnull().sum()
print(missing_moves[missing_moves > 0])

In [None]:
# Gestion des valeurs manquantes
df_pokemon['type_2'] = df_pokemon['type_2'].fillna('none')
df_pokemon['ability_2'] = df_pokemon['ability_2'].fillna('none')
df_pokemon['ability_3'] = df_pokemon['ability_3'].fillna('none')

df_moves['power'] = df_moves['power'].fillna(0)
df_moves['accuracy'] = df_moves['accuracy'].fillna(100)
df_moves['effect_text'] = df_moves['effect_text'].fillna(df_moves['short_effect_text'])
df_moves['effect_text'] = df_moves['effect_text'].fillna('No description')

print("‚úì Valeurs manquantes trait√©es")

In [None]:
# Distribution des types
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

type1_counts = df_pokemon['type_1'].value_counts()
axes[0].bar(range(len(type1_counts)), type1_counts.values, color='skyblue')
axes[0].set_xticks(range(len(type1_counts)))
axes[0].set_xticklabels(type1_counts.index, rotation=45, ha='right')
axes[0].set_title('Types Primaires', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Nombre')
axes[0].grid(axis='y', alpha=0.3)

damage_counts = df_moves['damage_class'].value_counts()
axes[1].bar(damage_counts.index, damage_counts.values, color='lightcoral')
axes[1].set_title('Damage Classes', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Nombre')
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nType le plus commun: {type1_counts.index[0]} ({type1_counts.values[0]})")

## Part 2: Clustering Pok√©mon by Statistics

In [None]:
# S√©lection des stats
stat_columns = ['hp', 'attack', 'defense', 'special-attack', 'special-defense', 'speed']
X_stats = df_pokemon[stat_columns].values

print("Stats descriptives:")
display(df_pokemon[stat_columns].describe())

In [None]:
# Normalisation avec StandardScaler
print("Normalisation: StandardScaler")
print("Raison: Pr√©serve la structure, adapt√© aux distances euclidiennes")

scaler_stats = StandardScaler()
X_stats_scaled = scaler_stats.fit_transform(X_stats)

print(f"\nShape: {X_stats_scaled.shape}")
print(f"Mean: {X_stats_scaled.mean():.3f}")
print(f"Std: {X_stats_scaled.std():.3f}")

In [None]:
# Elbow Method + Silhouette
inertias = []
silhouette_scores = []
K_range = range(2, 12)

print("Calcul des scores pour diff√©rentes valeurs de k...")
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_stats_scaled)
    inertias.append(kmeans.inertia_)
    silhouette_scores.append(silhouette_score(X_stats_scaled, kmeans.labels_))
    print(f"  k={k}: silhouette={silhouette_scores[-1]:.3f}")

# Plots
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

axes[0].plot(K_range, inertias, 'bo-', linewidth=2, markersize=8)
axes[0].set_xlabel('k', fontsize=12)
axes[0].set_ylabel('Inertie', fontsize=12)
axes[0].set_title('Elbow Method', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

axes[1].plot(K_range, silhouette_scores, 'ro-', linewidth=2, markersize=8)
axes[1].set_xlabel('k', fontsize=12)
axes[1].set_ylabel('Silhouette Score', fontsize=12)
axes[1].set_title('Silhouette par k', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

best_k = K_range[np.argmax(silhouette_scores)]
print(f"\nMeilleur k selon Silhouette: {best_k}")
print(f"Choix final: k=5 (bon compromis interpr√©tabilit√©/score)")

In [None]:
# Clustering final
k_final = 5
kmeans_final = KMeans(n_clusters=k_final, random_state=42, n_init=20)
clusters = kmeans_final.fit_predict(X_stats_scaled)

df_pokemon['cluster'] = clusters

print(f"Clustering avec k={k_final}")
print(f"Silhouette: {silhouette_score(X_stats_scaled, clusters):.3f}")
print(f"\nTaille des clusters:")
print(df_pokemon['cluster'].value_counts().sort_index())

In [None]:
# PCA pour visualisation
print("R√©duction dimensionnalit√©: PCA")
print("Raison: Pr√©serve variance, rapide, interpr√©table")

pca = PCA(n_components=2, random_state=42)
X_pca = pca.fit_transform(X_stats_scaled)

print(f"\nVariance PC1: {pca.explained_variance_ratio_[0]:.2%}")
print(f"Variance PC2: {pca.explained_variance_ratio_[1]:.2%}")
print(f"Total: {pca.explained_variance_ratio_.sum():.2%}")

In [None]:
# Visualisation PCA
fig, axes = plt.subplots(1, 2, figsize=(18, 7))

# Par cluster
scatter1 = axes[0].scatter(X_pca[:, 0], X_pca[:, 1], c=clusters, cmap='viridis', alpha=0.6, s=50, edgecolors='black', linewidth=0.5)
axes[0].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%})', fontsize=12)
axes[0].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%})', fontsize=12)
axes[0].set_title('Pok√©mon par Cluster', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)
plt.colorbar(scatter1, ax=axes[0], label='Cluster')

# Centro√Ødes
centroids_pca = pca.transform(kmeans_final.cluster_centers_)
axes[0].scatter(centroids_pca[:, 0], centroids_pca[:, 1], c='red', marker='X', s=300, edgecolors='black', linewidth=2, label='Centro√Ødes')
axes[0].legend()

# Par type
type_map = {t: i for i, t in enumerate(df_pokemon['type_1'].unique())}
type_numeric = df_pokemon['type_1'].map(type_map)
scatter2 = axes[1].scatter(X_pca[:, 0], X_pca[:, 1], c=type_numeric, cmap='tab20', alpha=0.6, s=50, edgecolors='black', linewidth=0.5)
axes[1].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%})', fontsize=12)
axes[1].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%})', fontsize=12)
axes[1].set_title('Pok√©mon par Type Primaire', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Stats moyennes par cluster
cluster_stats = df_pokemon.groupby('cluster')[stat_columns].mean()

print("Stats moyennes par cluster:")
display(cluster_stats.round(1))

# Heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(cluster_stats, annot=True, fmt='.1f', cmap='YlOrRd', linewidths=0.5)
plt.title('Stats Moyennes par Cluster', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Nommer les clusters
cluster_names = {}
for cid in range(k_final):
    stats = cluster_stats.loc[cid]
    if stats['attack'] > 100 and stats['speed'] > 90:
        name = "Fast Sweepers"
    elif stats['defense'] > 90 and stats['special-defense'] > 90:
        name = "Defensive Walls"
    elif stats['special-attack'] > 100:
        name = "Special Attackers"
    elif stats['hp'] > 90:
        name = "Bulky Pok√©mon"
    else:
        name = "Balanced All-Rounders"
    cluster_names[cid] = name

df_pokemon['cluster_name'] = df_pokemon['cluster'].map(cluster_names)

print("Noms des clusters:")
for cid, name in cluster_names.items():
    count = (df_pokemon['cluster'] == cid).sum()
    print(f"  Cluster {cid}: {name} ({count} Pok√©mon)")

In [None]:
# Exemples par cluster
for cid in range(k_final):
    print(f"\n{'='*60}")
    print(f"Cluster {cid}: {cluster_names[cid]}")
    print(f"{'='*60}")
    cluster_size = (df_pokemon['cluster'] == cid).sum()
    examples = df_pokemon[df_pokemon['cluster'] == cid].sample(min(5, cluster_size), random_state=42)
    display(examples[['name', 'type_1', 'type_2'] + stat_columns])

## Part 3: Analyzing Moves with Text

In [None]:
# Pr√©traitement texte
def preprocess_text(text):
    text = str(text).lower()
    text = re.sub(r'[^a-z\s]', '', text)
    text = re.sub(r'\s+', ' ', text)
    return text.strip()

df_moves['effect_clean'] = df_moves['effect_text'].apply(preprocess_text)
print("‚úì Texte pr√©trait√©")

In [None]:
# TF-IDF
tfidf = TfidfVectorizer(max_features=200, min_df=2, max_df=0.8, stop_words='english')
X_tfidf = tfidf.fit_transform(df_moves['effect_clean'])

print(f"TF-IDF shape: {X_tfidf.shape}")
print(f"Sparsity: {(1 - X_tfidf.nnz / (X_tfidf.shape[0] * X_tfidf.shape[1])):.2%}")

In [None]:
# Mots caract√©ristiques par damage_class
feature_names = tfidf.get_feature_names_out()
damage_classes = df_moves['damage_class'].unique()

for dc in damage_classes:
    mask = df_moves['damage_class'] == dc
    # CORRECTION : Utiliser les indices au lieu du mask directement
    indices = mask[mask].index
    tfidf_subset = X_tfidf[indices].toarray()
    avg_tfidf = tfidf_subset.mean(axis=0)
    top_idx = avg_tfidf.argsort()[::-1][:10]

    print(f"\n{'='*60}")
    print(f"{dc.upper()}")
    print(f"{'='*60}")
    for idx in top_idx:
        print(f"  {feature_names[idx]:20s}: {avg_tfidf[idx]:.4f}")

In [None]:
# Clustering des moves
k_moves = 6
kmeans_moves = KMeans(n_clusters=k_moves, random_state=42, n_init=20)
move_clusters = kmeans_moves.fit_predict(X_tfidf)

df_moves['text_cluster'] = move_clusters

print(f"Clustering moves avec k={k_moves}")
print(df_moves['text_cluster'].value_counts().sort_index())

In [None]:
# Analyse clusters de moves
for cid in range(k_moves):
    print(f"\n{'='*60}")
    print(f"Move Cluster {cid}")
    print(f"{'='*60}")
    cluster_moves = df_moves[df_moves['text_cluster'] == cid]
    print("\nDamage class distribution:")
    print(cluster_moves['damage_class'].value_counts())
    print("\nExemples:")
    n_samples = min(3, len(cluster_moves))
    examples = cluster_moves.sample(n_samples, random_state=42)
    for _, m in examples.iterrows():
        # CORRECTION : V√©rifier si short_effect_text existe
        effect_text = m['short_effect_text'] if pd.notna(m['short_effect_text']) else "No description"
        effect_preview = effect_text[:60] if len(str(effect_text)) > 60 else effect_text
        print(f"  - {m['name']} ({m['damage_class']}): {effect_preview}...")

## Part 4: Connecting Pok√©mon and Moves

In [None]:
# Repr√©sentation move-based
print("Construction de la repr√©sentation move-based...")
move_features_list = []

for pid in df_pokemon['pokemon_id']:
    pokemon_moves = df_learnset[df_learnset['pokemon_id'] == pid]['move_id']

    if len(pokemon_moves) == 0:
        features = np.zeros(6)
    else:
        moves_info = df_moves[df_moves['move_id'].isin(pokemon_moves)]
        physical = (moves_info['damage_class'] == 'physical').sum()
        special = (moves_info['damage_class'] == 'special').sum()
        status = (moves_info['damage_class'] == 'status').sum()
        avg_power = moves_info['power'].mean()
        avg_acc = moves_info['accuracy'].mean()
        avg_pp = moves_info['pp'].mean()
        features = [physical, special, status, avg_power, avg_acc, avg_pp]

    move_features_list.append(features)

X_moves = np.array(move_features_list)
scaler_moves = StandardScaler()
X_moves_scaled = scaler_moves.fit_transform(X_moves)

print(f"‚úì Repr√©sentation move-based: {X_moves_scaled.shape}")

In [None]:
# Fonction similarit√©
def find_similar(idx, X_data, n=3):
    distances = euclidean_distances([X_data[idx]], X_data)[0]
    distances[idx] = np.inf
    similar_idx = distances.argsort()[:n]
    return similar_idx

# S√©lection Pok√©mon
selected = df_pokemon.sample(5, random_state=42)

results = []
for idx in selected.index:
    poke_name = df_pokemon.loc[idx, 'name']
    similar_stats = find_similar(idx, X_stats_scaled)
    similar_moves = find_similar(idx, X_moves_scaled)

    results.append({
        'Pokemon': poke_name,
        'Similar (Stats)': ', '.join(df_pokemon.iloc[similar_stats]['name'].values),
        'Similar (Moves)': ', '.join(df_pokemon.iloc[similar_moves]['name'].values)
    })

print("Similarit√© Stats vs Moves:")
display(pd.DataFrame(results))

In [None]:
# Matrices de similarit√©
print("Calcul des matrices de similarit√©...")
sim_stats = cosine_similarity(X_stats_scaled)
sim_moves = cosine_similarity(X_moves_scaled)

# Exemples
print("\n1. Similaires en STATS ET MOVES:")
found = False
for i in range(min(200, len(df_pokemon))):
    if found:
        break
    for j in range(i+1, min(200, len(df_pokemon))):
        if sim_stats[i,j] > 0.9 and sim_moves[i,j] > 0.9:
            print(f"  {df_pokemon.iloc[i]['name']} ‚Üî {df_pokemon.iloc[j]['name']}")
            print(f"    Stats sim: {sim_stats[i,j]:.3f}, Moves sim: {sim_moves[i,j]:.3f}")
            found = True
            break

print("\n2. Similaires en STATS, diff√©rents en MOVES:")
found = False
for i in range(min(200, len(df_pokemon))):
    if found:
        break
    for j in range(i+1, min(200, len(df_pokemon))):
        if sim_stats[i,j] > 0.85 and sim_moves[i,j] < 0.5:
            print(f"  {df_pokemon.iloc[i]['name']} ‚Üî {df_pokemon.iloc[j]['name']}")
            print(f"    Stats sim: {sim_stats[i,j]:.3f}, Moves sim: {sim_moves[i,j]:.3f}")
            found = True
            break

print("\n3. Diff√©rents en STATS, similaires en MOVES:")
found = False
for i in range(min(200, len(df_pokemon))):
    if found:
        break
    for j in range(i+1, min(200, len(df_pokemon))):
        if sim_stats[i,j] < 0.6 and sim_moves[i,j] > 0.85:
            print(f"  {df_pokemon.iloc[i]['name']} ‚Üî {df_pokemon.iloc[j]['name']}")
            print(f"    Stats sim: {sim_stats[i,j]:.3f}, Moves sim: {sim_moves[i,j]:.3f}")
            found = True
            break

## Part 5: Finding Unusual Pok√©mon

In [None]:
# Isolation Forest
print("M√©thode: Isolation Forest")
print("Raison: Efficace, pas d'hypoth√®se sur distribution")

iso_forest = IsolationForest(contamination=0.05, random_state=42, n_estimators=100)
anomaly_labels = iso_forest.fit_predict(X_stats_scaled)
anomaly_scores = iso_forest.score_samples(X_stats_scaled)

df_pokemon['anomaly_label'] = anomaly_labels
df_pokemon['anomaly_score'] = anomaly_scores

n_outliers = (anomaly_labels == -1).sum()
print(f"\nOutliers d√©tect√©s: {n_outliers} ({n_outliers/len(df_pokemon)*100:.1f}%)")

In [None]:
# Top outliers
top_outliers = df_pokemon.nsmallest(10, 'anomaly_score')

print("Top 10 outliers:")
display(top_outliers[['name', 'type_1', 'type_2', 'anomaly_score'] + stat_columns])

In [None]:
# Analyse d√©taill√©e top 5
print("\nAnalyse d√©taill√©e des top 5 outliers:\n")

for idx, row in top_outliers.head(5).iterrows():
    print(f"\n{'='*70}")
    print(f"{row['name']} (Type: {row['type_1']}/{row['type_2']})")
    print(f"Anomaly Score: {row['anomaly_score']:.4f}")
    print(f"{'='*70}")

    print(f"\nStatistiques:")
    for stat in stat_columns:
        value = row[stat]
        mean = df_pokemon[stat].mean()
        std = df_pokemon[stat].std()
        z_score = (value - mean) / std
        status = "üî¥ EXTR√äME" if abs(z_score) > 2.5 else "üü° √âlev√©" if abs(z_score) > 1.5 else "üü¢ Normal"
        print(f"  {stat:18s}: {value:6.1f} (z={z_score:6.2f}) {status}")

In [None]:
# Visualisation
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

colors = ['red' if l == -1 else 'blue' for l in anomaly_labels]
axes[0].scatter(X_pca[:, 0], X_pca[:, 1], c=colors, alpha=0.6, s=50, edgecolors='black', linewidth=0.5)
axes[0].set_title('Outliers dans l\'espace PCA', fontsize=14, fontweight='bold')
axes[0].set_xlabel('PC1')
axes[0].set_ylabel('PC2')
axes[0].grid(True, alpha=0.3)

axes[1].hist(anomaly_scores, bins=50, alpha=0.7, color='skyblue', edgecolor='black')
axes[1].axvline(anomaly_scores[anomaly_labels == -1].max(), color='red', linestyle='--', linewidth=2, label='Seuil outliers')
axes[1].set_title('Distribution Anomaly Scores', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Score')
axes[1].set_ylabel('Fr√©quence')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# L√©gendaires vs normaux
df_pokemon['bst'] = df_pokemon[stat_columns].sum(axis=1)
df_pokemon['is_legendary'] = df_pokemon['bst'] > 580

leg_out = ((df_pokemon['is_legendary']) & (df_pokemon['anomaly_label'] == -1)).sum()
leg_tot = df_pokemon['is_legendary'].sum()
norm_out = ((~df_pokemon['is_legendary']) & (df_pokemon['anomaly_label'] == -1)).sum()
norm_tot = (~df_pokemon['is_legendary']).sum()

leg_rate = leg_out / leg_tot if leg_tot > 0 else 0
norm_rate = norm_out / norm_tot if norm_tot > 0 else 0

print(f"\nüåü Analyse L√©gendaires vs Normaux:\n")
print(f"L√©gendaires (BST>580): {leg_tot}")
print(f"  Outliers: {leg_out} ({leg_rate*100:.1f}%)")
print(f"\nNormaux: {norm_tot}")
print(f"  Outliers: {norm_out} ({norm_rate*100:.1f}%)")

if leg_rate > norm_rate * 1.5:
    print("\n‚ûú Les l√©gendaires sont significativement plus susceptibles d'√™tre outliers")
    print("   Coh√©rent avec leur design: stats exceptionnelles, √©quilibrage sp√©cial")

## Conclusion

### D√©couvertes principales:

**Part 1 - Understanding the Data:**
- Dataset de qualit√© avec valeurs manquantes g√©rables
- Types d√©s√©quilibr√©s (Water, Normal, Grass surrepr√©sent√©s)
- Moves bien r√©partis entre physical, special, status

**Part 2 - Clustering by Stats:**
- 5 arch√©types identifi√©s (Fast Sweepers, Defensive Walls, etc.)
- Les clusters capturent des R√îLES de combat, pas des types √©l√©mentaires
- PCA explique ~60% de variance avec 2 composantes
- Types = r√©sistances, Clusters = styles de jeu

**Part 3 - Text Analysis:**
- TF-IDF r√©v√®le sous-cat√©gories fines (healing, status effects, damage)
- Mots caract√©ristiques clairement diff√©renci√©s par damage_class
- Le texte apporte information compl√©mentaire aux attributs num√©riques
- Clusters textuels capturent des M√âCANIQUES de jeu

**Part 4 - Stats vs Moves:**
- Corr√©lation mod√©r√©e entre similarit√© stats et moves
- Stats = POTENTIEL, Moves = OPTIONS TACTIQUES
- Les deux dimensions sont compl√©mentaires
- Exemples trouv√©s dans chaque cat√©gorie

**Part 5 - Anomaly Detection:**
- ~5% outliers d√©tect√©s (Isolation Forest)
- L√©gendaires significativement plus outliers
- Outliers dus √†: stats extr√™mes, BST exceptionnel, distributions inhabituelles
- Design intentionnel des l√©gendaires visible dans les donn√©es

### Insights cl√©s:

1. **Multi-facettes**: Stats, types, moves capturent aspects diff√©rents
2. **Structure cach√©e**: Unsupervised learning r√©v√®le arch√©types non √©vidents
3. **Valeur du text mining**: Descriptions r√©v√®lent m√©caniques non visibles
4. **Balance vs Sp√©cialisation**: Continuum entre Pok√©mon √©quilibr√©s et sp√©cialis√©s
5. **Design intentionnel**: L√©gendaires math√©matiquement diff√©rents par construction