
# Overwatch Hero Psychology: Archetype Analysis (Final Notebook)

This notebook extends the original *"We Are Who We Lock"* analysis by:
- Expanding to the full 45‑hero Overwatch roster (Season 20).
- Using **cosine similarity** to compare heroes on psychological attributes.
- Applying **PCA** to build a 2D "Hero Space".
- Applying **K‑Means clustering** to discover **four psychological archetypes**:
  - Archetype 0 — *The Skirmishers*
  - Archetype 1 — *The Tacticians*
  - Archetype 2 — *The Anchors*
  - Archetype 3 — *The Sharpshooters*
- Visualizing these archetypes with heatmaps, scatterplots, and radar charts.


In [None]:

# 0. Imports & global style

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from math import pi

from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

plt.rcParams["figure.figsize"] = (10, 7)
sns.set_style("whitegrid")

role_colors = {"Tank": "#1f77b4", "Damage": "#d62728", "Support": "#2ca02c"}


## 1. Build hero dataset
We define the full 45‑hero Overwatch roster (Season 20) and their psychological stats.

In [None]:

# 1. Build hero dataset

columns = [
    "Hero", "Mobility", "Self_Sustain", "Mechanical_Skill",
    "HP", "Damage", "Support", "Tank"
]

heroes_data = [
    # --- TANK HEROES ---
    {"Hero": "D.Va", "Mobility": 8, "Self_Sustain": 6, "Mechanical_Skill": 5, "HP": 725, "Role": "Tank"},
    {"Hero": "Doomfist", "Mobility": 10, "Self_Sustain": 7, "Mechanical_Skill": 10, "HP": 525, "Role": "Tank"},
    {"Hero": "Hazard", "Mobility": 7, "Self_Sustain": 8, "Mechanical_Skill": 6, "HP": 650, "Role": "Tank"},
    {"Hero": "Junker Queen", "Mobility": 6, "Self_Sustain": 8, "Mechanical_Skill": 7, "HP": 525, "Role": "Tank"},
    {"Hero": "Mauga", "Mobility": 4, "Self_Sustain": 9, "Mechanical_Skill": 5, "HP": 725, "Role": "Tank"},
    {"Hero": "Orisa", "Mobility": 3, "Self_Sustain": 9, "Mechanical_Skill": 4, "HP": 625, "Role": "Tank"},
    {"Hero": "Ramattra", "Mobility": 4, "Self_Sustain": 7, "Mechanical_Skill": 6, "HP": 600, "Role": "Tank"},
    {"Hero": "Reinhardt", "Mobility": 3, "Self_Sustain": 6, "Mechanical_Skill": 4, "HP": 700, "Role": "Tank"},
    {"Hero": "Roadhog", "Mobility": 2, "Self_Sustain": 10, "Mechanical_Skill": 5, "HP": 750, "Role": "Tank"},
    {"Hero": "Sigma", "Mobility": 2, "Self_Sustain": 7, "Mechanical_Skill": 8, "HP": 625, "Role": "Tank"},
    {"Hero": "Winston", "Mobility": 8, "Self_Sustain": 5, "Mechanical_Skill": 6, "HP": 625, "Role": "Tank"},
    {"Hero": "Wrecking Ball", "Mobility": 10, "Self_Sustain": 8, "Mechanical_Skill": 10, "HP": 775, "Role": "Tank"},
    {"Hero": "Zarya", "Mobility": 2, "Self_Sustain": 6, "Mechanical_Skill": 7, "HP": 550, "Role": "Tank"},

    # --- DAMAGE HEROES ---
    {"Hero": "Ashe", "Mobility": 5, "Self_Sustain": 2, "Mechanical_Skill": 9, "HP": 250, "Role": "Damage"},
    {"Hero": "Bastion", "Mobility": 2, "Self_Sustain": 4, "Mechanical_Skill": 4, "HP": 300, "Role": "Damage"},
    {"Hero": "Cassidy", "Mobility": 3, "Self_Sustain": 3, "Mechanical_Skill": 9, "HP": 275, "Role": "Damage"},
    {"Hero": "Echo", "Mobility": 9, "Self_Sustain": 4, "Mechanical_Skill": 9, "HP": 250, "Role": "Damage"},
    {"Hero": "Freja", "Mobility": 6, "Self_Sustain": 3, "Mechanical_Skill": 8, "HP": 250, "Role": "Damage"},
    {"Hero": "Genji", "Mobility": 9, "Self_Sustain": 3, "Mechanical_Skill": 10, "HP": 250, "Role": "Damage"},
    {"Hero": "Hanzo", "Mobility": 5, "Self_Sustain": 2, "Mechanical_Skill": 9, "HP": 250, "Role": "Damage"},
    {"Hero": "Junkrat", "Mobility": 7, "Self_Sustain": 2, "Mechanical_Skill": 5, "HP": 250, "Role": "Damage"},
    {"Hero": "Mei", "Mobility": 2, "Self_Sustain": 9, "Mechanical_Skill": 5, "HP": 300, "Role": "Damage"},
    {"Hero": "Pharah", "Mobility": 9, "Self_Sustain": 2, "Mechanical_Skill": 7, "HP": 250, "Role": "Damage"},
    {"Hero": "Reaper", "Mobility": 7, "Self_Sustain": 8, "Mechanical_Skill": 4, "HP": 300, "Role": "Damage"},
    {"Hero": "Sojourn", "Mobility": 8, "Self_Sustain": 2, "Mechanical_Skill": 9, "HP": 250, "Role": "Damage"},
    {"Hero": "Soldier: 76", "Mobility": 6, "Self_Sustain": 7, "Mechanical_Skill": 6, "HP": 250, "Role": "Damage"},
    {"Hero": "Sombra", "Mobility": 9, "Self_Sustain": 6, "Mechanical_Skill": 7, "HP": 250, "Role": "Damage"},
    {"Hero": "Symmetra", "Mobility": 4, "Self_Sustain": 5, "Mechanical_Skill": 6, "HP": 250, "Role": "Damage"},
    {"Hero": "Torbjörn", "Mobility": 3, "Self_Sustain": 6, "Mechanical_Skill": 5, "HP": 300, "Role": "Damage"},
    {"Hero": "Tracer", "Mobility": 10, "Self_Sustain": 4, "Mechanical_Skill": 9, "HP": 175, "Role": "Damage"},
    {"Hero": "Vendetta", "Mobility": 6, "Self_Sustain": 6, "Mechanical_Skill": 8, "HP": 250, "Role": "Damage"},
    {"Hero": "Venture", "Mobility": 8, "Self_Sustain": 6, "Mechanical_Skill": 6, "HP": 250, "Role": "Damage"},
    {"Hero": "Widowmaker", "Mobility": 4, "Self_Sustain": 1, "Mechanical_Skill": 10, "HP": 200, "Role": "Damage"},

    # --- SUPPORT HEROES ---
    {"Hero": "Ana", "Mobility": 1, "Self_Sustain": 5, "Mechanical_Skill": 9, "HP": 250, "Role": "Support"},
    {"Hero": "Baptiste", "Mobility": 6, "Self_Sustain": 7, "Mechanical_Skill": 8, "HP": 250, "Role": "Support"},
    {"Hero": "Brigitte", "Mobility": 4, "Self_Sustain": 7, "Mechanical_Skill": 4, "HP": 250, "Role": "Support"},
    {"Hero": "Illari", "Mobility": 5, "Self_Sustain": 6, "Mechanical_Skill": 8, "HP": 250, "Role": "Support"},
    {"Hero": "Juno", "Mobility": 8, "Self_Sustain": 3, "Mechanical_Skill": 6, "HP": 250, "Role": "Support"},
    {"Hero": "Kiriko", "Mobility": 8, "Self_Sustain": 7, "Mechanical_Skill": 8, "HP": 250, "Role": "Support"},
    {"Hero": "Lifeweaver", "Mobility": 6, "Self_Sustain": 6, "Mechanical_Skill": 4, "HP": 275, "Role": "Support"},
    {"Hero": "Lúcio", "Mobility": 9, "Self_Sustain": 6, "Mechanical_Skill": 8, "HP": 250, "Role": "Support"},
    {"Hero": "Mercy", "Mobility": 8, "Self_Sustain": 6, "Mechanical_Skill": 3, "HP": 250, "Role": "Support"},
    {"Hero": "Moira", "Mobility": 7, "Self_Sustain": 9, "Mechanical_Skill": 3, "HP": 250, "Role": "Support"},
    {"Hero": "Wuyang", "Mobility": 5, "Self_Sustain": 6, "Mechanical_Skill": 7, "HP": 250, "Role": "Support"},
    {"Hero": "Zenyatta", "Mobility": 1, "Self_Sustain": 4, "Mechanical_Skill": 9, "HP": 250, "Role": "Support"},
]

df = pd.DataFrame(heroes_data)

df["Damage"] = df["Role"].eq("Damage").astype(int)
df["Support"] = df["Role"].eq("Support").astype(int)
df["Tank"]   = df["Role"].eq("Tank").astype(int)

df = df[columns + ["Role"]]

print(df.shape)
df.head()


## 2. Preprocessing & basic EDA

In [None]:

features = ["Mobility", "Self_Sustain", "Mechanical_Skill", "HP"]

X = df[features].values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

df_scaled = pd.DataFrame(X_scaled, columns=features)
df_scaled["Hero"] = df["Hero"]
df_scaled["Role"] = df["Role"]

print(f"Loaded {len(df)} heroes.")
df.describe()


In [None]:

# Feature distributions by role
plt.figure(figsize=(12, 8))
melted = df.melt(id_vars=["Hero", "Role"], value_vars=features,
                 var_name="Feature", value_name="Value")
sns.boxplot(data=melted, x="Feature", y="Value", hue="Role")
plt.title("Feature distributions by role")
plt.tight_layout()
plt.show()


## 3. Similarity analysis (cosine)

In [None]:

sim_matrix = cosine_similarity(X_scaled)
sim_df = pd.DataFrame(sim_matrix, index=df["Hero"], columns=df["Hero"])

plt.figure(figsize=(14, 12))
sns.heatmap(sim_df, cmap="coolwarm", center=0, square=True,
            linewidths=0.4, cbar_kws={"shrink": 0.6})
plt.title("Hero psychological similarity (cosine)")
plt.tight_layout()
plt.show()


In [None]:

def top_matches_for(hero, n=5):
    if hero not in sim_df.index:
        raise ValueError(f"{hero} not in matrix")
    sims = sim_df.loc[hero].sort_values(ascending=False)
    return sims.iloc[1:n+1]

for hero in ["Hazard", "Mercy", "Genji", "Ana"]:
    print(f"\nTop 5 matches for {hero}:")
    print(top_matches_for(hero))


In [None]:

# Build full top‑5 similarity lookup table & export

rows = []
for hero in sim_df.index:
    sims = sim_df.loc[hero].sort_values(ascending=False).iloc[1:6]
    row = {"Hero": hero}
    for i, (match_name, score) in enumerate(sims.items(), start=1):
        row[f"Match_{i}"] = match_name
        row[f"Score_{i}"] = round(score, 4)
    rows.append(row)

full_similarity = pd.DataFrame(rows)
full_similarity.to_csv("overwatch_full_similarity_top5.csv", index=False)

print("Saved top‑5 similarity table to overwatch_full_similarity_top5.csv")
full_similarity.head()


## 4. PCA: building the 2D hero space

In [None]:

pca = PCA(n_components=2, random_state=42)
X_pca = pca.fit_transform(X_scaled)

df["PC1"] = X_pca[:, 0]
df["PC2"] = X_pca[:, 1]

print("Explained variance by PC1, PC2:", pca.explained_variance_ratio_)
print("Total variance (PC1+PC2):", round(pca.explained_variance_ratio_.sum(), 3))


In [None]:

plt.figure(figsize=(11, 8))
sns.scatterplot(data=df, x="PC1", y="PC2", hue="Role",
                palette=role_colors, s=110, alpha=0.9)
for _, row in df.iterrows():
    plt.text(row["PC1"] + 0.03, row["PC2"] + 0.03, row["Hero"], fontsize=8)
plt.title("Overwatch hero space (PCA, colored by role)")
plt.xlabel(f"PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)")
plt.ylabel(f"PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)")
plt.tight_layout()
plt.show()


## 5. K‑Means clustering & choosing k

In [None]:

# Elbow plot (inertia vs k)
inertias = []
k_values = range(2, 9)
for k in k_values:
    km = KMeans(n_clusters=k, n_init=20, random_state=42)
    km.fit(X_pca)
    inertias.append(km.inertia_)

plt.figure(figsize=(8,5))
plt.plot(k_values, inertias, marker="o")
plt.title("K‑Means elbow plot (PCA space)")
plt.xlabel("k (number of clusters)")
plt.ylabel("Inertia (within‑cluster SSE)")
plt.tight_layout()
plt.show()


In [None]:

# Silhouette score for several k values
sil_scores = []
k_values = range(2, 9)

for k in k_values:
    km = KMeans(n_clusters=k, n_init=20, random_state=42)
    labels = km.fit_predict(X_pca)
    sil = silhouette_score(X_pca, labels)
    sil_scores.append(sil)

plt.figure(figsize=(8,5))
plt.plot(k_values, sil_scores, marker="o")
plt.title("Silhouette score vs k")
plt.xlabel("k")
plt.ylabel("Silhouette score")
plt.tight_layout()
plt.show()


In [None]:

# Final model with k = 4 archetypes
k = 4
kmeans = KMeans(n_clusters=k, n_init=20, random_state=42)
df["Cluster"] = kmeans.fit_predict(X_pca)

sil_final = silhouette_score(X_pca, df["Cluster"])
print(f"Silhouette score for k=4: {sil_final:.3f}")


In [None]:

plt.figure(figsize=(11, 8))
sns.scatterplot(data=df, x="PC1", y="PC2", hue="Cluster", style="Role",
                palette="viridis", s=120)
for _, row in df.iterrows():
    plt.text(row["PC1"] + 0.02, row["PC2"] + 0.02, row["Hero"], fontsize=8)
plt.title("K‑Means clustering of hero psychology (k=4)")
plt.tight_layout()
plt.show()


In [None]:

for c in range(k):
    members = df[df["Cluster"] == c]["Hero"].tolist()
    print(f"\n--- Cluster {c} ---")
    print(members)


## 6. Role–cluster relationships

In [None]:

crosstab = pd.crosstab(df["Cluster"], df["Role"])
print(crosstab)

crosstab.plot(kind="bar", stacked=True,
              color=[role_colors[r] for r in crosstab.columns],
              figsize=(8,5))
plt.title("Role composition per cluster")
plt.xlabel("Cluster")
plt.ylabel("Number of heroes")
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()


## 7. Cluster profiles & archetype naming

In [None]:

cluster_means = df.groupby("Cluster")[features].mean()
print(cluster_means.round(2))

plt.figure(figsize=(7,5))
sns.heatmap(cluster_means, annot=True, fmt=".1f", cmap="YlGnBu")
plt.title("Average stats per cluster")
plt.tight_layout()
plt.show()


In [None]:

# Map clusters to named archetypes
archetype_names = {
    0: "The Skirmishers",
    1: "The Tacticians",
    2: "The Anchors",
    3: "The Sharpshooters",
}
df["Archetype"] = df["Cluster"].map(archetype_names)

hero_archetypes = df[["Hero", "Role", "Cluster", "Archetype"]].sort_values("Hero")
hero_archetypes.to_csv("overwatch_hero_archetypes.csv", index=False)
hero_archetypes.head(10)


## 8. Archetype radar charts (normalized)
These show the *shape* of each archetype across Mobility, Sustain, Mechanical Skill, and HP, all on a common 0–1 scale.

In [None]:

radar_features = ["Mobility", "Self_Sustain", "Mechanical_Skill", "HP"]
radar_data = cluster_means[radar_features].copy()
radar_norm = (radar_data - radar_data.min()) / (radar_data.max() - radar_data.min())

N = len(radar_features)
angles = [n / float(N) * 2 * pi for n in range(N)]
angles += angles[:1]

archetype_colors = {
    0: "#FF6B6B",   # Skirmishers
    1: "#4ECDC4",   # Tacticians
    2: "#1A73E8",   # Anchors
    3: "#9D4EDD",   # Sharpshooters
}

titles = [
    "Archetype 0 — The Skirmishers",
    "Archetype 1 — The Tacticians",
    "Archetype 2 — The Anchors",
    "Archetype 3 — The Sharpshooters",
]

fig, axes = plt.subplots(1, k, subplot_kw=dict(polar=True), figsize=(24, 6))

for i in range(k):
    ax = axes[i]
    vals = radar_norm.iloc[i].values.tolist()
    vals += vals[:1]

    ax.plot(angles, vals, linewidth=3, linestyle="solid", color=archetype_colors[i])
    ax.fill(angles, vals, color=archetype_colors[i], alpha=0.22)

    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(["Mobility", "Sustain", "Mech Skill", "HP"], fontsize=13)

    ax.set_title(titles[i], size=15, color=archetype_colors[i], y=1.15)
    ax.set_ylim(0, 1)

plt.tight_layout()
plt.show()


## 9. Cluster fingerprint radar charts (per‑hero spread)
Gray lines show individual heroes inside each archetype; the colored line shows the mean.

In [None]:

fig, axes = plt.subplots(1, k, subplot_kw=dict(polar=True), figsize=(24, 6))

for i in range(k):
    ax = axes[i]
    subset = df[df["Cluster"] == i][radar_features]

    # per‑hero
    for _, row in subset.iterrows():
        vals = row.values.tolist()
        vals += vals[:1]
        ax.plot(angles, vals, color="gray", linewidth=1, alpha=0.3)

    mean_vals = cluster_means.iloc[i].values.tolist()
    mean_vals += mean_vals[:1]
    ax.plot(angles, mean_vals, color=archetype_colors[i], linewidth=3)
    ax.fill(angles, mean_vals, color=archetype_colors[i], alpha=0.18)

    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(["Mobility", "Sustain", "Mech Skill", "HP"], fontsize=12)
    ax.set_title(titles[i], color=archetype_colors[i], size=15, y=1.15)
    ax.set_ylim(0, max(df[radar_features].max()) * 1.1)

plt.tight_layout()
plt.show()


## 10. Summary
We briefly summarize what the archetypes tell us about hero psychology and player identity.

In [None]:

print("""SUMMARY OF FINDINGS
===================
• We extended the original 'We Are Who We Lock' cosine similarity analysis
  to the full 45‑hero Overwatch roster.

• Using PCA, we built a 2D 'hero space' that shows how heroes cluster by
  mobility, self‑sustain, mechanical skill, and HP.

• Using K‑Means with k = 4 and silhouette analysis, we discovered four
  psychologically meaningful archetypes:

  - Archetype 0 — The Skirmishers:
    high mobility + high mechanics (dive / skirmish heroes and mobile supports).

  - Archetype 1 — The Tacticians:
    mid‑mobility, mid‑sustain heroes that win through positioning, utility,
    and cooldown management.

  - Archetype 2 — The Anchors:
    high‑HP, high‑sustain tanks and bruisers that stabilize the frontline.

  - Archetype 3 — The Sharpshooters:
    low‑sustain, high‑precision heroes like Ashe, Hanzo, Widowmaker, Ana, Zenyatta.

• These archetypes align with how players describe their own playstyles:
  creative skirmishers, strategic tacticians, protective anchors, and precise
  sharpshooters. In other words: we really are who we lock.

Artifacts saved:
- overwatch_full_similarity_top5.csv  (top‑5 matches for every hero)
- overwatch_hero_archetypes.csv       (hero → role → archetype mapping)
""")
