### Analysis Notebook

__Title__: Dota 2 Dataset Analysis\
__Subject__: IKT110 - Artificial Intelligence Architecture\
__Authors__: Cornelius Brandt, Maximilian Eckstein, Mohammad Itani\
__Date__: November 2025\
__Version__: 1.0

This notebook contains the analysis of the Dota 2 matches dataset. The analysis is structured around several questions regarding hero picks, win rates, side advantages, and game durations.

In [42]:
import json
from collections import Counter, defaultdict
from itertools import combinations

import sys
from pathlib import Path

PROJECT_ROOT = Path("/home/maxge/studies/bachelor_mechatronics/semester_5/exchange_UiA/IKT110_artificial_intelligence_architecture/IKT110_eckstein/portfolio/dota_2")

SRC_PATH = PROJECT_ROOT / "src"
if str(SRC_PATH) not in sys.path:
    sys.path.append(str(SRC_PATH))

from heroes import hero_name

DATA_PATH = PROJECT_ROOT / "data" / "analysis_dataset.jsonl"
NUM_HEROES = 136

In [43]:
def iter_matches(limit=None):
    """
    Streams matches from analysis_dataset.jsonl.
    limit: if not None, stops after 'limit' matches.
    """
    count = 0
    with open(DATA_PATH, "r") as f:
        for line in f:
            if line.strip() == "":
                continue
            m = json.loads(line)
            yield m
            count += 1
            if limit is not None and count >= limit:
                break

In [44]:
# view first 3 matches

for i, m in enumerate(iter_matches(limit=3)):
    print(f"Match {i}:")
    print("  match_id:", m["match_id"])
    print("  radiant_win:", m["radiant_win"])
    print("  duration:", m["duration"])
    print("  game_mode:", m["game_mode"])
    print("  lobby_type:", m["lobby_type"])
    print("  heroes_radiant:", m["heroes_radiant"])
    print("  heroes_dire:", m["heroes_dire"])
    print()

Match 0:
  match_id: 5607724594
  radiant_win: False
  duration: 2407
  game_mode: 3
  lobby_type: 7
  heroes_radiant: [36, 27, 41, 31, 98]
  heroes_dire: [35, 103, 16, 67, 9]

Match 1:
  match_id: 5647502064
  radiant_win: True
  duration: 1929
  game_mode: 22
  lobby_type: 7
  heroes_radiant: [14, 40, 97, 10, 18]
  heroes_dire: [100, 67, 28, 63, 75]

Match 2:
  match_id: 5670735120
  radiant_win: False
  duration: 1946
  game_mode: 22
  lobby_type: 7
  heroes_radiant: [106, 22, 85, 91, 49]
  heroes_dire: [74, 14, 63, 41, 107]



#### Question 1: *"What games did I exclude/include for analysis?"*

In [45]:
game_modes = Counter()
lobby_types = Counter()
durations = []

for m in iter_matches():
    game_modes[m["game_mode"]] += 1
    lobby_types[m["lobby_type"]] += 1
    durations.append(m["duration"])

print("Game modes (after filter):", game_modes)
print("Lobby types (after filter):", lobby_types)
print("Number of matches:", sum(game_modes.values()))
print("Min. duration:", min(durations), "Max:", max(durations))

Game modes (after filter): Counter({22: 1652497, 3: 234464, 2: 76})
Lobby types (after filter): Counter({7: 1887037})
Number of matches: 1887037
Min. duration: 600 Max: 3869


#### Question 2: *"What hero is the most picked?"*

In [46]:
pick_counts = Counter()

for m in iter_matches():
    pick_counts.update(m["heroes_radiant"])
    pick_counts.update(m["heroes_dire"])

print("Top 10 most picked heroes:")
for hero_id, cnt in pick_counts.most_common(10):
    print(f"Hero: {hero_name(hero_id)}, Count: {cnt:,}")

Top 10 most picked heroes:
Hero: Pudge, Count: 463,747
Hero: Windrunner, Count: 441,630
Hero: Juggernaut, Count: 428,344
Hero: Invoker, Count: 401,865
Hero: Ogre Magi, Count: 378,114
Hero: Lion, Count: 373,756
Hero: Phantom Assassin, Count: 361,449
Hero: Faceless Void, Count: 361,040
Hero: Rubick, Count: 357,511
Hero: Antimage, Count: 328,744


#### Question 3: *"What hero has the highest win rate?"*

In [47]:
games_per_hero = Counter()
wins_per_hero = Counter()

for m in iter_matches():
    radiant_won = m["radiant_win"]
    rad = m["heroes_radiant"]
    dire = m["heroes_dire"]

    # games per hero
    for hid in rad + dire:
        games_per_hero[hid] += 1

    # winning team
    if radiant_won:
        for hid in rad:
            wins_per_hero[hid] += 1
    else:
        for hid in dire:
            wins_per_hero[hid] += 1

hero_winrates = {}
for hid, games in games_per_hero.items():
    if games < 50:  # optional: minimum games threshold
        continue
    hero_winrates[hid] = wins_per_hero[hid] / games

# Top 10 by winrate
top = sorted(hero_winrates.items(), key=lambda t: t[1], reverse=True)[:10]
print("Top 10 heroes by winrate (min 50 games):")
for hid, wr in top:
    print(f"Hero: {hero_name(hid)}, Winrate: {wr:.3f}")

Top 10 heroes by winrate (min 50 games):
Hero: Broodmother, Winrate: 0.643
Hero: Underlord, Winrate: 0.601
Hero: Drow Ranger, Winrate: 0.590
Hero: Clinkz, Winrate: 0.587
Hero: Lycan, Winrate: 0.569
Hero: Bloodseeker, Winrate: 0.564
Hero: Visage, Winrate: 0.563
Hero: Meepo, Winrate: 0.563
Hero: Ogre Magi, Winrate: 0.553
Hero: Vengeful Spirit, Winrate: 0.551


#### Question 4: *"Is there an advantage to playing Dire or Radiant?"* and Question 4a: *"What hero is most affected by the side?"*

In [48]:
total = 0
radiant_wins = 0

for m in iter_matches():
    total += 1
    if m["radiant_win"]:
        radiant_wins += 1

wr_radiant = radiant_wins / total
wr_dire = 1.0 - wr_radiant

print("Overall winrates by side:")
print("Radiant winrate:", f"{wr_radiant:.3f}")
print("Dire winrate:", f"{wr_dire:.3f}")

# Hero most affected by side
rad_games = Counter()
rad_wins = Counter()
dire_games = Counter()
dire_wins = Counter()

for m in iter_matches():
    radiant_won = m["radiant_win"]
    rad = m["heroes_radiant"]
    dire = m["heroes_dire"]

    for hid in rad:
        rad_games[hid] += 1
        if radiant_won:
            rad_wins[hid] += 1

    for hid in dire:
        dire_games[hid] += 1
        if not radiant_won:
            dire_wins[hid] += 1

side_diff = {}  # hero_id -> |WR_Radiant - WR_Dire|
for hid in range(1, NUM_HEROES+1):
    rg = rad_games[hid]
    dg = dire_games[hid]
    if rg < 50 or dg < 50:  # minimum games threshold
        continue
    wr_rad = rad_wins[hid] / rg
    wr_dir = dire_wins[hid] / dg
    side_diff[hid] = abs(wr_rad - wr_dir)

top_side_affected = sorted(side_diff.items(), key=lambda t: t[1], reverse=True)[:10]
print("\nHeroes most affected by side:")
for hid, diff in top_side_affected:
    print(f"Hero: {hero_name(hid)}, Difference: {diff:.3f}")

Overall winrates by side:
Radiant winrate: 0.575
Dire winrate: 0.425

Heroes most affected by side:
Hero: Sniper, Difference: 0.177
Hero: Pudge, Difference: 0.172
Hero: Drow Ranger, Difference: 0.164
Hero: Zeus, Difference: 0.164
Hero: Viper, Difference: 0.162
Hero: Arc Warden, Difference: 0.161
Hero: Medusa, Difference: 0.161
Hero: Venomancer, Difference: 0.161
Hero: Skywrath Mage, Difference: 0.161
Hero: Spectre, Difference: 0.160


#### Question 5: *"What hero has the highest impact on the game? (Define impact yourself)."*

In [49]:
print("The hero with the highest weight of the model is the hero with the highest impact on the game.")

The hero with the highest weight of the model is the hero with the highest impact on the game.


#### Question 6 and 7: *"What hero has the longest/shortest games?"*

In [50]:
duration_sum = Counter()
duration_count = Counter()

for m in iter_matches():
    dur = m["duration"]
    for hid in m["heroes_radiant"] + m["heroes_dire"]:
        duration_sum[hid] += dur
        duration_count[hid] += 1

avg_duration = {}
for hid, cnt in duration_count.items():
    if cnt < 50:  # minimum games
        continue
    avg_duration[hid] = duration_sum[hid] / cnt

# Längste
longest = sorted(avg_duration.items(), key=lambda t: t[1], reverse=True)[:10]
print("Heroes with longest average games:")
for hid, d in longest:
    print(f"Hero: {hero_name(hid)}, Average Duration: {d/60:.2f} minutes")

# Kürzeste
shortest = sorted(avg_duration.items(), key=lambda t: t[1])[:10]
print("\nHeroes with shortest average games:")
for hid, d in shortest:
    print(f"Hero: {hero_name(hid)}, Average Duration: {d/60:.2f} minutes")

Heroes with longest average games:
Hero: Tinker, Average Duration: 33.49 minutes
Hero: Zeus, Average Duration: 33.21 minutes
Hero: Spectre, Average Duration: 33.12 minutes
Hero: Sniper, Average Duration: 32.96 minutes
Hero: Ancient Apparition, Average Duration: 32.96 minutes
Hero: Razor, Average Duration: 32.79 minutes
Hero: Earthshaker, Average Duration: 32.72 minutes
Hero: Viper, Average Duration: 32.69 minutes
Hero: Sand King, Average Duration: 32.68 minutes
Hero: Windrunner, Average Duration: 32.66 minutes

Heroes with shortest average games:
Hero: Broodmother, Average Duration: 29.10 minutes
Hero: Meepo, Average Duration: 29.98 minutes
Hero: Lycan, Average Duration: 30.43 minutes
Hero: Chen, Average Duration: 30.47 minutes
Hero: Huskar, Average Duration: 30.50 minutes
Hero: Lone Druid, Average Duration: 30.71 minutes
Hero: Io, Average Duration: 30.88 minutes
Hero: Visage, Average Duration: 31.01 minutes
Hero: Beastmaster, Average Duration: 31.08 minutes
Hero: Templar Assassin, Ave

Yet to implement:

- Question 8: "What pair of heroes are the best together?"
- Question 9: "What hero is hardest countered by another hero?"
- Question 10: "What hero is the best if it is not countered by its TOP 5 counters (if not countered it will win type of hero)?"
- Question 11: "Give 2 heroes that a team safely can first pick."
- Question 12: "How can Molde Dotaklubb use the webpage to improve?"