# 1. Data Collection  
The data collection isn't done through this notebook since it can be very lengthy in terms of time so it is done in a seperate script. Furthermore, the API used by the python code isn't supported by Anaconda. To collect the data set you must perform the following operations:
1. pip3 install -t ../dota2/requirements.txt
2. python3 ../dota2/src/api_wrapper.py  

The dataset will be saved in *../dota2/src/data/*  

## 1.1 Files  
1. match.json
2. heroes.json
3. items.json

# 2 Data Discovery  
The data is well structured in *.json* format already. So we start by observing their contents. The first step is to download the R package to read *.json* files if it is not already present.

Next we read the contents from the JSON files. We start off with the *match.json* since this is what the study will focus on.

In [1]:
import json
with open('data/match.json') as f:
    matches = json.load(f)

# print first value
for keys in matches:
    print(matches[keys])
    break

{'positive_votes': 0, 'game_mode': 3, 'duration': 1769, 'barracks_status_radiant': 63, 'cluster_name': 'China', 'negative_votes': 0, 'lobby_name': 'Ranked', 'human_players': 10, 'tower_status_radiant': 1975, 'match_seq_num': 3412759258, 'radiant_win': True, 'tower_status_dire': 4, 'game_mode_name': 'Random Draft', 'cluster': 224, 'dire_score': 16, 'start_time': 1528135941, 'barracks_status_dire': 3, 'lobby_type': 7, 'leagueid': 0, 'players': [{'kills': 10, 'item_1_name': 'Blink Dagger', 'gold_spent': 12480, 'scaled_tower_damage': 1655, 'item_4': 34, 'last_hits': 120, 'player_slot': 0, 'deaths': 2, 'item_5_name': 'Hyperstone', 'item_4_name': 'Magic Stick', 'assists': 10, 'item_1': 1, 'item_0_name': 'Phase Boots', 'item_0': 50, 'item_2': 9, 'tower_damage': 2752, 'item_3_name': 'Blade Mail', 'gold': 2308, 'level': 19, 'backpack_2': 0, 'denies': 2, 'hero_damage': 14917, 'scaled_hero_damage': 14171, 'xp_per_min': 563, 'scaled_hero_healing': 1705, 'leaver_status_description': 'finished match

We notice that there is a HUGE amount of information. Most of this data comes from the **players** key from the dictionary. 

## 2.1 Breaking down the matches dictionary  
### 2.1.1  Analyzing the player key
We first seperate the **players** key from the rest of the data

In [2]:
for keys in matches:
    print(matches[keys]['players'][0])
    break

{'kills': 10, 'item_1_name': 'Blink Dagger', 'gold_spent': 12480, 'scaled_tower_damage': 1655, 'item_4': 34, 'last_hits': 120, 'player_slot': 0, 'deaths': 2, 'item_5_name': 'Hyperstone', 'item_4_name': 'Magic Stick', 'assists': 10, 'item_1': 1, 'item_0_name': 'Phase Boots', 'item_0': 50, 'item_2': 9, 'tower_damage': 2752, 'item_3_name': 'Blade Mail', 'gold': 2308, 'level': 19, 'backpack_2': 0, 'denies': 2, 'hero_damage': 14917, 'scaled_hero_damage': 14171, 'xp_per_min': 563, 'scaled_hero_healing': 1705, 'leaver_status_description': 'finished match, no abandon', 'hero_healing': 1060, 'item_3': 127, 'account_id': 345295663, 'hero_name': 'Legion Commander', 'item_2_name': 'Platemail', 'backpack_0': 0, 'item_5': 55, 'leaver_status_name': 'NONE', 'leaver_status': 0, 'gold_per_min': 476, 'backpack_1': 0, 'hero_id': 104, 'ability_upgrades': [{'level': 1, 'ability': 5596, 'time': 423}, {'level': 2, 'ability': 5595, 'time': 488}, {'level': 3, 'ability': 5597, 'time': 710}, {'level': 4, 'ability

In [3]:
import pandas as pd
for keys in matches:
    try:
        del matches[keys]['players'][0]['ability_upgrades']
        player_0 = matches[keys]['players'][0]
        player = pd.DataFrame(player_0, index=[0])        
    except KeyError:
        pass
    break

In [4]:
print(player.iloc[0])

account_id                                    345295663
assists                                              10
backpack_0                                            0
backpack_1                                            0
backpack_2                                            0
deaths                                                2
denies                                                2
gold                                               2308
gold_per_min                                        476
gold_spent                                        12480
hero_damage                                       14917
hero_healing                                       1060
hero_id                                             104
hero_name                              Legion Commander
item_0                                               50
item_0_name                                 Phase Boots
item_1                                                1
item_1_name                                Blink

### 2.1.2  Analyzing the rest of the keys 
Then we look at the rest of the keys

In [5]:
for keys in matches:
    for k in matches[keys]:
        if k == 'players':
            pass
        else:
            print(k + ": " + str(matches[keys][k]))
    break

positive_votes: 0
game_mode: 3
duration: 1769
barracks_status_radiant: 63
cluster_name: China
negative_votes: 0
lobby_name: Ranked
human_players: 10
tower_status_radiant: 1975
match_seq_num: 3412759258
radiant_win: True
tower_status_dire: 4
game_mode_name: Random Draft
cluster: 224
dire_score: 16
start_time: 1528135941
barracks_status_dire: 3
lobby_type: 7
leagueid: 0
pre_game_duration: 90
match_id: 3933473800
engine: 1
flags: 1
first_blood_time: 47
radiant_score: 35


We have analyzed this long enough, we can start creating the tables by ommitting the **ability_upgrades** key since it is variable and not that impactful. We also seperate the player data from the rest as both do not possess the same dimensions

## 2.2 Analyzing the items dictionary

In [6]:
with open('data/items.json') as f:
    items = json.load(f)

In [7]:
for key in items:
    print(key)

items
status


In [8]:
print(type(items['items']))

<class 'list'>


In [9]:
print(items['items'][0])

{'name': 'item_blink', 'side_shop': 1, 'cost': 2250, 'id': 1, 'secret_shop': 0, 'url_image': 'http://cdn.dota2.com/apps/dota2/images/items/blink_lg.png', 'recipe': 0, 'localized_name': 'Blink Dagger'}


## 2.3 Analyzing the heroes dictionary

In [10]:
with open('data/heroes.json') as f:
    heroes = json.load(f)

In [11]:
for key in heroes:
    print(key)

count
status
heroes


In [12]:
print(heroes['count'])

115


In [13]:
print(type(heroes['heroes']))

<class 'list'>


In [14]:
print(heroes['heroes'][0])

{'name': 'npc_dota_hero_antimage', 'url_full_portrait': 'http://cdn.dota2.com/apps/dota2/images/heroes/antimage_full.png', 'url_vertical_portrait': 'http://cdn.dota2.com/apps/dota2/images/heroes/antimage_vert.jpg', 'id': 1, 'url_small_portrait': 'http://cdn.dota2.com/apps/dota2/images/heroes/antimage_sb.png', 'localized_name': 'Anti-Mage', 'url_large_portrait': 'http://cdn.dota2.com/apps/dota2/images/heroes/antimage_lg.png'}


## 2.4 Analyzing game complexity

In [15]:
import math
item_count = len(items['items'])
heroes_count = len(heroes['heroes'])
heroes_per_game = 10
items_per_hero = 6

# hero combinations without repetition
numerator = math.factorial(heroes_count)
denominator = math.factorial(heroes_per_game) * math.factorial(heroes_count - heroes_per_game)
hero_combinations = numerator // denominator

# item combinations with repetition
numerator = math.factorial(item_count + items_per_hero - 1)
denominator = math.factorial(item_count - 1) * math.factorial(items_per_hero)
item_combinations = numerator // denominator

In [16]:
print(hero_combinations)
print(item_combinations)

74540394223878
594115882360


In [17]:
# Theorycraft combinations
theory_craft_combination = hero_combinations * item_combinations
print(theory_craft_combination)
'{:.2e}'.format(theory_craft_combination)

44285632085781525350992080


'4.43e+25'

## 2.5 Reducing game complexity with data classification

### 2.5.1 Analyzing hero roles

In [18]:
file = open('data/hero_role.txt', 'r')
for line in file:
    print(line)
file.close()

{| class="wikitable"

!colspan=8| ■■■ Carry

|-

|{{HeroIcon|am}}{{HeroIcon|arc}}{{HeroIcon|ck}}{{HeroIcon|gyro}}{{HeroIcon|medusa}}{{HeroIcon|morph}}{{HeroIcon|naga}}{{HeroIcon|pa}}<br>{{HeroIcon|sniper}}{{HeroIcon|spectre}}{{HeroIcon|tb}}{{HeroIcon|tiny}}{{HeroIcon|troll}}

|-

!colspan=8| ■■ Carry

|-

|{{HeroIcon|alch}}{{HeroIcon|bb}}{{HeroIcon|dk}}{{HeroIcon|huskar}}{{HeroIcon|ls}}{{HeroIcon|lycan}}{{HeroIcon|mk}}{{HeroIcon|slardar}}<br>{{HeroIcon|sven}}{{HeroIcon|wk}}{{HeroIcon|clinkz}}{{HeroIcon|drow}}{{HeroIcon|ember}}{{HeroIcon|void}}{{HeroIcon|jugg}}{{HeroIcon|ld}}<br>{{HeroIcon|luna}}{{HeroIcon|meepo}}{{HeroIcon|pl}}{{HeroIcon|razor}}{{HeroIcon|riki}}{{HeroIcon|sf}}{{HeroIcon|slark}}{{HeroIcon|ta}}<br>{{HeroIcon|ursa}}{{HeroIcon|weaver}}{{HeroIcon|od}}{{HeroIcon|storm}}{{HeroIcon|pangolier}}

|-

!colspan=8| ■ Carry

|-

|{{HeroIcon|abaddon}}{{HeroIcon|brew}}{{HeroIcon|doom}}{{HeroIcon|kunkka}}{{HeroIcon|lc}}{{HeroIcon|ns}}{{HeroIcon|sb}}{{HeroIcon|bs}}<br>{{HeroIcon|brood}}

### 2.5.2 Analyzing hero complexity

In [19]:
file = open('data/hero_complexity.txt')
for line in file:
    print(line)
file.close()

==Complexity==

Heroes can be sorted by overall complexity, taking all of their variables into account.



Complexity levels: <br> &nbsp; ■■■ Advanced <br> &nbsp; ■■ Moderate <br> &nbsp; ■ Straightforward



Heroes with advanced complexity take a high degree of skill and game knowledge to play well. <br>

Heroes with moderate complexity are fairly flexible in terms of difficulty and method of play style. <br>

Heroes with straightforward complexity more forgiving to play while still offering unique play styles.



{| class="wikitable"

!colspan=8| ■■■ Complexity

|-

|{{HeroIcon|brewmaster}}{{HeroIcon|earth spirit}}{{HeroIcon|io}}{{HeroIcon|aw}}{{HeroIcon|ld}}{{HeroIcon|meepo}}{{HeroIcon|morph}}{{HeroIcon|chen}}<br>{{HeroIcon|invoker}}{{HeroIcon|oracle}}{{HeroIcon|rubick}}{{HeroIcon|storm}}{{HeroIcon|visage}}

|-

!colspan=8| ■■ Complexity

|-

|{{HeroIcon|beastmaster}}{{HeroIcon|clock}}{{HeroIcon|doom}}{{HeroIcon|earthshaker}}{{HeroIcon|et}}{{HeroIcon|kunkka}}{{HeroIcon|lifestealer}}{