# Verification Set

Using crowd sourcing, we have set manually checked the role of each player of a few hundreds games. Let's take a look at it.

In [1]:
import pymongo

client = pymongo.MongoClient()
db = client.roleCS
gamesTable = db.entries

entries = []
for g in gamesTable.find():
    row = g["roles"]
    row["gameId"] = g["gameId"]
    entries.append(row)
    
import pandas as pd
pd.set_option('float_format', '{:.0f}'.format)
df = pd.DataFrame(entries)
df.head()

Unnamed: 0,1,10,2,3,4,5,6,7,8,9,gameId
0,carry,toplaner,midlaner,support,toplaner,jungler,midlaner,support,jungler,carry,3916329852
1,toplaner,midlaner,jungler,midlaner,carry,support,jungler,support,toplaner,carry,3916458918
2,support,midlaner,toplaner,carry,midlaner,jungler,support,carry,jungler,toplaner,3917694109
3,support,midlaner,midlaner,carry,jungler,toplaner,jungler,carry,toplaner,support,3915381277
4,toplaner,support,support,midlaner,carry,jungler,midlaner,carry,jungler,toplaner,3916535231


Exporting to csv

In [2]:
df.to_csv("verification_set.csv")

Checking for duplicates

In [3]:
df["gameId"].value_counts()

3916458918    2
3916145366    2
3914501728    1
3917441258    1
3914537226    1
3914530756    1
3914526201    1
3914519318    1
3914517377    1
3915102225    1
3917478753    1
3914551256    1
3914494028    1
3915342319    1
3917463080    1
3913918589    1
3914464433    1
3913192752    1
3914543116    1
3917489441    1
3914448807    1
3914612313    1
3914641721    1
3914155289    1
3913169707    1
3914624611    1
3914620058    1
3914087004    1
3916095785    1
3914562423    1
             ..
3912866869    1
3914811331    1
3915909018    1
3915897441    1
3915965174    1
3915881122    1
3915877352    1
3915863199    1
3915757838    1
3915821925    1
3912846086    1
3915961583    1
3918348659    1
3916088750    1
3914889132    1
3915726106    1
3916060944    1
3915234471    1
3916026799    1
3916020835    1
3916014189    1
3916007624    1
3915972613    1
3915180996    1
3915997133    1
3916606566    1
3914228308    1
3915980836    1
3917663414    1
3913287185    1
Name: gameId, Length: 58

In [4]:
df[((df["gameId"] == 3916458918) | (df["gameId"] == 3916145366))].sort_values("gameId")

Unnamed: 0,1,10,2,3,4,5,6,7,8,9,gameId
497,carry,support,jungler,support,midlaner,toplaner,jungler,toplaner,midlaner,carry,3916145366
498,carry,support,jungler,support,midlaner,toplaner,jungler,toplaner,midlaner,carry,3916145366
1,toplaner,midlaner,jungler,midlaner,carry,support,jungler,support,toplaner,carry,3916458918
15,toplaner,midlaner,jungler,midlaner,carry,support,jungler,support,toplaner,carry,3916458918


Very few duplicates and with no divergence. It's a good start.

Now we compare the verification set with the label from match data.

In [5]:
roleTranslate = {
    "midlaner":"MIDDLE_SOLO",
    "toplaner":"TOP_SOLO",
    "jungler":"JUNGLE_NONE",
    "carry":"BOTTOM_DUO_CARRY",
    "support":"BOTTOM_DUO_SUPPORT",
    "undefined":"undefined",
    "swaplaner":"swaplaner"
}

db = client.game
gamesTable = db.gameData_92_2

participantEntries = []
for g in gamesTable.find({"gameId":{"$in":[int(i) for i in list(df["gameId"].values)]}}):
    
    for p in g["participants"]:
        row = {}
        row["gameId"] = g["gameId"]
        row["participantId"] = p["participantId"]
        row["label_match"] = p['timeline']['lane']+"_"+p['timeline']['role']
        row["label_cs"] = roleTranslate[ df[df["gameId"] == g["gameId"]][str(p["participantId"])].iloc[0] ]
        participantEntries.append(row)
dfEntries = pd.DataFrame(participantEntries)

In [6]:
dfEntries.head()

Unnamed: 0,gameId,label_cs,label_match,participantId
0,3912729384,BOTTOM_DUO_CARRY,BOTTOM_DUO_CARRY,1
1,3912729384,BOTTOM_DUO_SUPPORT,BOTTOM_DUO_SUPPORT,2
2,3912729384,MIDDLE_SOLO,MIDDLE_SOLO,3
3,3912729384,JUNGLE_NONE,JUNGLE_NONE,4
4,3912729384,TOP_SOLO,TOP_SOLO,5


In [7]:
dfEntries[dfEntries["label_cs"] == dfEntries["label_match"]].shape[0] / dfEntries.shape[0]

0.8739352640545145

On the selected games, the label from the match data gives the same role as the verification set in 87% of cases.

In [8]:
dfEntries[dfEntries["label_cs"] != dfEntries["label_match"]]

Unnamed: 0,gameId,label_cs,label_match,participantId
5,3912729384,TOP_SOLO,TOP_DUO_CARRY,6
8,3912729384,JUNGLE_NONE,TOP_DUO_SUPPORT,9
11,3912729869,TOP_SOLO,TOP_DUO_SUPPORT,2
14,3912729869,MIDDLE_SOLO,TOP_DUO_CARRY,5
31,3912755404,TOP_SOLO,JUNGLE_NONE,2
47,3912760394,MIDDLE_SOLO,MIDDLE_DUO,8
48,3912760394,TOP_SOLO,MIDDLE_DUO,9
50,3912764104,JUNGLE_NONE,NONE_DUO_SUPPORT,1
51,3912764104,MIDDLE_SOLO,NONE_DUO_SUPPORT,2
52,3912764104,TOP_SOLO,NONE_DUO_SUPPORT,3


After manual review of some cases, the verification set proved to be correct and reliable as a truth.