# Verification Set

Using crowd sourcing, we have set manually checked the role of each player of a few hundreds games. Let's take a look at it.

In [1]:
import pymongo

client = pymongo.MongoClient()
db = client.roleCS
gamesTable = db.entries

entries = []
for g in gamesTable.find():
    row = g["roles"]
    row["gameId"] = g["gameId"]
    entries.append(row)
    
import pandas as pd
df = pd.DataFrame(entries)
df.head()

Unnamed: 0,1,10,2,3,4,5,6,7,8,9,gameId
0,carry,toplaner,midlaner,support,toplaner,jungler,midlaner,support,jungler,carry,3916329852
1,toplaner,midlaner,jungler,midlaner,carry,support,jungler,support,toplaner,carry,3916458918
2,support,midlaner,toplaner,carry,midlaner,jungler,support,carry,jungler,toplaner,3917694109
3,support,midlaner,midlaner,carry,jungler,toplaner,jungler,carry,toplaner,support,3915381277
4,toplaner,support,support,midlaner,carry,jungler,midlaner,carry,jungler,toplaner,3916535231


Exporting to csv

In [2]:
df.to_csv("verification_set.csv")

Checking for duplicates

In [3]:
df["gameId"].value_counts()

3916145366    2
3916458918    2
3914358710    1
3913416015    1
3913072972    1
3913409866    1
3915075913    1
3918040390    1
3913182532    1
3912931840    1
3914651192    1
3916374334    1
3916140498    1
3917778236    1
3913259323    1
3914641721    1
3915183415    1
3913453901    1
3914932560    1
3915731253    1
3913411921    1
3916171629    1
3914231146    1
3915351267    1
3918209772    1
3917777253    1
3918090596    1
3915169275    1
3912924512    1
3915380062    1
             ..
3914344080    1
3914497679    1
3914411662    1
3917141644    1
3915749281    1
3918571535    1
3918346900    1
3914155289    1
3913265793    1
3918600711    1
3917686826    1
3914268128    1
3916899999    1
3914990332    1
3914810028    1
3913547260    1
3914376889    1
3915707295    1
3917793972    1
3913012915    1
3916105392    1
3916406443    1
3917916833    1
3915231912    1
3914116774    1
3916913317    1
3915732187    1
3917677656    1
3913226914    1
3915726106    1
Name: gameId, Length: 58

In [4]:
df[((df["gameId"] == 3916458918) | (df["gameId"] == 3916145366))].sort_values("gameId")

Unnamed: 0,1,10,2,3,4,5,6,7,8,9,gameId
497,carry,support,jungler,support,midlaner,toplaner,jungler,toplaner,midlaner,carry,3916145366
498,carry,support,jungler,support,midlaner,toplaner,jungler,toplaner,midlaner,carry,3916145366
1,toplaner,midlaner,jungler,midlaner,carry,support,jungler,support,toplaner,carry,3916458918
15,toplaner,midlaner,jungler,midlaner,carry,support,jungler,support,toplaner,carry,3916458918


Very few duplicates and with no divergence. It's a good start.

Now we compare the verification set with the label from match data.

In [5]:
roleTranslate = {
    "midlaner":"MIDDLE_SOLO",
    "toplaner":"TOP_SOLO",
    "jungler":"JUNGLE_NONE",
    "carry":"BOTTOM_DUO_CARRY",
    "support":"BOTTOM_DUO_SUPPORT",
    "undefined":"undefined",
    "swaplaner":"swaplaner"
}

db = client.game
gamesTable = db.gameData_92_2

participantEntries = []
for g in gamesTable.find({"gameId":{"$in":[int(i) for i in list(df["gameId"].values)]}}):
    
    for p in g["participants"]:
        row = {}
        row["gameId"] = g["gameId"]
        row["participantId"] = p["participantId"]
        row["label_match"] = p['timeline']['lane']+"_"+p['timeline']['role']
        row["label_cs"] = roleTranslate[ df[df["gameId"] == g["gameId"]][str(p["participantId"])].iloc[0] ]
        participantEntries.append(row)
dfEntries = pd.DataFrame(participantEntries)

In [6]:
dfEntries.head()

Unnamed: 0,gameId,label_cs,label_match,participantId
0,3912729384,BOTTOM_DUO_CARRY,BOTTOM_DUO_CARRY,1
1,3912729384,BOTTOM_DUO_SUPPORT,BOTTOM_DUO_SUPPORT,2
2,3912729384,MIDDLE_SOLO,MIDDLE_SOLO,3
3,3912729384,JUNGLE_NONE,JUNGLE_NONE,4
4,3912729384,TOP_SOLO,TOP_SOLO,5


In [7]:
dfEntries[dfEntries["label_cs"] != dfEntries["label_match"]].shape[0] / dfEntries.shape[0]

0.13013698630136986

On the selected games, the label from the match data gives the same role as the verification set in almost 90% of cases.

In [8]:
dfEntries[dfEntries["label_cs"] != dfEntries["label_match"]]

Unnamed: 0,gameId,label_cs,label_match,participantId
5,3912729384,TOP_SOLO,TOP_DUO_CARRY,6
8,3912729384,JUNGLE_NONE,TOP_DUO_SUPPORT,9
11,3912729869,TOP_SOLO,TOP_DUO_SUPPORT,2
14,3912729869,MIDDLE_SOLO,TOP_DUO_CARRY,5
31,3912755404,TOP_SOLO,JUNGLE_NONE,2
41,3912760394,MIDDLE_SOLO,TOP_SOLO,2
44,3912760394,TOP_SOLO,MIDDLE_SOLO,5
47,3912760394,MIDDLE_SOLO,MIDDLE_DUO,8
48,3912760394,TOP_SOLO,MIDDLE_DUO,9
50,3912764104,JUNGLE_NONE,NONE_DUO_SUPPORT,1


After manual review of some cases, the verification set proved to be correct and reliable as a truth.