## Predicting Final Round Team Compositions in Rainbow 6: Siege

#### Description of the dataset
The data used is an official datadump from Ubisoft, the developers of the game. This data was released after the fifth competitive season of gameplay (In keeping with the games' story as a counter-terrorism first-person shooter, the seasons have operation names. This particular one was named "Operation Velvet Shell"). For the project, the full datadump, featuring a detailed round-by-round breakdown of matches, was used. Each round is detailed from the perspective of each involved player, and includes details on the operator used, the exact loadout of the operator in that round, general measures of the player's skill, and specific match performance statistics, such as the number of kills, and whether they were killed in the round.

First, a subset of approximately 200,000 entries from the data will be obtained. This subset will be sampled in such a way as to preserve round/match groupings, e.g. by date. Then, the data will be aggregated and reshaped in order to combine all per-player-per-round entries into a single entry per match, detailing the choices and statistics of all involved players, as well a relevant match data, e.g. map name and game mode. This aggregated form will be the final data used for training and predictions.

#### Description of the project
Using the aggregated subset described above, a Generative Adversarial Network (GAN) will be trained to predict the team compositions of the Blue and Orange teams in the final round of a match. Since there are a large number of factors affecting a player's choice of operator in the game, such as personal preference, player skill with specific operators, the choices of other team members, and counterpicks to opponent's favourites, GANs are perfect for this role.

#### Description of GANs
TODO, also cite one of the neat articles on the subject to impress Dr. Miller more when the network inevitably fails to work properly.

In [1]:
import pandas as pd
import dask.dataframe as dd
import numpy as np

import matplotlib.pyplot as plt

In [2]:
data = pd.read_csv("downsampled_datadump.csv")

In [3]:
# Check size of downsampled dataset. ~217,000 is adequately reduced.
data.shape

(1675335, 31)

In [4]:
data.iloc[:5,:18]

Unnamed: 0,dateid,platform,gamemode,mapname,matchid,roundnumber,objectivelocation,winrole,endroundreason,roundduration,clearancelevel,skillrank,role,team,haswon,operator,nbkills,isdead
0,20170212,PC,PvP – HOSTAGE,CLUB HOUSE,1522380841,1,STRIP CLUB,Defender,AttackersKilledHostage,124,64,Gold,Defender,1,1,SWAT-CASTLE,0,0
1,20170212,PC,PvP – HOSTAGE,CLUB HOUSE,1522380841,4,CHURCH,Defender,AttackersEliminated,217,81,Gold,Defender,0,1,GSG9-JAGER,0,1
2,20170212,PC,PvP – HOSTAGE,CLUB HOUSE,1522380841,3,CHURCH,Defender,AttackersEliminated,160,150,Gold,Defender,1,1,JTF2-FROST,0,0
3,20170212,PC,PvP – HOSTAGE,CLUB HOUSE,1522380841,4,CHURCH,Defender,AttackersEliminated,217,94,Gold,Defender,0,1,BOPE-CAVEIRA,3,0
4,20170212,PC,PvP – HOSTAGE,CLUB HOUSE,1522380841,6,BEDROOM,Attacker,DefendersEliminated,143,81,Gold,Defender,0,0,GSG9-JAGER,0,1


#### Restructuring the Data

Given the limitations of the multi-level index in pandas, the optimal way to restructure the data into per-match rows is to construct a new dataframe. Additionally, as repeated queries are quite slow, an intermediate storage form of the data will be constructed, as a nested dictionary containing nested lists of the form: `{<matchid>: {<roundnumber>: [[row1], [row2], ...]}}` This intermediate representation will then be used to construct the new dataframe, as it will be both more computationally efficient and enable easier filtering of non-structurally compliant rounds.

The new structure requires that games have exactly six rounds played, as well as a full complement of five players, in order to standardise the length of input data to the GAN. Additionally, a number of details about individual player loadouts will be dropped, as the exact set of mods on each weapon is almost purely dictacted by the meta. Of more interest is the general weapon types selected by each op, even though this is still largely meta-dependent.

In [5]:
# The intermediate step of aggregating by round prior to aggregating into matches is no longer being used,
# but setting up the data dictionary for it was very helpful in figuring out how to structure the new
# dataframe of aggregated data.

# per_round = {"gamemode": [], "mapname": [], "matchid": [], "roundnumber": [], "objectivelocation": [],
#              "winrole": [], "endroundreason": [], "roundduration": [], "bluerole": [], "orangerole": [],
#              "teamwin": [], "blue1skill": [], "blue1level": [], "blue1kills": [], "blue1dead": [],
#              "blue1op": [], "blue1primary": [], "blue1secondary": [], "blue2skill": [],
#              "blue2level": [], "blue2kills": [], "blue2dead": [], "blue2op": [], "blue2primary": [],
#              "blue2secondary": [], "blue3skill": [], "blue3level": [], "blue3kills": [],
#              "blue3dead": [], "blue3op": [], "blue3primary": [], "blue3secondary": [],
#              "blue4skill": [], "blue4level": [], "blue4kills": [], "blue4dead": [], "blue4op": [],
#              "blue4primary": [], "blue4secondary": [], "blue5skill": [], "blue5level": [],
#              "blue5kills": [], "blue5dead": [], "blue5op": [], "blue5primary": [], "blue5secondary": [],
#              "orange1skill": [], "orange1level": [], "orange1kills": [], "orange1dead": [],
#              "orange1op": [], "orange1primary": [], "orange1secondary": [], "orange2skill": [],
#              "orange2level": [], "orange2kills": [], "orange2dead": [], "orange2op": [],
#              "orange2primary": [], "orange2secondary": [], "orange3skill": [], "orange3level": [],
#              "orange3kills": [], "orange3dead": [], "orange3op": [], "orange3primary": [],
#              "orange3secondary": [], "orange4skill": [], "orange4level": [], "orange4kills": [],
#              "orange4dead": [], "orange4op": [], "orange4primary": [], "orange4secondary": [],
#              "orange5skill": [], "orange5level": [], "orange5kills": [], "orange5dead": [],
#              "orange5op": [], "orange5primary": [], "orange5secondary": [],
#             }

# Arbitrary decision: Blue is 0, Orange is 1.

In [6]:
# ~Elegant~ programmatic data dictionary creation
per_match = {"gamemode": [], "mapname": [], "matchid": []}
for r_num in range(1, 7):
    for detail in ["objective", "endreason", "duration", "bluerole", "orangerole", "winner", "winrole"]:
        per_match["".join(["round", str(r_num), detail])] = []
    for team in ["blue", "orange"]:
        for player in range(1,6):
            for field in ["level", "skill", "kills", "dead", "op", "primary", "secondary"]:
                per_match["".join(["round", str(r_num), team, str(player), field])] = []

In [7]:
tmp_storage = {}

In [8]:
%%time
for row in data.values:
    matchid = row[4]
    r_num = row[5]
    if matchid not in tmp_storage:
        tmp_storage[matchid] = {}
    if r_num not in tmp_storage[matchid]:
        tmp_storage[matchid][r_num] = [row]
    else:
        tmp_storage[matchid][r_num].append(row)

CPU times: user 2.69 s, sys: 940 ms, total: 3.63 s
Wall time: 3.62 s


In [9]:
%%time
for matchid in tmp_storage:
    if len(tmp_storage[matchid]) == 6:  # Check to ensure enough rounds are played, and not too many.
        match = tmp_storage[matchid]
        insuff_players = False
        for r_num in range(1, 7):  # First check to ensure that there are enough players.
            if len(match[r_num]) != 10:
                insuff_players = True
        if not insuff_players:
            per_match["matchid"].append(matchid)
            frfe = match[1][0]  # First entry in the first round of the match.
            per_match["mapname"].append(frfe[3])  # Extract map name for the match.
            per_match["gamemode"].append(frfe[2])  # Extract game mode for the match.
            for r_num in range(1, 7):
                first_entry = match[r_num][0]
                # Extract per-round details
                for detail, idx in [("objective", 6), ("winrole", 7), ("duration", 9), ("endreason", 8)]:
                    per_match["".join(["round",str(r_num),detail])].append(first_entry[idx])
                
                # Begin inference of other per-round details based on first entry.
                fe_role, fe_team, fe_won = first_entry[12], first_entry[13], first_entry[14]
                if fe_won == 1:
                    per_match["".join(["round", str(r_num), "winner"])].append(fe_team)
                elif fe_team == 1:
                    per_match["".join(["round", str(r_num), "winner"])].append(0)
                else:
                    per_match["".join(["round", str(r_num), "winner"])].append(1)
                
                if fe_team == 0 and fe_role == "Defender":
                    per_match["".join(["round", str(r_num), "bluerole"])].append("Defender")
                    per_match["".join(["round", str(r_num), "orangerole"])].append("Attacker")
                elif fe_team == 1 and fe_role == "Defender":
                    per_match["".join(["round", str(r_num), "bluerole"])].append("Attacker")
                    per_match["".join(["round", str(r_num), "orangerole"])].append("Defender")
                elif fe_team == 0: # Team 0 defenders have already been matched.
                    per_match["".join(["round", str(r_num), "bluerole"])].append("Attacker")
                    per_match["".join(["round", str(r_num), "orangerole"])].append("Defender")
                else:
                    per_match["".join(["round", str(r_num), "bluerole"])].append("Defender")
                    per_match["".join(["round", str(r_num), "orangerole"])].append("Attacker")
                
                # Begin extraction of per-round-per-player details.
                blue_p_num = 1
                orange_p_num = 1
                for row in match[r_num]:
                    try:
                        for field, idx in [("level", 10), ("skill", 11), ("kills", 16), ("dead", 17),
                                           ("op", 15), ("primary", 19), ("secondary", 25)]:
                            if row[13] == 0:  # Blue team.
                                per_match["".join(["round", str(r_num), "blue", str(blue_p_num), field])].append(row[idx])
                            else:  # Orange team.
                                per_match["".join(["round", str(r_num), "orange", str(orange_p_num), field])].append(row[idx])
                        if row[13] == 0:
                            blue_p_num += 1
                        else:
                            orange_p_num += 1
                    except KeyError:
                        print(matchid, r_num, blue_p_num, orange_p_num)
                        raise

CPU times: user 3.24 s, sys: 4 ms, total: 3.25 s
Wall time: 3.25 s


In [11]:
match_data = pd.DataFrame.from_dict(per_match)

In [12]:
match_data.head()

Unnamed: 0,gamemode,mapname,matchid,round1blue1dead,round1blue1kills,round1blue1level,round1blue1op,round1blue1primary,round1blue1secondary,round1blue1skill,...,round6orange5dead,round6orange5kills,round6orange5level,round6orange5op,round6orange5primary,round6orange5secondary,round6orange5skill,round6orangerole,round6winner,round6winrole
0,PvP – BOMB,HOUSE,3960340489,1,1,169,SWAT-ASH,Assault Rifles,Pistols,Gold,...,1,1,122,SPETSNAZ-FUZE,Assault Rifles,Pistols,Unranked,Attacker,0,Defender
1,PvP – SECURE AREA,KANAL,1114831601,0,0,68,SWAT-ASH,Assault Rifles,Pistols,Unranked,...,0,3,112,SAS-THATCHER,Assault Rifles,Pistols,Unranked,Attacker,1,Attacker
2,PvP – HOSTAGE,BORDER,51742361,1,0,24,SPETSNAZ-GLAZ,Marksman Rifles,Pistols,Unranked,...,1,0,51,SAS-THATCHER,Assault Rifles,Pistols,Unranked,Attacker,0,Defender
3,PvP – HOSTAGE,HEREFORD BASE,3412208309,1,1,118,SAT-HIBANA,Assault Rifles,Submachine Guns,Gold,...,1,1,100,SAS-THATCHER,Assault Rifles,Pistols,Gold,Attacker,0,Defender
4,PvP – SECURE AREA,CHALET,3494445129,1,0,64,SAT-HIBANA,Assault Rifles,Submachine Guns,Gold,...,0,0,57,SPETSNAZ-FUZE,Light Machine Guns,Pistols,Gold,Attacker,1,Attacker


In [13]:
match_data.shape

(3798, 465)