# Model 3: Who is going to be the Starplayer of the WinningTeam in Brawlball matches?

The goal of this task is to predict the "StarPlayerBrawler" of the WinningTeam in the mode "BrawlBall". 

- 1. We need to read out the Brawlers of the winning team, since the Starplayer can only come from the winning team
- 2. The attributes "WinnerTeam", "map" and the "StarPlayerBrawler" of the match needs to to be encoded to perform a classification. The idea is that "WinnerTeam" and "map" defines the target attribute "StarPlayerBrawler"
- 3. We will try to predict the "StarPlayerBrawler" of the match with a RandomForestClassifier and interpret the result

In [2]:
import pandas as pd
from sklearn import tree
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_val_score
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from operator import itemgetter

# Reading the Dataset

In [3]:
df = pd.read_csv("rawdata.csv")

# Appending Winners

As mentioned, Brawlball is 3 vs 3 player mode. Since the "StarplayerBrawler" can only come from the WinnerTeam, we only need to consider the Winners. 
As a first step, we will append the BrawlerNames of the Winner group of each row in a list:

In [4]:
WinnerTeam = []

for index, row in df.iterrows():
    winners = []
    
    if row["WinningTeam"] == 1 and row["result"] != 0:
        winners.append(row["Brawler1Name"])
        winners.append(row["Brawler2Name"])
        winners.append(row["Brawler3Name"])
        
    
    elif row["result"] != 0:
        winners.append(row["Brawler4Name"])
        winners.append(row["Brawler5Name"])
        winners.append(row["Brawler6Name"])
    
    if len(winners) != 0:
        winners = sorted(winners)
        WinnerTeam.append(winners)

print(WinnerTeam[:10])

combinedWinnerTeam = []
    
for i in range(len(WinnerTeam)):
    winners = ""
    winners = winners + WinnerTeam[i][0]
    winners = winners + WinnerTeam[i][1]
    winners = winners + WinnerTeam[i][2]
    combinedWinnerTeam.append(winners)
    
print(combinedWinnerTeam[:10])

[['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['BEA', 'GENE', 'PIPER'], ['GENE', 'SANDY', 'SPIKE']]
['BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'BEAGENEPIPER', 'GENESANDYSPIKE']


# Creating a binary classification table

In order to predict the "StarPlayerBrawler", we need to transform the data to numerical type since sklearn cannot handle nominal input values properly. This will be done by converting the input data to a binary table. For each team and map constellation, there is 0 (false) or 1 (true) label. 

In [5]:
dfFinal = pd.DataFrame(columns=["WinnerTeam", "map"])
dfFinal["WinnerTeam"] = combinedWinnerTeam
dfFinal["map"] =df["map"]
dfFinal["StarPlayerBrawler"] = df["StarPlayerBrawler"]

print(dfFinal.head(10))
#print(dfFinal["WinnerTeam"].nunique())
#print(dfFinal["map"].nunique())

target = dfFinal.loc[:,"StarPlayerBrawler"].copy()
dfFinal = pd.get_dummies(dfFinal.loc[:,["WinnerTeam", "map"]], dtype = "uint8")

dfFinal["StarPlayerBrawler"] = target

dfFinal[["WinnerTeam_BEAGENEPIPER", "map_Backyard Bowl", "StarPlayerBrawler"]].head(10)

       WinnerTeam             map StarPlayerBrawler
0    BEAGENEPIPER   Backyard Bowl               BEA
1    BEAGENEPIPER   Backyard Bowl               BEA
2    BEAGENEPIPER   Backyard Bowl               BEA
3    BEAGENEPIPER   Backyard Bowl             PIPER
4    BEAGENEPIPER   Backyard Bowl             PIPER
5    BEAGENEPIPER   Backyard Bowl              GENE
6    BEAGENEPIPER   Backyard Bowl             PIPER
7    BEAGENEPIPER   Backyard Bowl             PIPER
8    BEAGENEPIPER   Backyard Bowl              GENE
9  GENESANDYSPIKE  Triple Dribble             SANDY


Unnamed: 0,WinnerTeam_BEAGENEPIPER,map_Backyard Bowl,StarPlayerBrawler
0,1,1,BEA
1,1,1,BEA
2,1,1,BEA
3,1,1,PIPER
4,1,1,PIPER
5,1,1,GENE
6,1,1,PIPER
7,1,1,PIPER
8,1,1,GENE
9,0,0,SANDY


This table returns only a small subset of the whole table. The full table consists of 1113 rows and 513 columns, since every team constellation needs to be represented in this binary classification table. This will be used to create a model in the next part. 

# Model Creation 

In [6]:
x_features = dfFinal.loc[:,[x for x in dfFinal.columns if x.startswith("WinnerTeam") or x.startswith("map")]]
winners = dfFinal.loc[:,["StarPlayerBrawler"]].values.reshape(-1,)

In [7]:
tree1 = tree.DecisionTreeClassifier()
tree1 = tree1.fit(x_features, winners)

In [8]:
tree2 = RandomForestClassifier()
tree2 = tree2.fit(x_features, winners)

In [9]:
scores = cross_val_score(tree1, x_features, winners, scoring='accuracy')
np.mean(scores)



0.12132670787379307

In [10]:
scores = cross_val_score(tree2, x_features, winners, scoring='accuracy')
np.mean(scores)



0.1213347877025007

The accuracy of our model is around 12.5% with a RandomForestClassifier, which is rather bad. 

To analyse the cause of this warning, we need to have a look on the number of Starplayer counts for each character: 

In [14]:
print(dfFinal.loc[:,"StarPlayerBrawler"].value_counts())

PAM         128
BEA         124
BROCK        87
EMZ          61
SANDY        53
PIPER        51
MORTIS       46
PENNY        45
SPROUT       44
COLT         44
POCO         42
BIBI         39
GALE         32
JACKY        28
BO           27
MAX          27
MR. P        25
CROW         25
GENE         24
SPIKE        21
FRANK        19
CARL         17
NITA         15
JESSIE       15
TARA         15
RICO         15
8-BIT        10
EL PRIMO      8
DARRYL        7
LEON          5
BULL          5
ROSA          4
DYNAMIKE      3
SHELLY        2
Name: StarPlayerBrawler, dtype: int64


Three Brawlers (class) in y have less than 5 occurences, which is less than n_splits=5 for the RandomForestClassifier. As an optimization problem, one could try to set a minimum occurence (e.g. 10). This could help to get a better performance & accurracy