# Overview

As a reminder, the purpose of this project is to be able to predict whether a team will win given certain features. In addition, the project should also indicate which factors were most responsible in determining which team won the match. 

For the statistical section of this project, bootstrapping will be used to determine whether there is a difference between the selected features of the teams that won versus the teams that lost. The features to be analyzed are as follows: 

1. Gamelength
2. Time of first tower taken
3. Number of each type of dragon
    - Earth
    - Ocean
    - Air
    - Fire 
5. Mean gold of winning team by role
    - Top
    - Jungle
    - Mid
    - ADC 
    - Support

In [76]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

In [92]:
league = pd.read_csv("../Dataset/LeagueofLegends.csv")

nans = lambda df: df[df.isnull().any(axis=1)]
nans(league) # Can't handle these kinds of NaN values, will instead drop 
league.dropna(inplace=True)
league.reset_index(drop=True, inplace=True)

In [78]:
league_wrangled = pd.read_csv("../Dataset/Wrangled_LeagueofLegends.csv")

In [93]:
league_wrangled.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7582 entries, 0 to 7581
Columns: 5720 entries, bResult to rbot_base
dtypes: float64(43), int64(5677)
memory usage: 330.9 MB


In [94]:
league.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7582 entries, 0 to 7581
Data columns (total 57 columns):
League              7582 non-null object
Year                7582 non-null int64
Season              7582 non-null object
Type                7582 non-null object
blueTeamTag         7582 non-null object
bResult             7582 non-null int64
rResult             7582 non-null int64
redTeamTag          7582 non-null object
gamelength          7582 non-null int64
golddiff            7582 non-null object
goldblue            7582 non-null object
bKills              7582 non-null object
bTowers             7582 non-null object
bInhibs             7582 non-null object
bDragons            7582 non-null object
bBarons             7582 non-null object
bHeralds            7582 non-null object
goldred             7582 non-null object
rKills              7582 non-null object
rTowers             7582 non-null object
rInhibs             7582 non-null object
rDragons            7582 non-null ob

In [95]:
def calculate_p(blue, red, feature, N=10000): 
    bs_mean_diff = np.empty(N)
    
    mean_total   = np.mean(feature)
    blue_shifted = blue - np.mean(blue) + mean_total
    red_shifted  = red - np.mean(red) + mean_total 
    
    for i in range(N): 
        bs_blue = np.random.choice(blue_shifted, size=len(league))
        bs_red = np.random.choice(red_shifted, size=len(league))
        bs_mean_diff[i] = np.mean(bs_blue) - np.mean(bs_red)
        
    emperical_mean_diff = np.mean(blue) - np.mean(red)
    p = np.sum(bs_mean_diff >= emperical_mean_diff) / len(gamelength_mean_diff)
    return p

## Gamelength

Null hypothesis: There is no difference in average game legnth between games where the Red Team won versus games where the Blue                  Team won.

Alternative hypothesis: There is a significant difference between the length of games where the Red Team won versus games where                         the Blue Team won. 

Testing this hypothesis will help determine if game length is a determining factor in which team is more likely to win. A possible scenario where this could be true is if the blue team is more likely to pick champions that outscale whereas the red team is more likely to pick champions that are stronger early in the game. 

In [96]:
blue_win_gamelength = league.gamelength[league.bResult == 1]
red_win_gamelength  = league.gamelength[league.bResult == 0]
gamelength = league.gamelength

p = calculate_p(blue_win_gamelength, 
                red_win_gamelength, 
                gamelength)

print(f"p-value: {p}")

p-value: 1.0


## Time first tower taken 

In [97]:
import ast 

def find_first_column(col_1, col_2, col_3): 
    if col_1 == 1:
        tower_list = ast.literal_eval(col_2)
        return tower_list[0][0]

    else: 
        tower_list = ast.literal_eval(col_3)
        return tower_list[0][0]

    
league['first_tower'] = league.apply(lambda x: find_first_column(x.bResult, x.bTowers, x.rTowers), axis=1)

In [98]:
blue_first_tower = league.first_tower[league.bResult == 1]
red_first_tower = league.first_tower[league.rResult == 0]
first_tower = league.first_tower

p = calculate_p(blue_first_tower,
                red_first_tower, 
                first_tower)

print(f"p-value: {p}")

p-value: 0.4908


## Number of Each Type of Dragon
    
Null hypothesis: There is no difference in the number of each dragon taken when the Blue Team won versus when the Red Team won. 

Alternative hypothesis: A significant difference exists between the number of each dragon taken when the Blue Team wins versus                         when the Red Team wins.


In [107]:
def find_winning_drag(col_1, col_2, col_3): 
    if col_1 == 1: 
        return col_2
    else:
        return col_3

In [100]:
league['winning_earth'] = league_wrangled.apply(lambda x: find_winning_drag(x.bResult, x.bEarth_drag, x.rEarth_drag), axis=1)
league['winning_ocean'] = league_wrangled.apply(lambda x: find_winning_drag(x.bResult, x.bWater_drag, x.rWater_drag), axis=1)
league['winning_air']   = league_wrangled.apply(lambda x: find_winning_drag(x.bResult, x.bAir_drag, x.rAir_drag), axis=1)
league['winning_fire']  = league_wrangled.apply(lambda x: find_winning_drag(x.bResult, x.bFire_drag, x.rFire_drag), axis=1)
league['winning_elder'] = league_wrangled.apply(lambda x: find_winning_drag(x.bResult, x.bElder_drag, x.rElder_drag), axis=1)

In [102]:
blue_earth  = league.winning_earth[league.bResult == 1]
red_earth   = league.winning_earth[league.bResult == 0]
earth_drags = league.winning_earth

p = calculate_p(blue_earth, 
                red_earth, 
                earth_drags)

print(f"p-value: {p}")

p-value: 0.9999


In [104]:
blue_ocean  = league.winning_ocean[league.bResult == 1]
red_ocean   = league.winning_ocean[league.bResult == 0]
ocean_drags = league.winning_ocean

p = calculate_p(blue_ocean, 
                red_ocean, 
                ocean_drags)

print(f"p-value: {p}")

p-value: 0.9999


In [105]:
blue_air  = league.winning_air[league.bResult == 1]
red_air   = league.winning_air[league.bResult == 0]
air_drags = league.winning_air

p = calculate_p(blue_air, 
                red_air, 
                air_drags)

print(f"p-value: {p}")

p-value: 1.0


In [103]:
blue_fire  = league.winning_fire[league.bResult == 1]
red_fire   = league.winning_fire[league.bResult == 0]
fire_drags = league.winning_fire

p = calculate_p(blue_fire, 
                red_fire, 
                fire_drags)

print(f"p-value: {p}")

p-value: 0.0


In [106]:
blue_elder  = league.winning_elder[league.bResult == 1]
red_elder   = league.winning_elder[league.bResult == 0]
elder_drags = league.winning_elder

p = calculate_p(blue_elder, 
                red_elder, 
                elder_drags)

print(f"p-value: {p}")

p-value: 1.0


### Additional Analysis on Infernal Dragons

In [147]:
blue_infernal = league.winning_fire[league.bResult == 1]
red_infernal  = league.winning_fire[league.bResult == 0]
print("\033[1m" + "Infernal Dragon statistics by winning team:" + "\033[0m" )

print(f"Blue sum: {blue_infernal.sum()}")
print(f"Red sum:  {red_infernal.sum()}")
print()
print(f"Blue mean: {blue_infernal.mean()}")
print(f"Red mean:  {red_infernal.mean()}")
print()
print(f"Blue std: {blue_infernal.std()}")
print(f"Red std:  {red_infernal.std()}")

[1mInfernal Dragon statistics by winning team:[0m
Blue sum: 1570.0
Red sum:  516.0

Blue mean: 0.38042161376302397
Red mean:  0.14934876989869755

Blue std: 0.6769413952055907
Red std:  0.3979303851311427


## Mean Gold of Winning Team

Null hypothesis: There is no difference between which role is most impactful based on who wins the game. 

Alternative hypothesis: One role is more likely to be more impactful depending on the side of the team. For example, perhaps                           the top laner recieved more gold throughout the game. 


In [129]:
def find_winning_gold(col_1, col_2, col_3): 
    if col_1 == 1: 
        gold = ast.literal_eval(col_2)
        return np.mean(gold)
    else:
        gold = ast.literal_eval(col_3)
        return np.mean(gold)

In [130]:
league['top_gold']     = league.apply(lambda x: find_winning_gold(x.bResult, x.goldblueTop, x.goldredTop), axis=1)
league['jungle_gold']  = league.apply(lambda x: find_winning_gold(x.bResult, x.goldblueJungle, x.goldredJungle), axis=1)
league['mid_gold']     = league.apply(lambda x: find_winning_gold(x.bResult, x.goldblueMiddle, x.goldredMiddle), axis=1)
league['adc_gold']     = league.apply(lambda x: find_winning_gold(x.bResult, x.goldblueADC, x.goldredADC), axis=1)
league['support_gold'] = league.apply(lambda x: find_winning_gold(x.bResult, x.goldblueSupport, x.goldredSupport), axis=1)

In [131]:
blue_top_gold = league.top_gold[league.bResult == 1]
red_top_gold  = league.top_gold[league.bResult == 0]
top_gold      = league.top_gold

p = calculate_p(blue_top_gold, 
                red_top_gold, 
                top_gold)

print(f"p-value: {p}")

p-value: 1.0


In [132]:
blue_jungle_gold = league.jungle_gold[league.bResult == 1]
red_jungle_gold  = league.jungle_gold[league.bResult == 0]
jungle_gold      = league.jungle_gold

p = calculate_p(blue_jungle_gold, 
                red_jungle_gold, 
                jungle_gold)

print(f"p-value: {p}")

p-value: 1.0


In [133]:
blue_mid_gold = league.mid_gold[league.bResult == 1]
red_mid_gold  = league.mid_gold[league.bResult == 0]
mid_gold      = league.mid_gold

p = calculate_p(blue_mid_gold, 
                red_mid_gold, 
                mid_gold)

print(f"p-value: {p}")

p-value: 1.0


In [134]:
blue_adc_gold = league.adc_gold[league.bResult == 1]
red_adc_gold  = league.adc_gold[league.bResult == 0]
adc_gold      = league.adc_gold

p = calculate_p(blue_adc_gold, 
                red_adc_gold, 
                adc_gold)

print(f"p-value: {p}")

p-value: 1.0


In [135]:
blue_support_gold = league.support_gold[league.bResult == 1]
red_support_gold  = league.support_gold[league.bResult == 0]
support_gold      = league.support_gold

p = calculate_p(blue_support_gold, 
                red_support_gold, 
                support_gold)

print(f"p-value: {p}")

p-value: 1.0
