# Understanding Influences of Win Rate in League of Legends
## Phase 2: Statistical Modelling

***
## Group Name: Group 52

## Student Names:

- Anton Angelo Carasig (s306344)
- Khang Nguyen (s3894597)
- Oliver Guzowski (s3897734)

***
## Table of Contents:

- [Introduction](#intro)
- [Statistical Modelling](#sm)
- [Critique & Limitations](#cl)
- [Summary & Conclusions](#sc)

***
## Introduction <a id='intro'></a>

### Phase 1 Summary

Phase 1 of our report concluded with the notion that the way players are able to generate, maintain and grow a lead is what enables a higher chance of winning the match. Data visualization techniques illustrate gold and experience to be the main contributive factor in leveraging leads, while secondary components such as kills, especially `First Blood`, are utilized to inhibit the enemy from generating leads of their own. It is likewise noted that common objectives such as `Dragon` and `Herald` within the early 10 minutes of the match are placed at a lower priority despite the team-wide advantages that they offer. 

On a similar note, our literature review exposes certain elements that coincide within high ELO games that the dataset is unable to offer, yet is able to intrinsically highlight certain aspects such as the prominence of `Assists` or neutral objectives within the data. Noting especially the cooperation and functionality of a team as well as their priorities, the flexibility to work effectively within a team demonstrates more external "macro" aspects of the game that is not able to simply be calculated through numerical viewpoint.

It is nevertheless a fact that while these external factors may potentially play a important part in the goal of winning, we are still required to scrutinize and understand how the numerical aspects are able to influence the chances of victory. 

### Report Overview

This report aims to develop a multiple linear regression algorithm that uncovers how various factors within a LoL match can manipulate the winning chances of a team. As such, we will draw attention to the response variable that we wish to explore, `blueWins`. However, such a response variable is of a "binary" categorical feature in that, we are unable to utilize multiple linear regression due to it's binary typing. In order to counteract this issue, we have adjusted the response variable to `blueGoldDiff` - the difference in gold between the blue and red sides of the game.

### Overview of Methodology

Multiple Linear Regression (MLR) will be utilized within this report. The aforementioned technique will enable us to make more accurate predictions about our target variable `blueGoldDiff`, uncovering how certain variables influence the result of our response variable. The end result aims to generate a linear model between such independent variables and our response variable, sharing a more accurate representation on what aspects of a LoL match should be prioritized to maximize the chances of winning.

***
## Statistical Modelling <a id='sm'></a>

### Model Overview

Overview of full model, including the variables and terms you are using in your regression model.

*Not finalized on how our final dataset will look like yet*

#### Module Imports

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import patsy

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None) 

%matplotlib inline 
%config InlineBackend.figure_format = 'retina'
plt.style.use("ggplot")

df = pd.read_csv('ranked_games.csv')

### Model Fitting

Details of assumptions check, model selection, plots of residuals, and technical analysis of regression results.

**NOTE:** The second half of [this](https://github.com/vaksakalli/stats_tutorials/blob/master/Regression_Case_Study1_web.ipynb) regression case study ("Statistical Modeling and Performance Evaluation" Section) will be **very helpful** for this Model Fitting section.

#### Feature Selection

You can use the code below to to perform backward feature selection using p-values ([credit](https://github.com/vaksakalli/stats_tutorials/blob/master/Regression_Case_Study1_web.ipynb)).

In [None]:
## create the patsy model description from formula
patsy_description = patsy.ModelDesc.from_formula(formula_string_encoded)

# initialize feature-selected fit to full model
linreg_fit = model_full_fitted

# do backwards elimination using p-values
p_val_cutoff = 0.05

## WARNING 1: The code below assumes that the Intercept term is present in the model.
## WARNING 2: It will work only with main effects and two-way interactions, if any.

print('\nPerforming backwards feature selection using p-values:')

while True:

    # uncomment the line below if you would like to see the regression summary
    # in each step:
    ### print(linreg_fit.summary())

    pval_series = linreg_fit.pvalues.drop(labels='Intercept')
    pval_series = pval_series.sort_values(ascending=False)
    term = pval_series.index[0]
    pval = pval_series[0]
    if (pval < p_val_cutoff):
        break
    term_components = term.split(':')
    print(f'\nRemoving term "{term}" with p-value {pval:.4}')
    if (len(term_components) == 1): ## this is a main effect term
        patsy_description.rhs_termlist.remove(patsy.Term([patsy.EvalFactor(term_components[0])]))    
    else: ## this is an interaction term
        patsy_description.rhs_termlist.remove(patsy.Term([patsy.EvalFactor(term_components[0]), 
                                                        patsy.EvalFactor(term_components[1])]))    
        
    linreg_fit = smf.ols(formula=patsy_description, data=data_encoded).fit()
    
###
## this is the clean fit after backwards elimination
model_reduced_fitted = smf.ols(formula = patsy_description, data = data_encoded).fit()
###
    
#########
print("\n***")
print(model_reduced_fitted.summary())
print("***")
print(f"Regression number of terms: {len(model_reduced_fitted.model.exog_names)}")
print(f"Regression F-distribution p-value: {model_reduced_fitted.f_pvalue:.4f}")
print(f"Regression R-squared: {model_reduced_fitted.rsquared:.4f}")
print(f"Regression Adjusted R-squared: {model_reduced_fitted.rsquared_adj:.4f}")

***
## Critique & Limitations <a id='cl'></a>

Critique & Limitations of your approach: strengths and weaknesses in detail.

***
## Summary & Conclusions <a id='sc'></a>

### Project Summary

A comprehensive summary of your entire project (both Phase 1 and Phase 2). That is, what exactly did you do in your project? (Example: I first cleaned the data in such and such ways. And then I applied multiple linear regression techniques in such and such ways. etc).

### Summary of Findings

A comprehensive summary of your findings. That is, what exactly did you find about your particular problem?

### Conclusions

Your detailed conclusions as they relate to your goals and objectives.