# League of Legends Positions Impact Analysis

**Name(s)**: Ahmed Mostafa and Ethan Vo

**Website Link**: (your website link)

### Which player position, when achieving more kills than their counterpart on the opposing team, has the greatest impact on boosting the overall win rate?




## Code

In [10]:
import pandas as pd
import numpy as np
import os

import plotly.express as px
pd.options.plotting.backend = 'plotly'

### Cleaning and EDA

We extracted the columns ```gameid```, ```side```, ```position```, and ```kills``` from the League of Legends DataFrame as they are relevant to our proposed question. We cleaned the data by removing the rows that contain information about the team. We did this by removing the rows that had "team" value in the ```position``` column. We then added a new column to the DataFrame that identifies each row as whether it belongs to a position that had more kills than its counterpart in the opposing team.

In [11]:
lol_data = pd.read_csv('lol_data.csv', usecols=["gameid", "datacompleteness", "side", "position", "kills", "teamkills", "result"])
lol_data = lol_data.query("position != 'team'")
red = lol_data.sort_values(by=["gameid", "position"]).query("side == 'Red'")
blue = lol_data.sort_values(by=["gameid", "position"]).query("side == 'Blue'")
red["has_more_kills"] = np.array(red['kills']) > np.array(blue['kills'])
blue["has_more_kills"] = np.array(red['kills']) < np.array(blue['kills'])
col_added = red.merge(blue, how='outer').sort_values(by=["gameid", "position"])

### Univariate Charts

In order to increase our awareness of the data we have, we produced an interactive pie chart using ```plotly``` that reprensents the percentage of players who had more kills and won vs players who had more kills but won. This visualization is helpful to give us insight of what to expect from our proportions that represents the impact of having more kills. As we can see, 79.4% of the players who had more kills won. This gives us insight about what our proportions would look like per position. It makes us expect that having more kills should have a big impact on your chances of winning.

In [12]:
more_kills_won = (col_added
                  .query("result == 1 and has_more_kills == True")
                  .shape[0] / 
                  col_added
                  .query("has_more_kills == True")
                  .shape[0]
                 )
more_kills_lost = 1 - more_kills_won
data = {'rate': [more_kills_won, more_kills_lost],
        'Label': ['Had more kills and won', 'Had more kills but lost']}
df = pd.DataFrame(data)
fig = px.pie(df, values='rate', names='Label', title='Players winning rate when having more kills')
fig.show()

This histogram shows the distribution of kills counts per position for winning vs losing players. ...

In [13]:
fig = (px.histogram(lol_data,
                    x="kills",
                    facet_col= "position",
                    facet_row = "result",
                    title="Kills Distribution Per Position",
                    histnorm="probability density")
      )
fig.show()

### Bivariate Chart

In [14]:
prop_per_position_more = (col_added
                     .query("has_more_kills == True and result == 1")
                     .groupby("position")
                     .size() / 
                     col_added
                     .groupby('position')
                     .size()
                    )
prop_per_position_more.name = "Proportions"
fig2 = px.bar(prop_per_position_more, title="Win Rate of Position with more Kills")
fig2.update_layout(
    legend=dict(title='Variable'),  # Add legend title
    yaxis_title='Win Rate', # Add y-axis label
)
fig2.show()

In [15]:
prop_per_position_less = (col_added
                          .query("has_more_kills == False and result == 1")
                          .groupby("position")
                          .size() / 
                          col_added
                          .groupby('position')
                          .size()
                         )
observed_stat = (prop_per_position_more - prop_per_position_less)
observed_stat.name = "Proportions"
fig2 = px.bar(observed_stat, title="Increase in Win Rate of Position With More Kills vs Less Kills")
fig2.update_layout(
    legend=dict(title='Variable'),  # Add legend title
    yaxis_title='Win Rate', # Add y-axis label
)
fig2.show()

### Assessment of Missingness

Data used to answer the question was complete. All columns had observations with no missing values.


### Hypothesis Testing

- **Null Hypothesis**: The proportion of **support** position winning and having higher kills is equal to the proportion of support position winning and having less kills.
- **Alternative Hypothesis**: The proportion of support position winning and having higher kills is less than the proportion of support position winning and having less kills.

In [16]:
samples = (np.random.multinomial(len(lol_data['gameid'].unique()),
                                pvals=[0.5,0.5],
                                size = 100000)
           /10772 # Converts the sampples to proportions
           /2 # divides in half since the overall population (winning support) is 0.5, 
              #so proportions should sum up to 0.5
          )
pval = sum((samples[:, 0] - samples[:, 1]) <= observed_stat["sup"])

In [17]:
difference_in_proportions = pd.Series(samples[:, 0] - samples[:, 1])
difference_in_proportions.name = "Difference in Proportions"

In [18]:
fig = px.histogram(
    difference_in_proportions,
    histnorm='probability', 
    title='Empirical Distribution of the Differences in Proportions Between \
Winning Support with Higher Kills vs with Lower Kills')
fig.add_vline(x = observed_stat["sup"], line_color='red')
print("P-value: " + str(pval))
fig.show()

P-value: 0
