## League of Legends Project


### 3 Missions were assigned for this project : 
1. Determine, based on in-game statistics of a single player, whether their team won or lost the match.
2. Identify the most important quantifiable variables that players should focus on to increase their chances
of winning.
3. Determine the most ”talented” players Ecamania could recruit for the next season.

Logically, the first step is to load the data and check its structure.

In [46]:


import pandas as pd

LoL_players_file_path = 'data\game_players_stats.csv'
LoL_players_data = pd.read_csv(LoL_players_file_path) 
LoL_players_data.columns
LoL_players_data['game_id'].count()

  LoL_players_file_path = 'data\game_players_stats.csv'


374554

First we will drop the data where some rows are not correctly inputted

In [47]:
LoL_players_data = LoL_players_data.dropna(axis=0)
LoL_players_data['game_id'].count()

374264

In [48]:
y = LoL_players_data['win']
LoL_players_data = LoL_players_data.drop(columns=['win'])

In [49]:
features = ['player_id','team_id','role','champion_name','team_kills','tower_kills','inhibitor_kills',
            'dragon_kills','herald_kills','baron_kills','player_kills','player_deaths',
            'player_assists','total_minions_killed','gold_earned','level','total_damage_dealt',
            'total_damage_dealt_to_champions','total_damage_taken','wards_placed',
            'largest_killing_spree','largest_multi_kill']
X = LoL_players_data[features]
print(X.shape)
X.head(5)


(374264, 22)


Unnamed: 0,player_id,team_id,role,champion_name,team_kills,tower_kills,inhibitor_kills,dragon_kills,herald_kills,baron_kills,...,player_assists,total_minions_killed,gold_earned,level,total_damage_dealt,total_damage_dealt_to_champions,total_damage_taken,wards_placed,largest_killing_spree,largest_multi_kill
0,0,0,Top,Irelia,7,3,0,0,0,0,...,1,179,8530,12,99007,7923,15326,8,0,1
1,1,1,Top,Vladimir,17,8,1,3,1,1,...,6,174,8565,14,100342,10857,16475,11,2,1
2,2,0,Bot,Kai'Sa,7,3,0,0,0,0,...,1,227,9613,12,116407,7011,5788,9,0,1
3,3,0,Support,Lux,7,3,0,0,0,0,...,2,19,5442,10,23555,4932,6151,25,0,0
4,4,1,Mid,Aatrox,17,8,1,3,1,1,...,4,188,10125,14,125022,10749,15481,10,3,2


Split the dataset

In [50]:
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state = 0)

Function to evaluate the model

In [54]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Function for comparing different approaches
def score_dataset(X_train, X_valid, y_train, y_valid):
    model = RandomForestClassifier(n_estimators=100, random_state=0)
    model.fit(X_train, y_train)
    preds = model.predict(X_valid)
    return accuracy_score(y_valid, preds)

One-hot encoding to change str valures to numerical values

In [56]:
from sklearn.preprocessing import OneHotEncoder

# Get list of categorical variables
s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)

print("Categorical variables:")
print(object_cols)

# Apply one-hot encoder to each column with categorical data
OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
OH_cols_train = pd.DataFrame(OH_encoder.fit_transform(X_train[object_cols]))
OH_cols_valid = pd.DataFrame(OH_encoder.transform(X_valid[object_cols]))

# One-hot encoding removed index; put it back
OH_cols_train.index = X_train.index
OH_cols_valid.index = X_valid.index

# Remove categorical columns (will replace with one-hot encoding)
num_X_train = X_train.drop(object_cols, axis=1)
num_X_valid = X_valid.drop(object_cols, axis=1)

# Add one-hot encoded columns to numerical features
OH_X_train = pd.concat([num_X_train, OH_cols_train], axis=1)
OH_X_valid = pd.concat([num_X_valid, OH_cols_valid], axis=1)

# Ensure all columns have string type
OH_X_train.columns = OH_X_train.columns.astype(str)
OH_X_valid.columns = OH_X_valid.columns.astype(str)

print("Accuracy from Approach 3 (One-Hot Encoding):")
print(score_dataset(OH_X_train, OH_X_valid, y_train, y_valid))

Categorical variables:
['role', 'champion_name']
Accuracy from Approach 3 (One-Hot Encoding):
0.9768826283051536


First model