## Logistic Regression

Notebook with implementation of the Logistic Regression algorithm to predict victory in Dota 2

-------------------------------------------------------------------------------------------------------------------------------

#### If you are running this code on Google Colab, you need to first upload the following feature file: *dota2_time_blowout_features.csv*

## Time blowout matches

Useful functions to use to explore the data and preprocessing steps before feeding the data into the algorithm:

* df.columns : to see the names of the columns (i.e., features)
* df.dtype : to see the types in the data
* data.head()
* data.info()
* df.describe()

In [None]:
# Import necessary libraries
import os
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold, train_test_split
from sklearn.metrics import confusion_matrix, precision_score, recall_score, roc_auc_score, roc_curve
from sklearn import metrics
import statistics as st
from sklearn import preprocessing

In [None]:
# NOTE: uncomment this cell if you are running this code on a local machine. Please adjust the following variables to correctly point to the feature file location on your machine

# # Set directory for the time blowout match group
# cwd = os.getcwd()
# root_directory = os.path.dirname(cwd)

# time_blowout_data_dir = root_directory + "\\model_features_pre-match\\time_blowout\\"
# path_to_features = time_blowout_data_dir + "dota2_time_blowout_features.csv"

In [None]:
# NOTE: use this cell if you are running this code on Google Colab

# Set directory for the time blowout match group. Make sure the feature file is uploaded to this Colab session
path_to_features = "/content/dota2_time_blowout_features.csv"

In [None]:
# Read the data (model feature file)
feature_time_blowout_df = pd.read_csv(path_to_features)

### Exploration and preprocessing of the data

In [None]:
# Drop first column (match id)
feature_time_blowout_df = feature_time_blowout_df.drop(['match_id'], axis=1)

# Print feature names
feature_time_blowout_df.columns

In [None]:
feature_time_blowout_df.head()

In [None]:
feature_time_blowout_df.info()

In [None]:
#Fill in missing data with the median value of the feature
feature_time_blowout_df = feature_time_blowout_df.fillna(feature_time_blowout_df.median())

In [None]:
feature_time_blowout_df['rad_first_pick'] = feature_time_blowout_df['rad_first_pick'].astype(int)

### Model building, training and evaluation

In [None]:
# Import logistic regression library
from sklearn.linear_model import LogisticRegression

In [None]:
# Split into features (X) and label (y)
X, y = feature_time_blowout_df.iloc[:,:-1],feature_time_blowout_df.iloc[:,-1]

In [None]:
features = [c for c in feature_time_blowout_df.columns if c != 'win_label']
target = 'win_label'

In [None]:
# Define the number of folds for the k-fold cross-validation
kfolds = KFold(n_splits=10, shuffle=True)

In [None]:
# instantiate the model (using the default parameters)
logreg = LogisticRegression()

# Scale features
columns_to_scale = list(range(238,456))

In [None]:
# NOTE: the training process might take a while to execute

auc = list()

mm_scaler = preprocessing.MinMaxScaler()

for train_idx, test_idx in kfolds.split(X):
    X_train, y_train = X.iloc[train_idx], y.iloc[train_idx]
    X_test, y_test = X.iloc[test_idx], y.iloc[test_idx]  
    
    X_train.iloc[:,columns_to_scale] = mm_scaler.fit_transform(X_train.iloc[:,columns_to_scale])
    logreg.fit(X_train, y_train)
    
    X_test.iloc[:,columns_to_scale] = mm_scaler.transform(X_test.iloc[:,columns_to_scale])
    y_pred_proba =logreg.predict_proba(X_test)[::,1]
    auc.append(metrics.roc_auc_score(y_test, y_pred_proba))

'Median AUC: {:.04f}'.format(st.median(auc))

  return self.partial_fit(X, y)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
  return self.partial_fit(X, y)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the cav

  return self.partial_fit(X, y)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
  return self.partial_fit(X, y)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the cav

'Median AUC: 0.7902'