# League of Legends Match Outcome Prediction

This project predicts whether a League of Legends match is won or lost using a logistic regression model implemented in PyTorch. The notebook includes exploratory data analysis (EDA), data preprocessing, model training, evaluation, and visualization.

---

## 1. Import Libraries


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import torch
import torch.nn as nn
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


## 2. Load Dataset
Load the dataset and inspect the first few rows and basic info.


In [None]:
data = pd.read_csv('league_of_legends_data_large.csv')
print("Dataset Shape:", data.shape)
data.head()

## 3. Exploratory Data Analysis (EDA)

We will check:

- Target distribution
- Some basic statistics of features
- Correlation heatmap



In [None]:
# Target distribution
sns.countplot(x='win', data=data)
plt.title("Win/Loss Distribution")
plt.show()

# Summary statistics
print(data.describe())

# Correlation heatmap (for top 10 features with highest variance)
top_features = data.var().sort_values(ascending=False).head(10).index
plt.figure(figsize=(10,8))
sns.heatmap(data[top_features].corr(), cmap='coolwarm', annot=True, fmt=".2f")
plt.title("Correlation Heatmap of Top Features")
plt.show()


## 4. Data Preprocessing
Split the data, standardize features, and convert to PyTorch tensors.


In [None]:
X = data.drop('win', axis=1)
y = data['win']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.FloatTensor(y_train.values).view(-1, 1)
y_test = torch.FloatTensor(y_test.values).view(-1, 1)


## 5. Define Logistic Regression Model in PyTorch


In [None]:
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(LogisticRegressionModel, self).__init__()
        self.linear = nn.Linear(input_size, output_size)

    def forward(self, x):
        x = self.linear(x)
        x = torch.sigmoid(x)
        return x

input_dimention = X_train.shape[1]
model = LogisticRegressionModel(input_dimention, output_size=1)


## 6. Train the Model
We use binary cross-entropy loss and SGD optimizer with L2 regularization (weight decay).


In [None]:
Loss_function = nn.BCELoss()
optimizer_regularised = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01)

epochs = 1000
for i in range(epochs):
    model.train()
    optimizer_regularised.zero_grad()
    output = model(X_train)
    loss = Loss_function(output, y_train)
    loss.backward()
    optimizer_regularised.step()


## 7. Evaluate the Model


In [None]:
model.eval()
with torch.no_grad():
    y_pred_probs = model(X_test)
    y_pred = (y_pred_probs > 0.5).int()

y_test_np = y_test.cpu().numpy()
y_pred_np = y_pred.cpu().numpy()
y_pred_probs_np = y_pred_probs.cpu().numpy()

# Confusion Matrix
cm = confusion_matrix(y_test_np, y_pred_np)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show()

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test_np, y_pred_np, digits=4))

# ROC Curve
fpr, tpr, _ = roc_curve(y_test_np, y_pred_probs_np)
roc_auc = auc(fpr, tpr)
plt.figure(figsize=(7,6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

print(f"AUC Score: {roc_auc:.4f}")


## 9. Feature Importance Analysis

In logistic regression, the weights of the linear layer indicate the impact of each feature on the prediction.  
Larger absolute weights correspond to more influential features. Positive weights suggest a positive correlation with the outcome (predicting the positive class), while negative weights suggest the opposite.  

This section extracts the model weights, creates a DataFrame to display feature importance, and visualizes the top 15 features.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#Extract model weights
#model.linear.weight shape: [1, num_features], flatten to 1D array
feature_weights = model.linear.weight.data.numpy().flatten()


feature_names = data.drop('win', axis=1).columns
feature_importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': feature_weights
})

#Sort by absolute value of importance
feature_importance_df['Abs_Importance'] = np.abs(feature_importance_df['Importance'])
feature_importance_df = feature_importance_df.sort_values(by='Abs_Importance', ascending=False)


print("Top 15 Features by Importance:\n")
print(feature_importance_df.head(15))


plt.figure(figsize=(10,8))
plt.barh(feature_importance_df['Feature'][:15][::-1], feature_importance_df['Importance'][:15][::-1])
plt.xlabel('Importance (Weight Value)')
plt.title('Top 15 Feature Importances')
plt.show()


## 8. Conclusion

- The logistic regression model using PyTorch successfully predicts match outcomes.
- EDA provided insights into feature correlations and target distribution.
- Confusion matrix, classification report, and ROC/AUC curves give a clear understanding of model performance.
- This notebook can be uploaded to GitHub as a portfolio-ready project.
