### Step 1: Data Loading and Preprocessing

#### Task 1: Load the League of Legends dataset and preprocess it for training.

Loading and preprocessing the dataset involves reading the data, splitting it into training and testing sets, and standardizing the features. You will utilize `pandas` for data manipulation, `train_test_split` from `sklearn` for data splitting, and `StandardScaler` for feature scaling.

Note: Please ensure all the required libraries are installed and imported.

1. Load the dataset:
   Use `pd.read_csv()` to load the dataset into a pandas DataFrame.
   
2. Split data into features and target: Separate win (target) and the remaining columns (features).
   - X = data.drop('win', axis=1)
   - y = data['win']
   
3. Split the Data into Training and Testing Sets:
   Use `train_test_split()` from `sklearn.model_selection` to divide the data. Set `test_size`=0.2 to allocate 20% for testing and 80% for training, and use `random_state`=42 to ensure reproducibility of the split.
   
4. Standardize the features:
   Use `StandardScaler()` from sklearn.preprocessing to scale the features.
   
5. Convert to PyTorch tensors:
   Use `torch.tensor()` to convert the data to PyTorch tensors.

#### Exercise 1:

Write a code to load the dataset, split it into training and testing sets, standardize the features, and convert the data into PyTorch tensors for use in training a PyTorch model.


### Setup
Installing required libraries:

The following required libraries are not pre-installed in the Skills Network Labs environment. You will need to run the following cell to install them:


In [None]:
# %pip install pandas scikit-learn matplotlib
# %pip install torch==2.8.0+cpu torchvision==0.23.0+cpu torchaudio==2.8.0+cpu \
#     --index-url https://download.pytorch.org/whl/cpu


In [None]:
# Write your code here
# Load the dataset, split into train/test, standardize features, and convert to PyTorch tensors


### Step 2: Logistic Regression Model

#### Task 2: Implement a logistic regression model using PyTorch.

Defining the logistic regression model involves specifying the input dimensions, the forward pass using the sigmoid activation function, and initializing the model, loss function, and optimizer.

1. Define the Logistic Regression Model:
   Create a class LogisticRegressionModel that inherits from torch.nn.Module.
   - In the `__init__()` method, define a linear layer (nn.Linear) to implement the logistic regression model.
   - The `forward()` method should apply the sigmoid activation function to the output of the linear layer.

2. Initialize the Model, Loss Function, and Optimizer:
   - Set input_dim: Use `X_train.shape[1]` to get the number of features from the training data (X_train).
   - Initialize the model: Create an instance of the LogisticRegressionModel class (e.g., `model = LogisticRegressionModel()`) while passing input_dim as a parameter
   - Loss Function: Use `BCELoss()` from torch.nn (Binary Cross-Entropy Loss).
   - Optimizer: Initialize the optimizer using `optim.SGD()` with a learning rate of 0.01

#### Exercise 2:

Define the logistic regression model using PyTorch, specifying the input dimensions and the forward pass. Initialize the model, loss function, and optimizer.


In [None]:
# Write your code here
# Define the LogisticRegressionModel class and initialize model, loss function, and optimizer


### Step 3: Model Training

#### Task 3: Train the logistic regression model on the dataset.

The training loop will run for a specified number of epochs. In each epoch, the model makes predictions, calculates the loss, performs backpropagation, and updates the model parameters.

1. Set Number of Epochs:  
   - Define the number of epochs for training to 1000.

2. Training Loop:  
   For each epoch:
   - Set the model to training mode using `model.train()`.
   - Zero the gradients using `optimizer.zero_grad()`.
   - Pass the training data (`X_train`) through the model to get the predictions (`outputs`).
   - Calculate the loss using the defined loss function (`criterion`).
   - Perform backpropagation with `loss.backward()`.
   - Update the model's weights using `optimizer.step()`.

3. Print Loss Every 100 Epochs:  
   - After every 100 epochs, print the current epoch number and the loss value.

4. Model Evaluation:  
   - Set the model to evaluation mode using `model.eval()`.
   - Use `torch.no_grad()` to ensure no gradients are calculated during evaluation.
   - Get predictions on both the training set (`X_train`) and the test set (`X_test`).

5. Calculate Accuracy:  
   - For both the training and test datasets, compute the accuracy by comparing the predicted values with the true values (`y_train`, `y_test`).
   - Use a threshold of 0.5 for classification
   
6. Print Accuracy:  
   - Print the training and test accuracies after the evaluation is complete.

#### Exercise 3:

Write the code to train the logistic regression model on the dataset. Implement the training loop, making predictions, calculating the loss, performing backpropagation, and updating model parameters. Evaluate the model's accuracy on training and testing sets.


In [None]:
# Write your code here
# Train the model for 1000 epochs and evaluate accuracy


### Step 4: Model Optimization and Evaluation

#### Task 4: Implement optimization techniques and evaluate the model's performance.

Optimization techniques such as L2 regularization (Ridge Regression) help in preventing overfitting. The model is retrained with these optimizations, and its performance is evaluated on both training and testing sets.

**Weight Decay**: In the context of machine learning and specifically in optimization algorithms, weight_decay is a parameter used to apply L2 regularization to the model's parameters (weights). It helps prevent the model from overfitting by penalizing large weight values, thereby encouraging the model to find simpler solutions. To use L2 regularization, you need to modify the optimizer by setting the weight_decay parameter. The weight_decay parameter in the optimizer adds the L2 regularization term during training.

For example, when you initialize the optimizer with `optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01)`, the `weight_decay=0.01` term applies L2 regularization with a strength of 0.01.

1. Set Up the Optimizer with L2 Regularization:
   - Modify the optimizer to include `weight_decay` for L2 regularization.
   - Example:
     ```python
     optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01)
     ```
2. Train the Model with L2 Regularization:
   - Follow the same steps as before but use the updated optimizer with regularization during training.
   - Use epochs=1000
   
3. Evaluate the Optimized Model:
   - After training, evaluate the model on both the training and test datasets.
   - Compute the accuracy for both sets by comparing the model's predictions to the true labels (`y_train` and `y_test`).

4. Calculate and Print the Accuracy:
   - Use a threshold of 0.5 to determine whether the model's predictions are class 0 or class 1.
   - Print the training accuracy and test accuracy after evaluation.

#### Exercise 4:

Implement optimization techniques like L2 regularization and retrain the model. Evaluate the performance of the optimized model on both training and testing sets.


In [None]:
# Write your code here
# Retrain model with L2 regularization (weight_decay=0.01) and evaluate


### Step 5: Visualization and Interpretation

Visualization tools like confusion matrices and ROC curves provide insights into the model's performance. The confusion matrix helps in understanding the classification accuracy, while the ROC curve illustrates the trade-off between sensitivity and specificity.

**Confusion Matrix**: A Confusion Matrix is a fundamental tool used in classification problems to evaluate the performance of a model. It provides a matrix showing the number of correct and incorrect predictions made by the model, categorized by the actual and predicted classes.

Where:
- **True Positive (TP)**: Correctly predicted positive class (class 1).
- **True Negative (TN)**: Correctly predicted negative class (class 0).
- **False Positive (FP)**: Incorrectly predicted as positive (class 1), but the actual class is negative (class 0). This is also called a Type I error.
- **False Negative (FN)**: Incorrectly predicted as negative (class 0), but the actual class is positive (class 1). This is also called a Type II error.

**ROC Curve (Receiver Operating Characteristic Curve)**:
The ROC Curve is a graphical representation used to evaluate the performance of a binary classification model across all classification thresholds. It plots two metrics:
- **True Positive Rate (TPR) or Recall (Sensitivity)**: It is the proportion of actual positive instances (class 1) that were correctly classified as positive by the model.
- **False Positive Rate (FPR)**: It is the proportion of actual negative instances (class 0) that were incorrectly classified as positive by the model.
  
**AUC**:
AUC stands for Area Under the Curve and is a performance metric used to evaluate the quality of a binary classification model. Specifically, it refers to the area under the ROC curve (Receiver Operating Characteristic curve), which plots the True Positive Rate (TPR) versus the False Positive Rate (FPR) for different threshold values.

**Classification Report**:
A Classification Report is a summary of various classification metrics, which are useful for evaluating the performance of a classifier on the given dataset.

#### Exercise 5:

Write code to visualize the model's performance using confusion matrices and ROC curves. Generate classification reports to evaluate precision, recall, and F1-score. Retrain the model with L2 regularization and evaluate the performance.


In [None]:
# Write your code here
# Visualize confusion matrix, ROC curve, and generate classification report

# Hint:
# import matplotlib.pyplot as plt
# from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
# import itertools


### Step 6: Model Saving and Loading

#### Task 6: Save and load the trained model.

This task demonstrates the techniques to persist a trained model using `torch.save` and reload it using `torch.load`. Evaluating the loaded model ensures that it retains its performance, making it practical for deployment in real-world applications.

1. Saving the Model:
   - Save the model's learned weights and biases using torch.save(). (e.g., `torch.save(model.state_dict(), 'your_model_name.pth')`)
   - Saving only the state dictionary (model parameters) is preferred because it's more flexible and efficient than saving the entire model object.

2. Loading the Model:
   - Create a new model instance (e.g., `model = LogisticRegressionModel()`) and load the saved parameters. (e.g., `model.load_state_dict(torch.load('your_model_name.pth'))`).

3. Evaluating the Loaded Model:
   - After loading, set the model to evaluation mode by calling `model.eval()`
   - After loading the model, evaluate it again on the test dataset to make sure it performs similarly to when it was first trained. Now evaluate it on the test data.
   - Use `torch.no_grad()` to ensure that no gradients are computed.

#### Exercise 6:

Write code to save the trained model and reload it. Ensure the loaded model performs consistently by evaluating it on the test dataset.


In [None]:
# Write your code here
# Save the model


In [None]:
# Write your code here
# Load the model


In [None]:
# Write your code here
# Ensure the loaded model is in evaluation mode and evaluate it


### Step 7: Hyperparameter Tuning

#### Task 7: Perform hyperparameter tuning to find the best learning rate.

By testing different learning rates, you will identify the optimal rate that provides the best test accuracy. This fine-tuning is crucial for enhancing model performance.

1. Define Learning Rates:
   - Choose these learning rates to test: [0.01, 0.05, 0.1]

2. Reinitialize the Model for Each Learning Rate:
   - For each learning rate, you'll need to reinitialize the model and optimizer (e.g., `torch.optim.SGD(model.parameters(), lr=lr)`).
   - Each new learning rate requires reinitializing the model since the optimizer and its parameters are linked to the learning rate.

3. Train the Model for Each Learning Rate:
   - Train the model for a fixed number of epochs (e.g., 50 or 100 epochs) for each learning rate, and compute the accuracy on the test set.
   - Track the test accuracy for each learning rate and identify which one yields the best performance.

4. Evaluate and Compare:
   - After training with each learning rate, compare the test accuracy for each configuration.
   - Report the learning rate that gives the highest test accuracy

#### Exercise 7:

Perform hyperparameter tuning to find the best learning rate. Retrain the model for each learning rate and evaluate its performance to identify the optimal rate.


In [None]:
# Write your code here
# Test different learning rates [0.01, 0.05, 0.1] and find the best one


### Step 8: Feature Importance

#### Task 8: Evaluate feature importance to understand the impact of each feature on the prediction.

The code to evaluate feature importance to understand the impact of each feature on the prediction.

1. Extracting Model Weights:
   - The weights of the logistic regression model represent the importance of each feature in making predictions. These weights are stored in the model's linear layer (`model.linear.weight`).
   - You can extract the weights using `model.linear.weight.data.numpy()` and flatten the resulting tensor to get a 1D array of feature importances.

2. Creating a DataFrame:
   - Create a pandas DataFrame with two columns: one for the feature names and the other for their corresponding importance values (i.e., the learned weights).
   - Ensure the features are aligned with their names in your dataset (e.g., `X_train.columns`).

3. Sorting and Plotting Feature Importance:
   - Sort the features based on the absolute value of their importance (weights) to identify the most impactful features.
   - Use a bar plot (via `matplotlib`) to visualize the sorted feature importances, with the feature names on the y-axis and importance values on the x-axis.

4. Interpreting the Results:
   - Larger absolute weights indicate more influential features. Positive weights suggest a positive correlation with the outcome (likely to predict the positive class), while negative weights suggest the opposite.

#### Exercise 8:

Evaluate feature importance by extracting the weights of the linear layer and creating a DataFrame to display the importance of each feature. Visualize the feature importance using a bar plot.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Write your code here
# Extract the weights of the linear layer


In [None]:
# Write your code here
# Create a DataFrame for feature importance and plot it

# Hint:
# weights = model.linear.weight.data.numpy().flatten()
# features = X.columns
# feature_importance = pd.DataFrame({'Feature': features, 'Importance': weights})
# feature_importance = feature_importance.sort_values(by='Importance', ascending=False)
# plt.figure(figsize=(10, 6))
# plt.bar(feature_importance['Feature'], feature_importance['Importance'])
# plt.xlabel('Features')
# plt.ylabel('Importance')
# plt.title('Feature Importance')
# plt.xticks(rotation=45)
# plt.show()


#### Conclusion:

Congratulations on completing the project! In this final project, you built a logistic regression model to predict the outcomes of League of Legends matches based on various in-game statistics. This comprehensive project involved several key steps, including data loading and preprocessing, model implementation, training, optimization, evaluation, visualization, model saving and loading, hyperparameter tuning, and feature importance analysis. This project provided hands-on experience with the complete workflow of developing a machine learning model for binary classification tasks using PyTorch.

Â© Copyright IBM Corporation. All rights reserved.
