# **Neural Network on MBA Admission Dataset**

## **Overview**
In this notebook, we will apply the Neural Network algorithm to predict the admission status of MBA applicants based on various features like gender, GPA, GMAT score, work experience, and more. We will also evaluate the performance of the model using key classification metrics such as Accuracy, Precision, Recall, and F1-score, and visualize the structure of the Neural Network

### **Dataset**
The dataset contains information on MBA applicants, including:
- **Gender**: Gender of the applicant
- **International**: Whether the applicant is an international student
- **GPA**: Grade point average
- **Major**: Undergraduate major
- **Race**: Race/ethnicity of the applicant
- **GMAT**: GMAT score of the applicant
- **Work Experience**: Number of years of work experience
- **Work Industry**: The industry where the applicant works
- **Admission**: Whether the applicant was admitted (Target Variable)

### **Objective**
Our goal is to use a Neural Network to predict whether an applicant will be admitted based on the available features, while also visualizing the tree to understand how the model makes decisions.


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay

# Load the dataset
data = pd.read_csv('MBA.csv')

# Display the first few rows of the dataset
data.head()


## Data Preprocessing

The data will be processed in a similar manner to the KNN example, including handling missing values, encoding categorical variables, and splitting the dataset.


In [None]:
data['race']=data['race'].fillna('International')
data['admission']=data['admission'].fillna('Deny')
data.isna().sum()

data["admission"].value_counts()
# Dropping all rows where the 'admission' column is 'Waitlist'
data = data[data['admission'] != 'Waitlist']

# Verifying that the 'Waitlist' rows are dropped
data['admission'].value_counts()

# Encode categorical variables
data['gender'] = data['gender'].map({'Male': 0, 'Female': 1})
data['international'] = data['international'].astype(int)
data['admission'] = data['admission'].map({'Admit': 1, 'Deny': 0})

# One-hot encode categorical columns like 'major', 'race', and 'work_industry'
data = pd.get_dummies(data, columns=['major', 'race', 'work_industry'], drop_first=True)

# Display the processed dataset
data.head()


### Splitting the Data
We split the dataset into training and testing sets.



In [None]:
# Split the data into features (X) and target (y)
X = data.drop('admission', axis=1)
y = data['admission']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# Initialize the scaler
scaler = StandardScaler()

# Fit the scaler on the training data and transform both the training and test sets
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 🤖 **Understanding Neural Networks with the MBA Admission Dataset**

## **Overview:**
In this document, we will explore how neural networks work using the MBA Admission dataset. After preprocessing the data, we will:
1. Train a Multi-Layer Perceptron (MLP) model to predict whether an MBA applicant is admitted.
2. Understand the important parameters of MLP and how they affect model performance.
3. Evaluate model performance using classification metrics like Accuracy, Precision, Recall, and F1-Score.
4. Dive deeper into how **hidden layers** and **activation functions** contribute to the model's learning ability.

---

## **🗃️ Dataset Description:**
The MBA Admission dataset contains features such as academic scores, work experience, and other relevant metrics that help determine whether an applicant is admitted to the MBA program. The target variable is **'Admitted'**, which indicates if the applicant was admitted or not.

---

## **🔄 Preprocessing the Data:**


---

## **🤖 How Neural Networks Work:**

Multi-Layer Perceptrons (MLP) are a type of feedforward neural network, composed of an input layer, one or more hidden layers, and an output layer. Here’s how they work:

### 1. **Layers and Neurons**:
   - **Input Layer**: Receives input data (features) and passes it to the next layer.
   - **Hidden Layers**: Contain neurons that learn features from the input data. The number of neurons and layers determines the model's capacity to learn complex patterns.
   - **Output Layer**: Provides the final output, which, in this case, is a binary prediction (admitted or not).

### 2. **Activation Functions**:
   - Activation functions introduce **non-linearity** into the model, allowing it to learn complex relationships. Common activation functions include **ReLU** (Rectified Linear Unit) and **sigmoid**.

### 3. **Training and Backpropagation**:
   - **Forward Pass**: The input data passes through the network, and predictions are generated.
   - **Loss Calculation**: A loss function (e.g., **cross-entropy**) calculates the error between predicted and actual values.
   - **Backpropagation**: The model adjusts the weights by minimizing the loss using an optimization algorithm (e.g., **stochastic gradient descent**).

### 4. **Reduce Overfitting**
   - **Regularization (`alpha`)**: Tune `alpha` for L2 regularization to penalize large weights.
   - **Reduce Model Complexity**: Adjust model depth and neuron counts to control complexity.
   - **Cross-Validation**: Apply cross-validation to find optimal parameters (like `hidden_layer_sizes` and `alpha`).

---

## **📉 Plotting Training vs. Testing Curves**

To assess the model’s learning dynamics and detect overfitting or underfitting, we plot the training and testing loss curves over multiple epochs:
- **Training Curve**: Tracks the model’s performance on the training data.
- **Testing Curve**: Shows how well the model generalizes to unseen data.

This plot helps identify points where overfitting may begin.

---

## **🔍 Interpreting with SHAP**

### **What are SHAP Values?**
SHAP (SHapley Additive exPlanations) values help interpret the contribution of each feature in the model's prediction. SHAP values indicate how much each feature pushes a prediction higher or lower compared to the average prediction.

### **Using SHAP with Our Model**
1. **SHAP Summary Plot**: Shows feature importance by displaying the average SHAP value of each feature.
2. **SHAP Force Plot**: Visualizes individual predictions, highlighting the positive or negative contribution of each feature.
3. **SHAP Dependence Plot**: Examines the effect of individual features in the context of interactions with other features.

---

## **📊 Evaluation Metrics**

To evaluate the model, we use:
- **Mean Absolute Error (MAE)**: Measures the average magnitude of errors in predictions.
- **Mean Squared Error (MSE)**: Penalizes larger errors, useful for gauging prediction accuracy.
- **R-squared (R²)**: Indicates the proportion of variance explained by the model.

---

## **🎯 Summary and Insights**
By combining scaling, overfitting reduction, visualizations, and SHAP values, we gain a comprehensive understanding of the model’s performance and feature importance. This project not only predicts housing prices but also sheds light on the impact of each feature, making it useful for practical insights into real estate valuation. 🏡💼

In [None]:
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV, learning_curve
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import log_loss
import matplotlib.pyplot as plt
import numpy as np
# Define the MLPClassifier

def our_mlp(mlp, num_epochs):
    
    # Prepare lists to track the loss
    train_losses = []
    test_losses = []
    
    # Training loop
    for epoch in range(num_epochs):
        # Train with partial fit (incremental learning)
        mlp.partial_fit(X_train, y_train, classes=np.unique(y_train))

        # Calculate training loss
        train_loss = log_loss(y_train, mlp.predict_proba(X_train))
        train_losses.append(train_loss)

        # Calculate test loss
        test_loss = log_loss(y_test, mlp.predict_proba(X_test))
        test_losses.append(test_loss)
        if epoch % 50 == 0:
            print(f"Epoch {epoch+1}/{num_epochs} - Training Loss: {train_loss:.4f}, Test Loss: {test_loss:.4f}")
    return (train_losses, test_losses)
    
def plot_mlp(train_losses, test_losses):
    # Plot the training and test loss curves
    plt.figure(figsize=(10, 6))
    plt.plot(range(1, len(train_losses) + 1), train_losses, '-', color='r', label='Training loss')
    plt.plot(range(1, len(test_losses) + 1), test_losses, '-', color='g', label='Test loss')
    plt.title('Training and Test Loss Curves (MLPClassifier)')
    plt.xlabel('Epoch')
    plt.ylabel('Log Loss')
    plt.legend(loc='best')
    plt.grid()
    plt.show()

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(50, 30), warm_start=True, random_state=42)
mlp.fit(X_train, y_train)

# Predict on test data
y_pred = mlp.predict(X_test)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(50, 30), max_iter=1, warm_start=True, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 300)
plot_mlp(train_losses, test_losses)

### Explanation
- **Hidden Layers**: We define two hidden layers with 50 and 30 neurons, respectively. More neurons and layers can increase the model's ability to learn but may also lead to overfitting.
- **Max Iterations**: Controls how many times the model will iterate to optimize weights.
- **Activation Function**: The default activation function is **ReLU** for hidden layers, allowing the model to handle non-linearity in data.

---

## **📊 Evaluating Model Performance:**

To evaluate the MLP model, we use several metrics:

1. **Accuracy**: Measures the overall correctness of predictions.
2. **Classification Report**: Provides **precision**, **recall**, and **F1-score** for each class, offering a detailed view of model performance.
3. **Confusion Matrix**: Shows the counts of **true positives**, **true negatives**, **false positives**, and **false negatives**, helping us understand where the model makes mistakes.

---

## **💡 Insights on Neural Network Performance:**

- **Hidden Layer Size**: Adding more neurons or layers can improve performance but may also require more computational resources and may lead to overfitting if not properly regularized.
- **Scaling**: Neural networks are sensitive to input scale, hence scaling is crucial for effective training.

In this document, we explored how neural networks can be applied to predict MBA admissions. We went through data preprocessing, model training, and performance evaluation, offering insights into how MLP parameters affect model performance.

# 🌟 Reducing Overfitting in MLPClassifier: A Comprehensive Guide

When training an `MLPClassifier`, overfitting can hinder performance on unseen data. This guide explores various strategies to mitigate overfitting effectively. Let's dive into each method and make your model robust and generalizable! 🎉 

---

## 🛠️ 1. Regularization with `alpha`

- **What It Does**: The `alpha` parameter controls L2 regularization, which penalizes large weights, reducing model complexity.
- **How to Use It**:
  ```python

  ```
- **Pro Tip**: Start with a small `alpha` (e.g., 0.0001) and increase it gradually. Too high an `alpha` can lead to underfitting. 🎛️

In [None]:
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(50, 30), max_iter=1000, alpha=0.0001, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 300)
plot_mlp(train_losses, test_losses)

In [None]:
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(50, 30), max_iter=1000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 300)
plot_mlp(train_losses, test_losses)

---

## 🔍 2. Reduce Model Complexity

- **Why**: Smaller networks have fewer parameters, making it harder for the model to memorize the training data.
- **Example**:
  ```python
  mlp = MLPClassifier(hidden_layer_sizes=(30,), max_iter=300, random_state=42)
  ```
- **Configurations to Try**: Test hidden layer sizes like `(50,)`, `(30, 20)`, or `(50, 50)`. Finding the balance is key! ⚖️

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(10), max_iter=1000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 300)
plot_mlp(train_losses, test_losses)

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(20), max_iter=1000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)


# 🌟 Understanding SHAP Values: A Quick Guide

SHAP (SHapley Additive exPlanations) values are a powerful tool to interpret machine learning models. They help us see how much each feature contributes to a specific prediction, making even complex models more understandable.

---

### ✨ Key Concepts

- **SHAP Value**: Measures each feature's contribution to a prediction.
  - **Positive SHAP value** ➡️ The feature pushes the prediction higher.
  - **Negative SHAP value** ➡️ The feature pushes the prediction lower.

### 🔍 How SHAP Values Work

1. **Base Value** 🎯: This is the average prediction of the model across all data points, serving as the starting point for each prediction.
2. **Feature Contributions** 📊: SHAP values show how much each feature “pushes” the prediction away from the base value, calculated by considering all possible combinations of features.

---

### 💡 Example

Imagine a model predicts a credit score of **0.8** (on a scale of 0 to 1), and the base value is **0.5**:

- **Salary** 🤑 with a SHAP value of **0.2** means it **increases** the score by 0.2.
- **Debt** 💸 with a SHAP value of **-0.1** means it **decreases** the score by 0.1.

So, the final prediction **0.8** is the base value **0.5** plus the contributions from salary and debt.

---

### 📝 Summary

SHAP values let us open the “black box” of machine learning models, revealing how each feature influences the outcome. They provide transparency and insight, helping you trust and understand your model's predictions.

---

With SHAP values, even the most complex models become interpretable, empowering you to make informed decisions with confidence! 🎉

# 🔎 Using SHAP Values for Model Interpretation in MLPClassifier

SHAP (SHapley Additive exPlanations) values provide insights into feature contributions to model predictions, which is especially helpful in understanding and refining models trained with scikit-learn's `MLPClassifier`.

## Why Use SHAP?
- **Feature Importance**: SHAP values help interpret which features are driving the model’s predictions and can reveal if specific features overly influence the model.
- **Overfitting Detection**: If certain features dominate the predictions, it might indicate overfitting.

In [None]:
import shap
# Explain the model's predictions using SHAP

# Randomly sample 100 instances from the training set for the SHAP background
background = shap.sample(X_train, 20)

explainer = shap.KernelExplainer(mlp.predict, background, feature_names=X.columns)


In [None]:

shap_values = explainer(X_test)  # Calculate SHAP values for the test set

# Understanding SHAP Plots for Model Interpretation

Using SHAP values provides insights into feature importance and individual predictions. This document explains how to interpret different types of SHAP plots, including summary, force, and dependence plots.

---

## 1. SHAP Summary Plot (Bar)

```python
# Summary plot of SHAP values (bar plot for feature importance)
shap.summary_plot(shap_values, X_test, plot_type="bar")
```
### What It Shows
- **Purpose**: Provides a high-level view of feature importance across the entire dataset.
- **Interpretation**: Each bar represents the average absolute SHAP value of a feature, indicating its overall importance in the model.
- **Insight**: Longer bars indicate features that have a larger impact on model predictions on average. This is useful for identifying the most influential features.

In [None]:
shap.summary_plot(shap_values, X_test, plot_type="bar", feature_names=X.columns)  # Bar plot for feature importance


---

## 2. SHAP Summary Plot (Density)

```python

```
### What It Shows
- **Purpose**: Displays the distribution of SHAP values for each feature, highlighting how they contribute to different predictions.
- **Interpretation**: Each point represents a SHAP value for a feature in a specific instance. Points are color-coded by feature values (e.g., blue for low and red for high).
- **Insight**: The spread of SHAP values for each feature shows its influence across various instances. Features with both positive and negative SHAP values indicate they contribute to increasing or decreasing the prediction based on their values.

In [None]:
# Detailed SHAP value plot for individual features (traditional summary plot)
shap.summary_plot(shap_values, X_test, feature_names=X.columns)

---

## 3. SHAP Force Plot (Single Prediction)


### What It Shows
- **Purpose**: Explains a single prediction by showing how each feature value pushes the prediction away from the expected value (baseline).
- **Interpretation**: Features that push the prediction higher are shown in red, while those pushing it lower are in blue. The length of each segment represents the strength of the feature’s impact.
- **Insight**: The force plot helps in understanding the main drivers behind a specific prediction. It is particularly useful in identifying key factors that lead to higher or lower model outputs.

In [None]:
import numpy as np
import pandas as pd

# Round SHAP values and the expected value to 2 decimals
rounded_shap_values = np.round(shap_values[0].values, 2)
rounded_expected_value = np.round(explainer.expected_value, 2)

# Convert the first instance of X_test to a pandas Series and round values
X_test_rounded = pd.Series(X_test[0], index=X.columns).apply(lambda x: f"{x:.2f}")

# Generate the force plot
shap.force_plot(rounded_expected_value, rounded_shap_values, X_test_rounded, 
                matplotlib=True, feature_names=X.columns)



# SHAP Force Plot Explanation

This document provides a brief explanation of the SHAP force plot and how each feature affects the model's prediction for a single instance.

---

### Structure of the Plot

1. **Base Value**: This is the starting point (or baseline) of the model’s prediction, which is the average prediction across all training data. In this plot, it is near zero.

2. **f(x)**: This represents the final prediction for this specific instance after considering the contributions of each feature. Here, `f(x)` is **-0.01**.

3. **Feature Contributions (Arrows)**:
   - **Red Arrows**: Features pushing the prediction higher (positive impact on `f(x)`).
   - **Blue Arrows**: Features pushing the prediction lower (negative impact on `f(x)`).
   - The length of each arrow represents the strength of the feature’s impact.

---

### Interpretation of Each Feature

- **`international = 1.53`**: Positively impacts `f(x)`, pushing it up by about 0.15.
- **`gpa = 1.78`**: Has a positive effect, increasing `f(x)` by around 0.10.
- **`gender = 1.33`**: Slightly increases `f(x)`, with an impact close to 0.05.
- **`application_id = 0.16`**: Has a minor negative effect, decreasing `f(x)` by around 0.02.
- **`gmat = -0.84`**: Negatively impacts `f(x)`, lowering it by about 0.10.
- **`major_Humanities = -0.82`**: Strongly decreases `f(x)`, with a negative impact of around 0.15.

### Summary

The model’s prediction `f(x)` of **-0.01** is the combined result of all these feature effects. Positive contributions from `international`, `gpa`, and `gender` are nearly balanced by the negative contributions from `gmat` and `major_Humanities`, resulting in a neutral prediction.

This plot allows us to understand which features have the most influence on this particular prediction and whether their impact is positive or negative.


---

## 4. SHAP Dependence Plot (for a specific feature)

### What It Shows
- **Purpose**: Shows the effect of a specific feature on the prediction while considering the impact of other interacting features.
- **Interpretation**: The x-axis represents values of the selected feature, and the y-axis shows SHAP values for that feature. Color coding represents another interacting feature, providing context on how interactions affect predictions.
- **Insight**: This plot highlights both the individual impact of a feature and any interactions with other features. For example, if 'gmat' score has a high SHAP value at certain levels, the plot can reveal how it influences predictions in the context of other variables.

In [None]:
shap.plots.scatter(shap_values[:, "gmat"], color=shap_values[:, "gender"])

In [None]:
shap.plots.scatter(shap_values[:, "gmat"], color=shap_values[:, "race_Black"])

In [None]:
# Interactive SHAP dependence plot for a specific feature (e.g., 'gmat' score)
shap.dependence_plot('gmat', shap_values.values, X_test, feature_names=X.columns)

---

## Summary

By combining these SHAP plots, you can gain a comprehensive understanding of your model:
- The **bar summary plot** offers a quick look at overall feature importance.
- The **density summary plot** provides more detail on feature impact distributions.
- The **force plot** explains individual predictions.
- The **dependence plot** reveals interactions and the influence of specific features.

Using these plots together enables a deep dive into model behavior, helping diagnose potential overfitting, feature dominance, and instance-specific impacts.