## 🌟 **Overview**
In this project, we’ll build a Neural Network to predict the **sale price** of houses based on several key features. Using this predictive model, we’ll dive into the relationships between house characteristics and their impact on price. To evaluate our model’s performance, we’ll use essential regression metrics such as **Mean Absolute Error (MAE)**, **Mean Squared Error (MSE)**, and **R-squared (R²)**. Additionally, we’ll visualize the Neural Network structure to understand how it processes and learns from the data.

---

### 📊 **Dataset Description**
Our dataset contains rich information about houses, with various attributes that influence pricing:

- **price** 💲: Final sale price of the house (Target Variable)
- **area** 📐: Total area of the house in square feet
- **bedrooms** 🛏️: Number of bedrooms
- **bathrooms** 🛁: Number of bathrooms
- **stories** 🏢: Number of stories (floors)
- **mainroad** 🚗: Whether the house is located on the main road (`yes`/`no`)
- **guestroom** 🛋️: Indicates if there’s a guest room in the house (`yes`/`no`)
- **basement** ⬇️: Whether the house has a basement (`yes`/`no`)
- **hotwaterheating** 🔥: Availability of hot water heating (`yes`/`no`)
- **airconditioning** ❄️: Indicates if there is air conditioning (`yes`/`no`)
- **parking** 🚙: Number of parking spaces available
- **prefarea** 🌳: Whether the house is located in a preferred area (`yes`/`no`)
- **furnishingstatus** 🛠️: Furnishing status of the house (e.g., furnished, semi-furnished, unfurnished)

---

### 🎯 **Objective**
Our objective is to use a Neural Network model to predict the **price** of a house using the provided features. We aim to achieve:

- **Accurate Predictions**: Maximize the model’s performance in predicting house prices.
- **Feature Understanding**: Gain insights into how each feature impacts the sale price.
- **Model Evaluation**: Use regression metrics to ensure that the model generalizes well.



## ✨ Data Cleaning and Imputation for Missing Values ✨

In this notebook, we will handle missing data from a housing dataset. Specifically, we will identify missing values and impute them using an appropriate strategy (mode imputation for the `hotwaterheating` column).

---

## 📝 1. Load the Dataset

Let's begin by loading the dataset and displaying a few records to get an overview of the data.


In [None]:
import pandas as pd

# Load the dataset
file_path = 'Housing_V0.csv'
housing_data = pd.read_csv(file_path)

# Display the first few rows of the dataset
housing_data.head()



## 🔍 2. Check for Missing Values

Before performing any data imputation, it's essential to check for missing values in the dataset.


In [None]:
# Check for any missing values
null_values = housing_data.isnull().sum()

# Display the columns with null values
null_values[null_values > 0]


---

## 🔧 3. Impute Missing Values

In [None]:
# Impute missing values using the mode of 'hotwaterheating'
# Calculate the mode for 'hotwaterheating'
mode_value = housing_data['hotwaterheating'].mode()[0]

# Assign back to the column without chaining
housing_data['hotwaterheating'] = housing_data['hotwaterheating'].fillna(mode_value)

# Verify if any missing values remain
null_values_after_imputation = housing_data.isnull().sum()

# Display the result
null_values_after_imputation

---
## 🚫 4. Removing Outliers Based on Price

In [None]:
# Calculate Q1, Q3, and IQR for 'price'
Q1 = housing_data['price'].quantile(0.25)
Q3 = housing_data['price'].quantile(0.75)
IQR = Q3 - Q1

# Define the upper and lower bounds for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify rows with outliers in the 'price' column
outliers = housing_data[(housing_data['price'] < lower_bound) | (housing_data['price'] > upper_bound)]

# Remove outliers from the dataset
housing_data_cleaned = housing_data[~((housing_data['price'] < lower_bound) | (housing_data['price'] > upper_bound))]

# Verify the number of remaining rows
housing_data_cleaned.shape

In [None]:
housing_data_cleaned

## 🔄5. One-Hot Encoding the 'Furnishingstatus' Column

In [None]:
housing_data = pd.get_dummies(housing_data_cleaned, columns=['furnishingstatus'], drop_first=True)
# Assuming 'df' is your DataFrame

housing_data = housing_data.replace({'yes': 1, 'no': 0})

## 📏6. Scaling the 'price' Column Between 0 and 1

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Initialize the MinMaxScaler
scaler = MinMaxScaler()

# Scale the 'area' column
housing_data[['price']] = scaler.fit_transform(housing_data[['price']])

# Display the first few rows of the scaled dataset
housing_data.head()


## ✂️7. Splitting the Dataset into Training and Testing Sets

In [None]:
from sklearn.model_selection import train_test_split

# Define the feature set and target variable
X = housing_data.drop(columns=['price'])  # Example: using 'price' as the target variable
y = housing_data['price']

# Split the dataset (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the resulting datasets
print("Training set shape (X_train, y_train):", X_train.shape, y_train.shape)
print("Testing set shape (X_test, y_test):", X_test.shape, y_test.shape)


## 📏8. Scaling 

In [None]:
from sklearn.preprocessing import StandardScaler
# Initialize the StandardScaler
scaler = StandardScaler()

# Fit the scaler on the training data and transform both training and testing sets
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Now X_train_scaled and X_test_scaled are scaled versions of the original feature sets



# 🏠 **Understanding Neural Networks with the Housing Dataset**

## **Overview:**
In this document, we will explore how neural networks work using the Housing dataset. After preprocessing the data, we will:
1. Train a Multi-Layer Perceptron (MLP) model to predict the sale price of a house.
2. Understand the important parameters of MLP and how they affect model performance.
3. Evaluate model performance using regression metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared.
4. Dive deeper into how **hidden layers** and **activation functions** contribute to the model's learning ability.

---

## **🗃️ Dataset Description:**
The Housing dataset contains features such as area, number of bedrooms and bathrooms, presence of a basement, and other relevant attributes that help determine the sale price of a house. The target variable is **'price'**, representing the sale price of the house.

---

## **🔄 Preprocessing the Data:**
1. **Scaling**: Standardize features to ensure better model performance.
2. **Train-Test Split**: Divide data into training and testing sets (e.g., 80%-20%).

---

## **🤖 How Neural Networks Work:**

Multi-Layer Perceptrons (MLP) are a type of feedforward neural network, composed of an input layer, one or more hidden layers, and an output layer. Here’s how they work:

### 1. **Layers and Neurons**:
   - **Input Layer**: Receives input data (features) and passes it to the next layer.
   - **Hidden Layers**: Contain neurons that learn features from the input data. The number of neurons and layers determines the model's capacity to learn complex patterns.
   - **Output Layer**: Provides the final output, which, in this case, is a continuous prediction (house price).

### 2. **Activation Functions**:
   - Activation functions introduce **non-linearity** into the model, allowing it to learn complex relationships. Common activation functions include **ReLU** (Rectified Linear Unit) and **sigmoid**.

### 3. **Training and Backpropagation**:
   - **Forward Pass**: The input data passes through the network, and predictions are generated.
   - **Loss Calculation**: A loss function (e.g., **Mean Squared Error**) calculates the error between predicted and actual values.
   - **Backpropagation**: The model adjusts the weights by minimizing the loss using an optimization algorithm (e.g., **stochastic gradient descent**).

### 4. **Reduce Overfitting**
   - **Regularization (`alpha`)**: Tune `alpha` for L2 regularization to penalize large weights.
   - **Reduce Model Complexity**: Adjust model depth and neuron counts to control complexity.
   - **Cross-Validation**: Apply cross-validation to find optimal parameters (like `hidden_layer_sizes` and `alpha`).

---

## **📉 Plotting Training vs. Testing Curves**

To assess the model’s learning dynamics and detect overfitting or underfitting, we plot the training and testing loss curves over multiple epochs:
- **Training Curve**: Tracks the model’s performance on the training data.
- **Testing Curve**: Shows how well the model generalizes to unseen data.

This plot helps identify points where overfitting may begin.

---

## **🔍 Interpreting with SHAP**

### **What are SHAP Values?**
SHAP (SHapley Additive exPlanations) values help interpret the contribution of each feature in the model's prediction. SHAP values indicate how much each feature pushes a prediction higher or lower compared to the average prediction.

### **Using SHAP with Our Model**
1. **SHAP Summary Plot**: Shows feature importance by displaying the average SHAP value of each feature.
2. **SHAP Force Plot**: Visualizes individual predictions, highlighting the positive or negative contribution of each feature.
3. **SHAP Dependence Plot**: Examines the effect of individual features in the context of interactions with other features.

---

## **📊 Evaluation Metrics**

To evaluate the model, we use:
- **Mean Absolute Error (MAE)**: Measures the average magnitude of errors in predictions.
- **Mean Squared Error (MSE)**: Penalizes larger errors, useful for gauging prediction accuracy.
- **R-squared (R²)**: Indicates the proportion of variance explained by the model.

---

## **🎯 Summary and Insights**
By combining scaling, overfitting reduction, visualizations, and SHAP values, we gain a comprehensive understanding of the model’s performance and feature importance. This project not only predicts house prices but also provides valuable insights into the impact of each feature, making it useful for real estate valuation and analysis.

In [None]:
import numpy as np
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Define the MLPRegressor
def our_mlp(mlp, num_epochs):
    # Prepare lists to track the loss
    train_losses = []
    test_losses = []
    
    # Training loop
    for epoch in range(num_epochs):
        # Train with partial fit (incremental learning)
        mlp.partial_fit(X_train, y_train)

        # Calculate training loss (MSE for regression)
        train_loss = mean_squared_error(y_train, mlp.predict(X_train))
        train_losses.append(train_loss)

        # Calculate test loss (MSE for regression)
        test_loss = mean_squared_error(y_test, mlp.predict(X_test))
        test_losses.append(test_loss)

        if epoch % 100 == 0:
            print(f"Epoch {epoch+1}/{num_epochs} - Training Loss: {train_loss:.4f}, Test Loss: {test_loss:.4f}")
    return train_losses, test_losses
    
def plot_mlp(train_losses, test_losses):
    # Plot the training and test loss curves
    plt.figure(figsize=(10, 6))
    plt.plot(range(1, len(train_losses) + 1), train_losses, '-', color='r', label='Training loss (MSE)')
    plt.plot(range(1, len(test_losses) + 1), test_losses, '-', color='g', label='Test loss (MSE)')
    plt.title('Training and Test Loss Curves (MLPRegressor)')
    plt.xlabel('Epoch')
    plt.ylabel('Mean Squared Error')
    plt.legend(loc='best')
    plt.grid()
    plt.show()


In [None]:
mlp = MLPRegressor(hidden_layer_sizes=(50, 30), max_iter=1000, warm_start=True, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)

### Explanation
- **Hidden Layers**: We define two hidden layers with 50 and 30 neurons, respectively. More neurons and layers can increase the model's ability to learn but may also lead to overfitting.
- **Max Iterations**: Controls how many times the model will iterate to optimize weights.
- **Activation Function**: The default activation function is **ReLU** for hidden layers, allowing the model to handle non-linearity in data.

---

## **📊 Evaluating Model Performance:**

To evaluate the MLP model, we use several metrics:

1. **Accuracy**: Measures the overall correctness of predictions.
2. **Classification Report**: Provides **precision**, **recall**, and **F1-score** for each class, offering a detailed view of model performance.
3. **Confusion Matrix**: Shows the counts of **true positives**, **true negatives**, **false positives**, and **false negatives**, helping us understand where the model makes mistakes.

---

## **💡 Insights on Neural Network Performance:**

- **Hidden Layer Size**: Adding more neurons or layers can improve performance but may also require more computational resources and may lead to overfitting if not properly regularized.
- **Scaling**: Neural networks are sensitive to input scale, hence scaling is crucial for effective training.

In this document, we explored how neural networks can be applied to predict MBA admissions. We went through data preprocessing, model training, and performance evaluation, offering insights into how MLP parameters affect model performance.

# 🌟 Reducing Overfitting in MLPClassifier: A Comprehensive Guide

When training an `MLPClassifier`, overfitting can hinder performance on unseen data. This guide explores various strategies to mitigate overfitting effectively. Let's dive into each method and make your model robust and generalizable! 🎉 

---

## 🛠️ 1. Regularization with `alpha`

- **What It Does**: The `alpha` parameter controls L2 regularization, which penalizes large weights, reducing model complexity.
- **How to Use It**:
  ```python

  ```
- **Pro Tip**: Start with a small `alpha` (e.g., 0.0001) and increase it gradually. Too high an `alpha` can lead to underfitting. 🎛️

In [None]:
from sklearn.neural_network import MLPClassifier
mlp = MLPRegressor(hidden_layer_sizes=(50, 30), max_iter=10000, alpha=0.0001, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)

In [None]:
from sklearn.neural_network import MLPClassifier
mlp = MLPRegressor(hidden_layer_sizes=(50, 30), max_iter=10000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)

---

## 🔍 2. Reduce Model Complexity

- **Why**: Smaller networks have fewer parameters, making it harder for the model to memorize the training data.
- **Example**:
  ```python
  mlp = MLPClassifier(hidden_layer_sizes=(30,), max_iter=300, random_state=42)
  ```
- **Configurations to Try**: Test hidden layer sizes like `(50,)`, `(30, 20)`, or `(50, 50)`. Finding the balance is key! ⚖️

In [None]:
mlp = MLPRegressor(hidden_layer_sizes=(10), max_iter=10000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)

In [None]:
mlp = MLPRegressor(hidden_layer_sizes=(20), max_iter=1000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)



## 🎯 3. Cross-Validation

- **Purpose**: Use cross-validation to tune hyperparameters, such as `hidden_layer_sizes`, `alpha`, and `learning_rate_init`.
- **Implementation**:

- **Note**: Cross-validation provides a robust evaluation of model performance and helps find the best configuration.



Applying one or more of these strategies can help mitigate overfitting and improve the generalization of an `MLPRegressor`. Experimenting with combinations of these techniques often yields the best results.

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
  'hidden_layer_sizes': [(10,),  (30, 10), (50, 20), (100, 30), (300, 30)],
  'alpha': [0.1,  0.01, 0.001]
}
grid_search = GridSearchCV(MLPRegressor(max_iter=1000, random_state=42), param_grid, cv=5)
grid_search.fit(X_train, y_train)

In [None]:
mlp = MLPRegressor(hidden_layer_sizes=(300, 30), max_iter=1000, alpha=0.1, random_state=42)
(train_losses, test_losses) = our_mlp(mlp, 1000)
plot_mlp(train_losses, test_losses)


# 🌟 Understanding SHAP Values: A Quick Guide

SHAP (SHapley Additive exPlanations) values are a powerful tool to interpret machine learning models. They help us see how much each feature contributes to a specific prediction, making even complex models more understandable.

---

### ✨ Key Concepts

- **SHAP Value**: Measures each feature's contribution to a prediction.
  - **Positive SHAP value** ➡️ The feature pushes the prediction higher.
  - **Negative SHAP value** ➡️ The feature pushes the prediction lower.

### 🔍 How SHAP Values Work

1. **Base Value** 🎯: This is the average prediction of the model across all data points, serving as the starting point for each prediction.
2. **Feature Contributions** 📊: SHAP values show how much each feature “pushes” the prediction away from the base value, calculated by considering all possible combinations of features.

---

### 💡 Example

Imagine a model predicts a credit score of **0.8** (on a scale of 0 to 1), and the base value is **0.5**:

- **Salary** 🤑 with a SHAP value of **0.2** means it **increases** the score by 0.2.
- **Debt** 💸 with a SHAP value of **-0.1** means it **decreases** the score by 0.1.

So, the final prediction **0.8** is the base value **0.5** plus the contributions from salary and debt.

---

### 📝 Summary

SHAP values let us open the “black box” of machine learning models, revealing how each feature influences the outcome. They provide transparency and insight, helping you trust and understand your model's predictions.

---

With SHAP values, even the most complex models become interpretable, empowering you to make informed decisions with confidence! 🎉

# 🔎 Using SHAP Values for Model Interpretation in MLPClassifier

SHAP (SHapley Additive exPlanations) values provide insights into feature contributions to model predictions, which is especially helpful in understanding and refining models trained with scikit-learn's `MLPClassifier`.

## Why Use SHAP?
- **Feature Importance**: SHAP values help interpret which features are driving the model’s predictions and can reveal if specific features overly influence the model.
- **Overfitting Detection**: If certain features dominate the predictions, it might indicate overfitting.

In [None]:
import shap
# Explain the model's predictions using SHAP

# Randomly sample 100 instances from the training set for the SHAP background
background = shap.sample(X_train, 20)

explainer = shap.KernelExplainer(mlp.predict, background, feature_names=X.columns)
shap_values = explainer(X_test)  # Calculate SHAP values for the test set

# Understanding SHAP Plots for Model Interpretation

Using SHAP values provides insights into feature importance and individual predictions. This document explains how to interpret different types of SHAP plots, including summary, force, and dependence plots.

---

## 1. SHAP Summary Plot (Bar)

```python
# Summary plot of SHAP values (bar plot for feature importance)
shap.summary_plot(shap_values, X_test, plot_type="bar")
```
### What It Shows
- **Purpose**: Provides a high-level view of feature importance across the entire dataset.
- **Interpretation**: Each bar represents the average absolute SHAP value of a feature, indicating its overall importance in the model.
- **Insight**: Longer bars indicate features that have a larger impact on model predictions on average. This is useful for identifying the most influential features.

In [None]:
shap.summary_plot(shap_values, X_test, plot_type="bar", feature_names=X.columns)  # Bar plot for feature importance


---

## 2. SHAP Summary Plot (Density)

```python

```
### What It Shows
- **Purpose**: Displays the distribution of SHAP values for each feature, highlighting how they contribute to different predictions.
- **Interpretation**: Each point represents a SHAP value for a feature in a specific instance. Points are color-coded by feature values (e.g., blue for low and red for high).
- **Insight**: The spread of SHAP values for each feature shows its influence across various instances. Features with both positive and negative SHAP values indicate they contribute to increasing or decreasing the prediction based on their values.

In [None]:
# Detailed SHAP value plot for individual features (traditional summary plot)
shap.summary_plot(shap_values, X_test, feature_names=X.columns)

---

## 3. SHAP Force Plot (Single Prediction)


### What It Shows
- **Purpose**: Explains a single prediction by showing how each feature value pushes the prediction away from the expected value (baseline).
- **Interpretation**: Features that push the prediction higher are shown in red, while those pushing it lower are in blue. The length of each segment represents the strength of the feature’s impact.
- **Insight**: The force plot helps in understanding the main drivers behind a specific prediction. It is particularly useful in identifying key factors that lead to higher or lower model outputs.

In [None]:
import numpy as np
import pandas as pd

# Round SHAP values and the expected value to 2 decimals
rounded_shap_values = np.round(shap_values[0].values, 2)
rounded_expected_value = np.round(explainer.expected_value, 2)

# Convert the first instance of X_test to a pandas Series and round values
X_test_rounded = pd.Series(X_test[0], index=X.columns).apply(lambda x: f"{x:.2f}")

# Generate the force plot
shap.force_plot(rounded_expected_value, rounded_shap_values, X_test_rounded, 
                matplotlib=True, feature_names=X.columns)


# SHAP Force Plot Explanation

This document provides a brief explanation of the SHAP force plot and how each feature affects the model's prediction for a single instance.

---

### Structure of the Plot

1. **Base Value**: This is the starting point (or baseline) of the model’s prediction, representing the average prediction across all training data. In this plot, the base value is approximately **0.25**.

2. **f(x)**: This represents the final prediction for this specific instance after considering the contributions of each feature. Here, `f(x)` is **0.34**.

3. **Feature Contributions (Arrows)**:
   - **Red Arrows**: Features that push the prediction higher (positive impact on `f(x)`).
   - **Blue Arrows**: Features that push the prediction lower (negative impact on `f(x)`).
   - The length of each arrow represents the strength of the feature’s impact on the prediction.

---

### Interpretation of Each Feature

- **`bedrooms = 1.39`**: This feature positively impacts `f(x)`, increasing it by around 0.09.
- **`parking = 0.37`**: Adds positively to the prediction, pushing it up by approximately 0.04.
- **`area = 0.34`**: Also contributes positively, with an impact around 0.03.
- **`basement = 1.34`**: Slightly increases `f(x)`, with an impact close to 0.05.
- **`bathrooms = 1.54`**: Has a significant positive effect, increasing `f(x)` by approximately 0.07.

These features collectively increase the model's prediction.

- **`mainroad = -2.46`**: This feature has the largest negative impact, reducing `f(x)` by about 0.09.
- **`prefarea = -0.5`**: Decreases `f(x)` slightly, with an impact close to 0.02.
- **`hotwaterheating = -0.28`**: Reduces the prediction by approximately 0.01.
- **`airconditioning = -0.67`**: Has a small negative effect, lowering `f(x)` by around 0.04.

These features collectively decrease the model's prediction.

### Summary

The model’s final prediction of **0.34** is the combined result of all these feature contributions. Positive contributions from features such as `bedrooms`, `parking`, `area`, `basement`, and `bathrooms` are partially balanced by negative contributions from `mainroad`, `prefarea`, `hotwaterheating`, and `airconditioning`.

This plot provides insight into which features have the strongest influence on this prediction and whether their impact is positive or negative, helping to explain the model's decision for this particular instance.


---

## 4. SHAP Dependence Plot (for a specific feature)

### What It Shows
- **Purpose**: Shows the effect of a specific feature on the prediction while considering the impact of other interacting features.
- **Interpretation**: The x-axis represents values of the selected feature, and the y-axis shows SHAP values for that feature. Color coding represents another interacting feature, providing context on how interactions affect predictions.
- **Insight**: This plot highlights both the individual impact of a feature and any interactions with other features. For example, if 'gmat' score has a high SHAP value at certain levels, the plot can reveal how it influences predictions in the context of other variables.

In [None]:
# Interactive SHAP dependence plot for a specific feature (e.g., 'gmat' score)
shap.dependence_plot('bedrooms', shap_values.values, X_test, feature_names=X.columns)

In [None]:
shap.plots.scatter(shap_values[:, "bedrooms"], color=shap_values[:, "bathrooms"])