# Home Loan Approval Prediction with TensorFlow in Google Colab

This notebook demonstrates how to load and preprocess a sample home loan dataset, build and train a neural network model using TensorFlow (with regularization techniques), and evaluate its performance by making predictions.

## Dataset Description

The sample home loan dataset contains the following attributes:
- **LoanAmount:** Requested loan amount in USD
- **InterestRate:** Loan's interest rate
- **LoanTerm:** Duration of the loan in years
- **CreditScore:** Applicant's credit score
- **AnnualIncome:** Applicant's annual income in USD
- **PropertyValue:** Estimated property value
- **EmploymentStatus:** Employment status (e.g., Employed, Self-employed, Unemployed)
- **LoanPurpose:** Purpose of the loan (e.g., Purchase, Refinance)
- **ApprovalStatus:** Target variable (0 = Rejected, 1 = Approved)

In [24]:
# Import necessary libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import regularizers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import accuracy_score

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.18.0


## 1. Data Loading and Pre-processing

For demonstration purposes, we create a sample dataset. In a real assignment, you would load the data from a CSV file.

In [25]:
# Create a sample home loan dataset
data = {
    'LoanAmount': [200000, 150000, 250000, 180000, 220000, 130000, 175000, 205000, 160000, 190000],
    'InterestRate': [3.5, 4.0, 3.8, 4.2, 3.9, 4.1, 3.7, 3.6, 4.3, 3.9],
    'LoanTerm': [30, 15, 30, 20, 30, 15, 20, 30, 15, 20],
    'CreditScore': [700, 680, 720, 690, 710, 670, 705, 695, 680, 715],
    'AnnualIncome': [85000, 75000, 95000, 80000, 90000, 70000, 78000, 87000, 73000, 82000],
    'PropertyValue': [250000, 200000, 270000, 230000, 260000, 190000, 240000, 255000, 210000, 225000],
    'EmploymentStatus': ['Employed', 'Self-employed', 'Employed', 'Employed', 'Self-employed', 'Unemployed', 'Employed', 'Employed', 'Self-employed', 'Employed'],
    'LoanPurpose': ['Purchase', 'Refinance', 'Purchase', 'Purchase', 'Refinance', 'Purchase', 'Purchase', 'Refinance', 'Purchase', 'Refinance'],
    'ApprovalStatus': [1, 0, 1, 0, 1, 0, 1, 1, 0, 1]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,LoanAmount,InterestRate,LoanTerm,CreditScore,AnnualIncome,PropertyValue,EmploymentStatus,LoanPurpose,ApprovalStatus
0,200000,3.5,30,700,85000,250000,Employed,Purchase,1
1,150000,4.0,15,680,75000,200000,Self-employed,Refinance,0
2,250000,3.8,30,720,95000,270000,Employed,Purchase,1
3,180000,4.2,20,690,80000,230000,Employed,Purchase,0
4,220000,3.9,30,710,90000,260000,Self-employed,Refinance,1


### Preprocessing

We need to preprocess our data. This includes scaling numerical features and encoding categorical variables. We then split our data into features (`X`) and our target (`ApprovalStatus`).

In [26]:
# Define numerical and categorical columns
numerical_cols = ['LoanAmount', 'InterestRate', 'LoanTerm', 'CreditScore', 'AnnualIncome', 'PropertyValue']
categorical_cols = ['EmploymentStatus', 'LoanPurpose']

# Create a ColumnTransformer for preprocessing
preprocessor = ColumnTransformer(
    transformers=[
        ('num', MinMaxScaler(), numerical_cols),
        ('cat', OneHotEncoder(), categorical_cols)
    ]
)

# Separate features and target
X = df.drop('ApprovalStatus', axis=1)
y = df['ApprovalStatus']

# Apply the preprocessing pipeline
X_processed = preprocessor.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)

print('Training data shape:', X_train.shape)
print('Test data shape:', X_test.shape)

Training data shape: (8, 11)
Test data shape: (2, 11)


## 2. Model Building

Now, we'll build a multi-layer neural network model using TensorFlow's Keras API. Our model will include a hidden layer with 16 neurons, incorporate L2 regularization (weight decay), and use a dropout layer to reduce overfitting.

In [27]:
from tensorflow.keras import regularizers

# YOUR CODE GOES HERE

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Define the model
model = Sequential([
    Dense(16, activation='relu', input_shape=(X_train.shape[1],), kernel_regularizer=regularizers.l2(0.01)),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

# Display the model summary
model.summary()

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Model Summary Explanation

The model summary displays:
- **Hidden Layer:** A Dense layer with 16 neurons and ReLU activation that processes the input. The layer includes L2 regularization, which adds a penalty to large weights.
- **Dropout Layer:** This layer randomly sets 50% of the outputs to zero during each training iteration, which helps prevent the neurons from co-adapting too much.
- **Output Layer:** A Dense layer with 1 neuron using sigmoid activation, which outputs a probability for binary classification (loan approval).

The summary also shows the total number of trainable parameters, allowing you to gauge the model’s complexity.

## 3. Model Training

We now train our model using early stopping, which monitors the validation loss and stops training when improvements cease, helping to avoid overfitting.

In [29]:
# Set up EarlyStopping callback to monitor validation loss
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model with 20% of training data used as validation

# YOUR CODE GOES HERE
history = model.fit(
    X_train,
    y_train,
    validation_split=0.2,
    epochs=50,
    batch_size=32,
    callbacks=[early_stop]
)

Epoch 1/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 166ms/step - accuracy: 0.5000 - loss: 1.1010 - val_accuracy: 0.5000 - val_loss: 0.7699
Epoch 2/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 248ms/step - accuracy: 0.1667 - loss: 1.0810 - val_accuracy: 0.5000 - val_loss: 0.7701
Epoch 3/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 128ms/step - accuracy: 0.1667 - loss: 1.2972 - val_accuracy: 0.5000 - val_loss: 0.7702
Epoch 4/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 138ms/step - accuracy: 0.5000 - loss: 0.9578 - val_accuracy: 0.5000 - val_loss: 0.7704


## 4. Making Predictions and Evaluating the Model

After training, we use our model to predict approvals on the test set. We convert prediction probabilities to binary class labels by thresholding at 0.5 and then calculate the accuracy of the model.

In [30]:
# Make predictions on the test set
predictions = model.predict(X_test)

# Display raw prediction probabilities
print("Raw Predictions:\n", predictions)

# Convert probabilities to binary predictions (threshold = 0.5)
binary_predictions = (predictions > 0.5).astype(int)
print("Binary Predictions:\n", binary_predictions)

# Calculate test accuracy
accuracy = accuracy_score(y_test, binary_predictions)
print("Test Accuracy:", accuracy)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step
Raw Predictions:
 [[0.41258135]
 [0.31705838]]
Binary Predictions:
 [[0]
 [0]]
Test Accuracy: 1.0


## 5. Conclusion

In this assignment, we loaded and preprocessed a sample home loan dataset, built a multi-layer neural network that incorporates regularization techniques such as dropout, weight decay, and early stopping, and evaluated its performance by calculating prediction accuracy on the test set.

Submit your completed `.ipynb` file via Brightspace. Please ensure your file name includes your last name (e.g., "Smith.ipynb").