# Student Loan Risk with Deep Learning

In [51]:
# Imports
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
from pathlib import Path

---

## Prepare the data to be used on a neural network model

### Step 1: Read the `student-loans.csv` file into a Pandas DataFrame. Review the DataFrame, looking for columns that could eventually define your features and target variables.   

In [52]:
# Read the csv into a Pandas DataFrame
file_path = "https://static.bc-edx.com/ai/ail-v-1-0/m18/lms/datasets/student-loans.csv"
loans_df = pd.read_csv(file_path)

# Review the DataFrame
loans_df.head()

Unnamed: 0,payment_history,location_parameter,stem_degree_score,gpa_ranking,alumni_success,study_major_code,time_to_completion,finance_workshop_score,cohort_ranking,total_loan_score,financial_aid_score,credit_ranking
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,0
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,0
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,0
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,0


In [53]:
# Review the data types associated with the columns
loans_df.dtypes

Unnamed: 0,0
payment_history,float64
location_parameter,float64
stem_degree_score,float64
gpa_ranking,float64
alumni_success,float64
study_major_code,float64
time_to_completion,float64
finance_workshop_score,float64
cohort_ranking,float64
total_loan_score,float64


In [54]:
# Check the credit_ranking value counts
loans_df["credit_ranking"].value_counts()

Unnamed: 0_level_0,count
credit_ranking,Unnamed: 1_level_1
1,855
0,744


### Step 2: Using the preprocessed data, create the features (`X`) and target (`y`) datasets. The target dataset should be defined by the preprocessed DataFrame column “credit_ranking”. The remaining columns should define the features dataset.

In [55]:
# Define the target set y using the credit_ranking column
y = loans_df["credit_ranking"]

# Display a sample of y
print(y.head())

0    0
1    0
2    0
3    1
4    0
Name: credit_ranking, dtype: int64


In [56]:
# Define features set X by selecting all columns but credit_ranking
X = loans_df.drop("credit_ranking", axis=1)

# Review the features DataFrame
print(X.head())

   payment_history  location_parameter  stem_degree_score  gpa_ranking  \
0              7.4                0.70               0.00          1.9   
1              7.8                0.88               0.00          2.6   
2              7.8                0.76               0.04          2.3   
3             11.2                0.28               0.56          1.9   
4              7.4                0.70               0.00          1.9   

   alumni_success  study_major_code  time_to_completion  \
0           0.076              11.0                34.0   
1           0.098              25.0                67.0   
2           0.092              15.0                54.0   
3           0.075              17.0                60.0   
4           0.076              11.0                34.0   

   finance_workshop_score  cohort_ranking  total_loan_score  \
0                  0.9978            3.51              0.56   
1                  0.9968            3.20              0.68   
2          

### Step 3: Split the features and target sets into training and testing datasets.


In [57]:
# Split the preprocessed data into a training and testing dataset
# Assign the function a random_state equal to 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)




### Step 4: Use scikit-learn's `StandardScaler` to scale the features data.

In [58]:
# Create a StandardScaler instance
scaler = StandardScaler()

# Fit the scaler to the features training dataset
X_scaler = scaler.fit(X_train)

# Fit the scaler to the features training dataset
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

---

## Compile and Evaluate a Model Using a Neural Network

### Step 1: Create a deep neural network by assigning the number of input features, the number of layers, and the number of neurons on each layer using Tensorflow’s Keras.

> **Hint** You can start with a two-layer deep neural network model that uses the `relu` activation function for both layers.


In [59]:
# Define the the number of inputs (features) to the model
number_input_features = X_train.shape[1]

# Review the number of features
print("Number of input features:", number_input_features)

Number of input features: 11


In [61]:
# Define the number of hidden nodes for the first hidden layer
hidden_nodes1 = 6

# Define the number of hidden nodes for the second hidden layer
hidden_nodes2 = 3

# Define the number of neurons in the output layer
output_neurons = 1

In [63]:
# Create the Sequential model instance
model = Sequential()

# Add the first hidden layer
model.add(Dense(hidden_nodes1, activation='relu', input_shape=(number_input_features,)))

# Add the second hidden layer
model.add(Dense(hidden_nodes2, activation='relu'))

# Add the output layer to the model specifying the number of output neurons and activation function
model.add(Dense(output_neurons, activation='sigmoid'))

In [64]:
# Display the Sequential model summary
model.summary()

### Step 2: Compile and fit the model using the `binary_crossentropy` loss function, the `adam` optimizer, and the `accuracy` evaluation metric.


In [65]:
# Compile the Sequential model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [66]:
# Fit the model using 50 epochs and the training data
model_fit = model.fit(X_train_scaled, y_train, epochs=50)

Epoch 1/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.5336 - loss: 0.7453
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.5504 - loss: 0.7025
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.5903 - loss: 0.6849
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6025 - loss: 0.6612
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6385 - loss: 0.6320
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.6744 - loss: 0.6236
Epoch 7/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6659 - loss: 0.6138
Epoch 8/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6859 - loss: 0.5968
Epoch 9/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

### Step 3: Evaluate the model using the test data to determine the model’s loss and accuracy.


In [67]:
# Evaluate the model loss and accuracy metrics using the evaluate method and the test data
loss, accuracy = model.evaluate(X_test_scaled, y_test)

# Display the model loss and accuracy results
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7820 - loss: 0.4932  
Test Loss: 0.5351, Test Accuracy: 0.7563


### Step 4: Save and export your model to a keras file, and name the file `student_loans.keras`.


In [68]:
# Set the model's file path
model_filepath = "student_loans.keras"

# Export your model to a keras file
model.save(model_filepath)

---
## Predict Loan Repayment Success by Using your Neural Network Model

### Step 1: Reload your saved model.

In [69]:
from tensorflow.keras.models import load_model
# Set the model's file path
model_filepath = "student_loans.keras"

# Load the model to a new object
loaded_model = load_model(model_filepath)

### Step 2: Make predictions on the testing data and save the predictions to a DataFrame.

In [70]:
# Make predictions with the test data
predictions = loaded_model.predict(X_test_scaled)

# Display a sample of the predictions
print("Sample predictions:", predictions[:5])

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step 
Sample predictions: [[0.31577554]
 [0.3074245 ]
 [0.7175272 ]
 [0.71387607]
 [0.97108144]]


In [41]:
# Save the predictions to a DataFrame and round the predictions to binary results
# Convert predictions to binary values using a threshold of 0.5
binary_predictions = (predictions > 0.5).astype(int)
predictions_df = pd.DataFrame(binary_predictions, columns=["Prediction"])
print(predictions_df.head())

   Prediction
0           0
1           0
2           1
3           1
4           1


### Step 4: Display a classification report with the y test data and predictions

In [42]:
# Print the classification report with the y test data and predictions
print(classification_report(y_test, predictions_df["Prediction"]))

              precision    recall  f1-score   support

           0       0.75      0.74      0.74       154
           1       0.76      0.77      0.76       166

    accuracy                           0.75       320
   macro avg       0.75      0.75      0.75       320
weighted avg       0.75      0.75      0.75       320



---
## Discuss creating a recommendation system for student loans

Briefly answer the following questions in the space provided:

1. Describe the data that you would need to collect to build a recommendation system to recommend student loan options for students. Explain why this data would be relevant and appropriate.

**Answer**:

**Data to Collect:**

**Borrower Financial Information:**
- Credit history
- Income level
- Debt-to-income ratio
- Employment status

**Educational Details:**
- Institution attended
- Program type
- Major
- Academic performance

**Loan Details:**
- Loan amount
- Interest rates
- Repayment terms
- Historical repayment behavior

**Behavioral Data:**
- Application preferences
- Search behavior on loan products
- Previous interactions with loan options

**Relevance:**

1. **Personalization:**  
   Collecting detailed financial and educational profiles allows for tailored loan recommendations that fit each borrower's unique situation.

2. **Risk Management:**  
   Examining historical repayment behavior and credit metrics aids in assessing risk.

3. **Enhanced Matching:**  
   Behavioral data provides insights into borrower preferences, ensuring that recommendations are aligned with their financial goals and needs.


**Q2-** Based on the data you chose to use in this recommendation system, would your model be using collaborative filtering, content-based filtering, or context-based filtering? Justify why the data you selected would be suitable for your choice of filtering method.

- **Ans:**

**Recommended Method: Content-Based Filtering**

**Justification:**  
Content-based filtering utilizes detailed information about borrowers and loan products. This method relies on specific attributes such as financial metrics, educational background, and particular loan characteristics, allowing for a direct comparison between a borrower’s profile and available loan options.

**Suitability:**  
Unlike collaborative filtering, which depends on similarities among users, content-based filtering is well-suited for the structured and sensitive data commonly found in financial services.


**Q3**: Describe two real-world challenges that you would take into consideration while building a recommendation system for student loans. Explain why these challenges would be of concern for a student loan recommendation system.

- **Ans:**

- 1- **Data Privacy and Regulatory Compliance:**

**Challenge:**  
Collecting and processing sensitive financial and personal data is subject to strict regulations, such as GDPR and FERPA.

**Impact:**  
A data breach or failure to comply with these regulations could lead to legal penalties, financial losses, and damage to customer trust.

**Solution:**  
Implement strong data security measures and ensure compliance with all relevant regulations.

---

-2- **Bias and Fairness:**

**Challenge:**  
Machine learning models can unintentionally introduce bias, which poses a significant risk in financial decision-making.

**Impact:**  
Biased recommendations may result in unfair loan offers and could lead to potential regulatory issues.

**Solution:**  
Conduct regular audits of the recommendation system, employ bias mitigation strategies, and maintain transparency in how recommendations are generated.