## Recommendation system for student loans

**1. Describe the data that you would need to collect to build a recommendation system to recommend student loan options for students. Explain why this data would be relevant and appropriate.**

To build a student loan recommendation system, the following data is essential:
**Student Information:**
- **Academic profile** (GPA, major, institution).
- **Financial profile** (income, savings, credit score).
- **Career goals** (desired job, expected salary).

**Loan Information:**
- **Loan terms** (interest rates, repayment periods, grace periods).
- **Eligibility criteria** (credit score requirements, income thresholds).
- **Lender reputation** (customer reviews, default rates).

**Historical Data:**
- **Past loan applications and outcomes** (approved/rejected, repayment behavior).
- **Student feedback on loans** (satisfaction ratings, complaints).

**Contextual Data:**
- **Economic conditions** (interest rate trends, job market).
- **Regional factors** (cost of living, local job opportunities).

**Why This Data is Relevant:**
- **Personalization:** Student and financial data ensure tailored recommendations.
- **Feasibility:** Loan terms and eligibility criteria help match students with viable options.
- **Adaptability:** Historical and contextual data allow the system to adjust to economic


**2. Based on the data you chose to use in this recommendation system, would your model be using collaborative filtering, content-based filtering, or context-based filtering? Justify why the data you selected would be suitable for your choice of filtering method.**

TThe model would primarily use **content-based filtering**.

**Why Content-Based Filtering?**
- **Data Suitability:** It relies on the attributes of loans (terms, eligibility) and students (financial profile, academic background).
- **Personalization:** Matches students with loans based on their specific characteristics and needs.
- **Cold Start Problem:** Works well even when historical interaction data (past loan approvals) is limited, which is common for new students.

**Alternative Approach:**
If sufficient historical interaction data is available, **collaborative filtering** could also be used to recommend loans based on similar students' borrowing patterns.

**3. Describe two real-world challenges that you would take into consideration while building a recommendation system for student loans. Explain why these challenges would be of concern for a student loan recommendation system.**

Two key challenges in building a student loan recommendation system are:

**Challenge 1: Data Privacy and Security**
- **Concern:** Student financial and academic data is highly sensitive. Any breach could result in identity theft or privacy violations.
- **Solution:** Implement robust encryption, strict access controls, and comply with regulations like **GDPR** and **FERPA**.

**Challenge 2: Bias in Recommendations**
- **Concern:** Historical data might contain biases, leading to unfair loan recommendations that favor certain demographics.
- **Solution:** Regularly audit the model for bias, use diverse training data, and apply fairness constraints to ensure equitable recommendations.

---

# Student Loan Risk with Deep Learning

In [13]:
# Imports
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
from pathlib import Path
import numpy as np


---

## Prepare the data to be used on a neural network model

### Step 1: Read the `student-loans.csv` file into a Pandas DataFrame. Review the DataFrame, looking for columns that could eventually define your features and target variables.   

In [2]:
# Read the csv into a Pandas DataFrame
file_path = "https://static.bc-edx.com/ai/ail-v-1-0/m18/lms/datasets/student-loans.csv"
loans_df = pd.read_csv(file_path)

# Review the DataFrame
loans_df.head()

Unnamed: 0,payment_history,location_parameter,stem_degree_score,gpa_ranking,alumni_success,study_major_code,time_to_completion,finance_workshop_score,cohort_ranking,total_loan_score,financial_aid_score,credit_ranking
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,0
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,0
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,0
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,0


In [None]:
# Review the data types associated with the columns
loans_df.dtypes

payment_history           float64
location_parameter        float64
stem_degree_score         float64
gpa_ranking               float64
alumni_success            float64
study_major_code          float64
time_to_completion        float64
finance_workshop_score    float64
cohort_ranking            float64
total_loan_score          float64
financial_aid_score       float64
credit_ranking              int64
dtype: object

In [None]:
# Check the credit_ranking value counts
loans_df["credit_ranking"].value_counts()

1    855
0    744
Name: credit_ranking, dtype: int64

### Step 2: Using the preprocessed data, create the features (`X`) and target (`y`) datasets. The target dataset should be defined by the preprocessed DataFrame column “credit_ranking”. The remaining columns should define the features dataset.

In [8]:
# Define target (y)
y = loans_df['credit_ranking']

In [7]:
# Check the first few rows of the target (y)
print("Target (y):")
display(y.head())

Target (y):


Unnamed: 0,credit_ranking
0,0
1,0
2,0
3,1
4,0


In [9]:
#  Define features (X) All columns except 'credit_ranking'
X = loans_df.drop(columns=['credit_ranking'])
# Check the first few rows of features (X)
print("Features (X):")
display(X.head())

Features (X):


Unnamed: 0,payment_history,location_parameter,stem_degree_score,gpa_ranking,alumni_success,study_major_code,time_to_completion,finance_workshop_score,cohort_ranking,total_loan_score,financial_aid_score
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4


### Step 3: Split the features and target sets into training and testing datasets.


In [10]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Check the shape of the training and testing sets
print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)

Shape of X_train: (1279, 11)
Shape of X_test: (320, 11)
Shape of y_train: (1279,)
Shape of y_test: (320,)


### Step 4: Use scikit-learn's `StandardScaler` to scale the features data.

In [14]:
# Create a StandardScaler instance
scaler = StandardScaler()

# Fit the scaler to the features training dataset
scaler.fit(X_train)

# Transform the training and testing datasets
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Verify the scaled data
print("Mean of scaled training data:", np.mean(X_train_scaled, axis=0))
print("Standard deviation of scaled training data:", np.std(X_train_scaled, axis=0))

Mean of scaled training data: [-3.97215056e-16 -1.43052975e-16  4.16659149e-17 -1.22220017e-16
  5.83322809e-17 -1.52775021e-17 -8.33318298e-17 -4.38158761e-14
 -3.59437959e-15  1.95829800e-16 -1.15692357e-15]
Standard deviation of scaled training data: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


---

## Compile and Evaluate a Model Using a Neural Network

### Step 1: Create a deep neural network by assigning the number of input features, the number of layers, and the number of neurons on each layer using Tensorflow’s Keras.

> **Hint** You can start with a two-layer deep neural network model that uses the `relu` activation function for both layers.


In [16]:
# Define the number of inputs (features) to the model
input_nodes = len(X.columns)

# Review the number of features
print("Number of input nodes (features):", input_nodes)

Number of input nodes (features): 11


In [37]:
# Define the number of hidden nodes for the first hidden layer
hidden_nodes_layer1 =  64

# Define the number of hidden nodes for the second hidden layer
hidden_nodes_layer2 = 32

# Define the number of neurons in the output layer*
nb_output_neurons=1

In [38]:

# Define the model
model = Sequential()

# Add the input layer
model.add(Dense(units=hidden_nodes_layer1, activation='relu', input_dim=input_nodes))

# Add hidden layers
model.add(Dense(units=hidden_nodes_layer2, activation='relu'))

# Add the output layer (for binary classification)
model.add(Dense(units=nb_output_neurons, activation='sigmoid'))


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [19]:
# Display the Sequential model summary
model.summary()


### Step 2: Compile and fit the model using the `binary_crossentropy` loss function, the `adam` optimizer, and the `accuracy` evaluation metric.


In [21]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [22]:
# Fit the model using 50 epochs and the training data
history = model.fit(
    X_train_scaled,  # Scaled training features
    y_train,         # Training labels
    epochs=50,       # Number of epochs
    batch_size=32,   # batch size
    validation_data=(X_test_scaled, y_test),  # Validation data
    verbose=1        # Show progress during training
)

Epoch 1/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.6228 - loss: 0.6623 - val_accuracy: 0.7625 - val_loss: 0.5683
Epoch 2/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7266 - loss: 0.5672 - val_accuracy: 0.7719 - val_loss: 0.5299
Epoch 3/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7244 - loss: 0.5304 - val_accuracy: 0.7656 - val_loss: 0.5162
Epoch 4/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7629 - loss: 0.5027 - val_accuracy: 0.7469 - val_loss: 0.5084
Epoch 5/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7661 - loss: 0.5073 - val_accuracy: 0.7594 - val_loss: 0.5053
Epoch 6/50
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7728 - loss: 0.4900 - val_accuracy: 0.7563 - val_loss: 0.5024
Epoch 7/50
[1m40/40[0m [32m━━━━━━━━━

### Step 3: Evaluate the model using the test data to determine the model’s loss and accuracy.


In [24]:
# Evaluate the model loss and accuracy
model_loss, model_accuracy = model.evaluate(X_test_scaled, y_test, verbose=2)
# Display the model loss and accuracy results

print(f"Model Loss: {model_loss:.3f}")
print(f"Model Accuracy: {model_accuracy:.3f}")

10/10 - 0s - 9ms/step - accuracy: 0.7500 - loss: 0.5508
Model Loss: 0.551
Model Accuracy: 0.750


### Step 4: Save and export your model to a keras file, and name the file `student_loans.keras`.


In [27]:
# Set the model's file path
file_path = Path("student_loans.keras")

# Export your model to a keras file
model.save(file_path)

---
## Predict Loan Repayment Success by Using your Neural Network Model

### Step 1: Reload your saved model.

In [28]:
# Set the model's file path
file_path = Path("student_loans.keras")

# Load the model to a new object
model_imported = tf.keras.models.load_model(file_path)

### Step 2: Make predictions on the testing data and save the predictions to a DataFrame.

In [31]:
# Make predictions with the test data
predictions = model_imported.predict(X_test_scaled,verbose=2)

# Display a sample of the predictions
predictions

10/10 - 0s - 4ms/step


array([[0.17880689],
       [0.35643202],
       [0.7203337 ],
       [0.68313956],
       [0.9911691 ],
       [0.94559366],
       [0.9776287 ],
       [0.03158369],
       [0.382902  ],
       [0.32316434],
       [0.9794761 ],
       [0.15993638],
       [0.55027604],
       [0.87282044],
       [0.70885205],
       [0.31683168],
       [0.9731146 ],
       [0.27548397],
       [0.5738531 ],
       [0.42089313],
       [0.43325144],
       [0.9079206 ],
       [0.17913748],
       [0.9642519 ],
       [0.13618152],
       [0.9737385 ],
       [0.5276321 ],
       [0.45984542],
       [0.17276724],
       [0.7964207 ],
       [0.56622857],
       [0.97736734],
       [0.1229635 ],
       [0.97664225],
       [0.16376169],
       [0.51568043],
       [0.1022571 ],
       [0.55395514],
       [0.970615  ],
       [0.06452061],
       [0.9508073 ],
       [0.03743393],
       [0.02802789],
       [0.9801637 ],
       [0.10398721],
       [0.6527821 ],
       [0.05978981],
       [0.539

In [33]:
# Save the predictions to a DataFrame and round the predictions to binary results
predictions_df = pd.DataFrame(columns=["predictions"], data=predictions)
predictions_df["predictions"] = round(predictions_df["predictions"],0)
#display first 5 rows of the dataframe
predictions_df.head()


Unnamed: 0,predictions
0,0.0
1,0.0
2,1.0
3,1.0
4,1.0


### Step 4: Display a classification report with the y test data and predictions

In [40]:
# Generate a classification report
print("Classification Report:")
print(classification_report(y_test,  predictions_df["predictions"].values))

Classification Report:
              precision    recall  f1-score   support

           0       0.73      0.77      0.75       154
           1       0.78      0.73      0.75       166

    accuracy                           0.75       320
   macro avg       0.75      0.75      0.75       320
weighted avg       0.75      0.75      0.75       320

