# 🏦 Loan Default Prediction using LendingClub Data

### 🎯 Objective:
To predict whether a customer will default on a loan using key financial and demographic data.

**Dataset**: accepted_2007_to_2018Q4.xlsx  
**Target**: `loan_status` (binary classification: Fully Paid vs Charged Off)

**Libraries**: Pandas, Scikit-learn, Logistic Regression


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


In [2]:
# 📥 Load the dataset
df = pd.read_excel("accepted_2007_to_2018Q4.xlsx", engine="openpyxl")
df.shape


(160803, 151)

In [3]:
# 📌 Select relevant features and prepare target
df = df[df['loan_status'].notna()]
df = df[df['loan_status'].isin(['Fully Paid', 'Charged Off'])]
df['loan_default'] = df['loan_status'].apply(lambda x: 1 if x == 'Charged Off' else 0)

features = ['loan_amnt', 'term', 'int_rate', 'emp_length', 'home_ownership', 
            'annual_inc', 'purpose', 'dti', 'loan_default']
df = df[features].dropna()
df.shape


(132603, 9)

In [4]:
# 🔄 Preprocessing
df['term'] = df['term'].str.extract('(\d+)').astype(int)
df['emp_length'] = df['emp_length'].str.extract('(\d+)').fillna(0).astype(int)
df = pd.get_dummies(df, columns=['home_ownership', 'purpose'], drop_first=True)


  df['term'] = df['term'].str.extract('(\d+)').astype(int)
  df['emp_length'] = df['emp_length'].str.extract('(\d+)').fillna(0).astype(int)


In [5]:
# 🎯 Features and Target
X = df.drop('loan_default', axis=1)
y = df['loan_default']


In [6]:
# 🧪 Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ⚖️ Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [10]:
# 🧠 Train the model
model = LogisticRegression(max_iter=10000)
model.fit(X_train_scaled, y_train)


In [11]:
# 📊 Evaluate model performance
y_pred = model.predict(X_test_scaled)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


Accuracy: 0.8033633724218544
Confusion Matrix:
 [[20593   587]
 [ 4628   713]]
Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.97      0.89     21180
           1       0.55      0.13      0.21      5341

    accuracy                           0.80     26521
   macro avg       0.68      0.55      0.55     26521
weighted avg       0.76      0.80      0.75     26521



In [14]:
# 🔮 Predict on a sample
sample = pd.DataFrame([X.iloc[2000]])
sample_scaled = scaler.transform(sample)
prediction = model.predict(sample_scaled)
print("Sample Prediction:", "Default" if prediction[0] == 1 else "No Default")


Sample Prediction: No Default


In [18]:
import numpy as np

# Example: Show a test sample predicted as default (loan_default = 1)

# Find the first test sample predicted as default
default_indices = np.where(y_pred == 1)[0]
if len(default_indices) > 0:
    idx = default_indices[1000]
    example = X_test.iloc[idx]
    print("Example of predicted default:")
    print(example)
else:
    print("No predicted defaults in the test set.")

Example of predicted default:
loan_amnt                     21000
term                             60
int_rate                      18.55
emp_length                       10
annual_inc                    62000
dti                           30.53
home_ownership_MORTGAGE       False
home_ownership_OWN            False
home_ownership_RENT            True
purpose_credit_card            True
purpose_debt_consolidation    False
purpose_home_improvement      False
purpose_house                 False
purpose_major_purchase        False
purpose_medical               False
purpose_moving                False
purpose_other                 False
purpose_renewable_energy      False
purpose_small_business        False
purpose_vacation              False
Name: 156708, dtype: object
