# **SmartLend: AI-Powered Loan Default Prediction**

 Dataset Link :https://www.kaggle.com/datasets/sahideseker/loan-default-prediction-dataset

**Problem Statement**

Banks and financial institutions want to know *whether a loan applicant is likely to repay the loan* or *default*

# Data Collection & Cleaning

In [2]:
import pandas as pd
import numpy as np

# 1. Gather dataset (CSV file read)
df = pd.read_csv("/content/loan_default_prediction.csv")

print("Original Shape:", df.shape)
print("Missing Values:\n", df.isnull().sum())

# 2. Handle missing values
# Option A: Fill missing numeric values with median
df.fillna(df.median(numeric_only=True), inplace=True)

# Option B: For categorical columns, fill with mode
for col in df.select_dtypes(include=['object']).columns:
    df[col].fillna(df[col].mode()[0], inplace=True)

print("\nAfter Handling Missing Values:\n", df.isnull().sum())

# 3. Detect & Fix Outliers using IQR method
for col in df.select_dtypes(include=[np.number]).columns:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5 * IQR
    upper = Q3 + 1.5 * IQR
    # Cap the outliers (replace with boundary values)
    df[col] = np.where(df[col] < lower, lower, df[col])
    df[col] = np.where(df[col] > upper, upper, df[col])

print("\nOutliers handled successfully!")
print("Final Shape:", df.shape)

# Save clean dataset
df.to_csv("loan_data_clean.csv", index=False)


Original Shape: (1000, 5)
Missing Values:
 loan_id              0
income               0
loan_amount          0
employment_status    0
default              0
dtype: int64

After Handling Missing Values:
 loan_id              0
income               0
loan_amount          0
employment_status    0
default              0
dtype: int64

Outliers handled successfully!
Final Shape: (1000, 5)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mode()[0], inplace=True)


# Feature Engineering & EDA