# Project Overview: Enhancing Lending Decisions through Credit Risk Analysis

### Introduction


A leading financial institution, renowned for its innovative lending practices, is on a mission to expand its loan offerings to new customers. While doing so, the bank is committed to ensuring financial stability and minimizing risks. However, with a growing pool of loan applicants, the bank is increasingly concerned about the rising rate of defaults, which directly impacts its profitability and operational efficiency.

To address this critical challenge, the institution has provided a comprehensive dataset containing information about past loan applicants, including their demographics, financial behavior, and loan characteristics. By analyzing this data, the goal is to develop a predictive model that supports data-driven lending decisions and enhances risk assessment accuracy.

In [33]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

## Exploring the Dataset

In [34]:
data=pd.read_csv("credit_risk_dataset.csv")

In [35]:
data

Unnamed: 0,person_age,person_income,person_home_ownership,person_emp_length,loan_intent,loan_grade,loan_amnt,loan_int_rate,loan_status,loan_percent_income,cb_person_default_on_file,cb_person_cred_hist_length
0,22,59000,RENT,123.0,PERSONAL,D,35000,16.02,1,0.59,Y,3
1,21,9600,OWN,5.0,EDUCATION,B,1000,11.14,0,0.10,N,2
2,25,9600,MORTGAGE,1.0,MEDICAL,C,5500,12.87,1,0.57,N,3
3,23,65500,RENT,4.0,MEDICAL,C,35000,15.23,1,0.53,N,2
4,24,54400,RENT,8.0,MEDICAL,C,35000,14.27,1,0.55,Y,4
...,...,...,...,...,...,...,...,...,...,...,...,...
32576,57,53000,MORTGAGE,1.0,PERSONAL,C,5800,13.16,0,0.11,N,30
32577,54,120000,MORTGAGE,4.0,PERSONAL,A,17625,7.49,0,0.15,N,19
32578,65,76000,RENT,3.0,HOMEIMPROVEMENT,B,35000,10.99,1,0.46,N,28
32579,56,150000,MORTGAGE,5.0,PERSONAL,B,15000,11.48,0,0.10,N,26


In [36]:
new_data=data.copy()

In [37]:
new_data["person_home_ownership"].value_counts()

person_home_ownership
RENT        16446
MORTGAGE    13444
OWN          2584
OTHER         107
Name: count, dtype: int64

In [38]:
new_data.shape

(32581, 12)

## Data Cleaning 

In [39]:
new_data.isnull().sum()

person_age                       0
person_income                    0
person_home_ownership            0
person_emp_length              895
loan_intent                      0
loan_grade                       0
loan_amnt                        0
loan_int_rate                 3116
loan_status                      0
loan_percent_income              0
cb_person_default_on_file        0
cb_person_cred_hist_length       0
dtype: int64

In [40]:
def impute_missing_values(df):
    """
    Imputes missing values in a DataFrame.
    - Numerical columns: Fill with the mean.
    - Categorical/object columns: Fill with the mode.

    Parameters:
    df (pd.DataFrame): Input DataFrame.

    Returns:
    pd.DataFrame: DataFrame with missing values imputed.
    """
    
    # Iterate through all columns
    for column in df.columns:
        if df[column].isnull().any():  # Check if the column has missing values
            
            # For numerical columns
            if pd.api.types.is_numeric_dtype(df[column]):  
                df[column].fillna(df[column].mean(), inplace=True)
            
            # For categorical/object columns
            elif pd.api.types.is_categorical_dtype(df[column]) or df[column].dtype == 'object':
                if not df[column].mode().empty:  # Check if a mode exists
                    df[column].fillna(df[column].mode()[0], inplace=True)

    return df

In [41]:
new_data = impute_missing_values(new_data)
new_data.isnull().sum()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[column].fillna(df[column].mean(), inplace=True)


person_age                    0
person_income                 0
person_home_ownership         0
person_emp_length             0
loan_intent                   0
loan_grade                    0
loan_amnt                     0
loan_int_rate                 0
loan_status                   0
loan_percent_income           0
cb_person_default_on_file     0
cb_person_cred_hist_length    0
dtype: int64

In [42]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

def preprocess_categorical_data(df, label_encode_cols, one_hot_encode_cols):
    """
    Function to label encode specified columns and one-hot encode others.

    Parameters:
    - df (pd.DataFrame): Input DataFrame.
    - label_encode_cols (list): Columns to label encode.
    - one_hot_encode_cols (list): Columns to one-hot encode.

    Returns:
    - pd.DataFrame: Preprocessed DataFrame with categorical columns encoded.
    """
    # Create a copy to avoid modifying the original DataFrame
    df_processed = df.copy()

    # Label Encode specified columns
    label_encoder = LabelEncoder()
    for col in label_encode_cols:
        df_processed[col] = label_encoder.fit_transform(df_processed[col])
    
    # One-Hot Encode specified columns
    df_processed = pd.get_dummies(df_processed, columns=one_hot_encode_cols, drop_first=False)

    return df_processed

# Example usage:
# Define the columns
label_encode_cols = ['loan_grade', 'cb_person_default_on_file']  # Columns to label encode
one_hot_encode_cols = ['person_home_ownership', 'loan_intent']  # Columns to one-hot encode

# Assuming `df` is your DataFrame
new_data = preprocess_categorical_data(new_data, label_encode_cols, one_hot_encode_cols)



In [43]:
new_data

Unnamed: 0,person_age,person_income,person_emp_length,loan_grade,loan_amnt,loan_int_rate,loan_status,loan_percent_income,cb_person_default_on_file,cb_person_cred_hist_length,person_home_ownership_MORTGAGE,person_home_ownership_OTHER,person_home_ownership_OWN,person_home_ownership_RENT,loan_intent_DEBTCONSOLIDATION,loan_intent_EDUCATION,loan_intent_HOMEIMPROVEMENT,loan_intent_MEDICAL,loan_intent_PERSONAL,loan_intent_VENTURE
0,22,59000,123.0,3,35000,16.02,1,0.59,1,3,False,False,False,True,False,False,False,False,True,False
1,21,9600,5.0,1,1000,11.14,0,0.10,0,2,False,False,True,False,False,True,False,False,False,False
2,25,9600,1.0,2,5500,12.87,1,0.57,0,3,True,False,False,False,False,False,False,True,False,False
3,23,65500,4.0,2,35000,15.23,1,0.53,0,2,False,False,False,True,False,False,False,True,False,False
4,24,54400,8.0,2,35000,14.27,1,0.55,1,4,False,False,False,True,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32576,57,53000,1.0,2,5800,13.16,0,0.11,0,30,True,False,False,False,False,False,False,False,True,False
32577,54,120000,4.0,0,17625,7.49,0,0.15,0,19,True,False,False,False,False,False,False,False,True,False
32578,65,76000,3.0,1,35000,10.99,1,0.46,0,28,False,False,False,True,False,False,True,False,False,False
32579,56,150000,5.0,1,15000,11.48,0,0.10,0,26,True,False,False,False,False,False,False,False,True,False
