# Loan Prediction â€“ Data Cleaning and Preprocessing

This notebook focuses on cleaning the Loan Prediction dataset.
The objective is to handle missing values and prepare a clean dataset
for hypothesis testing and machine learning modeling.


## 1. Import Libraries and Load Dataset

The cleaned dataset preparation begins by loading the raw dataset
used in the data exploration phase.


In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('../data/loan_data.csv')
df.head()

## 2. Checking Missing Values

Before cleaning, we examine the number of missing values
present in each feature.


In [None]:
df.isnull().sum()

## 3. Handling Missing Values

Missing values are handled using appropriate statistical measures:
- Categorical features: Mode
- Numerical features: Median


In [None]:
# Fill categorical missing values with mode
for col in ['Gender', 'Married', 'Dependents', 'Self_Employed']:
    df[col].fillna(df[col].mode()[0], inplace=True)

# Fill numerical missing values with median
df['LoanAmount'].fillna(df['LoanAmount'].median(), inplace=True)
df['Loan_Amount_Term'].fillna(df['Loan_Amount_Term'].median(), inplace=True)
df['Credit_History'].fillna(df['Credit_History'].mode()[0], inplace=True)

## 4. Verifying Missing Value Treatment

After handling missing values, the dataset is checked again
to ensure no null values remain.


In [None]:
df.isnull().sum()

## 5. Cleaned Dataset Overview

The cleaned dataset is reviewed to confirm that it is ready
for further statistical analysis and modeling.

In [None]:
df.info()

In [None]:
# Save cleaned dataset for further modeling
df.to_csv('../data/loan_data_cleaned.csv', index=False)

## Conclusion

In this notebook, missing values in both categorical and numerical features
were successfully handled. The dataset is now clean and suitable for
hypothesis testing and machine learning model development.
