<!-- Google Fonts -->
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@700&display=swap" rel="stylesheet">

<div style="
    border-radius: 15px; 
    border: 2px solid #003366; 
    padding: 10px; 
    background: linear-gradient(135deg, #003366, #336699 30%, #66ccff 70%, #99ccff); 
    text-align: center; 
    box-shadow: 0px 4px 8px rgba(0, 0, 0, 0.5);
">
    <h1 style="
        color: #fff; 
        text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.7); 
        font-weight: bold; 
        margin-bottom: 10px; 
        font-size: 36px; 
        font-family: 'Roboto', sans-serif;
        letter-spacing: 1px;
    ">
        💰 Loan Approval ✅
    </h1>
</div>


# 📂 Overview

* **Background** <br>
  This dataset is part of the Kaggle Playground Series (Season 4, Episode 10). The data is synthetically generated using patterns and relationships from real-world tabular data. It simulates a binary classification problem with a relatively large number of features and no missing values, making it suitable for experimenting with feature selection, dimensionality reduction, and ensemble models.

The data is **clean**, with a balanced mix of categorical and numerical features, making it suitable for EDA, feature engineering, and testing various machine learning models.

* **Goal of the Project** <br>
  Build a machine learning model to **predict whether a loan will be approved** (`Loan_Status`: Y/N) based on applicant information.

**Key Features**

| Feature                  | Type                                | Description                                 | Encoding Suggestion  | Scale? |
| ------------------------ | ----------------------------------- | ------------------------------------------- | -------------------- | ------ |
| `Gender`                 | Categorical (Male/Female)           | Applicant's gender                          | `LabelEncoder`       | ❌ No   |
| `Married`                | Categorical (Yes/No)                | Marital status                              | `LabelEncoder`       | ❌ No   |
| `Dependents`             | Categorical (0/1/2/3+)              | Number of dependents                        | `Ordinal` or One-hot | ❌ No   |
| `Education`              | Categorical (Graduate/Not Graduate) | Education level                             | `LabelEncoder`       | ❌ No   |
| `Self_Employed`          | Categorical (Yes/No)                | Whether the applicant is self-employed      | `LabelEncoder`       | ❌ No   |
| `ApplicantIncome`        | Numerical                           | Applicant’s monthly income                  | `StandardScaler`     | ✅ Yes  |
| `CoapplicantIncome`      | Numerical                           | Co-applicant’s income                       | `StandardScaler`     | ✅ Yes  |
| `LoanAmount`             | Numerical                           | Loan amount requested (in thousands)        | `StandardScaler`     | ✅ Yes  |
| `Loan_Amount_Term`       | Numerical                           | Loan repayment term (in days)               | `StandardScaler`     | ✅ Yes  |
| `Credit_History`         | Categorical (0/1)                   | Whether applicant has a good credit history | No change            | ❌ No   |
| `Property_Area`          | Categorical (Urban/Rural/Semiurban) | Location of property                        | One-hot or Ordinal   | ❌ No   |
| `Loan_Status` *(target)* | Categorical (Y/N)                   | Whether loan was approved                   | `LabelEncoder`       | ❌ No   |


**Files Provided**

* `train.csv`: The train dataset.
* `test.csv`: The test dataset (no target).
* `credit_risk_dataset.csv`: The original dataset.
* `sample_submission.csv`: Template file for submitting predictions.

(Source: [Kaggle Dataset s4e10](https://www.kaggle.com/competitions/playground-series-s4e10/data)) <br>
(Source: [Kaggle Dataset – Loan Approval Prediction](https://www.kaggle.com/datasets/chilledwanker/loan-approval-prediction))

**Project Objective**

The goal of this notebook is to **analyze loan applicant features and build a model to predict loan approval**.

Key components of the approach include:

* **Exploratory Data Analysis (EDA):**
  Understand the distribution of key features and how they relate to loan approval status.

* **Feature Engineering:**
  Handle missing values, encode categorical variables, and create derived features if needed (e.g., total income, income-to-loan ratio).

* **Modeling:**
  Train various models like **Logistic Regression**, **Random Forest**, **XGBoost**, and **MLPClassifier**,...

* **Evaluation Framework:**

  * Use **Cross-Validation** to deal with potential class imbalance.
  * Evaluate with metrics: **Accuracy**, **Precision**, **Recall**, **F1-score**, and **ROC-AUC**.