Skip to content

amithvinod/APR_Assignment-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Loan Approval Classification with SVM

This repository contains a Jupyter Notebook (svc1.ipynb) that demonstrates a complete machine learning workflow for loan approval classification using a Support Vector Machine (SVM) model. The notebook covers data loading, cleaning, exploratory data analysis (EDA), feature engineering, model training with different hyperparameters, and evaluation with visualizations.

About the Dataset

1. Data Source

The dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle. It has been enriched with additional variables related to Financial Risk for Loan Approval. SMOTENC was used to simulate new data points and enlarge the instance size.

2. Metadata

The dataset contains 45,000 records and 14 variables. Below is the description of each column:

Column Description Type
person_age Age of the person Float
person_gender Gender of the person Categorical
person_education Highest education level Categorical
person_income Annual income Float
person_emp_exp Years of employment experience Integer
person_home_ownership Home ownership status (e.g., rent, own, mortgage) Categorical
loan_amnt Loan amount requested Float
loan_intent Purpose of the loan Categorical
loan_int_rate Loan interest rate Float
loan_percent_income Loan amount as a percentage of annual income Float
cb_person_cred_hist_length Length of credit history in years Float
credit_score Credit score of the person Integer
previous_loan_defaults_on_file Indicator of previous loan defaults Categorical
loan_status (target variable) Loan approval status: 1 = approved; 0 = rejected Integer

Notebook Overview

The notebook is organized into several key sections:

1. Data Loading and Inspection

  • Loading Data: Reads the dataset from a CSV file.
  • Initial Exploration: Displays the data, checks for null values, and prints data information and summary statistics.

2. Data Cleaning and Preprocessing

  • Handling Missing Values: Missing data in critical columns (e.g., loan_percent_income, cb_person_cred_hist_length, credit_score, previous_loan_defaults_on_file, loan_status) are filled using forward fill.
  • Categorical Data Encoding:
    • Conversion of person_gender values (e.g., 'male' to 1, other values to 0).
    • Replacement of categorical education levels with numeric codes.
    • Mapping values for person_home_ownership and previous_loan_defaults_on_file into numeric formats.
    • Uniform replacement for the loan_intent category.

3. Exploratory Data Analysis (EDA) and Visualization

  • Visualizations:
    • Uses Plotly for interactive pie charts (e.g., distribution of person_gender, loan_status, person_education, and person_home_ownership).
    • Seaborn and Matplotlib are used for histograms and a heatmap of the correlation matrix.
  • Correlation Analysis:
    • A correlation heatmap visualizes relationships among variables.
    • Features highly correlated with the target variable (loan_status) are selected for model training.

4. Feature Engineering and Data Preparation

  • Feature Selection:
    • Based on correlation analysis, a subset of features is selected (excluding the target variable).
  • Data Scaling:
    • StandardScaler is used to normalize feature values.
  • Train-Test Split:
    • The dataset is split into training and testing sets (80% train, 20% test).

5. Model Building and Evaluation

  • Model Training:
    • Multiple SVM models are trained:
      • Default SVC
      • SVC with a different penalty parameter (e.g., C=10)
      • SVC with a specified gamma value (e.g., gamma=0.0122)
  • Prediction and Evaluation:
    • The accuracy score of each model is computed.
    • A confusion matrix is generated and visualized with a heatmap to assess model performance.

Installation

To run the notebook, ensure you have the following Python packages installed:

pip install matplotlib plotly seaborn pandas numpy scikit-learn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published