Loan Approval Classification with SVM

This repository contains a Jupyter Notebook (svc1.ipynb) that demonstrates a complete machine learning workflow for loan approval classification using a Support Vector Machine (SVM) model. The notebook covers data loading, cleaning, exploratory data analysis (EDA), feature engineering, model training with different hyperparameters, and evaluation with visualizations.

About the Dataset

1. Data Source

The dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle. It has been enriched with additional variables related to Financial Risk for Loan Approval. SMOTENC was used to simulate new data points and enlarge the instance size.

2. Metadata

The dataset contains 45,000 records and 14 variables. Below is the description of each column:

Column	Description	Type
person_age	Age of the person	Float
person_gender	Gender of the person	Categorical
person_education	Highest education level	Categorical
person_income	Annual income	Float
person_emp_exp	Years of employment experience	Integer
person_home_ownership	Home ownership status (e.g., rent, own, mortgage)	Categorical
loan_amnt	Loan amount requested	Float
loan_intent	Purpose of the loan	Categorical
loan_int_rate	Loan interest rate	Float
loan_percent_income	Loan amount as a percentage of annual income	Float
cb_person_cred_hist_length	Length of credit history in years	Float
credit_score	Credit score of the person	Integer
previous_loan_defaults_on_file	Indicator of previous loan defaults	Categorical
loan_status (target variable)	Loan approval status: 1 = approved; 0 = rejected	Integer

Notebook Overview

The notebook is organized into several key sections:

1. Data Loading and Inspection

Loading Data: Reads the dataset from a CSV file.
Initial Exploration: Displays the data, checks for null values, and prints data information and summary statistics.

2. Data Cleaning and Preprocessing

Handling Missing Values: Missing data in critical columns (e.g., loan_percent_income, cb_person_cred_hist_length, credit_score, previous_loan_defaults_on_file, loan_status) are filled using forward fill.
Categorical Data Encoding:
- Conversion of person_gender values (e.g., 'male' to 1, other values to 0).
- Replacement of categorical education levels with numeric codes.
- Mapping values for person_home_ownership and previous_loan_defaults_on_file into numeric formats.
- Uniform replacement for the loan_intent category.

3. Exploratory Data Analysis (EDA) and Visualization

Visualizations:
- Uses Plotly for interactive pie charts (e.g., distribution of person_gender, loan_status, person_education, and person_home_ownership).
- Seaborn and Matplotlib are used for histograms and a heatmap of the correlation matrix.
Correlation Analysis:
- A correlation heatmap visualizes relationships among variables.
- Features highly correlated with the target variable (loan_status) are selected for model training.

4. Feature Engineering and Data Preparation

Feature Selection:
- Based on correlation analysis, a subset of features is selected (excluding the target variable).
Data Scaling:
- StandardScaler is used to normalize feature values.
Train-Test Split:
- The dataset is split into training and testing sets (80% train, 20% test).

5. Model Building and Evaluation

Model Training:
- Multiple SVM models are trained:
  - Default SVC
  - SVC with a different penalty parameter (e.g., C=10)
  - SVC with a specified gamma value (e.g., gamma=0.0122)
Prediction and Evaluation:
- The accuracy score of each model is computed.
- A confusion matrix is generated and visualized with a heatmap to assess model performance.

Installation

To run the notebook, ensure you have the following Python packages installed:

pip install matplotlib plotly seaborn pandas numpy scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
Report-Amith Vinod.pdf		Report-Amith Vinod.pdf
loan_data.csv		loan_data.csv
newplot.png		newplot.png
svc1.ipynb		svc1.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loan Approval Classification with SVM

About the Dataset

1. Data Source

2. Metadata

Notebook Overview

1. Data Loading and Inspection

2. Data Cleaning and Preprocessing

3. Exploratory Data Analysis (EDA) and Visualization

4. Feature Engineering and Data Preparation

5. Model Building and Evaluation

Installation

About

Uh oh!

Releases

Packages

Languages

amithvinod/APR_Assignment-1

Folders and files

Latest commit

History

Repository files navigation

Loan Approval Classification with SVM

About the Dataset

1. Data Source

2. Metadata

Notebook Overview

1. Data Loading and Inspection

2. Data Cleaning and Preprocessing

3. Exploratory Data Analysis (EDA) and Visualization

4. Feature Engineering and Data Preparation

5. Model Building and Evaluation

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages