# Project Overview- Prediction of credit approval decisions by using Support Vector Machines (SVMs) 
### Let's tain and deploy SVMs on another dataset from UCI Machine Learning Repository 
### [Statlog (Australian Credit Approval) Data Set ](https://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29)<br>

The goal of this project is to develop a machine learning model to predict credit approval decisions based on the Statlog (Australian Credit Approval) Data Set. This dataset contains various attributes related to applicants and their financial status, aiming to classify whether an applicant should be approved for credit. <br>



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

**Read the datafile and display the head of your dataframe**

In [2]:
#Code here 
df= pd.read_csv('Aust_Credit_Approval_Data.csv')
df.head()

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,target
0,22.08,11.46,2,4,4,1.585,0,0,0,1,2,100,1213,0
1,22.67,7.0,2,8,4,0.165,0,0,0,0,2,160,1,0
2,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
3,21.67,11.5,1,5,3,0.0,1,1,11,1,2,0,1,1
4,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1


**Getting the overview of the data using info**

In [3]:
#Code here 
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x1      690 non-null    float64
 1   x2      690 non-null    float64
 2   x3      690 non-null    int64  
 3   x4      690 non-null    int64  
 4   x5      690 non-null    int64  
 5   x6      690 non-null    float64
 6   x7      690 non-null    int64  
 7   x8      690 non-null    int64  
 8   x9      690 non-null    int64  
 9   x10     690 non-null    int64  
 10  x11     690 non-null    int64  
 11  x12     690 non-null    int64  
 12  x13     690 non-null    int64  
 13  target  690 non-null    int64  
dtypes: float64(3), int64(11)
memory usage: 75.6 KB


**The Statlog (Australian Credit Approval) Data Set consists of 690 instances and 14 attributes, including both numerical and categorical features. The target variable indicates whether the credit application was approved or not.**

In [4]:
#Code here 
df.describe()

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,target
count,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0,690.0
mean,31.568203,4.758725,1.766667,7.372464,4.692754,2.223406,0.523188,0.427536,2.4,0.457971,1.928986,184.014493,1018.385507,0.444928
std,11.853273,4.978163,0.430063,3.683265,1.992316,3.346513,0.499824,0.49508,4.86294,0.498592,0.298813,172.159274,5210.102598,0.497318
min,13.75,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
25%,22.67,1.0,2.0,4.0,4.0,0.165,0.0,0.0,0.0,0.0,2.0,80.0,1.0,0.0
50%,28.625,2.75,2.0,8.0,4.0,1.0,1.0,0.0,0.0,0.0,2.0,160.0,6.0,0.0
75%,37.7075,7.2075,2.0,10.0,5.0,2.625,1.0,1.0,3.0,1.0,2.0,272.0,396.5,1.0
max,80.25,28.0,3.0,14.0,9.0,28.5,1.0,1.0,67.0,1.0,3.0,2000.0,100001.0,1.0


## Machine Learning 

**Doing the train test split, use test_size .30 and default value of random state at the moment**

In [5]:
#Code here 
X = df.drop('target', axis = 1)
y = df['target']

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

In [7]:
X_train

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13
178,41.17,4.040,2,13,8,7.00,1,1,8,0,2,320,1
265,28.58,3.750,2,8,4,0.25,0,1,1,1,2,40,155
352,31.57,4.000,2,14,4,5.00,1,1,3,1,2,290,2280
529,17.92,0.540,2,8,4,1.75,0,1,1,1,2,80,6
409,25.25,1.000,2,6,4,0.50,0,0,0,0,2,200,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,30.42,1.375,2,9,8,0.04,0,1,3,0,2,0,34
106,27.67,2.000,2,14,8,1.00,1,1,4,0,2,140,7545
270,19.17,5.415,2,3,8,0.29,0,0,0,0,2,80,485
435,16.08,0.750,2,8,4,1.75,1,1,5,1,2,352,691


In [8]:
 X_test

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13
286,65.42,11.000,2,10,9,20.000,1,1,7,1,2,22,1
511,18.17,10.250,2,8,8,1.085,0,0,0,0,2,320,14
257,48.50,4.250,2,7,4,0.125,1,0,0,1,2,225,1
336,36.00,1.000,2,8,4,2.000,1,1,11,0,2,0,457
318,15.75,0.375,2,8,4,1.000,0,0,0,0,2,120,19
...,...,...,...,...,...,...,...,...,...,...,...,...,...
601,36.67,3.250,2,11,8,9.000,1,0,0,1,2,102,640
66,19.50,0.165,2,11,4,0.040,0,0,0,1,2,380,1
11,41.42,5.000,2,11,8,5.000,1,1,6,1,2,470,1
674,25.75,0.500,2,8,8,0.875,1,0,0,1,2,491,1


In [9]:
 y_train

178    1
265    0
352    1
529    0
409    0
      ..
71     0
106    1
270    0
435    1
102    1
Name: target, Length: 483, dtype: int64

In [10]:
 y_test

286    1
511    0
257    1
336    1
318    0
      ..
601    1
66     0
11     1
674    1
559    1
Name: target, Length: 207, dtype: int64

### Importing and  training the Support Vector Classifier

**Import SVC and creating its instance `svm_model`**

In [11]:
#Code here 

In [12]:
from sklearn.svm import SVC
svm_model = SVC(kernel='linear',C=30,gamma='auto') 


**Training the model** 

In [13]:
#Code here
svm_model.fit(X_train,y_train)


**Doing the predictions** 

In [14]:
#Code here
svm_model.predict([[48.50,10.250,9,8,9,5.75,1,1,7,1,2,22,1]])



array([1], dtype=int64)

**How is the model performing?**

In [15]:
#Code here
svm_model.score(X_test, y_test)

0.8454106280193237

**Conclusion**


The SVM model demonstrates robust performance in predicting credit approval decisions with an accuracy of 84.54%. This model can be a valuable tool for financial institutions in their decision-making processes. Future work could involve further refining the model, exploring additional features, or employing other machine learning algorithms to enhance performance further.


**Future Work**


1.Cross-validation: Implement cross-validation to ensure the model's robustness and mitigate overfitting.

2.Feature Engineering: Explore additional feature engineering techniques to improve model accuracy

3.Algorithm Comparison: Compare the performance of the SVM model with other classifiers such as Random Forest, Gradient Boosting, and Neural Networks.
