![Credit card being held in hand](credit_card.jpg)

Commercial banks receive _a lot_ of applications for credit cards. Many of them get rejected for many reasons, like high loan balances, low income levels, or too many inquiries on an individual's credit report, for example. Manually analyzing these applications is mundane, error-prone, and time-consuming (and time is money!). Luckily, this task can be automated with the power of machine learning and pretty much every commercial bank does so nowadays. In this workbook, you will build an automatic credit card approval predictor using machine learning techniques, just like real banks do.

### The Data

The data is a small subset of the Credit Card Approval dataset from the UCI Machine Learning Repository showing the credit card applications a bank receives. This dataset has been loaded as a `pandas` DataFrame called `cc_apps`. The last column in the dataset is the target value.

In [58]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV

# Load the dataset
cc_apps = pd.read_csv("cc_approvals.data", header=None) 
cc_apps.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,b,30.83,0.0,u,g,w,v,1.25,t,t,1,g,0,+
1,a,58.67,4.46,u,g,q,h,3.04,t,t,6,g,560,+
2,a,24.5,0.5,u,g,q,h,1.5,t,f,0,g,824,+
3,b,27.83,1.54,u,g,w,v,3.75,t,t,5,g,3,+
4,b,20.17,5.625,u,g,w,v,1.71,t,f,0,s,0,+


In [59]:
copy_data = cc_apps

# checking for missing values. RESULT : NONE

In [60]:
#print(cc_apps.isnull().sum())
#0--12
#3--6
#4--6
#6--9
list_to_check_missing = [0,1,3,4,6]
list_to_delete = []
for i in list_to_check_missing:
    index_of_question_mark = copy_data[copy_data[i] == '?'].index.tolist()

    print("in the index ", i, "the indexes are : ",index_of_question_mark)
    list_to_delete.append(index_of_question_mark)

print("the appended index : ", list_to_delete)


in the index  0 the indexes are :  [248, 327, 346, 374, 453, 479, 489, 520, 598, 601, 641, 673]
in the index  1 the indexes are :  [83, 86, 92, 97, 254, 286, 329, 445, 450, 500, 515, 608]
in the index  3 the indexes are :  [206, 270, 330, 456, 592, 622]
in the index  4 the indexes are :  [206, 270, 330, 456, 592, 622]
in the index  6 the indexes are :  [206, 270, 330, 456, 479, 539, 592, 601, 622]
the appended index :  [[248, 327, 346, 374, 453, 479, 489, 520, 598, 601, 641, 673], [83, 86, 92, 97, 254, 286, 329, 445, 450, 500, 515, 608], [206, 270, 330, 456, 592, 622], [206, 270, 330, 456, 592, 622], [206, 270, 330, 456, 479, 539, 592, 601, 622]]


In [61]:
# Flatten the list of lists into a single list
combined_list = [item for sublist in list_to_delete for item in sublist]

# Remove duplicates by converting to a set and then back to a list
unique_list = list(set(combined_list))
print(unique_list)

[641, 515, 520, 270, 539, 286, 673, 445, 450, 453, 327, 456, 329, 330, 206, 592, 83, 598, 86, 601, 346, 92, 479, 608, 97, 489, 622, 500, 374, 248, 254]


In [62]:
print("before dropping 31 rows:" ,copy_data.shape)
# Drop the specified rows from the DataFrame
copy_data = copy_data.drop(index=unique_list)
print("after dropping 31 rows:" ,copy_data.shape)

before dropping 31 rows: (690, 14)
after dropping 31 rows: (659, 14)


In [63]:
dropped_data = copy_data

In [64]:
dropped_data.shape

(659, 14)

# Label Encoding

In [65]:
# Import label encoder 
from sklearn import preprocessing 
  
# label_encoder object knows  
# how to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 


In [66]:
print("hello")

# Columns to work --> 0,3,4,5,6,8,9,11,13

hello


In [67]:
list_of_labels = [0,3,4,5,6,8,9,11,13]
for i in list_of_labels:
    dropped_data[i]= label_encoder.fit_transform(dropped_data[i]) 
dropped_data.dtypes 

0       int64
1      object
2     float64
3       int64
4       int64
5       int64
6       int64
7     float64
8       int64
9       int64
10      int64
11      int64
12      int64
13      int64
dtype: object

In [68]:
dropped_data[1] = pd.to_numeric(dropped_data[1])

print(dropped_data.dtypes)

0       int64
1     float64
2     float64
3       int64
4       int64
5       int64
6       int64
7     float64
8       int64
9       int64
10      int64
11      int64
12      int64
13      int64
dtype: object


# Splitting data

# splitting data, standard scaler


In [69]:
features = dropped_data.iloc[:, :-1]  # First 12 columns
target = dropped_data.iloc[:, -1]   

In [70]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, target, stratify=target, random_state=42)

In [71]:
# Import StandardScaler
from sklearn.preprocessing import StandardScaler

# Create the scaler
scaler = StandardScaler()

In [72]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# kknn model

In [73]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

In [74]:
knn.fit(X_train_scaled, y_train)

In [75]:
# Score the model on the test data
print(knn.score(X_test_scaled, y_test))

0.8363636363636363


In [76]:
from sklearn.metrics import accuracy_score
# Make predictions on the test set
y_pred = knn.predict(X_test_scaled)

accuracy = accuracy_score(y_test, y_pred)

best_score = accuracy
print("Accuracy:", best_score)

Accuracy: 0.8363636363636363
