## "In Milestone 3, we delved deeper into feature selection, fine-tuning, and using more advanced ML classifiers. We utilized mutual information for feature selection and cross-validation for parameter tuning. This allowed us to optimize the performance of KNN, Decision Tree, and SVC models. We'll discuss the results and compare the performance of these models shortly."

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import mutual_info_classif


In [2]:
print(dataset.columns)

Index(['Loan_Type', 'Requested_LoanAmount', 'Approved_LoanAmount',
       'Funded_LoanAmount', 'Loan_Status', 'Borrower_Type', 'Applicant_State',
       'Credit_Score', 'Type_ofVehicle', 'Value_ofAsset', 'LTV_Ratio',
       'Loan_Class', 'Loan_Tier', 'Loan_Term', 'Monthly_Income',
       'Monthly_debt', 'DTI_Ratio', 'Monthly_LoanPayment',
       'Age_of_Employment_in_Months', 'Age_of_Employment_in_Years',
       'Loan_SubType_Cat', 'Loan_Type_Cat', 'Loan_Status_Cat',
       'Borrower_Type_Cat'],
      dtype='object')


In [3]:
dataset = dataset.drop(['Loan_Type', 'Loan_Status', 'Borrower_Type', 'Applicant_State', 'Type_ofVehicle', 'Loan_Class', 'Loan_Tier'], axis=1)

In [4]:
X_train, X_test, y_train, y_test = train_test_split(dataset.drop(labels=['Loan_SubType_Cat'], axis=1), dataset['Loan_SubType_Cat'], test_size=0.3, random_state=0)

In [5]:
mutual_info = mutual_info_classif(X_train, y_train)

In [6]:
mutual_info = pd.Series(mutual_info)
mutual_info.index = X_train.columns
mutual_info.sort_values(ascending=False)

Loan_Type_Cat                  0.388071
Requested_LoanAmount           0.110842
Loan_Term                      0.110751
Monthly_LoanPayment            0.105559
Value_ofAsset                  0.096914
Approved_LoanAmount            0.048296
Funded_LoanAmount              0.043501
DTI_Ratio                      0.028036
Credit_Score                   0.020360
Loan_Status_Cat                0.018261
Monthly_debt                   0.015346
Monthly_Income                 0.002857
LTV_Ratio                      0.000000
Age_of_Employment_in_Months    0.000000
Age_of_Employment_in_Years     0.000000
Borrower_Type_Cat              0.000000
dtype: float64

# Step 3: Parameter Tuning for KNN (K-Nearest Neighbors)

In [8]:
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

In [9]:
param_grid = {'n_neighbors': [3, 4, 5, 6, 7], 'p': [1, 2, 5]}

knn = KNeighborsClassifier()


In [10]:
grid_search = GridSearchCV(knn, param_grid, cv=5, verbose=1, scoring='accuracy', return_train_score=True)
grid_search.fit(X_train, y_train)


Fitting 5 folds for each of 15 candidates, totalling 75 fits




In [11]:
best_params_knn = grid_search.best_params_
print("Best Para for KNN:", best_params_knn)

Best Para for KNN: {'n_neighbors': 6, 'p': 5}


In [12]:
best_knn = KNeighborsClassifier(n_neighbors=best_params_knn['n_neighbors'], p=best_params_knn['p'])
best_knn.fit(X_train, y_train)

In [13]:
from sklearn.metrics import classification_report


from sklearn.metrics import classification_report
y_pred_knn = best_knn.predict(X_test)


In [14]:
print("Classification Report for KNN:")
print(classification_report(y_test, y_pred_knn))

Classification Report for KNN:
              precision    recall  f1-score   support

           0       0.70      0.93      0.80       182
           1       0.00      0.00      0.00         1
           2       0.00      0.00      0.00        17
           3       0.40      0.25      0.31        16
           5       0.20      0.04      0.07        49

    accuracy                           0.66       265
   macro avg       0.26      0.24      0.24       265
weighted avg       0.54      0.66      0.58       265



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
from sklearn.svm import SVC


param_grid_svm = {'C': [0.1, 1, 10, 100], 'kernel': ['linear', 'rbf', 'poly'], 'gamma': ['scale', 'auto']}


svm_classifier = SVC()


grid_search_svm = GridSearchCV(svm_classifier, param_grid_svm, cv=5, verbose=1, scoring='accuracy', return_train_score=True)
grid_search_svm.fit(X_train, y_train)

best_params_svm = grid_search_svm.best_params_
print("Best Parameters for SVM:", best_params_svm)


best_svm = SVC(C=best_params_svm['C'], kernel=best_params_svm['kernel'], gamma=best_params_svm['gamma'])
best_svm.fit(X_train, y_train)


y_pred_svm = best_svm.predict(X_test)


print("Classification Report for SVM:")
print(classification_report(y_test, y_pred_svm))


Fitting 5 folds for each of 24 candidates, totalling 120 fits




In [None]:

print("Performance Comparison:")
print("KNN:")
print(classification_report(y_test, y_pred_knn))
print("\nDecision Tree:")
print(classification_report(y_test, y_pred_dt))
print("\nSupport Vector Machine:")
print(classification_report(y_test, y_pred_svm))
