# Project 6.1

Project 1: Support Vector Machines (SVMs) in Credit Risk Analysis

Business understanding – A financial institution wants to minimize the risk of loan default by identifying high-risk borrowers.

Data understanding – The institution has a dataset containing the financial and personal information of past borrowers, including credit score, income, age, and loan status.

Data preparation – We will pre-process the data by removing missing values and standardizing the numerical features. We will also encode the categorical features using one-hot encoding.

Modeling – We will apply SVMs to classify borrowers into high and low-risk categories. We will experiment with different kernel functions and regularization parameters to optimize the model's performance.

Evaluation – We will evaluate the SVM model using metrics such as accuracy, precision, recall, and F1-score. We will also compare the SVM model's performance to other classification algorithms such as logistic regression and decision trees.

In [None]:
# Load and pre-process the data
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split

df = pd.read_csv('credit_data.csv')
df.dropna(inplace=True)

numerical_cols = ['credit_score', 'income', 'age']
categorical_cols = ['gender', 'education', 'employment']

X_numerical = StandardScaler().fit_transform(df[numerical_cols])
X_categorical = OneHotEncoder().fit_transform(df[categorical_cols]).toarray()
X = np.concatenate([X_numerical, X_categorical], axis=1)

y = df['default'].values

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM model
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

svm = SVC(kernel='rbf', C=10)
svm.fit(X_train, y_train)

# Evaluate the model
y_pred = svm.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}, Precision: {precision:.2f}, Recall: {recall:.2f}, F1-score: {f1:.2f}')
