# Machine Learning Project (Ques-1)

**Programme:** Master of Computer Applications <br>
**Semester:** 2nd <br>

**Members:**
1. Geetansh Khaturia (23MCA0064)
2. Shivam Verma (23MCA0159)
3. Suryansh Narang (23MCA0166)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from imblearn.over_sampling import SMOTE
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
import pickle

In [2]:
data = pd.read_excel("credit_rating.xls")
data.head()

Unnamed: 0,S.No,CHK_ACCT,Duration,History,Purpose of credit,Credit Amount,Balance in Savings A/C,Employment,Install_rate,Marital status,...,Real Estate,Age,Other installment,Residence,Num_Credits,Job,No. dependents,Phone,Foreign,Credit classification
0,1,0DM,6,critical,radio-tv,1169,unknown,over-seven,4,single-male,...,real-estate,67,none,own,2,skilled,1,no,yes,good.
1,2,less-200DM,48,duly-till-now,radio-tv,5951,less100DM,four-years,2,female-divorced,...,real-estate,22,none,own,1,skilled,1,yes,yes,bad.
2,3,no-account,12,critical,education,2096,less100DM,seven-years,2,single-male,...,real-estate,49,none,own,1,unskilled-resident,2,yes,yes,good.
3,4,0DM,42,duly-till-now,furniture,7882,less100DM,seven-years,2,single-male,...,building-society,45,none,free,1,skilled,2,yes,yes,good.
4,5,0DM,24,delay,new-car,4870,less100DM,four-years,3,single-male,...,none,53,none,free,2,skilled,2,yes,yes,bad.


## Data Preprocessing

In [3]:
data.drop(columns=["S.No", "S.No."], axis=1, inplace=True)

In [4]:
duplicate_rows = data.duplicated().sum()
duplicate_cols = data.columns.duplicated().sum()

print("Number of duplicate rows:", duplicate_rows)
print("Number of duplicate columns:", duplicate_cols)

Number of duplicate rows: 0
Number of duplicate columns: 0


In [5]:
null_values = data.isnull().sum()
print("Number of null values...")
print(null_values)

Number of null values...
CHK_ACCT                  0
Duration                  0
History                   0
Purpose of credit         0
Credit Amount             0
Balance in Savings A/C    0
Employment                0
Install_rate              0
Marital status            0
Co-applicant              0
Present Resident          0
Real Estate               0
Age                       0
Other installment         0
Residence                 0
Num_Credits               0
Job                       0
No. dependents            0
Phone                     0
Foreign                   0
Credit classification     0
dtype: int64


In [6]:
print("Number of values in target variable...")
print(data["Credit classification"].value_counts())

Number of values in target variable...
Credit classification
 good.    700
 bad.     300
Name: count, dtype: int64


**Encoding the discrete values of categorical attributes.**

In [7]:
label_encoders = {}

for column in data.columns:
    if data[column].dtype == 'object':
        label_encoders[column] = LabelEncoder()
        data[column] = label_encoders[column].fit_transform(data[column])

**Segregating independent and target variables.**

In [8]:
X = data.drop(columns=['Credit classification'])
y = data['Credit classification']

**Normalizing the independent variables.**

In [9]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

**Using SMOTE technique to handle class imbalance.**

In [10]:
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_scaled, y)

In [11]:
print(y_resampled.value_counts())

Credit classification
1    700
0    700
Name: count, dtype: int64


**Splitting the dataset into training and test sets.**

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)

## Support Vector Machine

In [13]:
clf = SVC(kernel="rbf")
clf.fit(X_train, y_train)

**Evaluating the performance of SVM Classifier model.**

In [14]:
y_pred = clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

In [15]:
print("Accuracy:", acc)
print("F1 Score:", f1)
print(cm)

Accuracy: 0.825
F1 Score: 0.8304498269896193
[[111  20]
 [ 29 120]]


In [16]:
with open("model.pkl", "wb") as file:
    pickle.dump((clf, scaler, label_encoders), file)