| **CLASSIFICATION OF SPINAL CONDITIONS** |
|-----------------------------------------|
---


> **Description:**  
> The Support Vector Machine (SVM) classifier will be used to create a model that can tell spinal conditions apart, helping improve diagnosis and treatment for spinal disorders. Accuracy of the model is equal to 81.72 %.

---

**Name:** Ayesha Siddiqua  
**Student ID:** U22103855


### INTRODUCTION

The dataset contains:
***310 instances** and **6 features** related to spinal health.* 

#### Features
1. *Pelvic Incidence*
2. *Pelvic Tilt*
3. *Lumbar Lordosis Angle*
4. *Sacral Slope*
5. *Pelvic Radius*
6. *Degree of Spondylolisthesis*
   
#### Spinal Condition Categories
1. *Normal (NO)* - Patients without any spinal issues.
2. *Disk Hernia (DH)* - Patients with a herniated disc.
3. *Spondylolisthesis (SL)* - Patients with vertebrae that have slipped.

### **1. Data Preprocessing**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt #simple data visualization
%matplotlib inline
import seaborn as sns #some advanced data visualizations
import warnings
warnings.filterwarnings('ignore') # to get rid of warnings
plt.style.use('seaborn-v0_8-white') #defining desired style of viz

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

**Loading the Dataset**

In [None]:
# importing necessary libraries

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# loading the dataset

# read the file into a dataframe 
df = pd.read_csv('vertebral_column.csv')

# display the dataframe to check if it was loaded correctly
print(df.head())


**Handling Missing Values**

In [None]:
# check for missing values in the dataframe
missing_values = df.isnull().sum()

# display the count of missing values for each column
print(missing_values)

# check if there are any missing values
if missing_values.sum() == 0:
    print("\nNo missing values.")
else:
    print("\nMissing values found.")


**Data Preprocessing**

In [None]:
# FEATURES & TARGETS COLUMNS

features = df.columns[:-1]  
target = df.columns[-1]     

# Data preprocessing
x = df.iloc[:, :-1].values # feature values
y = df.iloc[:, -1].values # target values

labels=df[target].unique()

# print the names of the features
print("Features: \n", list(features))

# print the label type 
print("Labels: \n", labels)

**Label Encoding**

In [None]:
# Convert categorical labels to numerical values using label encoding:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

# TARGET LABELS: Disk Hernia -> 0, Normal -> 1, Spondylolisthesis -> 2.
y = le.fit_transform(y)

**Normalization/Scaling & Train-Test Split**

In [None]:
# Splitting the dataset into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

# Feature scaling 
# This ensures equal weight for each feature and improves model performance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)


**Feature selection using Recursive Feature Elimination (RFE) with SVM classifier**

In [None]:
from sklearn.feature_selection import RFE
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Print the names of features before RFE
print("Features before RFE:")
for feature in features:
    print(feature)

# Feature selection using Recursive Feature Elimination (RFE) with SVM classifier
svm = SVC(kernel="linear")
rfe_selector = RFE(estimator=svm, n_features_to_select=6, step=1) #choosing 6 features as it gives highest accuracy
X_train_rfe = rfe_selector.fit_transform(x_train, y_train)
X_test_rfe = rfe_selector.transform(x_test)

# Train the SVM classifier on the selected features
svm.fit(X_train_rfe, y_train)

# Make predictions on the test set
y_pred = svm.predict(X_test_rfe)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("\nAccuracy with selected features:\n", accuracy)

# Get the selected feature indices
selected_feature_indices = rfe_selector.support_

# Get the names of selected features
selected_feature_names = features[selected_feature_indices]

print("\nSelected Features after RFE:")
for feature in selected_feature_names:
    print(feature)

In [None]:
# Get the ranking of each feature
feature_ranking = rfe_selector.ranking_

# Print the rank of each feature
print("Rank of each feature:")
for feature_name, rank in zip(features, feature_ranking):
    print(f"{feature_name}: {rank}")

### **2. Generating SVM Model**

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn import svm
from sklearn.svm import SVC
from sklearn.model_selection import KFold, cross_val_score


I used GridSearchCV for tuning hyperparameters.

GridSearchCV automatically takes care of cross-validation for all combinations of hyperparameters, so I didn't need to do separate KFold cross-validation by setting the cv parameter to 5.

In [None]:
# Define the parameter grid to search
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf'], 'gamma': ['scale', 'auto']}

# Initialize SVM classifier for multiclass 
svm = SVC(decision_function_shape='ovo')

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5)

# Perform hyperparameter tuning
grid_search.fit(x_train, y_train)

# Print the best hyperparameters found
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Evaluate the best model on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(x_test)
test_accuracy = best_model.score(x_test, y_test)
print("Test Accuracy:", test_accuracy)
 

**Choosing Best Hyperparameters**

In [None]:
#Import svm model
from sklearn import svm

#Create a svm Classifier
clf = SVC(kernel=best_params['kernel'], C=best_params['C'], gamma=best_params['gamma'])

# Import necessary libraries

# Fit the model using the training sets
clf.fit(x_train, y_train)

# Predict the response for the test dataset
y_pred = clf.predict(x_test)


### **3. Evaluating the Model**

**ACCURACY**

- Test size: 0.20 -> Acccuracy: 75.81 %
- Test size: 0.25 -> Acccuracy: 80.77 %
- Test size: 0.30 -> Acccuracy: 81.72 %

**Hence, choosing test size=0.3 as it gives us the highest accuracy.**

In [None]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy: how often is the classifier correct?
accuracy = accuracy_score(y_test, y_pred) * 100 
print(f'Accuracy of the model is equal to {round(accuracy, 2)} %.')


**CONFUSION MATRIX, F1 SCORE, PRECISION AND RECALL**

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report, f1_score, recall_score, precision_score

# confusion matrix to summarize the classification results        
CM = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", CM)

# F1 score - the harmonic mean of precision and recall
# 'weighted' averaging is used to consider the proportion of each class in the dataset
f1 = f1_score(y_test, y_pred, average='weighted')  
print(f"F1 Score: {f1:.4f}")
        
# Recall measures the ability of the model to identify positive instances       
recall = recall_score(y_test, y_pred, average='weighted')    
print(f"Recall: {recall:.4f}")

# Precision measures the accuracy of positive predictions
precision = precision_score(y_test, y_pred, average='weighted')
print(f"Precision: {precision:.4f}")

# Plotting the confusion matrix heatmap
plt.figure(figsize=(10, 7))
sns.heatmap(CM, annot=True, fmt='d', cmap='Blues', cbar=False, 
                    xticklabels=['Disk Hernia', 'Normal', 'Spondylolisthesis'], 
                    yticklabels=['Disk Hernia', 'Normal', 'Spondylolisthesis'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title(f'Confusion Matrix Heatmap')
plt.show()

from sklearn.metrics import (
    confusion_matrix,
    ConfusionMatrixDisplay,
    classification_report,
)
labels = ["Disk Hernia", "Normal", "Spondylolisthesis"]
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
disp.plot();

### **4. Saving the trained SVM model as a .pkl file for deployment**


In [None]:
import joblib

joblib.dump(clf, 'svm_spinal_classifier.pkl')