| **CLASSIFICATION OF SPINAL CONDITIONS** |
|-----------------------------------------|
---


> **Description:**  
> 
---

**Name:** Ayesha Siddiqua  
**Student ID:** U22103855


### INTRODUCTION

The dataset contains:
***310 instances** and **6 features** related to spinal health.* 

#### Features
1. *Pelvic Incidence*
2. *Pelvic Tilt*
3. *Lumbar Lordosis Angle*
4. *Sacral Slope*
5. *Pelvic Radius*
6. *Degree of Spondylolisthesis*
   
#### Spinal Condition Categories
1. *Normal (NO)* - Patients without any spinal issues.
2. *Disk Hernia (DH)* - Patients with a herniated disc.
3. *Spondylolisthesis (SL)* - Patients with vertebrae that have slipped.

### **1. Data Preprocessing**

**Loading the Dataset**

In [12]:
# importing necessary libraries

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# loading the dataset

# read the file into a dataframe 
df = pd.read_csv('vertebral_column.csv')

# display the dataframe to check if it was loaded correctly
print(df.head())


   pelvic_incidence  pelvic_tilt  lumbar_lordosis_angle  sacral_slope  \
0             63.03        22.55                  39.61         40.48   
1             39.06        10.06                  25.02         29.00   
2             68.83        22.22                  50.09         46.61   
3             69.30        24.65                  44.31         44.64   
4             49.71         9.65                  28.32         40.06   

   pelvic_radius  degree_spondylolisthesis spinal_condition  
0          98.67                     -0.25               DH  
1         114.41                      4.56               DH  
2         105.99                     -3.53               DH  
3         101.87                     11.21               DH  
4         108.17                      7.92               DH  


**Handling Missing Values**

In [14]:
# check for missing values in the dataframe
missing_values = df.isnull().sum()

# display the count of missing values for each column
print(missing_values)

# check if there are any missing values
if missing_values.sum() == 0:
    print("\nNo missing values.")
else:
    print("\nMissing values found.")


pelvic_incidence            0
pelvic_tilt                 0
lumbar_lordosis_angle       0
sacral_slope                0
pelvic_radius               0
degree_spondylolisthesis    0
spinal_condition            0
dtype: int64

No missing values.


**Data Preprocessing**

In [16]:
# FEATURES & TARGETS COLUMNS

features = df.columns[:-1]  
target = df.columns[-1]     

# Data preprocessing
x = df.iloc[:, :-1].values # feature values
y = df.iloc[:, -1].values # target values

labels=df[target].unique()

# print the names of the features
print("Features: \n", list(features))

# print the label type 
print("Labels: \n", labels)

Features: 
 ['pelvic_incidence', 'pelvic_tilt', 'lumbar_lordosis_angle', 'sacral_slope', 'pelvic_radius', 'degree_spondylolisthesis']
Labels: 
 ['DH' 'SL' 'NO']


**Label Encoding**

In [18]:
# Convert categorical labels to numerical values using label encoding:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

# TARGET LABELS: Disk Hernia -> 0, Normal -> 1, Spondylolisthesis -> 2.
y = le.fit_transform(y)

**Normalization/Scaling & Train-Test Split**

In [20]:
# Splitting the dataset into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

# Feature scaling 
# This ensures equal weight for each feature and improves model performance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)


### Linear Model Regression

In [22]:
from sklearn.linear_model import LinearRegression

In [23]:
# Create a Linear Regression model and fit it
regressor =LinearRegression(fit_intercept=True)
regressor.fit(x_train,y_train)

In [24]:
# Getting Results
#print('Linear Model Coeff (m) =' , regressor.coef_)
#print('Linear Model Coeff (b) =' , regressor.intercept_)

In [25]:
# Predicting the data
y_predict=regressor.predict(x_test)
print(y_predict)

[0.97481451 1.58865549 0.59784507 1.56895605 0.52942224 1.07210691
 1.14169555 1.97487021 1.40450234 1.18451045 1.671178   1.84141921
 1.15131854 1.60638242 2.15717126 1.38445216 0.66768367 1.80713265
 1.47154595 1.47098457 0.66244721 0.80126371 0.8217946  2.22383283
 1.29831809 0.17951895 0.7665126  1.54074015 2.09869398 1.42992015
 1.97134318 1.84912831 0.93809933 1.26673498 1.11523245 1.4068466
 1.68211492 1.91925323 1.74816996 1.17182591 1.77458407 1.82654556
 0.39814664 1.54962458 1.12181545 0.76594137 1.67327251 0.94905585
 1.46485805 0.83848548 1.39131914 1.97955256 0.82777377 1.13193888
 0.88227312 0.54216868 1.62816859 1.60507425 0.82002578 0.74645138
 0.45197494 0.83095722 0.85874839 1.71811658 0.87692596 0.85227129
 1.14321044 1.68415653 1.26232615 1.53099942 0.55452919 2.22083804
 0.61431648 1.40761933 1.49728605 1.85457288 1.11260161 0.82050173
 0.83024821 1.43361458 0.62501204 1.01114956 1.44559575 0.82588681
 0.66375049 0.83539798 0.81827937 0.92963359 1.96027764 0.92166

In [None]:
import pandas as pd

# Load the Excel file
data = pd.read_excel('your_file.xlsx')

# Save as CSV
data.to_csv('your_file.csv', index=False)
