# Kyphosis Disease Classification

- Kyphosis is an abnormally excessive convex curvature of the spine. The kyphosis data frame has 81 rows and 4 columns, representing data on children who have had corrective spinal surgery. The dataset contains 3 inputs and 1 output.

**Inputs:**
- Age: In months.
- Number: The number of vertebrae involved.
- Start: The number of the first (topmost) vertebrae operated on.

**Outputs:**
- Kyphosis: Whether a type of deformation was present after the operation.

Data source: https://www.kaggle.com/abbasit/kyphosis-dataset

# Importing the libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Importing the dataset

In [None]:
ds = pd.read_csv('kyphosis.csv')

In [None]:
ds.head()

# Visualising the data

In [None]:
sns.pairplot(data = ds, hue = 'Kyphosis')

In [None]:
sns.countplot(ds.Kyphosis)

In [None]:
ds['Age'].hist(bins = 10)

# Taking care of missing data

In [None]:
# We observe no missing data.

sns.heatmap(ds.isnull(), yticklabels = False, cbar = False, cmap = 'Blues')

In [None]:
X = ds.iloc[:, 1:].values
y = ds.iloc[:, 0].values

In [None]:
X

In [None]:
y

# Encoding Categorical Variables

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

In [None]:
y

# Splitting the dataset into the training set and test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling - Not Required

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
# X_train = sc.fit_transform(X_train)
# X_test = sc.transform(X_test)

# Fitting the Random Forest Classsifier to the dataset

In [None]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

In [None]:
# Predicting the test set results

y_pred = rfc.predict(X_test)

# Model Evaluation - Confusion Matrix and K-Fold Cross Validation

In [None]:
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
print(classification_report(y_test, y_pred))

In [None]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = rfc, X = X_train, y = y_train, cv = 10)
mean_accuracy = accuracies.mean()
std_accuracy = accuracies.std()

In [None]:
print(mean_accuracy)
print(std_accuracy)