# Kyphosis Dataset Analysis

## Data Loading & Exploration

*   **Inputs:** Kyphosis (absent or present), Age (in months), Number (the number of vertebrae involved), Start (the number of the first vertebra operated on)
*   **Outputs:** Link to the dataset (https://www.kaggle.com/datasets/abbasit/kyphosis-dataset)

### Import Libraries

---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
# 1. Age: in months, 2. Number: the number of vertebrae involved, 3. Start: the number of the first (topmost) vertebra operated on.
k_df = pd.read_csv('kyphosis[1].csv')
k_df

In [None]:
k_df.head()

In [None]:
k_df.tail()

In [None]:
k_df.shape

In [None]:
k_df.info()

In [None]:
k_df.describe()

In [None]:
k_df.isnull().sum()

## Data Preprocessing

### Normalization Used to Convert Feature Range From 0-1

---



In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
k_df['Age'] = scaler.fit_transform(k_df['Age'].values.reshape(-1,1))
k_df

### Converting The Age From Months to Years

---

In [None]:
k_df['Age'].mean()/12

In [None]:
k_df['Age'].min()/12

In [None]:
k_df['Age'].max()/12

In [None]:
k_df["Age"] = k_df["Age"].astype("float64")
k_df.info()

### Creating a Function to Convert Age From Months to Years

---

In [None]:
# defining a function to change age from months to years
def month_to_year(age):
  return age / 12

In [None]:
# applying the function to the ['Age'] column
k_df['Age'] = k_df['Age'].apply(month_to_year)
k_df

In [None]:
k_df [k_df['Age'] == k_df['Age'].max()]

In [None]:
k_df [k_df['Age'] == k_df['Age'].min()]

In [None]:
k_df.describe().round(2)

### Averages for Kyphosis & Non Kyphosis Cases

---

In [None]:
non_number = (absent_kyphosis['Number'].mean())
non_number

In [None]:
number = (present_kyphosis['Number'].mean())
number

In [None]:
average_developed = (present_kyphosis['Age'].mean())
average_developed

In [None]:
average_nondeveloped = (absent_kyphosis['Age'].mean())
average_nondeveloped

## Data Visualization

### Histogram & Countplot

---

In [None]:
k_df.hist(bins = 10, figsize = (10,10), color = 'b');

In [None]:
sns.countplot(x = 'Kyphosis', data = k_df, label = 'Count')

### Heatmap & One Hot Encoding

---

In [None]:
sns.heatmap(k_df.drop('Kyphosis', axis=1).corr(), annot=True)
# Age is independent of the surgical factors in this dataset
# Higher surgeries tend to involve more vertebrae

In [None]:
# one hot encoding to ['Kyphosis'] column
Kyphosis_encoded = pd.get_dummies(k_df['Kyphosis'], dtype=int)
Kyphosis_encoded

## Summary

### Analysis Summary & Key Findings

Based on this dataset:

*   **Patients who developed kyphosis after surgery** had an average age of 97.8 months (8.1 years).
*   **Patients without kyphosis after surgery** had an average age of 79.9 months (6.6 years).
    *   **Finding:** Based on this dataset, older patients are more likely to develop kyphosis after surgery compared to younger patients.

*   **Kyphosis cases:** averaged 5.2 vertebrae operated on.
*   **Non-kyphosis cases:** averaged 3.8 vertebrae.
    *   **Finding:** Based on this dataset, surgeries involving a larger number of vertebrae are more likely to develop kyphosis.

*   **Kyphosis cases:** surgeries started around vertebra 7.3 (higher up the spine).
*   **Non-kyphosis cases:** surgeries started around vertebra 12.6 (lower down the spine).
    *   **Finding:** Based on this dataset, surgeries beginning higher on the spine seem to increase the risk of kyphosis.

**Hypothesis:** Patients who undergo spinal surgeries involving more vertebrae and a higher starting vertebral position are more likely to develop kyphosis rather than patients with smaller spinal surgeries starting lower in the spine.

**Important Considerations:**

*   Assuming the dataset accurately records the patient ages, number of vertebrae, and surgery start position.
*   This dataset only focuses on 4 categories, leaving out other factors such as surgeon experience & skill, patient health before & after surgery, patient genetics, etc.