# Score Pattern Analysis by ROM Classification

## Objective
This notebook explores how rehabilitation scores (proxied using `score` values from the dataset) vary across patients classified by ROM (Range of Motion) severity: **High**, **Low**, or **No** ROM.

Key tasks include:
- Visualizing score distributions by ROM class
- Removing outliers to ensure interpretability
- Preparing code suitable for clinical insight or research presentation


In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv("user_data_with_label.txt", sep="|", engine="python")
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
df.head()


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Drop missing values
df = df.dropna(subset=["score", "Classification"])

# Clip outliers at the 99th percentile
score_threshold = df["score"].quantile(0.99)
df_clipped = df[df["score"] <= score_threshold]


In [None]:
# Plot score distribution by ROM classification
plt.figure(figsize=(10, 6))
sns.histplot(data=df_clipped, x="score", hue="Classification", element="step", stat="count", common_norm=False)
plt.title("Score Distribution by ROM Classification (Clipped at 99th Percentile)")
plt.xlabel("Score")
plt.ylabel("Frequency")
plt.grid(True)
plt.tight_layout()
plt.show()
