# Lightweight Decision Tree Classification

Auto-generated from `healthcare_dataset.csv`.

**Target column:** `Test Results`

This notebook is optimized for speed and simplicity — it drops ID columns, samples a smaller subset of the data, and keeps the target labels as text.

---

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

plt.rcParams['figure.figsize'] = (8,5)


In [None]:
# Load and preview data
df = pd.read_csv(r"/mnt/data/healthcare_dataset.csv")
print("Original shape:", df.shape)
display(df.head())

# Drop ID-like columns
id_cols = [c for c in df.columns if 'id' in c.lower()]
if id_cols:
    print("Dropping ID-like columns:", id_cols)
    df = df.drop(columns=id_cols)

# Drop missing rows
df = df.dropna()
print("After dropping NA:", df.shape)

# Use smaller random sample for light processing
if len(df) > 1000:
    df = df.sample(n=1000, random_state=42)
print("Sampled shape:", df.shape)

# Separate features and target
target = "Test Results"
X = df.drop(columns=[target])
y = df[target]

print("Features shape:", X.shape)
print("Target classes:", y.unique())


## Preprocessing
- One-hot encode categorical features
- Keep target labels as text

In [None]:
# One-hot encoding for categorical features
X = pd.get_dummies(X, drop_first=True)
print("Feature shape after encoding:", X.shape)


In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y if len(y.unique()) < 20 else None)

# Train Decision Tree
clf = DecisionTreeClassifier(max_depth=4, random_state=42)
clf.fit(X_train, y_train)
print("Model trained successfully.")


In [None]:
# Predictions and evaluation
y_pred = clf.predict(X_test)

print("Accuracy:", round(accuracy_score(y_test, y_pred), 4))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


In [None]:
# Simple tree visualization
plt.figure(figsize=(10,6))
plot_tree(clf, filled=True, feature_names=X.columns, class_names=y.unique())
plt.show()


### Notes
- Lightweight version for quick testing.
- Drop 'id' columns automatically.
- Keeps text labels for target.
- Samples a smaller dataset if it's large.
- For a full-scale model, disable sampling and consider tuning hyperparameters.