# Hospital Readmission Prediction

This notebook demonstrates the AI Development Workflow for predicting 30-day hospital readmission risk. It follows the CRISP-DM framework and includes data preprocessing, model development, evaluation, and deployment simulation.

**Objectives**:
- Predict readmission risk using patient data.
- Improve discharge planning and reduce readmission rates.

**Stakeholders**:
- Hospital administrators
- Physicians and care teams

**KPI**: Precision and recall of the readmission prediction model.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv('data/hospital_readmissions.csv')

# Preview
df.head()


In [None]:
# Basic info
df.info()
df.describe()


In [None]:
# Visualize readmission distribution
sns.countplot(x='readmitted', data=df)
plt.title('Readmission Distribution')
plt.show()


In [None]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer

# Handle missing values
df['age'] = df['age'].fillna(df['age'].mean())

# Encode categorical variables
df = pd.get_dummies(df, columns=['discharge_type', 'gender'], drop_first=True)

# Normalize lab results
scaler = StandardScaler()
df['lab_results'] = scaler.fit_transform(df[['lab_results']])

# Split features and target
X = df.drop(['readmitted', 'patient_id'], axis=1)
y = df['readmitted']


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)


In [None]:
from sklearn.metrics import confusion_matrix, precision_score, recall_score, classification_report

# Predict
y_pred = model.predict(X_test)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Precision and recall
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))

# Full report
print(classification_report(y_test, y_pred))


In [None]:
# Simulate a new patient input
new_patient = X_test.iloc[0].values.reshape(1, -1)
prediction = model.predict(new_patient)

print("Readmission Risk:", "HIGH" if prediction[0] == 1 else "LOW")


## Concept Drift Monitoring

Concept drift occurs when the data distribution changes over time, affecting model accuracy. To monitor drift:
- Track prediction accuracy monthly.
- Retrain model with recent data.
- Use alerts for performance drops.

Ethical concerns include fairness across demographics and privacy compliance.


In [None]:
# Regularization to reduce overfitting
model = LogisticRegression(max_iter=1000, C=0.5)
model.fit(X_train, y_train)


## Reflection

**Challenge**: Designing a fair preprocessing pipeline and selecting interpretable features.

**Improvement**: With more time, I would test multiple models and include stakeholder feedback in feature selection.
