# Frailty Analysis - Reproducible Workflow

This notebook contains the full workflow for analyzing frailty data, including:
1. **Data Cleaning**
2. **Exploratory Data Analysis (EDA)**
3. **Visualizations**

## 1. Data Cleaning
We will load the dataset, clean it, and save the processed version.

In [None]:

import pandas as pd

# Load dataset
df = pd.read_csv("data/raw/frailty_data.csv")

# Convert Frailty column to binary (N -> 0, Y -> 1)
df["Frailty"] = df["Frailty"].map({"N": 0, "Y": 1})

# Save cleaned data
df.to_csv("data/cleaned/frailty_data.csv", index=False)
print("Cleaned dataset saved.")


## 2. Exploratory Data Analysis (EDA)
We will compute summary statistics and the correlation matrix.

In [None]:

# Compute summary statistics
summary_stats = df.describe()

# Compute correlation matrix
correlation_matrix = df.corr()

# Save EDA outputs
summary_stats.to_csv("analysis/reports/eda_summary.csv")
correlation_matrix.to_csv("analysis/reports/correlation_matrix.csv")

print("EDA results saved.")
summary_stats


## 3. Visualizations
We will create and save histograms and scatter plots.

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

# Histogram of Grip Strength
plt.figure(figsize=(6,4))
sns.histplot(df['Grip Strength'], bins=10, kde=True)
plt.title("Distribution of Grip Strength")
plt.xlabel("Grip Strength (kg)")
plt.ylabel("Frequency")
plt.savefig("reports/visualization_plots/grip_strength_distribution.png")
plt.show()

# Scatter plot of Grip Strength vs. Age
plt.figure(figsize=(6,4))
sns.scatterplot(x=df['Age'], y=df['Grip Strength'], hue=df['Frailty'])
plt.title("Grip Strength vs Age")
plt.xlabel("Age (years)")
plt.ylabel("Grip Strength (kg)")
plt.savefig("reports/visualization_plots/grip_vs_age.png")
plt.show()

print("Visualizations saved.")
