Project Overview

# Statistical Analysis for Decision-Making

This project applies inferential statistical techniques to evaluate relationships
between key indicators and outcomes. The objective is to support evidence-based
decision-making using statistical tests and clear interpretation.

In [1]:
# Imports & Dataset Loading (Code Cell)

import pandas as pd
import numpy as np

from sklearn.datasets import load_diabetes
from scipy.stats import pearsonr, ttest_ind

In [2]:
# Load dataset
diabetes = load_diabetes(as_frame=True)
df = diabetes.frame.copy()
df.rename(columns={"target": "disease_progression"}, inplace=True)

df.head()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,disease_progression
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0


# Define the Statistical Question

## Research Question

Is there a statistically significant relationship between selected health indicators
and disease progression?

In [3]:
# Correlation Hypothesis Test (Pearson)

x = df["bmi"]
y = df["disease_progression"]

corr, p_value = pearsonr(x, y)
corr, p_value

(np.float64(0.5864501344746889), np.float64(3.46600644516715e-42))

## Hypothesis Test (Correlation)

- **Null Hypothesis (H₀):** There is no relationship between BMI and disease progression.
- **Alternative Hypothesis (H₁):** There is a relationship between BMI and disease progression.

If the p-value is less than 0.05, the null hypothesis is rejected.

In [4]:
# Group Comparison (Decision-Driven Analysis)

df["bmi_group"] = np.where(
    df["bmi"] >= df["bmi"].median(),
    "High BMI",
    "Low BMI"
)

df.groupby("bmi_group")["disease_progression"].mean()

bmi_group
High BMI    190.450893
Low BMI     112.761468
Name: disease_progression, dtype: float64

In [5]:
# Independent Samples T-Test

high_bmi = df[df["bmi_group"] == "High BMI"]["disease_progression"]
low_bmi = df[df["bmi_group"] == "Low BMI"]["disease_progression"]

t_stat, p_val = ttest_ind(high_bmi, low_bmi)
t_stat, p_val

(np.float64(12.253090286312903), np.float64(6.670062575365566e-30))

# Interpretation 

## Interpretation of Results

The statistical tests indicate a significant difference in disease progression
between individuals with higher BMI and those with lower BMI.

This suggests that BMI may be an important factor influencing health outcomes
and should be considered in planning targeted interventions or resource allocation.



# Limitations & Notes

## Limitations

- The dataset is used for demonstration purposes.
- Results do not imply causation.
- Further analysis with real-world data and controlled variables is recommended.

# Final Summary 

## Conclusion

This analysis demonstrates how inferential statistics can support evidence-based
decision-making. Hypothesis testing and group comparisons provide meaningful
insights that complement exploratory data analysis.
