**Programmer:** python_scripts (Abhijith Warrier)

**PYTHON SCRIPT TO *PREDICT DIABETES PROGRESSION USING LINEAR REGRESSION*. üß†üìâü§ñ**

This script demonstrates how to use the **Diabetes dataset** from scikit-learn to build a **Linear Regression model** that predicts disease progression based on health metrics such as BMI, blood pressure, and serum measurements.

---

## **üì¶ Import Required Libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

---

## **üß© Load and Explore the Diabetes Dataset**

We use the built-in dataset from scikit-learn.

It contains **10 numeric medical features** and a continuous target representing **disease progression after 1 year**.

In [None]:
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

print("Features shape:", X.shape)
print("Target shape:", y.shape)

df = pd.DataFrame(X, columns=diabetes.feature_names)
df["target"] = y
df.head()

---

## **‚úÇÔ∏è Train/Test Split**

We use 80% of the data for training and 20% for testing.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

---

## **ü§ñ Train a Linear Regression Model**

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

---

## **üìä Evaluate the Model**

We measure performance using:

- **Mean Squared Error (MSE)**
- **R¬≤ Score (coefficient of determination)**

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R¬≤ Score:", r2)

---

## **üìà Visualize Predictions vs Actual Values**

In [None]:
plt.figure(figsize=(7,5))
plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel("Actual Progression")
plt.ylabel("Predicted Progression")
plt.title("Diabetes Progression: Actual vs Predicted")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], "r--")
plt.show()

This scatter plot helps visually judge how close predictions are to true values.

---

## **üìå Optional: Feature Importance (Coefficient Strengths)**

In [None]:
coef_df = pd.DataFrame({
    "feature": diabetes.feature_names,
    "coefficient": model.coef_
}).sort_values(by="coefficient", ascending=False)

coef_df

This shows which medical factors contribute most strongly to disease progression.

---

## ‚úÖ **Key Takeaways**

- **Linear Regression is a foundational ML tool for predicting continuous values**, useful in healthcare, finance, forecasting, and more.
- **R¬≤ Score** shows how much variance the model explains‚Äîhigher means better fit.
- **Visualization of predictions** makes it easy to diagnose underfitting or overfitting.
- **Coefficient analysis** reveals which features have the strongest relationship with disease progression.
- The Diabetes dataset is ideal for learning regression because it is real-world, numeric, and interpretable.

---