**Programmer: python_scripts (Abhijith Warrier)**

**PYTHON SCRIPT TO *PREDICT WINE QUALITY USING RANDOM FOREST AND EXPLAIN MODEL DECISIONS WITH SHAP*. üß†üç∑üìä**

This script demonstrates how to build a **high-performing tabular ML model** and then **explain its predictions** using **SHAP (SHapley Additive exPlanations)** ‚Äî a critical requirement for real-world ML systems.

---

## **üì¶ Install Required Packages**

**Install ML and explainability libraries.**

In [None]:
pip install pandas numpy scikit-learn shap matplotlib

---

## **üß© Load the Wine Quality Dataset**

**We use the popular UCI Wine Quality dataset.**

In [None]:
import pandas as pd

df = pd.read_csv("winequality-red.csv", sep=";")
df.head()

Features include acidity, sugar, sulphates, alcohol, and more.

Target variable: **quality** (integer score).

---

## **üîç Basic Data Inspection**

**Understand feature distributions and target range.**

In [None]:
print(df.info())
print(df["quality"].value_counts().sort_index())

Wine quality prediction is typically treated as a **regression problem**.

---

## **‚úÇÔ∏è Train/Test Split**

**Split features and target variable.**

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop("quality", axis=1)
y = df["quality"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    random_state=42
)

---

## **üå≤ Train a Random Forest Regressor**

**Random Forest captures complex, non-linear relationships.**

In [None]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=300,
    max_depth=12,
    random_state=42
)

model.fit(X_train, y_train)

---

## **üìä Evaluate Model Performance**

**Evaluate predictions using regression metrics.**

In [None]:
from sklearn.metrics import mean_absolute_error, r2_score

y_pred = model.predict(X_test)

print("MAE:", mean_absolute_error(y_test, y_pred))
print("R¬≤ Score:", r2_score(y_test, y_pred))

This confirms how well the model predicts wine quality scores.

---

## **üîé Explain Predictions with SHAP**

**SHAP explains how each feature contributes to predictions.**

In [None]:
import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

---

## **üìà Visualize Feature Importance with SHAP**

**Global feature impact on wine quality.**

In [None]:
shap.summary_plot(shap_values, X_test)

This plot shows:

- which features matter most
- whether they increase or decrease predicted quality

---

## **üß™ Why Explainability Matters**

- High accuracy alone is not enough
- Stakeholders need to understand *why* predictions happen
- SHAP provides consistent, model-agnostic explanations
- Critical for trust, debugging, and compliance

---

## **Key Takeaways**

1. Wine quality prediction is a strong tabular ML use case.
2. Random Forest models capture non-linear feature interactions.
3. SHAP explains model predictions at both global and local levels.
4. Explainability is essential for real-world ML systems.
5. Performance + interpretability together make models production-ready.

---