<a href="https://colab.research.google.com/github/RickyBandi/EXPLAINABLE-AI/blob/main/Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error


# 1. Dataset

data = {
    "GoogleAds_(₹1000s)": [1, 2, 3, 1, 2],
    "BooksSold": [100, 130, 160, 110, 140]
}
df = pd.DataFrame(data)

In [3]:
# 2. Baseline value

baseline = df["BooksSold"].mean()
print(f"Baseline value: {baseline}")

Baseline value: 128.0


In [4]:
# 3. Linear Regression

X = df[["GoogleAds_(₹1000s)"]]
y = df["BooksSold"]

model = LinearRegression()
model.fit(X, y)

intercept = model.intercept_
coef = model.coef_[0]

In [7]:
# 4. Predictions and SHAP Values
df["Predicted_BooksSold"] = model.predict(X).round(2)
df["Baseline"] = round(baseline, 2)
df["SHAP"] = (df["Predicted_BooksSold"] - baseline).round(2)
df["Baseline+SHAP"] = (df["Baseline"] + df["SHAP"]).round(2)
df["Residual"] = (df["BooksSold"] - df["Predicted_BooksSold"]).round(2)
df["Over/Under"] = df["Residual"].apply(
    lambda r: "Underprediction" if r > 0 else ("Overprediction" if r < 0 else "Exact")
)

print("\nSHAP Table:\n")
print(df)


SHAP Table:

   GoogleAds_(₹1000s)  BooksSold  Predicted_BooksSold  Baseline   SHAP  \
0                   1        100               105.71     128.0 -22.29   
1                   2        130               133.57     128.0   5.57   
2                   3        160               161.43     128.0  33.43   
3                   1        110               105.71     128.0 -22.29   
4                   2        140               133.57     128.0   5.57   

   Baseline+SHAP  Residual       Over/Under  Residual_(Actual-Predicted)  \
0         105.71     -5.71   Overprediction                        -5.71   
1         133.57     -3.57   Overprediction                        -3.57   
2         161.43     -1.43   Overprediction                        -1.43   
3         105.71      4.29  Underprediction                         4.29   
4         133.57      6.43  Underprediction                         6.43   

                        Over_Under  
0  Overprediction (model too high)  
1  Overpre

In [8]:
# 5. Residuals & Over/Under

df["Residual_(Actual-Predicted)"] = (df["BooksSold"] - df["Predicted_BooksSold"]).round(4)
df["Over_Under"] = df["Residual_(Actual-Predicted)"].apply(
    lambda r: "Underprediction (model too low)" if r > 0
    else ("Overprediction (model too high)" if r < 0 else "Exact")
)

In [9]:
# 6. Model performance

r2 = r2_score(y, df["Predicted_BooksSold"])
mse = mean_squared_error(y, df["Predicted_BooksSold"])
mae = mean_absolute_error(y, df["Predicted_BooksSold"])

In [10]:
# 7. Output results

print("Linear Regression Model")
print(f"Predicted_BooksSold = {intercept:.4f} + {coef:.4f} × GoogleAds_(₹1000s)")
print(f"Intercept: {intercept:.4f}")
print(f"Coefficient: {coef:.4f} (books per ₹1000 Google Ads)")
print("\nBaseline")
print(f"Baseline (mean BooksSold): {baseline:.4f}")

print("\nModel Performance")
print(f"R-squared: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"MAE: {mae:.4f}")

print("\nDetailed Table")
print(df.to_string(index=False))

Linear Regression Model
Predicted_BooksSold = 77.8571 + 27.8571 × GoogleAds_(₹1000s)
Intercept: 77.8571
Coefficient: 27.8571 (books per ₹1000 Google Ads)

Baseline
Baseline (mean BooksSold): 128.0000

Model Performance
R-squared: 0.9530
MSE: 21.4286
MAE: 4.2860

Detailed Table
 GoogleAds_(₹1000s)  BooksSold  Predicted_BooksSold  Baseline   SHAP  Baseline+SHAP  Residual      Over/Under  Residual_(Actual-Predicted)                      Over_Under
                  1        100               105.71     128.0 -22.29         105.71     -5.71  Overprediction                        -5.71 Overprediction (model too high)
                  2        130               133.57     128.0   5.57         133.57     -3.57  Overprediction                        -3.57 Overprediction (model too high)
                  3        160               161.43     128.0  33.43         161.43     -1.43  Overprediction                        -1.43 Overprediction (model too high)
                  1        110        

In [13]:
#8. Summary
summary = f"""
Summary Analysis

1. Model Accuracy
   - R² Score: {r2:.3f}
   - Mean Squared Error (MSE): {mse:.2f}
   - Mean Absolute Error (MAE): {mae:.2f}

   The model explains about {r2*100:.1f}% of the variance in book sales, which is fairly good for such a small dataset.

2. Trend Analysis
   - As Google Ads spending increases, predicted book sales also increase linearly.
   - This shows a positive correlation between ad spend and weekly sales.

3. SHAP Interpretation
   - SHAP values show how far each prediction moves away from the baseline (mean = {baseline:.2f}).
   - For example, spending ₹3,000 on ads adds +33.43 books above baseline,
     while spending only ₹1,000 reduces sales prediction by -22.29 books.

4. Over/Under Prediction
   - For some records (e.g., Ads = 1), the model slightly overpredicts sales compared to actual.
   - For others (e.g., Ads = 2 with 140 actual), the model underpredicts.
   - This is expected since a simple linear regression cannot perfectly capture all variations.

Conclusion:
The regression model confirms that higher ad spending increases book sales.
SHAP values clearly attribute sales differences to ad spend, making the model explainable and interpretable.
"""

print(summary)


Summary Analysis

1. Model Accuracy
   - R² Score: 0.953
   - Mean Squared Error (MSE): 21.43
   - Mean Absolute Error (MAE): 4.29

   The model explains about 95.3% of the variance in book sales, which is fairly good for such a small dataset.

2. Trend Analysis
   - As Google Ads spending increases, predicted book sales also increase linearly.
   - This shows a positive correlation between ad spend and weekly sales.

3. SHAP Interpretation
   - SHAP values show how far each prediction moves away from the baseline (mean = 128.00).
   - For example, spending ₹3,000 on ads adds +33.43 books above baseline,
     while spending only ₹1,000 reduces sales prediction by -22.29 books.

4. Over/Under Prediction
   - For some records (e.g., Ads = 1), the model slightly overpredicts sales compared to actual.
   - For others (e.g., Ads = 2 with 140 actual), the model underpredicts.
   - This is expected since a simple linear regression cannot perfectly capture all variations.

Conclusion:  
The r