# SmartExplain AI ‚Äì Interpretable & Adaptive House Price Prediction Engine

## 1Ô∏è‚É£ Title & Abstract

**SmartExplain AI** is a production-level ML system combining gradient-descent linear regression with explainability and what-if simulation for house price prediction.

**Abstract:** We implement linear regression from scratch using batch, mini-batch, and SGD with L2 regularization. The system provides transparent predictions via per-feature contributions and supports what-if analysis.

## 2Ô∏è‚É£ Problem Statement

Predict median house values in California using census data. We require:
- Interpretable predictions (feature contributions)
- Flexible optimization (batch/minibatch/SGD, momentum, LR decay)
- Reproducible pipeline

## 3Ô∏è‚É£ Mathematical Formulation

**Linear model:**
$$y = Xw + b$$

**Cost Function (L2 regularized):**
$$J(w,b) = \frac{1}{2m} \sum_{i=1}^{m} (y_{pred}^{(i)} - y^{(i)})^2 + \lambda \sum_{j} w_j^2$$

**Gradient Updates:**
$$\frac{\partial J}{\partial w} = \frac{1}{m} X^T (y_{pred} - y) + 2\lambda w$$
$$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i} (y_{pred}^{(i)} - y^{(i)})$$

In [None]:
import sys
sys.path.insert(0, '..')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
RANDOM_STATE = 42

## 4Ô∏è‚É£ Data Loading & EDA

In [None]:
df = pd.read_csv('../data/housing.csv')
print(df.shape)
df.head()

In [None]:
df.describe()

In [None]:
df['ocean_proximity'].value_counts()

## 5Ô∏è‚É£ Feature Engineering

In [None]:
from core.feature_engineering import FeatureEngineer

fe = FeatureEngineer(random_state=RANDOM_STATE, use_log_transform=True, cap_outliers=True)
X, y = fe.fit_transform(df, target_col='median_house_value')
feature_names = fe.get_feature_names()
print('Features:', feature_names)
print('X shape:', X.shape)

## 6Ô∏è‚É£ Model Implementation

In [None]:
from core.model import LinearRegressionGD

model = LinearRegressionGD(
    learning_rate=0.05,
    n_iterations=3000,
    regularization=0.008,
    mode='batch',
    use_momentum=True,
    momentum=0.9,
    use_lr_decay=True,
    decay_type='time',
    early_stopping=True,
    patience=150,
    random_state=RANDOM_STATE
)
model.fit(X, y, feature_names=feature_names)
y_pred = model.predict(X)
print('Weights shape:', model.weights.shape)
print('Bias:', model.bias)

from visualization.plots import plot_cost_vs_iterations, plot_actual_vs_predicted
plot_cost_vs_iterations(model.cost_history)
plt.show()
plot_actual_vs_predicted(y[:2000], y_pred[:2000])
plt.show()

## 7Ô∏è‚É£ Optimization Variants

In [None]:
from visualization.plots import plot_optimizer_comparison

histories = {}
for mode, name in [('batch','Batch GD'), ('minibatch','Mini-batch GD'), ('sgd','SGD')]:
    m = LinearRegressionGD(learning_rate=0.01, n_iterations=200, batch_size=64 if mode=='minibatch' else None,
                          mode=mode, random_state=RANDOM_STATE)
    m.fit(X[:5000], y[:5000], feature_names=feature_names)
    histories[name] = m.cost_history

plot_optimizer_comparison(histories)
plt.tight_layout()
plt.show()

In [None]:
lr_histories = {}
for lr, name in [(0.01,'lr=0.01'), (0.1,'lr=0.1'), (0.5,'lr=0.5')]:
    m = LinearRegressionGD(learning_rate=lr, n_iterations=200, random_state=RANDOM_STATE)
    m.fit(X[:3000], y[:3000], feature_names=feature_names)
    lr_histories[name] = m.cost_history

from visualization.plots import plot_learning_rate_comparison
plot_learning_rate_comparison(lr_histories)
plt.show()

## 8Ô∏è‚É£ Explainability

In [None]:
from core.explainability import FeatureExplainer

explainer = FeatureExplainer(model.weights, model.bias, feature_names)
result = explainer.explain(X[:1], index=0)
print('Total predicted price:', result['total_prediction'][0])
print('Contributions:', list(zip(result['feature_names'], result['contributions'][0].round(2))))

## 9Ô∏è‚É£ 3D Cost Surface Visualization

In [None]:
from visualization.cost_surface import plot_cost_surface_3d, get_gradient_descent_path

Xs, ys = X[:1000], y[:1000]
path = get_gradient_descent_path(Xs, ys, 0, 1, n_steps=15, lr=0.5)
fig = plot_cost_surface_3d(Xs, ys, weight_idx1=0, weight_idx2=1, path=path)
plt.tight_layout()
plt.show()

## üîü Evaluation Metrics

In [None]:
from core.metrics import mae, mse, rmse, r2_score

print('MAE:', mae(y, y_pred))
print('MSE:', mse(y, y_pred))
print('RMSE:', rmse(y, y_pred))
print('R2:', r2_score(y, y_pred))

## 1Ô∏è‚É£1Ô∏è‚É£ Comparison With Sklearn

In [None]:
from sklearn.linear_model import LinearRegression as SklearnLR

sk_model = SklearnLR()
sk_model.fit(X, y)
sk_pred = sk_model.predict(X)

print('Sklearn R2:', r2_score(y, sk_pred))
print('Our GD R2:', r2_score(y, y_pred))

## 1Ô∏è‚É£2Ô∏è‚É£ Conclusion & Future Work

We built SmartExplain AI with gradient-descent linear regression, feature engineering, explainability, and optimization variants. Future work: neural networks, tree-based models, uncertainty quantification.