## BIOSTAT 682 HW4 - Problem 1: Bayesian Neural Networks for Crime Data

In this notebook we fit Bayesian neural networks with spike-and-slab priors to the
UScrime dataset. We perform grid search over prior types (current non-centered, current
centered, and hw3 attempt2), draws/tune values, and hidden unit counts to find optimal
hyperparameters. We then use the best configuration to compare models with different
numbers of hidden units using DIC, evaluate test set performance, and compare with
Bayesian linear regression.

## Setup

Import required libraries and set random seed for reproducibility.


## Data loading and preprocessing

Load the UScrime dataset, standardize features and target, and split into train/test sets (50/50).


In [None]:
import time
import datetime
import numpy as np
import pandas as pd
import pymc as pm
import arviz as az
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import matplotlib.pyplot as plt
import warnings
import os
import re
warnings.filterwarnings('ignore')

SEED = 2025
np.random.seed(SEED)
EXEC_START = time.time()
print(f"Started: {datetime.datetime.now().isoformat(timespec='seconds')}")

## Helper functions

Define helper functions for computing DIC, convergence diagnostics, model creation,
and parsing grid search log files.


In [None]:
# Load and prepare data
df = pd.read_csv('../../data/UScrime.csv')
X = df.drop('y', axis=1).values
y = df['y'].values

scaler_X = StandardScaler()
scaler_y = StandardScaler()
X_scaled = scaler_X.fit_transform(X)
y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()

# Train/test split (roughly half)
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y_scaled, test_size=0.5, random_state=SEED
)

print(f"Data: {X_scaled.shape[0]} obs, {X_scaled.shape[1]} features")
print(f"Train: {len(y_train)}, Test: {len(y_test)}")

## Model and prior specification

We define two BNN model types with spike-and-slab priors:

1. **Current priors** (`create_bnn_spike_slab`): Supports both centered and non-centered
   parameterization with Bernoulli selection variables and spike-slab standard deviations.

2. **HW3 attempt2 priors** (`create_bnn_hw3_attempt2`): Precision-based spike-and-slab
   priors using inverse precision parameters.

Both models use one hidden layer with tanh activation. Helper functions compute DIC
and convergence diagnostics (R-hat, ESS, divergences) from inference data.

In [None]:
def create_bnn_spike_slab(X_train, y_train, q, X_test=None, use_noncentered=True):
    """BNN with one hidden layer and spike-and-slab priors (hw4 current)"""
    n, p = X_train.shape
    
    with pm.Model() as model:
        if use_noncentered:
            pi1, pi2 = 0.5, 0.5
            spike_sd, slab_sd = 0.01, 1.0
            
            gamma1 = pm.Bernoulli("gamma1", p=pi1, shape=(p, q))
            W1_raw = pm.Normal("W1_raw", mu=0, sigma=1, shape=(p, q))
            sd1 = spike_sd + gamma1 * (slab_sd - spike_sd)
            W1 = pm.Deterministic("W1", W1_raw * sd1)
            b1 = pm.Normal("b1", mu=0, sigma=1, shape=q)
            
            hidden = pm.math.tanh(pm.math.dot(X_train, W1) + b1)
            
            gamma2 = pm.Bernoulli("gamma2", p=pi2, shape=q)
            W2_raw = pm.Normal("W2_raw", mu=0, sigma=1, shape=q)
            sd2 = spike_sd + gamma2 * (slab_sd - spike_sd)
            W2 = pm.Deterministic("W2", W2_raw * sd2)
            b2 = pm.Normal("b2", mu=0, sigma=1)
        else:
            pi1, pi2 = 0.5, 0.5
            spike_sd, slab_sd = 0.01, 1.0
            
            gamma1 = pm.Bernoulli("gamma1", p=pi1, shape=(p, q))
            sd1 = spike_sd + gamma1 * (slab_sd - spike_sd)
            W1 = pm.Normal("W1", mu=0, sigma=sd1, shape=(p, q))
            b1 = pm.Normal("b1", mu=0, sigma=1, shape=q)
            
            hidden = pm.math.tanh(pm.math.dot(X_train, W1) + b1)
            
            gamma2 = pm.Bernoulli("gamma2", p=pi2, shape=q)
            sd2 = spike_sd + gamma2 * (slab_sd - spike_sd)
            W2 = pm.Normal("W2", mu=0, sigma=sd2, shape=q)
            b2 = pm.Normal("b2", mu=0, sigma=1)
        
        mu = pm.math.dot(hidden, W2) + b2
        sigma = pm.HalfNormal("sigma", sigma=1)
        y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y_train)
        
        if X_test is not None:
            hidden_test = pm.math.tanh(pm.math.dot(X_test, W1) + b1)
            mu_test = pm.math.dot(hidden_test, W2) + b2
            y_pred = pm.Normal("y_pred", mu=mu_test, sigma=sigma, shape=X_test.shape[0])
    
    return model


def create_bnn_hw3_attempt2(X_train, y_train, q, X_test=None):
    """BNN with hw3/v2 attempt2 precision-based spike-and-slab priors"""
    n, p = X_train.shape
    
    with pm.Model() as model:
        alpha = pm.Normal("alpha", mu=0.0, sigma=100.0)
        
        # Layer 1: Input -> Hidden (precision-based spike-and-slab)
        gamma1 = pm.Bernoulli("gamma1", p=0.5, shape=(p, q))
        inv_tau2_spike = 1000.0
        inv_tau2_slab = 0.01
        inv_tau2_1 = (1 - gamma1) * inv_tau2_spike + gamma1 * inv_tau2_slab
        tau2_1 = 1.0 / inv_tau2_1
        W1 = pm.Normal("W1", mu=0.0, sigma=pm.math.sqrt(tau2_1), shape=(p, q))
        b1 = pm.Normal("b1", mu=0, sigma=1, shape=q)
        
        hidden = pm.math.tanh(pm.math.dot(X_train, W1) + b1)
        
        # Layer 2: Hidden -> Output (precision-based spike-and-slab)
        gamma2 = pm.Bernoulli("gamma2", p=0.5, shape=q)
        inv_tau2_2 = (1 - gamma2) * inv_tau2_spike + gamma2 * inv_tau2_slab
        tau2_2 = 1.0 / inv_tau2_2
        W2 = pm.Normal("W2", mu=0.0, sigma=pm.math.sqrt(tau2_2), shape=q)
        b2 = pm.Normal("b2", mu=0, sigma=1)
        
        mu = pm.math.dot(hidden, W2) + b2
        
        inv_sigma2 = pm.Gamma("inv_sigma2", alpha=0.0001, beta=0.0001)
        sigma2 = 1.0 / inv_sigma2
        sigma = pm.math.sqrt(sigma2)
        
        y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y_train)
        
        if X_test is not None:
            hidden_test = pm.math.tanh(pm.math.dot(X_test, W1) + b1)
            mu_test = pm.math.dot(hidden_test, W2) + b2
            y_pred = pm.Normal("y_pred", mu=mu_test, sigma=sigma, shape=X_test.shape[0])
    
    return model

In [None]:
def _compute_dic(idata):
    """Compute DIC from inference data"""
    log_lik = idata.log_likelihood["y_obs"].values
    log_lik_flat = log_lik.reshape(-1, log_lik.shape[-1])
    D_bar = -2 * np.mean(log_lik_flat)
    D_theta_bar = -2 * np.sum(np.mean(log_lik_flat, axis=0))
    p_D = D_bar - D_theta_bar
    return D_bar + p_D


def _compute_diagnostics(idata):
    """Compute R-hat, ESS, and divergence count"""
    rhat = az.rhat(idata)
    vars_to_check = [v for v in rhat.data_vars 
                    if v not in ['y_pred', 'gamma1', 'gamma2']]
    
    if len(vars_to_check) > 0:
        max_rhat = max([float(rhat[var].max()) for var in vars_to_check])
        ess_bulk = az.ess(idata, method="bulk")
        min_ess = min([float(ess_bulk[var].min()) for var in vars_to_check])
    else:
        max_rhat = np.nan
        min_ess = np.nan
    
    n_divergences = int(idata.sample_stats.diverging.values.sum()) if 'diverging' in idata.sample_stats else 0
    
    return max_rhat, min_ess, n_divergences


def _get_model_creator(prior_type):
    """Get model creation function based on prior type"""
    if prior_type == "current":
        return create_bnn_spike_slab
    else:
        return create_bnn_hw3_attempt2


def _parse_gridsearch_log(log_file):
    """Parse grid search log file to extract results"""
    results = []
    
    if not os.path.exists(log_file):
        return pd.DataFrame()
    
    with open(log_file, 'r') as f:
        for line in f:
            if '[COMBO_END]' in line and 'ERROR' not in line:
                try:
                    match = re.search(r'prior=([^,]+), use_noncentered=([^,]+), draws=([^,]+), tune=([^,]+), q=([^,]+), DIC=([^,]+), Rhat=([^,]+), minESS=([^,]+), Divergences=([^\s]+)', line)
                    if match:
                        prior_type = match.group(1)
                        use_noncentered = match.group(2)
                        draws = int(match.group(3))
                        tune = int(match.group(4))
                        q = int(match.group(5))
                        dic = float(match.group(6))
                        rhat = float(match.group(7))
                        miness = float(match.group(8))
                        divs = int(match.group(9))
                        
                        use_nc = True if use_noncentered == 'True' else False if use_noncentered == 'False' else None
                        converged = (rhat < 1.01) and (divs == 0)
                        
                        results.append({
                            'prior_type': prior_type,
                            'use_noncentered': use_nc,
                            'draws': draws,
                            'tune': tune,
                            'q': q,
                            'DIC': dic,
                            'Rhat': rhat,
                            'minESS': miness,
                            'Divergences': divs,
                            'Converged': converged
                        })
                except Exception:
                    continue
    
    return pd.DataFrame(results)


def grid_search_bnn(X_train, y_train, log_file="bnn_gridsearch.log"):
    """Grid search over hyperparameters"""
    import datetime
    
    draws_tune_values = [1000, 2000, 5000, 10000, 20000]
    q_values = [2, 3, 4, 5, 6]
    
    prior_configs = [
        ("current", True),   # current non-centered
        ("current", False),  # current centered
        ("hw3", None),       # hw3 attempt2
    ]
    
    results = []
    
    with open(log_file, "w") as f:
        start_time = datetime.datetime.now()
        f.write(f"[GRIDSEARCH_START] {start_time.isoformat()}\n")
    
    print("="*80)
    print("BNN GRID SEARCH")
    print("="*80)
    total_combos = len(prior_configs) * len(draws_tune_values) * len(q_values)
    print(f"Testing {len(draws_tune_values)} x {len(q_values)} x {len(prior_configs)} = "
          f"{total_combos} combinations")
    print(f"  Prior configs: current (non-centered), current (centered), hw3 (attempt2)")
    print(f"Logging to: {log_file}\n")
    
    combo_num = 0
    
    for prior_type, use_noncentered in prior_configs:
        model_creator = _get_model_creator(prior_type)
        
        for draws_tune in draws_tune_values:
            for q in q_values:
                combo_num += 1
                
                if prior_type == "current":
                    param_name = "noncentered" if use_noncentered else "centered"
                    prior_label = f"current-{param_name}"
                else:
                    prior_label = "hw3-attempt2"
                
                print(f"[{combo_num}/{total_combos}] Testing: prior={prior_label}, "
                      f"draws={draws_tune}, tune={draws_tune}, q={q}")
                
                combo_start = datetime.datetime.now()
                with open(log_file, "a") as f:
                    f.write(f"[COMBO_START] {combo_start.isoformat()} - "
                           f"prior={prior_type}, use_noncentered={use_noncentered}, "
                           f"draws={draws_tune}, tune={draws_tune}, q={q}\n")
            
                try:
                    # Create model once
                    if prior_type == "current":
                        model = model_creator(X_train, y_train, q, use_noncentered=use_noncentered)
                    else:
                        model = model_creator(X_train, y_train, q)
                    
                    with model:
                        idata = pm.sample(
                            draws=draws_tune,
                            tune=draws_tune,
                            target_accept=0.90,
                            random_seed=SEED,
                            return_inferencedata=True,
                            init="adapt_diag",
                            chains=4,
                            cores=1,
                            progressbar=False
                        )
                        
                        # Compute log likelihood using same model
                        pm.compute_log_likelihood(idata)
                    
                    dic = _compute_dic(idata)
                    max_rhat, min_ess, n_divergences = _compute_diagnostics(idata)
                    
                    combo_end = datetime.datetime.now()
                    converged = (max_rhat < 1.01) and (n_divergences == 0)
                    status = "" if converged else " [DIVERGENCE]"
                    
                    with open(log_file, "a") as f:
                        f.write(f"[COMBO_END]   {combo_end.isoformat()} - "
                               f"prior={prior_type}, use_noncentered={use_noncentered}, "
                               f"draws={draws_tune}, tune={draws_tune}, q={q}, "
                               f"DIC={dic:.2f}, Rhat={max_rhat:.4f}, "
                               f"minESS={min_ess:.0f}, Divergences={n_divergences}"
                               f"{status}\n")
                    
                    results.append({
                        'prior_type': prior_type,
                        'use_noncentered': use_noncentered,
                        'draws': draws_tune,
                        'tune': draws_tune,
                        'q': q,
                        'DIC': dic,
                        'Rhat': max_rhat,
                        'minESS': min_ess,
                        'Divergences': n_divergences,
                        'Converged': converged,
                        'Duration': (combo_end - combo_start).total_seconds()
                    })
                    
                    print(f"  ✓ DIC={dic:.2f}, R-hat={max_rhat:.4f}, "
                          f"minESS={min_ess:.0f}, Divergences={n_divergences}")
                    
                except Exception as e:
                    combo_end = datetime.datetime.now()
                    error_msg = str(e)[:100]
                    
                    with open(log_file, "a") as f:
                        f.write(f"[COMBO_END]   {combo_end.isoformat()} - "
                               f"prior={prior_type}, use_noncentered={use_noncentered}, "
                               f"draws={draws_tune}, tune={draws_tune}, q={q}, "
                               f"ERROR: {error_msg}\n")
                    
                    print(f"  ✗ ERROR: {error_msg}")
                    
                    results.append({
                        'prior_type': prior_type,
                        'use_noncentered': use_noncentered,
                        'draws': draws_tune,
                        'tune': draws_tune,
                        'q': q,
                        'DIC': np.nan,
                        'Rhat': np.nan,
                        'minESS': np.nan,
                        'Divergences': np.nan,
                        'Converged': False,
                        'Duration': (combo_end - combo_start).total_seconds(),
                        'Error': error_msg
                    })
    
    print("\n" + "="*80)
    print("GRID SEARCH COMPLETE")
    print("="*80)
    
    results_df = pd.DataFrame(results)
    converged_results = results_df[results_df['Converged'] == True]
    
    if len(converged_results) > 0:
        best_idx = converged_results['DIC'].idxmin()
        best = converged_results.loc[best_idx]
        if best['prior_type'] == "current":
            param_name = "noncentered" if best['use_noncentered'] else "centered"
            prior_label = f"current-{param_name}"
        else:
            prior_label = "hw3-attempt2"
        print(f"\nBest converged model:")
        print(f"  Prior: {prior_label}")
        print(f"  q={int(best['q'])}, draws={int(best['draws'])}, "
              f"tune={int(best['tune'])}")
        print(f"  DIC={best['DIC']:.2f}, R-hat={best['Rhat']:.4f}, "
              f"minESS={best['minESS']:.0f}")
    else:
        print("\n⚠ No fully converged models found")
        best_idx = results_df['Rhat'].idxmin()
        best = results_df.loc[best_idx]
        if best['prior_type'] == "current":
            param_name = "noncentered" if best['use_noncentered'] else "centered"
            prior_label = f"current-{param_name}"
        else:
            prior_label = "hw3-attempt2"
        print(f"\nBest model (lowest R-hat):")
        print(f"  Prior: {prior_label}")
        print(f"  q={int(best['q'])}, draws={int(best['draws'])}, "
              f"tune={int(best['tune'])}")
        print(f"  DIC={best['DIC']:.2f}, R-hat={best['Rhat']:.4f}, "
              f"minESS={best['minESS']:.0f}, Divergences={best['Divergences']:.0f}")
    
    return results_df


## Grid search

We first check if existing grid search results are available in `bnn_gridsearch.log`.
If at least 10 results are found, we use them. Otherwise, we run a comprehensive grid
search over:
- Prior types: current (non-centered), current (centered), hw3 (attempt2)
- Draws/tune values: [1000, 2000, 5000, 10000, 20000]
- Hidden units q: [2, 3, 4, 5, 6]

Total: 75 combinations. Results are logged to `bnn_gridsearch.log`. The best
configuration (lowest DIC among converged models, or lowest R-hat if none converged)
is extracted and used for all subsequent analyses.

In [None]:
log_file = "bnn_gridsearch.log"
parsed_results = _parse_gridsearch_log(log_file)

if len(parsed_results) > 0 and len(parsed_results) >= 10:
    print(f"Found {len(parsed_results)} existing grid search results in {log_file}")
    grid_results_df = parsed_results
else:
    print(f"Running grid search (results will be saved to {log_file})...")
    grid_results_df = grid_search_bnn(X_train, y_train, log_file=log_file)

converged = grid_results_df[grid_results_df['Converged'] == True]
if len(converged) > 0:
    best_config = converged.loc[converged['DIC'].idxmin()]
else:
    best_config = grid_results_df.loc[grid_results_df['Rhat'].idxmin()]

BEST_PRIOR_TYPE = best_config['prior_type']
BEST_USE_NONCENTERED = best_config['use_noncentered'] if pd.notna(best_config['use_noncentered']) else True
BEST_DRAWS = int(best_config['draws'])
BEST_TUNE = int(best_config['tune'])

print(f"\n✓ Best configuration:")
print(f"  Prior: {BEST_PRIOR_TYPE}, Non-centered: {BEST_USE_NONCENTERED}")
print(f"  Draws: {BEST_DRAWS}, Tune: {BEST_TUNE}")
print(f"  DIC: {best_config['DIC']:.2f}, R-hat: {best_config['Rhat']:.4f}")

In [None]:
## Problem 1(a): Compare models with different q

Using the best prior type and hyperparameters from grid search, we fit BNN models
on the full dataset (X_scaled, y_scaled) for each $q \in \{2,3,4,5,6\}$. We compute
DIC, R-hat, ESS, and divergence counts for each model and identify the best model
based on DIC.

In [None]:
CHAINS = 6
TARGET_ACCEPT = 0.98
q_values = [2, 3, 4, 5, 6]

full_results = {}
dic_scores = []
model_creator = _get_model_creator(BEST_PRIOR_TYPE)

for q in q_values:
    print(f"Fitting q={q}...")
    
    if BEST_PRIOR_TYPE == "current":
        model = model_creator(X_scaled, y_scaled, q, use_noncentered=BEST_USE_NONCENTERED)
    else:
        model = model_creator(X_scaled, y_scaled, q)
    
    with model:
        idata = pm.sample(
            draws=BEST_DRAWS, tune=BEST_TUNE, chains=CHAINS,
            target_accept=TARGET_ACCEPT, random_seed=SEED,
            init="adapt_diag", return_inferencedata=True,
            cores=1, progressbar=True
        )
        pm.compute_log_likelihood(idata)
    
    dic = _compute_dic(idata)
    max_rhat, min_ess, n_div = _compute_diagnostics(idata)
    
    full_results[q] = idata
    dic_scores.append({
        'q': q,
        'DIC': dic,
        'R-hat': max_rhat,
        'min_ESS': min_ess,
        'Divergences': n_div
    })
    print(f"  DIC={dic:.2f}, R-hat={max_rhat:.4f}, ESS={min_ess:.0f}, Div={n_div}\n")

dic_df = pd.DataFrame(dic_scores).sort_values('DIC')
print(dic_df.to_string(index=False))
best_q = int(dic_df.iloc[0]['q'])
print(f"\nBest model: q={best_q} (DIC={dic_df.iloc[0]['DIC']:.2f})")

## Problem 1(b): Test set prediction

Using the best prior type and hyperparameters from grid search, we fit BNN models
on the training set for each $q \\in \\{2,3,4,5,6\\}$ and generate predictions on the
test set. We compute RMSE, MAE, R², and correlation for each model and identify the
best performing model based on test RMSE.

In [None]:
test_results = []
model_creator = _get_model_creator(BEST_PRIOR_TYPE)

for q in q_values:
    print(f"Evaluating q={q} on test set...")
    
    if BEST_PRIOR_TYPE == "current":
        model = model_creator(X_train, y_train, q, X_test, use_noncentered=BEST_USE_NONCENTERED)
    else:
        model = model_creator(X_train, y_train, q, X_test)
    
    with model:
        idata = pm.sample(
            draws=BEST_DRAWS, tune=BEST_TUNE, chains=CHAINS,
            target_accept=TARGET_ACCEPT, random_seed=SEED,
            init="adapt_diag", return_inferencedata=True,
            cores=1, progressbar=True
        )
    
    y_pred_samples = idata.posterior["y_pred"].values
    y_pred_median = np.median(y_pred_samples.reshape(-1, len(y_test)), axis=0)
    
    rmse = np.sqrt(mean_squared_error(y_test, y_pred_median))
    mae = mean_absolute_error(y_test, y_pred_median)
    r2 = r2_score(y_test, y_pred_median)
    corr = np.corrcoef(y_test, y_pred_median)[0, 1]
    
    max_rhat, _, _ = _compute_diagnostics(idata)
    
    test_results.append({
        'q': q,
        'RMSE': rmse,
        'MAE': mae,
        'R2': r2,
        'Correlation': corr,
        'R-hat': max_rhat,
        'y_pred_samples': y_pred_samples,
        'y_pred_median': y_pred_median
    })
    print(f"  RMSE={rmse:.4f}, MAE={mae:.4f}, R²={r2:.4f}, Corr={corr:.4f}\n")

test_df = pd.DataFrame(test_results)
print(test_df.to_string(index=False))
best_test_idx = test_df['RMSE'].idxmin()
print(f"\nBest test performance: q={int(test_df.iloc[best_test_idx]['q'])} (RMSE={test_df.iloc[best_test_idx]['RMSE']:.4f})")

## Problem 1(c): Compare with Bayesian linear regression

We fit a Bayesian linear regression model with spike-and-slab priors (non-centered
parameterization) using the same hyperparameters (draws, tune, chains, target_accept)
as the BNN models. We compute test set predictions and compare performance metrics
(RMSE, MAE, R², correlation) with the BNN models.

In [None]:
print("Fitting Bayesian linear regression with spike-and-slab priors...")

with pm.Model() as linear_model:
    pi = 0.5
    gamma_lin = pm.Bernoulli("gamma_lin", p=pi, shape=X_train.shape[1])
    spike_sd, slab_sd = 0.01, 1.0
    
    beta_raw = pm.Normal("beta_raw", mu=0, sigma=1, shape=X_train.shape[1])
    sd_beta = spike_sd + gamma_lin * (slab_sd - spike_sd)
    beta = pm.Deterministic("beta", beta_raw * sd_beta)
    
    alpha = pm.Normal("alpha", mu=0, sigma=1)
    mu = alpha + pm.math.dot(X_train, beta)
    sigma = pm.HalfNormal("sigma", sigma=1)
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y_train)
    
    mu_test = alpha + pm.math.dot(X_test, beta)
    y_pred = pm.Normal("y_pred", mu=mu_test, sigma=sigma, shape=len(y_test))

with linear_model:
    trace_linear = pm.sample(
        BEST_DRAWS, tune=BEST_TUNE, target_accept=TARGET_ACCEPT,
        random_seed=SEED, return_inferencedata=True,
        chains=CHAINS, cores=1, progressbar=True
    )

max_rhat_lin, min_ess_lin, n_div_lin = _compute_diagnostics(trace_linear)

print(f"R-hat: {max_rhat_lin:.4f}, ESS: {min_ess_lin:.0f}, Divergences: {n_div_lin}")

y_pred_linear = trace_linear.posterior["y_pred"].values
y_pred_linear_median = np.median(y_pred_linear.reshape(-1, len(y_test)), axis=0)

rmse_linear = np.sqrt(mean_squared_error(y_test, y_pred_linear_median))
mae_linear = mean_absolute_error(y_test, y_pred_linear_median)
r2_linear = r2_score(y_test, y_pred_linear_median)
corr_linear = np.corrcoef(y_test, y_pred_linear_median)[0, 1]

print(f"\nLinear regression: RMSE={rmse_linear:.4f}, MAE={mae_linear:.4f}, R²={r2_linear:.4f}, Corr={corr_linear:.4f}")

In [None]:
## Final comparison

We compare the test set performance of Bayesian linear regression with all BNN models
and identify which model performs best based on RMSE.

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

ax = axes[0, 0]
ax.bar(dic_df['q'], dic_df['DIC'], alpha=0.7, edgecolor='black', color='steelblue')
ax.set_xlabel('Number of hidden units (q)', fontsize=11)
ax.set_ylabel('DIC', fontsize=11)
ax.set_title('DIC by Model Size', fontsize=12, fontweight='bold')
ax.set_xticks(q_values)
ax.grid(True, alpha=0.3, axis='y')

ax = axes[0, 1]
ax.plot(test_df['q'], test_df['RMSE'], 'o-', linewidth=2, markersize=8, label='BNN')
ax.axhline(rmse_linear, color='red', linestyle='--', linewidth=2, label='Linear')
ax.set_xlabel('Number of hidden units (q)', fontsize=11)
ax.set_ylabel('RMSE', fontsize=11)
ax.set_title('Test RMSE Comparison', fontsize=12, fontweight='bold')
ax.set_xticks(q_values)
ax.legend(fontsize=9)
ax.grid(True, alpha=0.3)

ax = axes[0, 2]
ax.plot(test_df['q'], test_df['R2'], 'o-', linewidth=2, markersize=8, label='BNN', color='green')
ax.axhline(r2_linear, color='red', linestyle='--', linewidth=2, label='Linear')
ax.set_xlabel('Number of hidden units (q)', fontsize=11)
ax.set_ylabel('R²', fontsize=11)
ax.set_title('Test R² Comparison', fontsize=12, fontweight='bold')
ax.set_xticks(q_values)
ax.legend(fontsize=9)
ax.grid(True, alpha=0.3)

ax = axes[1, 0]
ax.plot(dic_df['q'], dic_df['R-hat'], 'o-', linewidth=2, markersize=8, color='purple')
ax.axhline(1.01, color='red', linestyle='--', linewidth=1, label='Target (1.01)')
ax.set_xlabel('Number of hidden units (q)', fontsize=11)
ax.set_ylabel('Max R-hat', fontsize=11)
ax.set_title('Convergence: R-hat', fontsize=12, fontweight='bold')
ax.set_xticks(q_values)
ax.legend(fontsize=9)
ax.grid(True, alpha=0.3)

ax = axes[1, 1]
ax.plot(dic_df['q'], dic_df['min_ESS'], 'o-', linewidth=2, markersize=8, color='orange')
ax.axhline(400, color='red', linestyle='--', linewidth=1, label='Target (400)')
ax.set_xlabel('Number of hidden units (q)', fontsize=11)
ax.set_ylabel('Min ESS', fontsize=11)
ax.set_title('Convergence: ESS', fontsize=12, fontweight='bold')
ax.set_xticks(q_values)
ax.legend(fontsize=9)
ax.grid(True, alpha=0.3)

ax = axes[1, 2]
best_q_plot = int(test_df.iloc[test_df['RMSE'].idxmin()]['q'])

# Get predictions from test_results (already computed in Problem 1(b))
best_result = next(r for r in test_results if r['q'] == best_q_plot)
y_pred_best = best_result['y_pred_median']

ax.scatter(y_test, y_pred_best, alpha=0.6, edgecolors='black')
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', linewidth=2)
ax.set_xlabel('Actual Crime Rate', fontsize=11)
ax.set_ylabel('Predicted Crime Rate', fontsize=11)
ax.set_title(f'Best BNN (q={best_q_plot}): Predicted vs Actual', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('crime_bnn_results.png', dpi=150, bbox_inches='tight')
plt.show()

In [None]:
## Execution summary

Display total runtime for the entire notebook execution.

In [None]:
print("\nLinear regression: RMSE={rmse_linear:.4f}, R²={r2_linear:.4f}")
print("\nBayesian neural networks:")
for _, row in test_df.iterrows():
    print(f"  q={int(row['q'])}: RMSE={row['RMSE']:.4f}, R²={row['R2']:.4f}")

best_bnn_rmse = test_df['RMSE'].min()
best_bnn_q = int(test_df.iloc[test_df['RMSE'].idxmin()]['q'])

if rmse_linear < best_bnn_rmse:
    print(f"\nLinear regression performs best!")
else:
    print(f"\nBNN (q={best_bnn_q}) performs best!")


## Visualization

We create six plots: (1) DIC by model size, (2) Test RMSE comparison (BNN vs Linear),
(3) Test R² comparison, (4) Convergence diagnostics (R-hat), (5) Convergence diagnostics
(ESS), and (6) Predicted vs Actual scatter plot for the best BNN model. The figure is
saved as `crime_bnn_results.png`.


In [None]:
EXEC_END = time.time()
elapsed = (EXEC_END - EXEC_START) / 60.0
print(f"\nTotal runtime: {elapsed:.1f} minutes")
