# Bayesian Variable Selection with Horseshoe Priors

This notebook demonstrates how to use the `BayesianHorseshoe` package to perform variable selection using Horseshoe Priors on synthetic data. 

**Workflow:**
- **Data Simulation:** Generate synthetic regression data with a sparse true coefficient vector.
- **Model Specification:** Build a Bayesian linear regression model with Horseshoe Priors.
- **Inference:** Run MCMC sampling to obtain posterior estimates.
- **Diagnostics & Visualization:** Plot trace plots and compute effective sample sizes.

Let's get started!

In [4]:
# Import necessary modules from our package
from bayesian_horseshoe.data import simulate_data
from bayesian_horseshoe.model import build_horseshoe_model
from bayesian_horseshoe.inference import run_mcmc
from bayesian_horseshoe.diagnostics import plot_trace, effective_sample_size

# Additionally import PyMC3 for summary, if needed
import pymc3 as pm

ModuleNotFoundError: No module named 'pymc3'

In [None]:
# Generate synthetic data for the regression model
n, p, n_relevant = 100, 50, 5  # Adjust as needed
X, y, true_beta = simulate_data(n=n, p=p, n_relevant=n_relevant, noise_std=0.5, seed=42)

print("Shape of X:", X.shape)
print("Shape of y:", y.shape)
print("True coefficients (non-zero values indicate relevant predictors):")
print(true_beta)

In [None]:
# Build the Bayesian linear regression model with Horseshoe Prior
model = build_horseshoe_model(X, y)

# Optionally, display the model structure
print("Horseshoe model created successfully!")

In [None]:
# Run MCMC sampling using the model
print("Starting MCMC sampling...")
trace = run_mcmc(model, draws=2000, tune=1000, target_accept=0.9, cores=1)
print("MCMC sampling completed.")

# Display a summary of the posterior samples for the regression coefficients
summary = pm.summary(trace, var_names=["beta"])
print(summary)

In [None]:
# Plot trace plots for the regression coefficients ("beta")
print("Plotting trace plots for 'beta' parameters...")
plot_trace(trace, var_names=["beta"])

# Compute and display effective sample sizes for "beta"
ess = effective_sample_size(trace, var_names=["beta"])
print("Effective sample sizes for 'beta':")
for param, neff in ess.items():
    print(f"{param}: {neff}")

## Conclusion

In this notebook, we:
- Generated synthetic data for a high-dimensional regression problem.
- Built a Bayesian model with Horseshoe Priors to perform variable selection.
- Ran MCMC sampling to infer the posterior distributions.
- Visualized the results and computed diagnostics to assess model convergence.

This demonstrates the power of Bayesian variable selection using Horseshoe Priors for identifying relevant predictors in complex datasets.

Feel free to experiment further by modifying parameters or extending the diagnostics!