Robust Regression: Bayesian Approach

This repository contains the project work for the Bayesian Data Analysis course, exploring Robust Regression using a Bayesian framework. The goal is to compare traditional and robust regression models when dealing with datasets containing outliers or extreme values.

📌 Summary

This repo includes:

normal_model.stan – Traditional regression model assuming normality of residuals.
normal_model_QR_reparametrization.stan – Normal regression model with QR reparametrization to improve convergence.
robust_model.stan – Regression model assuming a t-distribution for residuals, improving robustness to outliers.
demo.R – A working file to fit models and reproduce results.

🔍 Problem Definition & Methodology

This project builds a Bayesian Robust Regression model using Stan. The demo.R script demonstrates how to fit models and analyze results. Data details can be found at Springer.

Why Robust Regression?

Standard regression assumes normally distributed residuals. However, in real-world datasets, outliers can skew results. A robust approach models errors with a t-distribution, which has fatter tails, reducing outlier influence.

📊 Dataset: The training set consists of 1000 observations (orange-highlighted), containing unusual trends from a debutanizer column dataset.

Model Comparison

Normal Regression:
- Assumes residuals follow $\epsilon \sim N(0, \sigma)$
- Model: $y \sim N(\alpha + \beta x, \sigma)$
Robust Regression (t-distribution):
- Allows heavy-tailed errors, reducing outlier impact.
- Model: $y \sim t(\nu, \alpha + \beta x, \sigma)$
- Parameter ν controls tail fatness.

Bayesian Priors

$\alpha \sim N(0, 10)$ (Intercept)
$\beta \sim N(0, 10)$ (Coefficients)
$\sigma \sim \text{Inv-}\chi^2(10)$ (Error scale)
$\nu \sim \chi^2(5)$ (Degrees of freedom for t-distribution)

✅ Convergence Diagnostics

Below are the Markov Chains for normal (left) and robust (right) regression parameters (α and β). All parameters show satisfactory R-hat values ≈1, indicating good convergence.

📊 Model Validation

Internal Validation

We use LOO scores and PSIS diagnostic plots. A well-specified model should have k ≤ 0.7.

Additionally, we compare the two models using y_rep values, where posterior predictive checks are performed by simulating new data points and comparing them with real values.

Left: Normal Regression | Right: Robust Regression

External Validation

Predicting new observations using Root Mean Squared Error (RMSE):

Normal Regression: RMSE = 0.210
Robust Regression: RMSE = 0.144 (Lower is better!)

The robust regression model outperforms the normal model, handling outliers more effectively.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
LICENSE		LICENSE
README.md		README.md
demo.R		demo.R
normal_model.stan		normal_model.stan
normal_model_QR_reparametrization.stan		normal_model_QR_reparametrization.stan
robust_model.stan		robust_model.stan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robust Regression: Bayesian Approach

📌 Summary

🔍 Problem Definition & Methodology

Why Robust Regression?

Model Comparison

Bayesian Priors

✅ Convergence Diagnostics

📊 Model Validation

Internal Validation

External Validation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dcacciarelli/robust-regression

Folders and files

Latest commit

History

Repository files navigation

Robust Regression: Bayesian Approach

📌 Summary

🔍 Problem Definition & Methodology

Why Robust Regression?

Model Comparison

Bayesian Priors

✅ Convergence Diagnostics

📊 Model Validation

Internal Validation

External Validation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages