---
title: "If It Bleeds, We Can Kill It"
subtitle: "Copulas in Stan: Episode 1"
description: "This is the first post in what's going to be a series on using Copulas in Stan. Each post is going to be short to keep me from postponing writing them. In this post I lightly introduce the series and give a quick primer on copulas."
date: "2024/09/15"
draft: false
format: 
    html:
        code-fold: show
        toc: true
        toc-location: left
execute: 
  echo: true
  warning: false
editor: source
image: images/predator.jpg
categories:
    - stan
    - copulas
---


# Introduction

Welcome to the first post in our series on **copulas in Stan**. Copulas are a powerful statistical tool for modeling and simulating dependencies between random variables. They allow us to construct complex multivariate distributions by combining marginal distributions with a dependence structure.

If you've ever felt intimidated by copulas, just remember Arnold's famous words from Predator

![](images/predator.jpg)

# What Are Copulas?

At their core, copulas are functions that **link univariate marginal distribution functions to form a multivariate distribution**. According to **Sklar's Theorem**, any multivariate joint distribution can be expressed in terms of its marginals and a copula that captures the dependence structure between variables.

Mathematically, if $\mathbf{X} = (x_1, \dots, x_n)$:

$$
F(\mathbf{X}) = C\left( F_1(x_1), \dots, F_n(x_n) \right)
$$

where:

* $F(\mathbf{X})$ is the joint cumulative distribution function (CDF) of the collection of random variables $\mathbf{X}$.
* $F_i(x_i)$ are the marginal CDFs of each variate.
* $C$ is **the copula function**.

## Copulas as Densities

Copulas are multivariate distribution functions for random variables with uniform marginal distributions, i.e. they are functions that map the unit cube $[0,1]^n$ to $[0,1]$. They can also be described using **copula density functions** when the marginals are continuous.

$$
h(x) = c(F_1(x_1), \dots, F_d(x_d)) \prod_{i=1}^d f_i(x_i),
$$

or even a log-density function

$$
\log h(\mathbf x) = \log c\left(F_1(x_1), \dots, F_d(x_d)\right) + \sum_{i=1}^d \log f_i(x_i)
$$

Notice that $\sum_{i=1}^d \log f_i(x_i)$ is just the usual sum over marginal log-densities. Let's rewrite the other term a little bit and explicitly write the parameters we're conditioning on

$$
\begin{aligned}
\log h(\mathbf x) &= \log c\left(u_1, \dots, u_d\right) + \sum_{i=1}^d \log f_i(x_i \vert \theta_i) \\
u_i &= F_i(x_i \vert \theta_i)
\end{aligned}
$$

The main difference when modeling with a copula is 

1. We need to use the CDFs $F_i(x_i \vert \theta_i)$ as well as the pdfs.
2. We need to code up some function $\log c(u_1, \dots, u_i)$ that takes as input the data $\mathbf X$ after it's been transformed to $[0,1]^n$ by our CDFs and outputs a density.

# The Copula We All Use

The simplest copula is the independence copula

$$
\begin{aligned}
C(\mathbf{u}) &= \prod_{i=1}^n F(x_i\vert \theta_i) =  \prod_{i=1}^n u_i \\
c(\mathbf{u}) &= 1 \\
\log c(\mathbf{u}) &= 0
\end{aligned}
$$

We see that if we use the independence copula, we just end up with the usual likelihood

$$
\begin{aligned}
\log h(\mathbf x) &= 0 + \sum_{i=1}^d \log f_i(x_i \vert \theta_i) \\
&= \log f_i(x_i \vert \theta_i)
\end{aligned}
$$

In this way, we all use copulas whether we want to or not!

# An Imaginary Stan Model



```{stan}
#| eval: false
#| output.var: fake_model
functions {

}

data {

}

parameters {

}

model {
  
}
```