# 13. Varying slopes

## General Principles

To model the relationship between predictor variables and an independent variable while allowing for varying effects across groups or clusters, we use a *Varying slopes* model.

This approach is useful when we expect the relationship between predictors and the independent variable to differ across groups (e.g., different slopes for different subjects, locations, or time periods).This allow every unit in the data to have its own unique response to any treatment or exposure or event, while also improving estimates via pooling.

## Considerations

-   We have the same considerations as for [12. Varying interceps ðŸ›ˆ](12.%20Varying%20intercepts.qmd).

-   The idea is pretty similar to categorical models, where a slope is specified for each category. However, here, we also estimate relationships between different groups. This leads to a different mathematical approach, as to model these relationships between groups, we model a [matrix of covariance i](13.%20Varying%20slopes.qmd "A covariance matrix is a square matrix that contains the covariances between pairs of elements in a random vector. Each element in the matrix represents the covariance between two variables. The diagonal elements represent the variances of each variable, and the off-diagonal elements represent the covariances between different variables").

-   The covariance matrix requiere a correlation matris distribution which is modeleld using a $LKJcorr$ distribution that hold a parameter $Î·$. $Î·$ is ussually set to 2 to define a weakly informative prior that is skeptical of extreme correlations near âˆ’1 or 1. When we use LKJ- corr(1), the prior is flat over all valid correlation matrices. When the value is greater than 1, then extreme correlations are less likely.

-   The Half-Cauchy distribution is used when modeling the covariance matrix to specify strictly positive values for the diagonal of the covariance matrix, ensuring positive variances.

## Example

Below is an example code snippet demonstrating Bayesian regression with varying effects:

### Simulated data

``` python
from main import*
# Setup device------------------------------------------------
m = bi(platform='cpu')

a = 3.5  # average morning wait time
b = -1  # average difference afternoon wait time
sigma_a = 1  # std dev in intercepts
sigma_b = 0.5  # std dev in slopes
rho = -0.7  # correlation between intercepts and slopes
Mu = jnp.array([a, b])
cov_ab = sigma_a * sigma_b * rho
Sigma = jnp.array([[sigma_a**2, cov_ab], [cov_ab, sigma_b**2]])
jnp.array([1, 2, 3, 4]).reshape(2, 2).T
sigmas = jnp.array([sigma_a, sigma_b])  # standard deviations
Rho = jnp.array([[1, rho], [rho, 1]])  # correlation matrix

# now matrix multiply to get covariance matrix
Sigma = jnp.diag(sigmas) @ Rho @ jnp.diag(sigmas)

N_cafes = 20
seed = random.PRNGKey(5)  # used to replicate example
vary_effects = bi.dist.multivariatenormal(Mu, Sigma, shape=(N_cafes,), sample = True)
a_cafe = vary_effects[:, 0]
b_cafe = vary_effects[:, 1]

seed = random.PRNGKey(22)
N_visits = 10
afternoon = jnp.tile(jnp.arange(2), N_visits * N_cafes // 2)
cafe_id = jnp.repeat(jnp.arange(N_cafes), N_visits)
mu = a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma = 0.5  # std dev within cafes
wait = dist.normal(mu, sigma, sample = True)
d = pd.DataFrame(dict(cafe=cafe_id, afternoon=afternoon, wait=wait))
m.data_to_model(['cafe_id', "afternoon", "wait"])
```

#### Define model

``` python
def model(cafe, wait, N_cafes):
    a = dist.normal(5, 2, name = 'a')
    b = dist.normal(-1, 0.5, name = 'b')
    sigma_cafe = dist.exponential(1, shape=[2], name = 'sigma_cafe')
    sigma = dist.exponential( 1, name = 'sigma')
    Rho = dist.lkj(2, 2, name = 'Rho')
    cov = jnp.outer(sigma_cafe, sigma_cafe) * Rho
    a_cafe_b_cafe = dist.multivariatenormal(jnp.stack([a, b]), cov, shape = [N_cafes], name = 'a_cafe')    

    a_cafe, b_cafe = a_cafe_b_cafe[:, 0], a_cafe_b_cafe[:, 1]
    mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
    lk("y", Normal(mu, sigma), obs=wait)

# Run mcmc ------------------------------------------------
m.run(model) 

# Summary ------------------------------------------------
m.sampler.print_summary(0.89)
```

## Mathematical Details

### *Formula*

$$
\left(\begin{array}{cc} 
\sigma_\alpha^2 & \sigma_\alpha \sigma_{\beta \rho }\\
\sigma_\alpha \sigma_{\beta \rho } & \sigma_\beta^2
\end{array}\right)
$$

where : - $\sigma_\alpha^2$ is the variance of intercepts. - $\sigma_\beta^2$ is the covariance of intercepts & slopes. - $\sigma_\alpha \sigma_{\beta \rho }$ is the covariance between intercepts and slopes -i.e. the product of the two standard deviations-.

## Mathematical Details

### *Formula*

We model the relationship between the independent variable $X$ and the outcome variable *Y* with varying intercepts ($\alpha$) and varying slopes ($\beta$) for each group (*k*) using the following equation:

$$
Y_{ik} = \alpha_k + \beta_k X_{ik} + \sigma
$$

Where: - $Y_{ik}$ is the outcome variable for observation *i* in group *k*. - $X_{ik}$ is the independent variables for observation *i* in group *k*. - $\alpha_k$ is the varying intercept for group *k*. - $\beta_k$ is the varying regression coefficients for group *k*. - \$\sigma \$ is the error term, assumed to be strictly positive.

### *Bayesian model*

We can express the Bayesian regression model accounting for prior distribution as follows:

$$
p(Y_{ik} |\mu_{ik} , \sigma) \sim \text{Normal}(\mu_{ik} , \sigma) \\
\mu_{ik} = \alpha_k + \beta_k X_{ik} + \sigma \\
\alpha_k \sim Normal(0,1) \\
\beta_k \sim Normal(0,1) \\
\sigma \sim Exponential(0,1)
$$

The varying intercepts slopes ($\alpha_k$) and ($\beta_k$) are modeled using a *Multivariate Normal distribution*:

$$ 
\left(

\begin{array}{cc} \alpha_k \\ \beta_k\end{array}

\right) \sim  \text{MultivariateNormal}(

\left(

\begin{array}{cc} 
0 \\
0
\end{array}

\right), \left(

\begin{array}{cc} 
\sigma_\alpha^2 & \sigma_\pi\sigma_{\alpha\rho} \\
\sigma_\alpha\sigma_{\pi\rho} & \sigma_\pi
\end{array}

\right)) 
$$

Where: - $\left(\begin{array}{cc} 0 \\0\end{array}\right)$, is the prior for average intercept. - $\left(\begin{array}{cc} \sigma_\alpha^2 & \sigma_\pi\sigma_{\alpha\rho} \\ \sigma_\alpha\sigma_{\pi\rho} & \sigma_\pi \end{array}\right)$ is is the covariance matrix which specifies the variance and covariance of $\alpha_k$ and $\beta_k$, where: - $\sigma_\alpha^2$ The variance of $\alpha_k$. - $\sigma_\pi^2$ The variance of $\beta_k$. - $\sigma_\pi\sigma_{\alpha\rho}$ and $\sigma_\alpha\sigma_{\pi\rho}$ The covariance between $\alpha_k$ and $\beta_k$

For computational reasons, it is often better to implement a [centered version of the varying intercept ðŸ›ˆ](12.%20Varying%20intercepts.qmd "The centered version of the varying intercept model is a statistical approach where the intercepts of a regression model vary across groups or clusters, and these intercepts are centered around a global mean. In this model, each group has its own intercept, but the differences between these intercepts and the global mean are what are explicitly estimated. This helps in understanding how each group's baseline differs from an overall average, making it easier to interpret variations between groups.") that is equivalent to the *Multivariate Normal distribution* approach:

$$
\left(\begin{array}{cc} \alpha_k \\ \beta_k\end{array}\right)
 \sim 
\left(\begin{array}{cc} 
\sigma_\alpha\\
\sigma_\pi
\end{array}\right) \circ 
L *
\left(\begin{array}{cc} 
\widehat{\alpha}_k \\
\widehat{\pi}_k
\end{array}\right)
$$

-   Where:

    -   $\sigma_\alpha \sim Exponential(1)$ bewing the priorstddev amongintercepts.
    -   $\sigma_\beta \sim Exponential(1)$ bewing the prior stddevamongslopes.
    -   $L \sim LKJcorr(Î·)$ bewing the prior for the correlationmatrix.

The full cetered version of the model is thus :

$$
p(Y_{i} |\mu_k , \sigma) \sim \text{Normal}(\mu_k , \sigma) \\
$$

$$
\mu_k =   \alpha_k + \beta_i X_i \\
$$

$$
\left(\begin{array}{cc} \alpha_k \\ \beta_k\end{array}\right)
 \sim 
\left(\begin{array}{cc} 
\sigma_\alpha\\
\sigma_\pi
\end{array}\right) \circ 
L *
\left(\begin{array}{cc} 
\widehat{\alpha}_k \\
\widehat{\pi}_k
\end{array}\right)
$$

$$
\alpha \sim Normal(0,1) \\
\beta \sim Normal(0,1) \\
\sigma_\alpha \sim Exponential(1) \\
\sigma_\pi \sim Exponential(1) \\
L \sim LKJcorr(2)
$$

## Notes

-   We can apply multivariate model similarly as [chapter 2](/2.%20Multiple%20Regression%20for%20Continuous%20Variables.qmd). In this case, we apply the same principle, but with a covariance matrix of a dimension equal to the number of varying slopes we define. For example, if we want to generate random slopes for $i$ actors in a model with two independent variables $X_1$ and $X_2$, we can define the formula as follows:

$$
p(Y_{i} |\mu_i , \sigma) \sim \text{Normal}(\mu_i , \sigma) \\
$$

$$
\mu_i =   \alpha_i + \beta_{1i} X_{1i}  + \beta_{1i} X_{2i} \\
$$

$$ \left(

\begin{array}{cc} 
\alpha_{i}\\ 
\beta_{1i}\\ 
\beta_{2i}
\end{array}

\right)

\sim  \left(

\begin{array}{cc} 
\sigma_{\alpha}\\
\sigma_{\pi}\\
\sigma_{\gamma}
\end{array}

\right) \circ  L \* \left(

\begin{array}{cc} 
\widehat{\alpha}_{k} \\
\widehat{\pi}_{k} \\
\widehat{\gamma}_{k} 
\end{array}

\right) $$

$$
\sigma_{\alpha} \sim Exponential(1) \\
\sigma_{\pi} \sim Exponential(1) \\
\sigma_{\gamma} \sim Exponential(1) \\
L \sim LKJcorr(2)
$$

-   We can apply interaction terms similarly as [chapter 3](\3.%20Interaction%20between%20continuous%20variables.qmd).

-   We can apply caterogical variables similarly as [chapter 4](4.%20Categorical%20variable.qmd).

-   We can apply varying slopes with any distribution presented in previous chapters.

-   For more than two varying effects we apply the same principel but with a covariance matrix for each varying effect that are summed to gernerat the varying intercept and slope. For exmaple, if we want to generate random slopes for $i$ actors, and $k$ groups we can define the formula as follow:

$$
p(Y_{i} |\mu_i , \sigma) \sim \text{Normal}(\mu_i , \sigma) \\
$$

$$
\mu_i =   \alpha_i + \beta_{i} X_i \\
\alpha_i = \alpha + \alpha_{actor[i]} + \alpha_{group[i]} \\
\beta_{i} = \beta + \beta_{actor[i]} + \beta_{group[i]} \\
$$

$$
\alpha \sim Normal(0,1) \\
\beta \sim Normal(0,1) \\
$$

$$ \left(

\begin{array}{cc} 
\alpha_{actor}\\ 
\beta_{actor}
\end{array}

\right)

\sim  

\left(

\begin{array}{cc} 
\sigma_{\alpha a}\\
\sigma_{\pi a}
\end{array}

\right) \circ  L\_{a} * \left(

\begin{array}{cc} 
\widehat{\alpha}_{ka} \\
\widehat{\pi}_{ka} 
\end{array}
\right) 
$$

$$
\sigma_{\alpha a} \sim Exponential(1) \\
\sigma_{\pi a} \sim Exponential(1) \\
L_{a} \sim LKJcorr(2)
$$

$$ \left(

\begin{array}{cc} 
\alpha_{group }\\ 
\beta_{group}
\end{array}

\right)

\sim  \left(

\begin{array}{cc} 
\sigma_{\alpha g}\\
\sigma_{\pi g}\\
\end{array}
\end{array}

\\end{array}\right) \circ  L\_{g} * \left(

\begin{array}{cc} 
\widehat{\alpha}_{kg} \\
\widehat{\pi}_{kg}
\end{array}

\right) 
$$

$$
\sigma_{\alpha g} \sim Exponential(1) \\
\sigma_{\pi g} \sim Exponential(1) \\
L_{g} \sim LKJcorr(2)
$$

-   Bellow the formula and the code snipset for a Binomial multivariate model with interaction between two independent variables $X_1$ and $X_2$ and multiples varying effects for each actor and each group:

$$
p(Y_{i} |n , p_i) \sim \text{Binomial}(n = 1, p_i) \\
$$

$$
logit{p_i}=   \alpha_i + (\beta_{1i}  + \beta_{2i} X_{2i})  X_{1i} \\
\alpha_i = \alpha + \alpha_{actor[i]} + \alpha_{group[i]} \\
\beta_{1i} = \beta + \beta_{1 actor[i]} + \beta_{ group[i]} \\
\beta_{2i} = \beta + \beta_{2 actor[i]} + \beta_{2 group[i]} \\
$$

$$
\alpha \sim Normal(0,1) \\
\beta \sim Normal(0,1) \\
$$

$$ \left(

\begin{array}{cc} 
\alpha_{actor}\\ 
\beta_{1 actor}\\ 
\beta_{2 actor}
\end{array}

\right)

\sim  \left(

\begin{array}{cc} 
\sigma_{\alpha a}\\
\sigma_{\pi a}\\
\sigma_{\gamma a}\\
\end{array}

\right) \circ  L\_{a} * \left(

\begin{array}{cc} 
\widehat{\alpha}_{ka} \\
\widehat{\pi}_{ka} \\
\widehat{\gamma}_{ka}
\end{array}

\right) $$

$$
\sigma_{\alpha a} \sim Exponential(1) \\
\sigma_{\pi a} \sim Exponential(1) \\
\sigma_{\gamma a} \sim Exponential(1) \\
L_{a} \sim LKJcorr(2)
$$

$$ \left(

\begin{array}{cc} 
\alpha_{group}\\ 
\beta_{1 group}\\ 
\beta_{2 group}
\end{array}

\right)

\sim  \left(

\begin{array}{cc} 
\sigma_{\alpha g}\\
\sigma_{\pi g}\\
\sigma_{\gamma g}
\end{array}

\right) \circ  L\_{g} * \left(

\begin{array}{cc} 
\widehat{\alpha}_{kg} \\
\widehat{\pi}_{kg}\\
\widehat{\gamma}_{kg}
\end{array}

\right) $$

$$
\sigma_{\alpha g} \sim Exponential(1) \\
\sigma_{\pi g} \sim Exponential(1) \\
\sigma_{\gamma g} \sim Exponential(1) \\
L_{g} \sim LKJcorr(2)
$$


In [None]:
from main import*
# Setup device------------------------------------------------
m = bi(platform='cpu')
# Import data
m.read_csv("../data/chimpanzees.csv", sep=";")
m.df["block_id"] = m.df.block
m.df["treatment"] = 1 + m.df.prosoc_left + 2 * m.df.condition
m.data_to_model(['pulled_left', 'treatment', 'actor', 'block_id'])


def model(tid, actor, block_id, L=None, link=False):
    # fixed priors
    g = dist.normal(0, 1, name = 'g', shape = (4,))
    sigma_actor = dist.exponential(1, name = 'sigma_actor', shape = (4,))
    L_Rho_actor = dist.lkjcholesky(4, 2, name = "L_Rho_actor")
    sigma_block = dist.exponential(1, name = "sigma_block", shape = (4,))
    L_Rho_block = dist.lkjcholesky(4, 2, name = "L_Rho_block")

    # adaptive priors - non-centered
    z_actor = dist.normal(0, 1, name = "z_actor", shape = (4,7)
    z_block = dist.normal(0, 1, name = "z_block", shape = (4,3)
    alpha = deterministic(
        "alpha", ((sigma_actor[..., None] * L_Rho_actor) @ z_actor).T
    )
    beta = deterministic(
        "beta", ((sigma_block[..., None] * L_Rho_block) @ z_block).T
    )

    logit_p = g[tid] + alpha[actor, tid] + beta[block_id, tid]
    dist("L", dist.Binomial(logits=logit_p), obs=L)

    # compute ordinary correlation matrixes from Cholesky factors
    if link:
        deterministic("Rho_actor", L_Rho_actor @ L_Rho_actor.T)
        deterministic("Rho_block", L_Rho_block @ L_Rho_block.T)
        deterministic("p", expit(logit_p))

# Run mcmc ------------------------------------------------
m.run(model) 

# Summary ------------------------------------------------
m.sampler.print_summary(0.89)

## Reference(s)

@mcelreath2018statistical