# <center> Lecture15 : Hierarchical Models (2)</center>  
 
## <center> Instructor: Dr. Hu Chuan-Peng </center> 

## Intro  

在上一节课程中，我们学习了层级模型的基本概念，考虑了自我控制分数在不同站点和不同个体间的变化。  

🤔 然而，我们更想回答的问题是，压力对自我控制的影响是否在不同站点间存在差异？  

* 一种可能是，自我控制分数在不同站点间存在差异，但是压力对自我控制的影响在不同站点不存在差异。  
* 另一种可能是，站点只调节压力对自我控制的影响，而各站点间自我控制分数相当。  
* 最后，站点可能既影响自我控制分数，又影响压力对自我控制的效应。  

![Image Name](https://cdn.kesci.com/upload/s5sh35okpe.png?imageView2/0/w/640/h/640)  


在本节课中，我们将介绍引入包含自变量时的层级模型，并通过不同的模型验证不同的假设：  
* H0(model 0)，图A，普通线性模型，仅考虑压力对自我控制的影响。  
* H1(model 1)，图B，变化截距模型，在模型0的基础上考虑自我控制在不同站点的变化。  
* H2(model 2)，图C，变化斜率模型，在模型0的基础上不同站点间的压力影响的变化。  
* H3(model 3)，图D，变化截距和斜率模型，结合模型1和模型2，同时考虑站点对自我控制以及压力影响的变化。  

![Image Name](https://cdn.kesci.com/upload/s5sh35okpe.png?imageView2/0/w/640/h/640)

In [37]:
# 导入 pymc 模型包，和 arviz 等分析工具 
import pymc as pm
import arviz as az
import seaborn as sns
import scipy.stats as st
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import pandas as pd
import ipywidgets
import bambi as bmb

# 忽略不必要的警告
import warnings
warnings.filterwarnings("ignore")

In [38]:
# 通过 pd.read_csv 加载数据 Data_Sum_HPP_Multi_Site_Share.csv
df_raw = pd.read_csv('/home/mw/input/bayes20238001/Data_Sum_HPP_Multi_Site_Share.csv')
# 选取所需站点
first5_site = ['Southampton','Portugal','Kassel','Tsinghua','UCSB']
df_first5 = df_raw.query("Site in @first5_site")
# 生成站点索引
df_first5["site_idx"] = pd.factorize(df_first5.Site)[0]
# 生成被试数索引
df_first5["obs_id"] = range(len(df_first5))
# 将站点、被试id设置为索引
df_first5.set_index(['Site','obs_id'],inplace=True,drop=False)


## Model0: Complete pooling  

如果我们忽略数据的层级结构，认为所有数据都来自一个更大的总体，只需要用一个回归方程来描述自变量与因变量的关系。  

此时的回归模型采样了完全池化 (complete) 方法，对应假设0 (H0) 和模型0 (model 0):  

$$  
\begin{array}{lcrl}  
\text{data:} & \hspace{.05in} &   Y_i | \beta_0, \beta_1, \sigma & \stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right) \;\; \text{ with } \;\; \mu_i = \beta_0 + \beta_1X_i \\  

\text{priors:} & & \beta_{0}  & \sim N\left(0, 50^2 \right)  \\  
                    & & \beta_1  & \sim N\left(0, 5^2 \right) \\  
                    & & \sigma   & \sim \text{Exp}(1)  \\  
\end{array}  
$$ 

In [39]:
# 通过完全池化的方式可视化数据
sns.lmplot(df_first5,
           x="stress",
           y="scontrol",
           height=4, aspect=1.5)

<seaborn.axisgrid.FacetGrid at 0x7f5e2198d0a0>

### 模型定义与采样

In [40]:
# 注意，以下代码可能运行2分钟左右

coords = {"obs_id": df_first5.obs_id}
with pm.Model(coords=coords) as complete_pooled_model:

    beta_0 = pm.Normal("beta_0", mu=0, sigma=50)                #定义beta_0          
    beta_1 = pm.Normal("beta_1", mu=0, sigma=5)                 #定义beta_1
    sigma = pm.Exponential("sigma", 1)                          #定义sigma

    x = pm.MutableData("x", df_first5.stress, dims="obs_id")    #x是自变量压力水平

    mu = pm.Deterministic("mu",beta_0 + beta_1 * x, 
                          dims="obs_id")                        #定义mu，讲自变量与先验结合

    likelihood = pm.Normal("y_est", mu=mu, sigma=sigma, observed=df_first5.scontrol,
                           dims="obs_id")                       #定义似然：预测值y符合N(mu, sigma)分布
                                                                #通过 observed 传入实际数据y 自我控制水平
    complete_trace = pm.sample(random_seed=84735)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_1, sigma]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 7 seconds.


In [41]:
pm.model_to_graphviz(complete_pooled_model)

### 后验参数估计：  

结果显示：  

$\mu_i = \beta_0 + \beta_1X_i$  
- $\beta_0 = 63.17$  
- $\beta_1 = -0.58$

In [42]:
az.summary(complete_trace,
           var_names=["~mu"],
           filter_vars="like")

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta_0,63.32,1.89,59.597,66.761,0.049,0.034,1511.0,1453.0,1.0
beta_1,-0.578,0.046,-0.666,-0.492,0.001,0.001,1517.0,1572.0,1.0
sigma,6.472,0.215,6.06,6.847,0.005,0.003,1898.0,1748.0,1.01


### 后验预测回归线  

* 完全池化模型假设，所有站点中自我控制分数一致，并且压力的影响也一致。  
* 下图展示了不同站点下后验预测的结果，可以看到他们的回归线是一致的。

In [43]:
#提取不同站点数据对应的索引并储存，便于后续将后验预测数据按照站点进行提取
def get_group_index(data):
    group_index = {}
    for i, group in enumerate(data["Site"].unique()):
        group_index[group] = xr.DataArray(data.query(f"Site == '{group}'"))["obs_id"].values
    return group_index

In [44]:
#定义函数，绘制不同站点下的后验预测回归线
def plot_regression(data, trace, group_index):
    # 定义画布，根据站点数量定义画布的列数
    fig, ax = plt.subplots(1,len(data["Site"].unique()), 
                       sharex=True,
                       sharey=True,
                       figsize=(15,5))
    
    # 根据站点数来分别绘图
    # 我们需要的数据有原始数据，每一个因变量的后验预测均值
    # 这些数据都储存在后验参数采样结果中，也就是这里所用的trace
    for i, group in enumerate(data["Site"].unique()):
        #绘制真实数据的散点图
        x = trace.constant_data.x.sel(obs_id = group_index[f"{group}"])
        y = trace.observed_data.y_est.sel(obs_id = group_index[f"{group}"])
        mu = trace.posterior.mu.sel(obs_id = group_index[f"{group}"])
        ax[i].scatter(x, y,
                color=f"C{i}",
                alpha=0.5)
        #绘制回归线
        ax[i].plot(x, mu.stack(sample=("chain","draw")).mean(dim="sample"),
                color=f"C{i}",
                alpha=0.5)
        #绘制预测值95%HDI
        az.plot_hdi(
            x, mu,
            hdi_prob=0.95,
            fill_kwargs={"alpha": 0.25, "linewidth": 0},
            color=f"C{i}",
            ax=ax[i])
    # 生成横坐标名称
    fig.text(0.5, 0, 'Stress', ha='center', va='center', fontsize=12)
    # 生成纵坐标名称
    fig.text(0.08, 0.5, 'Self control', ha='center', va='center', rotation='vertical', fontsize=12)
    # 生成标题
    plt.suptitle("Posterior regression models", fontsize=15)
        
    sns.despine()

In [45]:
# 获取每个站点数据的索引
first5_index = get_group_index(data=df_first5)
# 进行可视化
plot_regression(data=df_first5,
                trace=complete_trace,
                group_index=first5_index)

## No pooling  

接下来我们暂时忽略总体信息，只考虑分组信息  

* 不同站点间，线性关系中的参数(斜率、截距)是相互独立的  

* 我们使用$j$来表示不同的站点，$j\in(0,1,2,3,4,5)$，从分布中抽取不同站点的参数  


$$  
\begin{array}{lcrl}  
\text{data:} & \hspace{.05in} &   Y_i | \beta_0j, \beta_1j, \sigma & \stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right) \;\; \text{ with } \;\; \mu_i = \beta_{0j} + \beta_{1j}X_i \\  

\text{priors:} & & \beta_{0j}  & \sim N\left(0, 50^2 \right)  \\  
                    & & \beta_{1j}  & \sim N\left(0, 5^2 \right) \\  
                    & & \sigma   & \sim \text{Exp}(1)  \\  
\end{array}  
$$ 

In [46]:
# 创建画图所需的网格数
g = sns.FacetGrid(df_first5, col="Site", col_wrap=5, height=4)

# 将各个图所画的内容对应到画布上
g.map(sns.regplot, "stress", "scontrol")

# Show the plot
plt.show()

### 模型定义与采样

In [47]:
# 注意，以下代码可能运行2分钟左右

coords = {"site": df_first5["Site"].unique(),
          "obs_id": df_first5.obs_id}

with pm.Model(coords=coords) as no_pooled_model:

    #定义截距、斜率，指定dims="site"，生成每个站点对应的截距、斜率
    beta_0 = pm.Normal("beta_0", mu=0, sigma=50, dims="site")
    beta_1 = pm.Normal("beta_1", mu=0, sigma=5, dims="site")    
    #定义sigma，指定dims="site"，生成不同的sigma
    sigma = pm.Exponential("sigma", 2, dims="site") 

    #传入自变量、获得观测值对应的站点映射
    site = pm.MutableData("site", df_first5.site_idx, dims="obs_id") 
    x = pm.MutableData("x", df_first5.stress, dims="obs_id")

    #线性关系
    mu = pm.Deterministic("mu", beta_0[site]+beta_1[site]*x, dims="obs_id")
    # 定义 likelihood
    likelihood = pm.Normal("y_est", mu=mu, sigma=sigma[site], observed=df_first5.scontrol, dims="obs_id")

    no_trace = pm.sample(random_seed=84735)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_1, sigma]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 26 seconds.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.


In [48]:
pm.model_to_graphviz(no_pooled_model)

###  后验参数估计  

* 可以看到每个站点的的截距(beta_0[xx])、斜率(beta_1[xx])，以及观测值所服从的正态分布中的标准差sigma[xx]都是不同的

In [49]:
az.summary(no_trace,
           var_names=["beta","beta_1"],
           filter_vars="like")

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta_0[Kassel],65.889,3.855,58.438,72.926,0.063,0.045,3744.0,2828.0,1.0
beta_0[Portugal],67.895,9.09,49.46,84.429,0.15,0.106,3691.0,2869.0,1.0
beta_0[Southampton],23.493,21.585,-18.565,63.199,0.336,0.263,4172.0,2682.0,1.0
beta_0[Tsinghua],55.82,2.528,51.262,60.993,0.043,0.03,3532.0,2629.0,1.01
beta_0[UCSB],68.193,3.811,60.7,75.167,0.062,0.044,3724.0,2734.0,1.0
beta_1[Kassel],-0.618,0.096,-0.8,-0.441,0.002,0.001,3725.0,3162.0,1.0
beta_1[Portugal],-0.573,0.237,-1.042,-0.129,0.004,0.003,3654.0,2825.0,1.0
beta_1[Southampton],0.379,0.541,-0.679,1.379,0.008,0.008,4074.0,2572.0,1.0
beta_1[Tsinghua],-0.41,0.063,-0.531,-0.293,0.001,0.001,3477.0,2511.0,1.01
beta_1[UCSB],-0.707,0.089,-0.868,-0.531,0.001,0.001,3754.0,2644.0,1.0


In [50]:
# 设置绘图坐标
figs, (ax1, ax2) = plt.subplots(1,2, figsize = (20,5))
# 绘制变化的截距
az.plot_forest(no_trace,
           var_names=["~mu", "~sigma", "~offset", "~beta_1"],
           filter_vars="like",
           combined = True,
           ax=ax1)
# 绘制变化的斜率
az.plot_forest(no_trace,
           var_names=["~mu", "~sigma", "~offset", "~beta_0"],
           filter_vars="like",
           combined = True,
           ax=ax2)
plt.show()

### 后验预测回归线  

* 在非池化模型中，生成了5条斜率与截距各不相同的回归线  


In [51]:
first5_index = get_group_index(data=df_first5)
plot_regression(data=df_first5,
                trace=no_trace,
                group_index=first5_index)

## Partial pooling & hierarchical model  

非池化模型 (no pooling)没有考虑总体和站点之间的关系，仅把不同站点当作独立群体。现在我们开始考虑如何使用部分池化方法 (partial pooling)来构建分层模型。  

* 考虑到不同站点下回归模型的截距 ($\beta_0$) 和斜率 ($\beta_1$) 都可能发生变化  
* 我们首先考虑截距 ($\beta_0$)随站点变化的模型 (model1，变化截距模型)  
* 然后再考虑斜率 ($\beta_1$) 随站点变化的模型 (model2，变化斜率模型)  
* 最后，我们同时考虑截距 ($\beta_0$)和斜率 ($\beta_1$) 随站点的变化 (model3，变化截距和斜率模型)  

![Image Name](https://cdn.kesci.com/upload/s5eeyrh0s5.png?imageView2/0/w/960/h/960)  

* $j$来表示站点，$j \in \{1,2, \ldots, 5\}$  
* $i$来表示站点内部的每一个数据$i \in \{1,2,\ldots,n_j\}$  
* 每一个被试的数据可以被表示为$Y_{ij}$，表示站点$j$内的第$i$个被试的自我控制分数观测值  

$$  
Y := \left((Y_{11}, Y_{21}, \ldots, Y_{n_1,1}), (Y_{12}, Y_{22}, \ldots, Y_{n_2,2}), \ldots, (Y_{1,5}, Y_{2,5}, \ldots, Y_{n_{5},5})\right)  .  
$$ 

##  Model1: Hierarchical model with varying intercepts  

相较于没有自变量的分层模型，构建包含自变量的分层模型的关键在于区分 **变量($\beta$)** 和 **分层(layer)** 的关系。  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0j}, \beta_1, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0j} + \beta_1 X_{ij} & \text{(每个站点内的线性模型)} \\  
\beta_{0j} | \beta_0, \sigma_0  & \stackrel{ind}{\sim} N(\beta_0, \sigma_0^2) & \text{(截距在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_0 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$

### Layer 1: Variability within Site  

**1. 自我控制与压力之间的关系在被试内有什么不同**  

$$  
Y_{ij} | \beta_{j}, \beta_1, \sigma_y \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ where } \; \mu_{ij} = \beta_{0j} + \beta_1 X_{ij}  .  
$$  

* 使用$i$表示每个站点内的第$i$个被试，$i \in (1,2,3...n)$  
* 对于每一个被试来说，其自我控制分数服从以$\mu_{ij}$为均值，$\sigma_y$为标准差的正态分布  

* 而$\mu_{ij}$由参数$\beta_{0j}$，$\beta_1$决定  

    * 其中，$\beta_{0j}$在组与组之间不同(group-specific)  

    * $\beta_1$和$\sigma_y$则是相同的(global)

### Layer 2: Variability between Site  
**2. 自我控制与压力之间的线性关系在站点间有什么不同**  

* 自我控制与压力之间的线性关系由截距和斜率两方面构成  

* 我们认为在不同的站点之间，其截距是变化的  

* 假设截距的基线(baseline)为$\beta_{0}$，不同站点间的组间差异为$\sigma_{0}$，则每个站点的截距可以表示为：  

$$  
\beta_{0j} | \beta_0, \sigma_0 \stackrel{ind}{\sim} N(\beta_0, \sigma_0^2)  .  
$$

### Layer 3: Global priors  
**3. 最后，我们对全局参数进行定义，即$\beta_{0}, \beta_1,  \sigma_0$**  

$$  
\begin{array}{rll}  
\beta_{0}  & \sim N(m_0, s_0^2)  \\  
\beta_1  & \sim N(m_1, s_1^2) & \\  
\sigma_y & \sim \text{Exp}(l_y)    & \\  
\sigma_0 & \sim \text{Exp}(l_0)    & \\  
\end{array}  

$$

**总结模型定义：**  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0j}, \beta_1, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0j} + \beta_1 X_{ij} & \text{(每个站点内的线性模型)} \\  
\beta_{0j} | \beta_0, \sigma_0  & \stackrel{ind}{\sim} N(\beta_0, \sigma_0^2) & \text{(截距在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_0 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$

### 另一种理解方式  

* 我们可以把不同站点间截距的变化用另一种方式表达：  

    * 不同站点间的截距是在总体的 $\beta_0$ 的基础上加上站点的特异性变异 $b_{0j}$， $\beta_{0j} = \beta_0 + b_{0j}$  

    * 而$b_{0j}$ 则满足$b_{0j} \sim N(0, \sigma_0^2)$， $b_{0j} \sim N(0, \sigma_0^2)$  


* 整理一下则有：  

$$  
\begin{split}  
Y_{ij} | \beta_{0j}, \beta_1, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = (\beta_0 + b_{0j}) + \beta_1 X_{ij}  \\  
b_{0j} | \sigma_0  & \stackrel{ind}{\sim} N(0, \sigma_0^2)  \\  
\beta_{0}  & \sim N(0, 50^2) \\  
\beta_1  & \sim N(0, 1^2)  \\  
\sigma_y & \sim \text{Exp}(1)  \\  
\sigma_0 & \sim \text{Exp}(1).  \\  
\end{split}  
$$

### 模型定义与采样  

* 这里我们将上述两种定义方式均转换为pymc代码，并比较两种定义方式下MCMC采样结果的差异  
* 首先，我们设定总体的参数 $\beta_0$, $\beta_1$。  
	* 由于 $\beta_{0j}$ 在不同站点间不同，因此我们 设定总体参数 $\sigma_0$ 并假设每个站点 $\beta_{0j} \sim N(\beta_0, \sigma_0)$  
	 
	*  在另一种定义方式下，我们将$\beta_{0j}$ 视为 $\beta_0$与$b_{0j}$的组合  
	* 之后，我们通过线性公式生成 $\mu = \beta_{0j} + \beta_1 * x$  
	* 最后，个体层面的数据 y 服从 $N(\mu, \sigma_y)$，其中 $\sigma_y$ 为组内变异。  


In [52]:
# 定义函数来构建和采样模型
def run_var_inter_model(non_centered = False):

    #定义数据坐标，包括站点和观测索引
    coords = {"site": df_first5["Site"].unique(),
            "obs_id": df_first5.obs_id}

    with pm.Model(coords=coords) as var_inter_model:
        #定义全局参数
        beta_0 = pm.Normal("beta_0", mu=40, sigma=20)
        beta_0_sigma = pm.Exponential("beta_0_sigma", 1)
        beta_1 = pm.Normal("beta_1", mu=0, sigma=5)
        sigma_y = pm.Exponential("sigma_y", 1) 

        #传入自变量、获得观测值对应的站点映射
        x = pm.MutableData("x", df_first5.stress, dims="obs_id")
        site = pm.MutableData("site", df_first5.site_idx, dims="obs_id") 
        
        #选择不同的模型定义方式
        if non_centered:
            beta_0_offset = pm.Normal("beta_0_offset", 0, sigma=1, dims="site")
            beta_0j = pm.Deterministic("beta_0j", beta_0 + beta_0_offset * beta_0_sigma, dims="site")
        else:
            beta_0j = pm.Normal("beta_0j", mu=beta_0, sigma=beta_0_sigma, dims="site")

        #线性关系
        mu = pm.Deterministic("mu", beta_0j[site]+beta_1*x, dims="obs_id")

        # 定义 likelihood
        likelihood = pm.Normal("y_est", mu=mu, sigma=sigma_y, observed=df_first5.scontrol, dims="obs_id")

        var_inter_trace = pm.sample(draws=5000,           # 使用mcmc方法进行采样，draws为采样次数
                            tune=1000,                    # tune为调整采样策略的次数，可以决定这些结果是否要被保留
                            chains=4,                     # 链数
                            discard_tuned_samples= True,  # tune的结果将在采样结束后被丢弃
                            random_seed=84735,
                            target_accept=0.99)
    
    return var_inter_model, var_inter_trace

In [53]:
# 注意，以下代码可能运行5分钟左右

var_inter_model_centered, var_inter_trace_centered = run_var_inter_model(non_centered = False)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_0_sigma, beta_1, sigma_y, beta_0j]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 97 seconds.
There were 22 divergences after tuning. Increase `target_accept` or reparameterize.


In [54]:
pm.model_to_graphviz(var_inter_model_centered)

In [55]:
var_inter_model, var_inter_trace = run_var_inter_model(non_centered = True)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_0_sigma, beta_1, sigma_y, beta_0_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 95 seconds.


In [56]:
pm.model_to_graphviz(var_inter_model)

### 先验预测检查

In [57]:
var_inter_prior = pm.sample_prior_predictive(samples=50,
                                            model=var_inter_model,
                                            random_seed=84735)

Sampling: [beta_0, beta_0_offset, beta_0_sigma, beta_1, sigma_y, y_est]


In [58]:
# 定义绘制先验预测回归线的函数，其逻辑与绘制后验预测回归线相同
def plot_prior(prior,group_index):
    # 定义画布，根据站点数量定义画布的列数
    fig, ax = plt.subplots(1,len(df_first5["Site"].unique()), 
                        sharex=True,
                        sharey=True,
                        figsize=(20,5))
    # 根据站点数来分别绘图
    # 我们需要的数据有原始数据中的自变量，每一个因变量的先验预测均值
    # 这些数据都储存在先验预测采样结果中，也就是这里所用的prior
    for i, group in enumerate(df_first5["Site"].unique()): 
        #绘制回归线
        ax[i].plot(prior.constant_data["x"].sel(obs_id = group_index[f"{group}"]),
                prior.prior["mu"].sel(obs_id = group_index[f"{group}"]).stack(sample=("chain","draw")),
                c='gray',
                alpha=0.5)
        ax[i].set_title(f"{group}")
    fig.text(0.5, 0, 'Stress', ha='center', va='center', fontsize=12)
    # 生成纵坐标名称
    fig.text(0.08, 0.5, 'Self control', ha='center', va='center', rotation='vertical', fontsize=12)
    # 生成标题
    plt.suptitle("Prior regression models", fontsize=15, y=1)
        
    sns.despine()

In [59]:
plot_prior(prior=var_inter_prior,
           group_index=first5_index)

###  MCMC采样&后验参数估计  

* 可以看到5条回归线的斜率都是一致的 $\beta_1 = -0.56$  
* 总体层面的解决 $\beta_0 = 63.15$  
* 但截距$\beta_{0j}$[xx]有所不同:  
    * $\beta_{0}$[Kassel] = 63.56  
    * $\beta_{0}$[Portugal] = 65.37  
    * $\beta_{0}$[Southampton] = 62.58  
    * $\beta_{0}$[Tsinghua] = 62.09  
    * $\beta_{0}$[UCSB] = 62.37

In [60]:
# ~ 和filter_vars="like" 表示在显示结果时去除掉包含这些字符的变量
var_inter_para = az.summary(var_inter_trace,
           var_names=["~mu","~_sigma","~_offset","~sigma_"],
           filter_vars="like")
var_inter_para

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta_0,63.148,2.088,59.342,67.175,0.023,0.016,8390.0,11140.0,1.0
beta_1,-0.563,0.047,-0.651,-0.473,0.0,0.0,9881.0,12031.0,1.0
beta_0j[Kassel],63.556,1.967,59.869,67.292,0.019,0.014,10461.0,12406.0,1.0
beta_0j[Portugal],65.369,2.303,61.083,69.659,0.023,0.017,9717.0,12261.0,1.0
beta_0j[Southampton],62.578,2.4,58.065,67.057,0.023,0.016,10884.0,13666.0,1.0
beta_0j[Tsinghua],62.09,1.963,58.307,65.657,0.02,0.014,9767.0,12428.0,1.0
beta_0j[UCSB],62.371,2.063,58.53,66.219,0.021,0.015,9843.0,11940.0,1.0


In [61]:
az.plot_forest(var_inter_trace,
           var_names=["~mu", "~sigma", "~offset", "~beta_1"],
           filter_vars="like",
           combined = True)

array([<Axes: title={'center': '94.0% HDI'}>], dtype=object)

### 后验预测回归线  

* 5条回归线的截距不同，斜率相同

In [62]:
#定义函数，绘制不同站点下的后验预测回归线
def plot_partial_regression(data, trace, group_index):
    # 定义画布，根据站点数量定义画布的列数
    fig, ax = plt.subplots(1,len(data["Site"].unique()), 
                       sharex=True,
                       sharey=True,
                       figsize=(15,5))
    
    # 根据站点数来分别绘图
    # 我们需要的数据有原始数据，每一个因变量的后验预测均值
    # 这些数据都储存在后验参数采样结果中，也就是这里所用的trace
    for i, group in enumerate(data["Site"].unique()):
        #绘制真实数据的散点图
        x = trace.constant_data.x.sel(obs_id = group_index[f"{group}"])
        y = trace.observed_data.y_est.sel(obs_id = group_index[f"{group}"])
        mu = trace.posterior.mu.sel(obs_id = group_index[f"{group}"])
        ax[i].scatter(x, y,
                color=f"C{i}",
                alpha=0.5)
        #绘制回归线
        ax[i].plot(x, mu.stack(sample=("chain","draw")).mean(dim="sample"),
                color=f"C{i}",
                alpha=0.5)
        ax[i].set_title(f"Intercept: {var_inter_para.loc[f'beta_0j[{group}]']['mean']}", fontsize=12)
        #绘制预测值95%HDI
        az.plot_hdi(
            x, mu,
            hdi_prob=0.95,
            fill_kwargs={"alpha": 0.25, "linewidth": 0},
            color=f"C{i}",
            ax=ax[i])
        
    # 生成横坐标名称
    fig.text(0.5, 0, 'Stress', ha='center', va='center', fontsize=12)
    # 生成纵坐标名称
    fig.text(0.08, 0.5, 'Self control', ha='center', va='center', rotation='vertical', fontsize=12)
    # 生成标题
    plt.suptitle("Posterior regression models(varing intercept)", fontsize=15, y=1.05)
        
    sns.despine()

In [63]:
plot_partial_regression(data=df_first5,
                trace=var_inter_trace,
                group_index=first5_index)

### 组间方差与组内方差  

* 在这个模型定义中，组间方差来自`beta_0_offset`，组内方差来自`sigma_y`  
* 结果发现：组间变异 (0.067) 小于组内变异 (0.932)，表明组内相关性低。

In [64]:
# 提取组间和组内变异
para_sum = az.summary(var_inter_trace,
                      var_names=["_offset","sigma_"],
                      filter_vars="like")
between_sd = (para_sum.filter(like='_offset', axis=0)["mean"]**2).sum()
within_sd = para_sum.loc['sigma_y','mean']**2
# 计算变异占比
var = between_sd + within_sd
print("被组间方差所解释的部分：", between_sd/var)
print("被组内方差所解释的部分：", within_sd/var)
print("组内相关：",between_sd/var)


被组间方差所解释的部分： 0.06711151565666632
被组内方差所解释的部分： 0.9328884843433337
组内相关： 0.06711151565666632


## Model2: Hierarchical model with varying slopes  

* 上一个模型考虑了回归截距随站点的变化，在模型2中，我们假设不同站点间的回归截距保持不变，但回归斜率随站点变化。  

$$  
\beta_{1j} | \beta_1, \sigma_1 \sim N(\beta_1, \sigma_1^2)  
$$  

类似于模型1，**模型2的定义形式为：**  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0}, \beta_{1j}, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0} + \beta_{1j} X_{ij} & \text{(每个站点内的线性模型)} \\  
\beta_{1j} | \beta_1, \sigma_1  & \stackrel{ind}{\sim} N(\beta_1, \sigma_1^2)  & \text{(斜率在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_1 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$  

或：  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0}, \beta_{1j}, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0} + (\beta_{1} + b_{1j}) X_{ij} & \text{(每个站点内的线性模型)} \\  
b_{1j} | \sigma_1 & \stackrel{ind}{\sim} N(0, \sigma_1^2) & \text{(斜率在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_1 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$

In [65]:
# 定义函数来构建和采样模型
def run_var_slope_model(non_centered = False):

    #定义数据坐标，包括站点和观测索引
    coords = {"site": df_first5["Site"].unique(),
            "obs_id": df_first5.obs_id}

    with pm.Model(coords=coords) as model:
        #定义全局参数
        beta_0 = pm.Normal("beta_0", mu=0, sigma=50)
        beta_1 = pm.Normal("beta_1", mu=0, sigma=5) 
        beta_1_sigma = pm.Exponential("beta_1_sigma", 1)
        sigma_y = pm.Exponential("sigma_y", 1) 

        #传入自变量、获得观测值对应的站点映射
        x = pm.MutableData("x", df_first5.stress, dims="obs_id")
        site = pm.MutableData("site", df_first5.site_idx, dims="obs_id") 

        #选择不同的模型定义方式
        if non_centered:
            beta_1_offset = pm.Normal("beta_1_offset", 0, sigma=1, dims="site")
            beta_1j = pm.Deterministic("beta_1j", beta_1 + beta_1_offset * beta_1_sigma, dims="site")
        else:
            beta_1j = pm.Normal("beta_1j", mu=beta_1, sigma=beta_1_sigma, dims="site")

        #线性关系
        mu = pm.Deterministic("mu", beta_0+beta_1j[site]*x, dims="obs_id")

        # 定义 likelihood
        likelihood = pm.Normal("y_est", mu=mu, sigma=sigma_y, observed=df_first5.scontrol, dims="obs_id")

        trace = pm.sample(draws=5000,           # 使用mcmc方法进行采样，draws为采样次数
                            tune=1000,                    # tune为调整采样策略的次数，可以决定这些结果是否要被保留
                            chains=4,                     # 链数
                            discard_tuned_samples= True,  # tune的结果将在采样结束后被丢弃
                            random_seed=84735,
                            target_accept=0.99)
    
    return model, trace

In [66]:
# 注意，以下代码可能运行5分钟左右

var_slope_model, var_slope_trace = run_var_slope_model(non_centered = True)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_1, beta_1_sigma, sigma_y, beta_1_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 110 seconds.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.


In [67]:
pm.model_to_graphviz(var_slope_model)

###  MCMC采样&后验参数估计  

* 可以看到5条回归线的截距 $\beta_{0j}$ 一致，但是斜率$\beta_{1j}$ 不同  
* $\beta_{1j}$ 在总体$\beta_{1}$ 上增加了变异

In [68]:
var_slope_para = az.summary(var_slope_trace,
                            var_names=["beta_0","beta_1j"],
                            filter_vars="like")
var_slope_para 

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta_0,62.554,1.932,59.002,66.275,0.021,0.015,8700.0,10803.0,1.0
beta_1j[Kassel],-0.538,0.05,-0.634,-0.446,0.001,0.0,8759.0,11890.0,1.0
beta_1j[Portugal],-0.469,0.065,-0.594,-0.348,0.001,0.001,6650.0,9053.0,1.0
beta_1j[Southampton],-0.569,0.068,-0.694,-0.44,0.001,0.0,10818.0,13388.0,1.0
beta_1j[Tsinghua],-0.573,0.048,-0.665,-0.483,0.001,0.0,9192.0,11322.0,1.0
beta_1j[UCSB],-0.574,0.047,-0.662,-0.484,0.0,0.0,9427.0,11888.0,1.0


In [69]:
az.plot_forest(var_slope_trace,
           var_names=["~mu", "~sigma", "~offset", "~beta_0"],
           filter_vars="like",
           combined = True)

array([<Axes: title={'center': '94.0% HDI'}>], dtype=object)

### 后验预测回归线  

* 5条回归线的截距相同，但是斜率不同

In [70]:
#定义函数，绘制不同站点下的后验预测回归线
def plot_partial_regression(data, trace, group_index):
    # 定义画布，根据站点数量定义画布的列数
    fig, ax = plt.subplots(1,len(data["Site"].unique()), 
                       sharex=True,
                       sharey=True,
                       figsize=(15,5))
    
    # 根据站点数来分别绘图
    # 我们需要的数据有原始数据，每一个因变量的后验预测均值
    # 这些数据都储存在后验参数采样结果中，也就是这里所用的trace
    for i, group in enumerate(data["Site"].unique()):
        #绘制真实数据的散点图
        x = trace.constant_data.x.sel(obs_id = group_index[f"{group}"])
        y = trace.observed_data.y_est.sel(obs_id = group_index[f"{group}"])
        mu = trace.posterior.mu.sel(obs_id = group_index[f"{group}"])
        ax[i].scatter(x, y,
                color=f"C{i}",
                alpha=0.5)
        #绘制回归线
        ax[i].plot(x, mu.stack(sample=("chain","draw")).mean(dim="sample"),
                color=f"C{i}",
                alpha=0.5)
        ax[i].set_title(f"Slope: {var_slope_para.loc[f'beta_1j[{group}]']['mean']}", fontsize=12)
        #绘制预测值95%HDI
        az.plot_hdi(
            x, mu,
            hdi_prob=0.95,
            fill_kwargs={"alpha": 0.25, "linewidth": 0},
            color=f"C{i}",
            ax=ax[i])
        
    # 生成横坐标名称
    fig.text(0.5, 0, 'Stress', ha='center', va='center', fontsize=12)
    # 生成纵坐标名称
    fig.text(0.08, 0.5, 'Self control', ha='center', va='center', rotation='vertical', fontsize=12)
    # 生成标题
    plt.suptitle("Posterior regression models(varing slope)", fontsize=15, y=1.05)
        
    sns.despine()

In [71]:
plot_partial_regression(data=df_first5,
                trace=var_slope_trace,
                group_index=first5_index)

### 组间方差与组内方差  

* 在这个模型定义中，组间方差来自`beta_1_offset`，组内方差来自`sigma_y`

In [72]:
# 提取组间和组内变异
para_sum = az.summary(var_slope_trace,
                      var_names=["_offset","sigma_"],
                      filter_vars="like")
between_sd = (para_sum.filter(like='_offset', axis=0)["mean"]**2).sum()
within_sd = para_sum.loc['sigma_y','mean']**2
# 计算变异占比
var = between_sd + within_sd
print("被组间方差所解释的部分：", between_sd/var)
print("被组内方差所解释的部分：", within_sd/var)
print("组内相关：",between_sd/var)

被组间方差所解释的部分： 0.04608346045572735
被组内方差所解释的部分： 0.9539165395442726
组内相关： 0.04608346045572735


## Model3: Hierarchical model with varying intercepts & slopes  

模型1 和模型2分别考虑了截距和斜率随着站点的变化，在模型3中我们将同时考虑截距和斜率在不同站点间的差异  

$$  
\beta_{0j} | \beta_0, \sigma_0 \sim N(\beta_0, \sigma_0^2)  
\;\;\;\; \text{ and } \;\;\;\;  
\beta_{1j} | \beta_1, \sigma_1 \sim N(\beta_1, \sigma_1^2)  
$$  

**总结模型定义：**  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0j}, \beta_{1j}, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0j} + \beta_{1j} X_{ij} & \text{(每个站点内的线性模型)} \\  
\beta_{0j} | \beta_0, \sigma_0  & \stackrel{ind}{\sim} N(\beta_0, \sigma_0^2) & \text{(截距在站点间的变化)} \\  
\beta_{1j} | \beta_1, \sigma_1  & \stackrel{ind}{\sim} N(\beta_1, \sigma_1^2) & \text{(斜率在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_0 & \sim \text{Exp}(1)    & \\  
\sigma_1 & \sim \text{Exp}(1)    & \\  
\sigma_y & \sim \text{Exp}(1).    & \\  
\end{array}  
$$  

或：  
$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0}, \beta_{1j}, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = (\beta_{0} +b_{0j}) + (\beta_{1} + b_{1j}) X_{ij} & \text{(每个站点内的线性模型)} \\  
b_{0j} | \sigma_0 & \stackrel{ind}{\sim} N(0, \sigma_0^2) & \text{(截距在站点间的变化)} \\  
b_{1j} | \sigma_1 & \stackrel{ind}{\sim} N(0, \sigma_1^2) & \text{(斜率在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_0 & \sim \text{Exp}(1)    & \\  
\sigma_1 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$  


**分层模型与非池化模型的对比**  

$$  
\beta_0 \sim N(0, 50)\\  
\sigma_0 \sim \text{Exp}(1)\\  
\beta_1 \sim N(0,5)\\  
\sigma_1 \sim \text{Exp}(1)  
$$  

* 在非池化模型中，我们认为截距、斜率和变异在不同站点间是不同的；  
* 但在层级模型中，我们仍考虑了来自总体的的信息，即不同站点间的斜率/截距仍是从总体斜率/截距中抽样的。  
	* 注意，在层级模型中一般不会假设*变异会随着分组变量变化*，这也是分层模型和非池化模型的重要区别。  


In [73]:
# 定义函数来构建和采样模型
def run_var_both_model(non_centered = False):

    #定义数据坐标，包括站点和观测索引
    coords = {"site": df_first5["Site"].unique(),
            "obs_id": df_first5.obs_id}

    with pm.Model(coords=coords) as model:
        #定义全局参数
        beta_0 = pm.Normal("beta_0", mu=0, sigma=50)
        beta_0_sigma = pm.Exponential("beta_0_sigma", 1)
        beta_1 = pm.Normal("beta_1", mu=0, sigma=5) 
        beta_1_sigma = pm.Exponential("beta_1_sigma", 1)
        sigma_y = pm.Exponential("sigma_y", 1) 

        #传入自变量、获得观测值对应的站点映射
        x = pm.MutableData("x", df_first5.stress, dims="obs_id")
        site = pm.MutableData("site", df_first5.site_idx, dims="obs_id") 

        #选择不同的模型定义方式
        if non_centered:
            beta_0_offset = pm.Normal("beta_0_offset", 0, sigma=1, dims="site")
            beta_0j = pm.Deterministic("beta_0j", beta_0 + beta_0_offset * beta_0_sigma, dims="site")
            beta_1_offset = pm.Normal("beta_1_offset", 0, sigma=1, dims="site")
            beta_1j = pm.Deterministic("beta_1j", beta_1 + beta_1_offset * beta_1_sigma, dims="site")
        else:
            beta_0j = pm.Normal("beta_0j", mu=beta_0, sigma=beta_0_sigma, dims="site")
            beta_1j = pm.Normal("beta_1j", mu=beta_1, sigma=beta_1_sigma, dims="site")

        #线性关系
        mu = pm.Deterministic("mu", beta_0j[site]+beta_1j[site]*x, dims="obs_id")

        # 定义 likelihood
        likelihood = pm.Normal("y_est", mu=mu, sigma=sigma_y, observed=df_first5.scontrol, dims="obs_id")

        trace = pm.sample(draws=5000,           # 使用mcmc方法进行采样，draws为采样次数
                            tune=1000,                    # tune为调整采样策略的次数，可以决定这些结果是否要被保留
                            chains=4,                     # 链数
                            discard_tuned_samples= True,  # tune的结果将在采样结束后被丢弃
                            random_seed=84735,
                            target_accept=0.99)
    
    return model, trace

In [74]:
# 注意，以下代码可能运行10分钟左右

var_both_model, var_both_trace = run_var_both_model(non_centered = True)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_0_sigma, beta_1, beta_1_sigma, sigma_y, beta_0_offset, beta_1_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 184 seconds.
There were 7 divergences after tuning. Increase `target_accept` or reparameterize.


In [75]:
pm.model_to_graphviz(var_both_model)

###  MCMC采样&后验参数估计  

* 可以看到5条回归线的斜率$\beta_{1j}$、截距$\beta_{0j}$都是不同的  
* $\beta_{1j}$、$\beta_{0j}$是在总体$\beta_{1}$、$\beta_{0}$上增加了一些变异

In [76]:
var_both_para = az.summary(var_both_trace,
                            var_names=["beta_0j","beta_1j"],
                            filter_vars="like")
var_both_para

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta_0j[Kassel],63.167,2.155,59.211,67.271,0.018,0.013,14939.0,14374.0,1.0
beta_0j[Portugal],63.805,2.611,59.138,68.908,0.027,0.019,9768.0,13221.0,1.0
beta_0j[Southampton],62.516,2.418,57.956,67.008,0.018,0.013,17485.0,14642.0,1.0
beta_0j[Tsinghua],61.829,2.152,57.911,66.012,0.017,0.012,15995.0,14840.0,1.0
beta_0j[UCSB],62.983,2.195,58.723,67.002,0.017,0.012,16092.0,14383.0,1.0
beta_1j[Kassel],-0.553,0.054,-0.653,-0.451,0.0,0.0,15507.0,14851.0,1.0
beta_1j[Portugal],-0.496,0.074,-0.63,-0.359,0.001,0.001,8478.0,13434.0,1.0
beta_1j[Southampton],-0.571,0.069,-0.701,-0.439,0.001,0.0,18669.0,15095.0,1.0
beta_1j[Tsinghua],-0.557,0.053,-0.654,-0.454,0.0,0.0,16720.0,15209.0,1.0
beta_1j[UCSB],-0.583,0.053,-0.682,-0.485,0.0,0.0,16193.0,14467.0,1.0


In [77]:
# 设置绘图坐标
figs, (ax1, ax2) = plt.subplots(1,2, figsize = (20,5))
# 绘制变化的截距
az.plot_forest(var_both_trace,
           var_names=["~mu", "~sigma", "~offset", "~beta_1"],
           filter_vars="like",
           combined = True,
           ax=ax1)
# 绘制变化的斜率
az.plot_forest(var_both_trace,
           var_names=["~mu", "~sigma", "~offset", "~beta_0"],
           filter_vars="like",
           combined = True,
           ax=ax2)
plt.show()

### 后验预测回归线  
* 5条回归线的截距、斜率都是不同的

In [78]:
#定义函数，绘制不同站点下的后验预测回归线
def plot_partial_regression(data, trace, group_index):
    # 定义画布，根据站点数量定义画布的列数
    fig, ax = plt.subplots(1,len(data["Site"].unique()), 
                       sharex=True,
                       sharey=True,
                       figsize=(15,5))
    
    # 根据站点数来分别绘图
    # 我们需要的数据有原始数据，每一个因变量的后验预测均值
    # 这些数据都储存在后验参数采样结果中，也就是这里所用的trace
    for i, group in enumerate(data["Site"].unique()):
        #绘制真实数据的散点图
        x = trace.constant_data.x.sel(obs_id = group_index[f"{group}"])
        y = trace.observed_data.y_est.sel(obs_id = group_index[f"{group}"])
        mu = trace.posterior.mu.sel(obs_id = group_index[f"{group}"])
        ax[i].scatter(x, y,
                color=f"C{i}",
                alpha=0.5)
        #绘制回归线
        ax[i].plot(x, mu.stack(sample=("chain","draw")).mean(dim="sample"),
                color=f"C{i}",
                alpha=0.5)
        ax[i].set_title(f"Slope: {var_both_para.loc[f'beta_1j[{group}]']['mean']}\nIntercept: {var_both_para.loc[f'beta_0j[{group}]']['mean']}", 
        fontsize=12)
        #绘制预测值95%HDI
        az.plot_hdi(
            x, mu,
            hdi_prob=0.95,
            fill_kwargs={"alpha": 0.25, "linewidth": 0},
            color=f"C{i}",
            ax=ax[i])
        
    # 生成横坐标名称
    fig.text(0.5, 0, 'Stress', ha='center', va='center', fontsize=12)
    # 生成纵坐标名称
    fig.text(0.08, 0.5, 'Self control', ha='center', va='center', rotation='vertical', fontsize=12)
    # 生成标题
    plt.suptitle("Posterior regression models(varing slope and intercept)", fontsize=15, y=1.05)
        
    sns.despine()


In [79]:
plot_partial_regression(data=df_first5,
                trace=var_both_trace,
                group_index=first5_index)

### 组间方差与组内方差  

* 在这个模型定义中，组间方差来自`beta_0_offset`、`beta_1_offset`，组内方差来自`sigma_y`

In [80]:
# 提取组间和组内变异
para_sum = az.summary(var_both_trace,
                      var_names=["_offset","sigma_"],
                      filter_vars="like")
between_sd = (para_sum.filter(like='_offset', axis=0)["mean"]**2).sum()
within_sd = para_sum.loc['sigma_y','mean']**2
# 计算变异占比
var = between_sd + within_sd
print("被组间方差所解释的部分：", between_sd/var)
print("被组内方差所解释的部分：", within_sd/var)
print("组内相关：",between_sd/var)


被组间方差所解释的部分： 0.04630475826414934
被组内方差所解释的部分： 0.9536952417358506
组内相关： 0.04630475826414934


## 评估后验预测  

* 在之前的课程中，我们介绍过对后验预测结果进行评估的两种方法  

* 一是**MAE**，即后验预测值与真实值之间预测误差的中位数，二是**within_95**，即真实值是否落在95%后验预测区间内  

* 在这里我们调用之前写过的计算两种指标的方法，评估四个模型的后验预测结果

In [81]:
# 进行后验预测
complete_ppc = pm.sample_posterior_predictive(complete_trace, 
                                            model = complete_pooled_model,
                                            random_seed=84735)
no_ppc = pm.sample_posterior_predictive(no_trace, 
                                        model = no_pooled_model,
                                        random_seed=84735)   
var_inter_ppc = pm.sample_posterior_predictive(var_inter_trace,
                                                model = var_inter_model,
                                                random_seed=84735)
var_slope_ppc = pm.sample_posterior_predictive(var_slope_trace,
                                                model = var_slope_model,
                                                random_seed=84735)                                                                                       
var_both_ppc = pm.sample_posterior_predictive(var_both_trace, 
                                            model = var_both_model,
                                            random_seed=84735)

Sampling: [y_est]


Sampling: [y_est]


Sampling: [y_est]


Sampling: [y_est]


Sampling: [y_est]


In [82]:
# 定义计算 MAE 函数
from statistics import median
def MAE(model_ppc):
    # 计算每个X取值下对应的后验预测模型的均值
    pre_x = model_ppc.posterior_predictive["y_est"].stack(sample=("chain", "draw"))
    pre_y_mean = pre_x.mean(axis=1).values

    # 提取观测值Y，提取对应Y值下的后验预测模型的均值
    MAE = pd.DataFrame({
        "scontrol_ppc_mean": pre_y_mean,
        "scontrol_original": model_ppc.observed_data.y_est.values
    })

    # 计算预测误差
    MAE["pre_error"] = abs(MAE["scontrol_original"] -\
                            MAE["scontrol_ppc_mean"])

    # 最后，计算预测误差的中位数
    MAE = median(MAE.pre_error)
    return MAE


In [83]:
# 定义
def counter_outlier(model_ppc, hdi_prob=0.95):
    # 将az.summary生成的结果存到hdi_multi这个变量中，该变量为数据框
    hdi = az.summary(model_ppc, kind="stats", hdi_prob=hdi_prob)
    lower = hdi.iloc[:,2].values
    upper = hdi.iloc[:,3].values

    # 将原数据中的自我控制分数合并，便于后续进行判断
    y_obs = model_ppc.observed_data["y_est"].values

    # 判断原数据中的压力分数是否在后验预测的95%可信区间内，并计数
    hdi["verify"] = (y_obs <= lower) | (y_obs >= upper)
    hdi["y_obs"] = y_obs
    hdi_num = sum(hdi["verify"])

    return hdi_num

In [84]:
# 将每个模型的PPC储存为列表
ppc_samples_list = [complete_ppc, no_ppc, var_inter_ppc, var_slope_ppc, var_both_ppc]
model_names = ["完全池化", "部分池化", "变化截距", "变化斜率", "变化截距、斜率"]

# 建立一个空列表来存储结果
results_list = []

# 遍历模型并计算MAE和超出95%hdi的值
for model_name, ppc_samples in zip(model_names, ppc_samples_list):
    outliers = counter_outlier(ppc_samples)
    MAEs = MAE(ppc_samples)
    results_list.append({'Model': model_name, 'MAE':MAEs, 'Outliers': outliers})

# 从结果列表创建一个DataFrame
results_df = pd.DataFrame(results_list)

results_df


Unnamed: 0,Model,MAE,Outliers
0,完全池化,3.931585,24
1,部分池化,3.820402,23
2,变化截距,3.946004,21
3,变化斜率,3.96137,17
4,变化截距、斜率,3.904573,19


## Model comparison  

从模型比较的结果，我们可以发现：  
* 同时包含变化截距和变化斜率的模型(model3)是最优模型，对应了我们的假设3。  
* 值得注意的是，非池化模型 (no pool model)同样考虑了不同站点间截距和斜率的变化，但是它仅比完全池化模型好一些。  
* 此外，所有模型的 elpd 都非常接近 (考虑到 se大于为15~16)，因此，模型比较的结果只能作为参考，更重要的是通过后验预测检验模型的性能。  

模型假设：  
* H0(model 0)，普通线性模型，仅考虑压力对自我控制的影响。  
* H1(model 1)，变化截距模型，在模型0的基础上考虑自我控制在不同站点的变化。  
* H2(model 2)，变化斜率模型，在模型0的基础上不同站点间的压力影响的变化。  
* H3(model 3)，变化截距和斜率模型，结合模型1和模型2，同时考虑站点对自我控制以及压力影响的变化。

In [85]:
pm.compute_log_likelihood(complete_trace, model=complete_pooled_model)
pm.compute_log_likelihood(no_trace, model=no_pooled_model)
pm.compute_log_likelihood(var_inter_trace, model=var_inter_model)
pm.compute_log_likelihood(var_slope_trace, model=var_slope_model)
pm.compute_log_likelihood(var_both_trace, model=var_both_model)

In [86]:
comparison_list = {
    "model0(complete pool)":complete_trace,
    "model1(hierarchical intercept)":var_inter_trace,
    "model2(hierarchical slope)":var_slope_trace,
    "model3(hierarchy both)":var_both_trace,
    "no pool model":no_trace
}
az.compare(comparison_list)

Unnamed: 0,rank,elpd_loo,p_loo,elpd_diff,weight,se,dse,warning,scale
model3(hierarchy both),0,-1363.340492,7.510265,0.0,0.281716,16.05533,0.0,False,log
model1(hierarchical intercept),1,-1363.949401,6.770135,0.608909,0.0,16.021236,0.891235,False,log
model2(hierarchical slope),2,-1364.190367,7.281886,0.849875,0.0,15.976502,0.397992,False,log
no pool model,3,-1365.404604,18.680394,2.064112,0.512647,17.852656,5.174477,True,log
model0(complete pool),4,-1367.84745,3.117897,4.506958,0.205637,15.951833,3.339691,False,log


## 预测新站点的数据  

* 我们可以根据当前的层级模型对新组别的数据进行预测，如"Zurich"站点  

* 在pymc中，只要在`pm.sample_posterior_predictive`中传入模型MCMC后验参数采样结果，即可以在该模型的基础上对新数据生成预测  

* 预测结果储存在`.predictions`中

In [87]:
# 选择站点为"Zurich"的数据
new_group = df_raw[df_raw.Site=="Zurich"]
# 生成被试索引
new_group["obs_id"] = range(len(new_group))
# 生成站点索引
new_group["site_idx"] = pd.factorize(new_group.Site)[0]

In [88]:
new_coords = {"site": new_group["Site"].unique(),
          "obs_id": new_group.obs_id}

with pm.Model(coords=new_coords) as hier_pred:
    #定义全局参数(这部分没有改变)
    beta_0 = pm.Normal("beta_0", mu=40, sigma=20)
    beta_0_sigma = pm.Exponential("beta_0_sigma", 1)
    beta_1 = pm.Normal("beta_1", mu=0, sigma=5) 
    beta_1_sigma = pm.Exponential("beta_1_sigma", 1)
    sigma_y = pm.Exponential("sigma_y", 1) 

    #传入自变量
    x = pm.MutableData("x", new_group.stress, dims="obs_id")
    #获得观测值对应的站点映射
    site = pm.MutableData("site", new_group.site_idx, dims="obs_id") 
    
    #注意：在这里我们需要传入一个新的参数名，因为传入的是一个新站点(除此处外，其余的定义变量名未发生改变)
    new_beta_0_offset = pm.Normal("new_beta_0_offset", 0, sigma=1, dims="site")
    new_beta_0j = pm.Deterministic("new_beta_0j", beta_0 + new_beta_0_offset * beta_0_sigma, dims="site")
    new_beta_1_offset = pm.Normal("new_beta_1_offset", 0, sigma=1, dims="site")
    new_beta_1j = pm.Deterministic("new_beta_1j", beta_1 + new_beta_1_offset * beta_1_sigma, dims="site")
    new_mu = pm.Normal("new_mu",  new_beta_0j[site]+new_beta_1j[site]*x, dims="obs_id")

    #似然
    likelihood = pm.Normal("y_est", mu=new_mu, sigma=sigma_y, observed=new_group.scontrol, dims="obs_id")

    # 进行后验预测估计，注意使用的是上一个模型的后验参数估计，partial_trace
    pred_trace = pm.sample_posterior_predictive(var_both_trace,
                                                var_names=["new_beta_0j","new_beta_1j"],
                                                predictions=True,
                                                extend_inferencedata=True,
                                                random_seed=84735)

Sampling: [new_beta_0_offset, new_beta_1_offset]


In [89]:
pred_trace

### 组内预测 vs 组外预测  

* 现在，对于原有站点和新站点，通过MCMC采样，我们都得到了对应站点下对斜率和截距的估计  

    * 在原有站点中，斜率和截距的变量名为beta_0j，beta_1j；在新站点中二者的名字则为new_beta_0j，new_beta_1j  

* 假设我们想知道当压力分数为40时，自我控制分数为多少，那么可以根据 $\mu_{ij} = \beta_{0j} + \beta_{1j} \cdot 40$ 对该数据点的观测值做出预测  

* 在MCMC采样中，对于每个站点，都生成了20000对参数估计值，$\left\lbrace \beta_{0j}^{(i)}, \beta_1^{(i)}\right\rbrace$。因此，代入X值后，对于每个站点我们都能获得20000个对应的预测值

In [90]:
#建立空dataframe，储存后验预测的结果
col = df_first5["Site"].unique()
pred_result = pd.DataFrame(columns=col)
#对每一个站点，提取后验参数的结果代入公式计算，并将计算结果存储在数据框的不同列
for site in df_first5["Site"].unique():
    pred = (40 * (var_both_trace.posterior['beta_1j'].sel(site = f"{site}")) +\
    (var_both_trace.posterior['beta_0j'].sel(site = f"{site}"))).stack(sample=("chain","draw")).values
    pred_result[site] = pred

In [91]:
#对于新站点，同样提取对应的参数并代入公式计算，将结果存在新列中
pred_trace = pred_trace.predictions.stack(sample=("chain","draw","site"))
new_group_pred = (40 * (pred_trace['new_beta_1j']) + (pred_trace['new_beta_0j'])).values
pred_result[new_group["Site"].unique()[0]] = new_group_pred 

pred_result

Unnamed: 0,Kassel,Portugal,Southampton,Tsinghua,UCSB,Zurich
0,39.562814,45.112704,40.492186,39.751103,39.251714,37.331538
1,41.897816,47.252798,37.287959,39.295603,40.789557,32.677484
2,41.107089,42.565081,40.537043,39.151684,39.239426,42.179472
3,40.033072,46.059016,40.606599,39.628171,39.119121,43.557299
4,40.813528,45.432646,39.480795,38.963377,39.450079,47.701150
...,...,...,...,...,...,...
19995,39.694040,45.579483,39.947529,39.193291,39.105715,41.863919
19996,40.663213,42.510325,41.701204,40.259021,40.430954,38.299519
19997,41.125972,45.183243,40.863587,40.201528,39.839498,42.452076
19998,41.519790,46.113326,40.331410,39.348882,39.910313,44.454457


**绘制预测密度分布图**  

* 最后，我们绘制每个站点的预测结果分布  

* 可以看到，数据点较多的站点预测的结果比较集中，数据点较少的站点预测的结果比较分散  

* 新站点的预测结果变异性是最大的

In [92]:
# 根据列数定义画布的行数
fig, ax = plt.subplots(len(pred_result.columns),1, figsize = (8,8),
                       sharex=True,
                       sharey=True)
# 对于每一个站点，绘制其预测结果的密度分布图
for i,site in enumerate(pred_result.columns):
    az.plot_kde(pred_result[site],
                fill_kwargs={"alpha": 0.5},
                quantiles=[.25, .75],
                ax=ax[i])
    #设置y轴标题和刻度
    ax[i].set_ylabel(f'{site}', rotation=0, labelpad=40)
    ax[i].set_yticks([])
#设置x轴范围
plt.xlim([20,60])
#设置标题
plt.suptitle("Posterior predictive models for Self Control(X = 40)",
             y=0.95,
             fontsize=15)
sns.despine()

## bambi code  

* bambi在对层级模型进行定义时，它认为组间参数如截距/斜率，由共同部分和组间变异组成(即pymc中的non-centered定义)  

| 模型  | 模型表达    |  
| -------- | ----------|  
| complete_pool |"scontrol ~ stress" |  
| no pool | "scontrol ~ "0 + stress:Site" |  
| varing intercepts | "scontrol ~ stress + (1\|Site)" |  
| varing slopes  | "scontrol ~ stress + (0 + stress\|Site)" |  
| varing intercepts and slopes  |"scontrol ~ 1 + stress + (1 + stress\|Site)"  |  

* 在bambi中，1表示添加回归截距，0表示不添加回归截距。  
* (...|Site)表示分层模型  
	* (1|Site)表示：仅包含截距的组间变异  
	* (0 + stress|Site)表示: 仅包含斜率的组间变异  
	* (1 + stress|Site)表示: 包含截距和斜率的组间变异，可以写作 (stress|Site)  
* 注意，bambi 中无法定义真正的非池化模型  
	* "scontrol ~ "0 + stress:Site"仅假设斜率存在变异，并且各站点的斜率相互独立。  
	* 因此，之后我们仅考虑用 bambi 来定义分层模型，而不是非池化的模型。

In [93]:
# complete-pooled
complete_bmb = bmb.Model("scontrol ~ stress",
                         df_first5,
                         family="gaussian")
complete_bmb.build()
complete_bmb.graph()

In [94]:
# no-pooled
no_bmb = bmb.Model("scontrol ~ 0 + stress:Site",
                      df_first5)
no_bmb.build()
no_bmb.graph()

In [95]:
# group-specific intercepts
inter_bmb = bmb.Model("scontrol ~ stress + (1|Site)",
                      df_first5,
                      categorical="Site")

inter_bmb.build()
inter_bmb.graph()

In [96]:
# group-specific slopes
slope_bmb = bmb.Model("scontrol ~ stress + (0 + stress|Site)",
                      df_first5,
                      categorical="Site")
slope_bmb.build()
slope_bmb.graph()

In [97]:
# group specific intercept and slope
var_both_bmb = bmb.Model("scontrol ~ 1 + stress + (stress|Site)",
                         df_first5,
                         categorical="Site")

var_both_bmb.build()
var_both_bmb.graph()

## 补充内容：分层模型中的组层面预测因子(group-level predictors)  

🤔 在上面的例子中，我们考虑了个体压力对于自我控制的影响，并且分不同站点来考虑这一问题。然而，我们是否考虑过不同站点的其他特性对于自我控制的影响呢？  

* 例如，原数据想要探究的核心问题为温度和社会行为的关系，不同站点对应了不同国家和地区，而这些地区的平均气温是否会对自我控制产生影响呢？  

    * 由于该数据的收集时间为夏季，因此我们查阅了每一站点的夏季平均气温，作为组层面的预测变量  

* 组层面预测因子(group-level predictors)是指在组间(站点)层面的特征，这是相对于组内预测因子(group-level predictors)而言的。  
    * 例如，个体压力分数(stress)是组内预测因子。这个值在不同个体间具有不同值。  
    * 而地区的夏季平均温度是组层面预测因子。这些值在不同站点(site)间不同，但是在站点内的个体间保持相同。  

![Image Name](https://cdn.kesci.com/upload/s5wfm7n7r8.png?imageView2/0/w/960/h/960)  





In [98]:
#选择需要的变量
df_temp =df_first5[["site_idx","obs_id","Site","stress","scontrol"]].reset_index(drop=True)
levels = df_temp['Site'].unique()

#生成各站点的夏季平均气温
level_mapping = {
    levels[0]: 17.6,
    levels[1]: 24.8,
    levels[2]: 19.9,
    levels[3]: 30.3,
    levels[4]: 19
}

#将气温信息合并到数据框中
df_temp['avetemp'] = df_temp['Site'].map(level_mapping)
df_temp 

Unnamed: 0,site_idx,obs_id,Site,stress,scontrol,avetemp
0,0,0,Kassel,30,47,17.6
1,0,1,Kassel,30,44,17.6
2,0,2,Kassel,31,47,17.6
3,0,3,Kassel,47,37,17.6
4,0,4,Kassel,50,33,17.6
...,...,...,...,...,...,...
410,4,410,UCSB,48,36,19.0
411,4,411,UCSB,45,32,19.0
412,4,412,UCSB,27,40,19.0
413,4,413,UCSB,46,31,19.0


### 夏季平均气温对自我控制平均分的影响  

    * 由于我们只考虑了5个站点的数据，因此也只有5个数据点

In [99]:
sns.regplot(x="avetemp", y="scontrol", data=df_temp.groupby("Site").mean())
sns.despine()

### 构建具有组层面预测因子的分层模型  

* 我们在**变化截距**模型的基础上继续添加  

* 回顾之前的仅包含变化的截距的分层模型：  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0j}, \beta_1, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0j} + \beta_1 X_{ij} & \text{(Layer1: 每个站点内的线性模型)} \\  
\beta_{0j} | \beta_0, \sigma_0  & \stackrel{ind}{\sim} N(\beta_0, \sigma_0^2) & \text{(Layer2: 截距在站点间的变化)} \\  
\beta_{0}  & \sim N(0, 50^2) & \text{(Layer3: 全局参数的先验)} \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_0 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$  

🤔 如果我们想考虑组层面预测因子，那么该变量该放在哪个层呐(Layer)？  
* 由于组层面预测因子是组间(站点$j$)层面的特征，因此该变量主要影响截距在站点间的变化。  
* 公式写作： $\beta_{0j} = \gamma_0 + \gamma_1 U_j$  
  * 其中，$U_j$是站点$j$的组层面预测因子  
  * $\gamma_1$是组层面预测因子$U_j$的回归系数；而$\gamma_0$为站点间的变化的截距。  
  * 因此，每个站点的截距 $\beta_{0j}$，受到组层面的预测因子$U_j$的影响。  
* 也可以写作 $\mu_{ij} = \beta_{0j} + \beta_1 X_{ij} = (\gamma_0 + \gamma_1 U_j) + \beta_1 X_{ij}$

**总的公式**  

将 $\beta_{0j} = \gamma_0 + \gamma_1 U_j$ 放入之前的公式，并且为新参数$\gamma_0$和$\gamma_1$引入先验分布：  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0j}, \beta_1, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = \beta_{0j} + \beta_1 X_{ij} & \text{(Layer1: 每个站点内的线性模型)} \\  
\beta_{0j} | \gamma_0, \gamma_1, \sigma_0  & \stackrel{ind}{\sim} N(\gamma_0 + \gamma_1 U_j, \sigma_0^2) & \text{(Layer2: 截距在站点间的变化)} \\  
\gamma_0  & \sim N(0, 50^2) & \text{(Layer3: 全局参数的先验)} \\  
\gamma_1 & \sim N(0, 5^2) & \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_0 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$

**另一种定义形式**  

* 在此前我们认为$\beta_{0j}= \beta_0 + b_{0j}$，即在共同截距的基础上增加组的变异  

* 但在具有组层面(group-level)预测因子的模型中，实际上截距又会受到组层面的影响  

    * 相当于：$\beta_{0} = \gamma_0 + \gamma_1 U_j$  

    * $\beta_{0j} = \gamma_0 + \gamma_1 U_j + b_{0j}$  

    * 进一步将 $\beta_{0j}$ 放入 $\mu_{ij} = \beta_{0j} + \beta_1 X_{ij}$中:  
    
* $\mu_{ij} = (\gamma_0 + \gamma_1 U_j + b_{0j}) + \beta_1 X_{ij}$  

* 整理有：  

$$  
\begin{array}{rll}  
Y_{ij} | \beta_{0j}, \beta_1, \sigma_y & \sim N(\mu_{ij}, \sigma_y^2) \;\; \text{ with } \;\;  \mu_{ij} = (\gamma_0 + \gamma_1 U_j + b_{0j}) + \beta_1 X_{ij} & \text{(Layer1: 每个站点内的线性模型)} \\  
b_{0j} | \sigma_0 & \stackrel{ind}{\sim} N(0, \sigma_0^2) & \text{(Layer2: 截距在站点间的变化)} \\  
\gamma_0  & \sim N(0, 50^2) & \text{(Layer3: 全局参数的先验)} \\  
\gamma_1 & \sim N(0, 5^2) & \\  
\beta_1  & \sim N(0, 5^2) & \\  
\sigma_y & \sim \text{Exp}(1)    & \\  
\sigma_0 & \sim \text{Exp}(1).    & \\  
\end{array}  
$$  



In [100]:
# 我们采用第二种定义方式

#定义数据坐标，包括站点和观测索引
coords = {"site": df_temp["Site"].unique(),
        "obs_id": df_temp.obs_id}

with pm.Model(coords=coords) as group_pred_model:
    #定义全局参数
    gamma_0 = pm.Normal("gamma_0", mu=0, sigma=50)
    gamma_1 = pm.Normal("gamma_1", mu=0, sigma=5) 
    beta_1 = pm.Normal("beta_1", mu=0, sigma=5) 
    sigma_y = pm.Exponential("sigma_y", 1) 
    sigma_0 = pm.Exponential("sigma_0", 1)

    #传入自变量、获得观测值对应的站点映射
    x = pm.MutableData("x", df_temp.stress, dims="obs_id")
    u = pm.MutableData("u", df_temp.avetemp.unique(), dims="site")
    site_idx = pm.MutableData("site", df_temp.site_idx, dims="obs_id") 

    #定义组层面变量
    beta_0 = pm.Deterministic("beta_0", gamma_0 + gamma_1*u, dims="site")
    beta_0_offset = pm.Normal("beta_0_offset", 0, sigma=1, dims="site")
    beta_0j = pm.Deterministic("beta_0j", beta_0 + sigma_0*beta_0_offset, dims="site")

    #线性关系
    mu = pm.Deterministic("mu", beta_0j[site_idx]+beta_1*x, dims="obs_id")
 
    # 定义 likelihood
    likelihood = pm.Normal("y_est", mu=mu, sigma=sigma_y, observed=df_temp.scontrol, dims="obs_id")

    group_pred_trace = pm.sample(draws=5000,           # 使用mcmc方法进行采样，draws为采样次数
                        tune=1000,                    # tune为调整采样策略的次数，可以决定这些结果是否要被保留
                        chains=4,                     # 链数
                        discard_tuned_samples= True,  # tune的结果将在采样结束后被丢弃
                        random_seed=84735,
                        target_accept=0.99)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [gamma_0, gamma_1, beta_1, sigma_y, sigma_0, beta_0_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 256 seconds.


In [101]:
pm.model_to_graphviz(group_pred_model)

#### bambi code  

* 当模型变得复杂， 在 PyMC 中定义分层模型将变得困难。  

* 使用 Bambi 可以简化这一过程，其模型定义公式为 `"stress + avetemp + (stress|Site)"`  

    * 只需要在原来层级模型的基础上加入组层面预测因子即可

In [102]:
group_pred_bmb = bmb.Model("scontrol ~ stress + avetemp + (stress|Site)",
                      df_temp,
                      categorical="Site")

group_pred_bmb.build()
group_pred_bmb.graph()

###  MCMC采样&后验参数估计  

* 可以看到组层面预测因子(平均最低气温)的回归斜率$\gamma_{1j}$和截距$\gamma_{0j}$

In [103]:
az.summary(group_pred_trace,
           var_names=["gamma_0","gamma_1","beta_0j","beta_1"])

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
gamma_0,62.986,5.463,51.922,72.399,0.073,0.052,6179.0,6585.0,1.0
gamma_1,0.014,0.221,-0.368,0.468,0.003,0.003,6030.0,6941.0,1.0
beta_0j[Kassel],63.709,1.978,60.006,67.449,0.016,0.011,15661.0,13706.0,1.0
beta_0j[Portugal],65.793,2.353,61.168,70.013,0.02,0.014,13425.0,13954.0,1.0
beta_0j[Southampton],62.594,2.566,57.563,67.274,0.022,0.015,14134.0,14330.0,1.0
beta_0j[Tsinghua],62.113,1.954,58.447,65.779,0.016,0.011,15072.0,13515.0,1.0
beta_0j[UCSB],62.45,2.077,58.514,66.331,0.017,0.012,14341.0,12870.0,1.0
beta_1,-0.566,0.047,-0.655,-0.477,0.0,0.0,15049.0,12525.0,1.0


In [104]:
group_pred_trace

### 后验预测回归线(站点平均气温的影响)  

* 下面展示了组层面预测因子(夏季平均气温)与截距的关系

In [105]:
# 提取每个站点的温度
u = group_pred_trace.constant_data.u
# 提取组层面截距与温度的关系，体现在beta_0: beta_0 = gamma_0 + gamma_1 * u
beta_0 = group_pred_trace.posterior.beta_0.mean(dim=("chain","draw")).values
#提取每个站点的截距
beta_0j = group_pred_trace.posterior.beta_0j.mean(dim=("chain","draw"))
temp_hdi = az.hdi(group_pred_trace.posterior.beta_0j)
# 绘制每个站点的截距均值
plt.scatter(u, beta_0j,
        color="black",
        alpha=0.5,
        label="Mean site-intercept")
#绘制截距与温度之间的关系
plt.plot(u, beta_0,
        color="red",
        alpha=0.5,
        label="Mean intercept")
#绘制每个站点截距95%HDI
az.plot_hdi(
        u, group_pred_trace.posterior.beta_0j,
        hdi_prob=0.95,
        fill_kwargs={"alpha": 0.1, "color": "k", "label": "Mean intercept HPD"}
        )
#生成横坐标名称
plt.xlabel('Site-level temparature', fontsize=12)
# 生成纵坐标名称
plt.ylabel('Intercept estimate', fontsize=12)
plt.legend(loc="upper right")

sns.despine()

## Hierarchical logistic regression

> 在之前的课程中，除了正态回归模型，我们还介绍过logistic回归模型、泊松回归模型和负二项回归模型。这些模型同样可以和层级模型结合  

* 在lec13中我们使用回避依恋分数来预测个体的恋爱情况，假设这一线性关系在不同文化中有不同的表现，我们也可以把站点信息考虑在内  

* 对于因变量为离散变量的情况，我们需要使用广义线性模型(Generalized linear model，GLM)。  

* 其特点为：  
	- 分布簇 (dist)不再局限于正态分布，而是允许其他不同的分布，比如 $y \sim Bernoulli(p)$  
	- 需要 **链接函数$g()$** 将 $\alpha + \beta * x$  映射到 $p$所在的范围  

| 一般线性模型 | 广义线性模型 |  
|---|---|  
| $y \sim Normal(\mu, \sigma)$ | $y \sim dist(p)$ |  
| $\mu = \alpha + \beta *x$ | $p = g(z)$|  
|  | $z = \alpha + \beta *x$|  


In [106]:
#查看所需列中是否存在缺失值
df_first5[df_first5[["romantic", "avoidance_r"]].isna().any(axis=1)]

Unnamed: 0_level_0,Unnamed: 1_level_0,age,anxiety,anxiety_r,artgluctot,attachhome,attachphone,AvgHumidity,avgtemp,avoidance,avoidance_r,...,sex,Site,smoke,socialdiversity,socialembedded,socTherm,soliTherm,stress,site_idx,obs_id
Site,obs_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Kassel,103,,3.166667,0.039748,0.0,4.888889,4.111111,,35.81,3.055556,0.249759,...,,Kassel,,6,4,1.6,3.875,37,0,103
Kassel,104,,0.0,-2.658639,,4.333333,3.0,,36.1,3.055556,0.249759,...,,Kassel,,3,1,3.2,3.5,42,0,104
Kassel,105,,0.0,-2.658639,,3.888889,1.777778,,36.3,2.111111,-0.811715,...,,Kassel,,0,1,3.4,3.0,39,0,105
Portugal,123,,0.0,-1.835996,,5.0,3.0,,36.2,3.055556,-0.178491,...,,Portugal,,0,1,3.6,3.875,39,1,123
Portugal,124,,3.388889,0.631906,1.0,5.0,3.666667,,36.4,4.0,0.901529,...,,Portugal,,8,3,3.2,4.0,45,1,124
Portugal,125,,0.0,-1.835996,,3.555556,3.0,74.0,37.0,3.166667,-0.05143,...,,Portugal,,11,5,3.2,3.5,39,1,125
Portugal,126,,0.0,-1.835996,0.0,3.777778,3.0,,37.1,2.111111,-1.25851,...,,Portugal,,0,1,4.0,4.25,39,1,126
UCSB,408,,3.611111,0.143634,0.0,4.777778,3.0,82.0,35.777778,3.055556,-0.160921,...,,UCSB,,6,5,3.8,2.875,47,4,408


In [107]:
#删除缺失值
df_first5.dropna(subset=["romantic", "avoidance_r"], inplace=True)
#再次查看所需列中是否存在缺失值
df_first5[df_first5["romantic"].isna()]

Unnamed: 0_level_0,Unnamed: 1_level_0,age,anxiety,anxiety_r,artgluctot,attachhome,attachphone,AvgHumidity,avgtemp,avoidance,avoidance_r,...,sex,Site,smoke,socialdiversity,socialembedded,socTherm,soliTherm,stress,site_idx,obs_id
Site,obs_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1


In [108]:
#对数据进行重新编码
df_first5["romantic"] =  np.where(df_first5['romantic'] == 2, 0, 1)

In [109]:
# 创建画图所需的网格数
g = sns.FacetGrid(df_first5, col="Site", col_wrap=5, height=4)

# 将各个图所画的内容对应到画布上
g.map(sns.regplot, "avoidance_r", "romantic")

# 调整y轴的刻度
plt.ylim(-0.5,1.5)
plt.yticks([0,1])
# Show the plot
plt.show()

### 完全池化模型  

#### 模型定义  

在这里，对完全池化模型的定义，和我们在lec13中介绍过的logistic回归模型是一样的  

对先前介绍过的模型定义进行回顾：  

* 因变量恋爱情况为二分变量  

$$  
Y_{ij} = \begin{cases}  
1 & \text{yes} \\  
0 & \text{no} \\  
\end{cases}  
$$  

* 恋爱情况与回避依恋分数的情况可以表示为：  
$$  
\begin{split}  
Y_{ij}|\beta_0,\beta_1 & \stackrel{ind}{\sim} \text{Bern}(\pi_{ij}) \;\; \text{ with } \;\; \pi_i = \frac{e^{\beta_0 + \beta_1 X_{ij}}}{1 + e^{\beta_0 + \beta_1 X_{ij}}}  \\  
\beta_{0}  &  \sim N\left(0, 0.5^2 \right)  \\  
\beta_1  &  \sim N\left(0, 0.5^2 \right)   \\  
\end{split}  
$$  

* 注意对$\beta_0,\beta_1$在整体logistic回归中的意义，在这里仅简单解释为线性关系中的斜率和截距  

> 注：在代码的模型定义中，我们使用的是bambi默认的先验，此处先验定义中的具体数值只为辅助说明使用。

#### MCMC采样&后验参数估计

In [110]:
complete_logit_bmb = bmb.Model("romantic ~ avoidance_r", 
                               df_first5, 
                               family="bernoulli")
complete_logit_trace = complete_logit_bmb.fit(random_seed=84735)

Modeling the probability that romantic==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, avoidance_r]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.


**后验参数解释**  

* `avoidance_r`为共同的斜率  

* `Intercept`为共同的截距  


In [111]:
az.summary(complete_logit_trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
Intercept,-0.055,0.1,-0.234,0.147,0.002,0.001,3799.0,2578.0,1.0
avoidance_r,-0.011,0.099,-0.191,0.173,0.002,0.001,3963.0,2960.0,1.0


### 变化截距模型  

#### 模型定义  

* 考虑线性关系(截距)在不同站点间的不同  

$$  
\begin{array}{rll}  
Y_{ij}|\beta_{0j},\beta_{1j} & \sim \text{Bern}(\pi_{ij})\; \text{ with } \;\; \pi_i = \frac{e^{\beta_{0j} + \beta_{1} X_{ij}}}{1 + e^{\beta_{0j} + \beta_{1} X_{ij}}} \\  
&& \text{(每个站点 $j$内的线性模型)}\\  

\beta_{0j} &= \beta_0 + b_{0j}\;\;\;b_{0j} \sim N(0, \sigma_0^2)& \text{(截距在站点间的变化)} \\  

\beta_{0}  &  \sim N\left(0, 0.5^2 \right) & \text{(全局参数的先验)}\\  
\beta_1  &  \sim N\left(0, 0.5^2 \right) & \\  
\sigma_0 & \sim \text{Exp}(1) & \\  
\end{array}  

$$  

* 模型图中的`Intercept`即为$\beta_{0}$，`1|Site`即为$b_{0j}$

In [112]:
# common slope/ common intercept + group-specific intercept
inter_logit_bmb = bmb.Model("romantic ~ avoidance_r + (1|Site)",
                               df_first5,
                               family="bernoulli")
inter_logit_bmb.build()
inter_logit_bmb.graph()

#### MCMC采样&后验参数估计

In [113]:
inter_logit_trace = inter_logit_bmb.fit(draws=5000,           
                                    tune=1000,                    
                                    chains=4,                     
                                    discard_tuned_samples= True,  
                                    random_seed=84735,
                                    target_accept=0.99)

Modeling the probability that romantic==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, avoidance_r, 1|Site_sigma, 1|Site_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 92 seconds.


**后验参数解释**  

* `avoidance_r`为共同的斜率  

* `Intercept`为共同的截距  

* `1|Site[xx]`为每一组在截距上的变异  

    * 如，Kassel站点的截距为：0.250+0.336 = 0.586

In [114]:
az.summary(inter_logit_trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
Intercept,0.253,0.491,-0.618,1.232,0.007,0.005,5659.0,6990.0,1.0
avoidance_r,-0.013,0.102,-0.205,0.177,0.001,0.001,14612.0,12245.0,1.0
1|Site_sigma,0.903,0.546,0.19,1.881,0.008,0.006,4724.0,7385.0,1.0
1|Site[Kassel],0.334,0.508,-0.691,1.237,0.007,0.005,6089.0,7630.0,1.0
1|Site[Portugal],0.241,0.585,-0.825,1.41,0.007,0.005,7497.0,9230.0,1.0
1|Site[Southampton],0.592,0.766,-0.702,2.128,0.008,0.006,8948.0,10482.0,1.0
1|Site[Tsinghua],-0.732,0.51,-1.73,0.154,0.007,0.005,5953.0,7389.0,1.0
1|Site[UCSB],-0.359,0.511,-1.343,0.573,0.007,0.005,5912.0,7380.0,1.0


### 变化斜率模型  

#### 模型定义  

* 考虑线性关系(斜率)在不同站点间的不同  

$$  
\begin{array}{rll}  
Y_{ij}|\beta_{0j},\beta_{1j} & \sim \text{Bern}(\pi_{ij})\; \text{ with } \;\; \pi_i = \frac{e^{\beta_{0} + \beta_{1j} X_{ij}}}{1 + e^{\beta_{0} + \beta_{1j} X_{ij}}} \\  
&& \text{(每个站点 $j$内的线性模型)}\\  
\beta_{1j} &= \beta_1 + b_{1j}\;\;\;b_{1j} \sim N(0, \sigma_1^2)& \text{(斜率在站点间的变化)} \\  

\beta_{0}  &  \sim N\left(0, 0.5^2 \right) & \text{(全局参数的先验)}\\  
\beta_1  &  \sim N\left(0, 0.5^2 \right) & \\  
\sigma_1 & \sim \text{Exp}(1) & \\  
\end{array}  

$$

In [115]:
# common slope +group-specific slope/ common intercept + group-specific intercept
slope_logit_bmb = bmb.Model("romantic ~ avoidance_r + (0 + avoidance_r|Site)",
                               df_first5,
                               family="bernoulli")
slope_logit_bmb.build()
slope_logit_bmb.graph()

#### MCMC采样&后验参数估计

In [116]:
slope_logit_trace = slope_logit_bmb.fit(draws=5000,           
                                    tune=1000,                    
                                    chains=4,                     
                                    discard_tuned_samples= True,  
                                    random_seed=84735,
                                    target_accept=0.99)

Modeling the probability that romantic==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, avoidance_r, avoidance_r|Site_sigma, avoidance_r|Site_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 46 seconds.
There were 4 divergences after tuning. Increase `target_accept` or reparameterize.


**后验参数解释**  

* `Intercept`为共同的截距  

* `avoidance_r`为共同的斜率  

* `avoidance_r|Site[xx]`为每一组在斜率上的变异  

    * 如，Kassel站点的斜率为：0.003+0.036 = 0.039

In [117]:
az.summary(slope_logit_trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
Intercept,-0.054,0.101,-0.242,0.134,0.001,0.001,15593.0,13511.0,1.0
avoidance_r,0.004,0.2,-0.361,0.369,0.003,0.004,5796.0,4640.0,1.0
avoidance_r|Site_sigma,0.25,0.268,0.0,0.674,0.004,0.003,4962.0,5628.0,1.0
avoidance_r|Site[Kassel],0.037,0.209,-0.354,0.441,0.003,0.004,6482.0,5538.0,1.0
avoidance_r|Site[Portugal],0.007,0.247,-0.474,0.502,0.003,0.003,10296.0,7867.0,1.0
avoidance_r|Site[Southampton],0.008,0.291,-0.56,0.553,0.003,0.004,14148.0,9954.0,1.0
avoidance_r|Site[Tsinghua],-0.091,0.207,-0.521,0.258,0.003,0.003,5603.0,5199.0,1.0
avoidance_r|Site[UCSB],0.055,0.208,-0.326,0.464,0.003,0.004,6622.0,5663.0,1.0


### 变化斜率、截距模型  

#### 模型定义  

* 考虑线性关系在不同站点间的不同  

$$  
\begin{array}{rll}  
Y_{ij}|\beta_{0j},\beta_{1j} & \sim \text{Bern}(\pi_{ij})\; \text{ with } \;\; \pi_i = \frac{e^{\beta_{0j} + \beta_{1j} X_{ij}}}{1 + e^{\beta_{0j} + \beta_{1j} X_{ij}}} \\  
&& \text{(每个站点 $j$内的线性模型)}\\  

\beta_{0j} &= \beta_0 + b_{0j}\;\;\;b_{0j} \sim N(0, \sigma_0^2)& \text{(截距在站点间的变化)} \\  

\beta_{1j} &= \beta_1 + b_{1j}\;\;\;b_{1j} \sim N(0, \sigma_1^2)& \text{(斜率在站点间的变化)} \\  

\beta_{0}  &  \sim N\left(0, 0.5^2 \right) & \text{(全局参数的先验)}\\  
\beta_1  &  \sim N\left(0, 0.5^2 \right) & \\  
\sigma_0 & \sim \text{Exp}(1) & \\  
\sigma_1 & \sim \text{Exp}(1) & \\  
\end{array}  

$$

In [118]:
# common slope +group-specific slope/ common intercept + group-specific intercept
both_logit_bmb = bmb.Model("romantic ~ avoidance_r + (avoidance_r|Site)",
                               df_first5,
                               family="bernoulli")
both_logit_bmb.build()
both_logit_bmb.graph()

#### MCMC采样&后验参数解释

In [119]:
both_logit_trace = both_logit_bmb.fit(draws=5000,           
                                    tune=1000,                    
                                    chains=4,                     
                                    discard_tuned_samples= True,  
                                    random_seed=84735,
                                    target_accept=0.99)

Modeling the probability that romantic==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, avoidance_r, 1|Site_sigma, 1|Site_offset, avoidance_r|Site_sigma, avoidance_r|Site_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 110 seconds.


In [120]:
both_logit_trace

In [121]:
az.plot_trace(both_logit_trace,
              var_names=["~sigma"],
              filter_vars="like",
              figsize=(12, 15))
plt.show()

**后验参数解释**  

* `Intercept`为共同的截距  

* `avoidance_r`为共同的斜率  

* `1|Site[xx]`为每一组在截距上的变异  

* `avoidance_r|Site[xx]`为每一组在斜率上的变异  

    * Kassel站点的斜率为：0.008 + 0.036 = 0.044  

    * Kassel站点的截距为：0.239 + 0.350 = 0.589

In [122]:
az.summary(both_logit_trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
Intercept,0.261,0.488,-0.621,1.201,0.006,0.005,6163.0,7414.0,1.0
avoidance_r,0.002,0.201,-0.364,0.356,0.003,0.003,8305.0,7821.0,1.0
1|Site_sigma,0.907,0.542,0.191,1.887,0.007,0.005,6087.0,9085.0,1.0
avoidance_r|Site_sigma,0.25,0.261,0.0,0.675,0.003,0.002,5851.0,8085.0,1.0
1|Site[Kassel],0.327,0.506,-0.651,1.273,0.006,0.005,6764.0,7839.0,1.0
1|Site[Portugal],0.238,0.58,-0.82,1.385,0.006,0.005,9527.0,10616.0,1.0
1|Site[Southampton],0.592,0.775,-0.777,2.089,0.007,0.005,12340.0,12683.0,1.0
1|Site[Tsinghua],-0.74,0.508,-1.717,0.167,0.007,0.005,6578.0,7780.0,1.0
1|Site[UCSB],-0.368,0.509,-1.405,0.506,0.007,0.005,6599.0,7526.0,1.0
avoidance_r|Site[Kassel],0.041,0.212,-0.339,0.475,0.003,0.003,9559.0,9858.0,1.0


#### 后验预测分布

In [123]:
posterior_predictive = both_logit_bmb.predict(both_logit_trace, kind="pps")

In [124]:
az.plot_ppc(both_logit_trace,
            num_pp_samples=500)

<Axes: xlabel='romantic / romantic'>

### 预测新站点的数据  

* 使用`.predict`，传入模型的MCMC采样结果和新的数据集，就可以在原有模型的基础上对新数据进行预测

In [125]:
# 选择站点为"Zurich"的数据
new_group = df_raw[df_raw.Site=="Zurich"]
# 生成被试索引
new_group["obs_id"] = range(len(new_group))
# 生成站点索引
new_group["site_idx"] = pd.factorize(new_group.Site)[0]
# 删除缺失值
new_group.dropna(subset=["romantic", "avoidance_r"], inplace=True)
new_group[new_group["romantic"].isna()]
# 对数据进行重新编码
new_group["romantic"] =  np.where(new_group['romantic'] == 2, 0, 1)

In [126]:
both_logit_bmb.predict(both_logit_trace,
                       kind="mean",
                       data=new_group,
                       sample_new_groups=True,
                       inplace=False)

### 补充：Pymc code for Hierarchical logistic regression  

* 在这里，我们补充对于完全池化模型与变化斜率&截距模型的 pymc 代码，可以自行学习

In [127]:
coords = {"obs_id": df_first5.obs_id}
with pm.Model(coords=coords) as complete_log:
    #传入自变量与因变量
    x = pm.MutableData("x", df_first5.avoidance_r, dims="obs_id")
    y = pm.MutableData('y', df_first5.romantic, dims = 'obs_id')

    #先验
    beta_0 = pm.Normal("beta_0", mu=0, sigma=0.5)          #定义beta_0          
    beta_1 = pm.Normal("beta_1", mu=0, sigma=0.5)          #定义beta_1
    #线性关系
    mu = pm.Deterministic("mu", beta_0 + beta_1 * x, dims="obs_id")
    #注意此处使用了Logistic sigmoid function：pm.math.invlogit
    #相当于进行了如下计算 (1 / (1 + exp(-mu))
    pi = pm.Deterministic("pi", pm.math.invlogit(mu), dims="obs_id")
    #似然
    likelihood = pm.Bernoulli("y_est",p=pi, observed=y,dims="obs_id")

    complete_log_trace = pm.sample(draws=5000,            # 使用mcmc方法进行采样，draws为采样次数
                            tune=1000,                    # tune为调整采样策略的次数，可以决定这些结果是否要被保留
                            chains=4,                     # 链数
                            discard_tuned_samples= True,  # tune的结果将在采样结束后被丢弃
                            random_seed=84735,
                            target_accept=0.99)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_1]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 11 seconds.


In [128]:
#获得新数据集的索引
logit_index = get_group_index(data=df_first5)

In [129]:
#绘图逻辑与上一个函数一样，只是此处使用logistic回归，绘制的变量需要更改
def plot_logit_regression(data,trace,group_index):
    # 定义画布，根据站点数量定义画布的列数
    fig, ax = plt.subplots(1,len(data["Site"].unique()), 
                        sharex=True,
                        sharey=True,
                        figsize=(15,5))
    
    # 根据站点数来分别绘图
    # 我们需要的数据有原始数据，每一个因变量的后验预测均值
    # 这些数据都储存在后验参数采样结果中，也就是这里所用的trace
    for i, group in enumerate(data["Site"].unique()):
        x = trace.constant_data.x.sel(obs_id = group_index[f"{group}"])
        #得到每个站点内的恋爱概率均值，并设置标题
        pi_mean = trace.posterior.pi.sel(obs_id = group_index[f"{group}"]).mean().values
        ax[i].set_title(f"Bern({pi_mean:.2f})")

        #绘制真实数据散点图
        ax[i].scatter(x,
                trace.observed_data.y_est.sel(obs_id = group_index[f"{group}"]),
                color=f"C{i}",
                alpha=0.2)
    # 生成横坐标名称    
    fig.text(0.5, 0, 'Avoidance', ha='center', va='center', fontsize=12)
    # 生成纵坐标名称
    fig.text(0.08, 0.5, 'Romantic', ha='center', va='center', rotation='vertical', fontsize=12)
    # 改变刻度
    plt.yticks([0,1])
    # 生成标题
    plt.suptitle("Posterior regression models", fontsize=15,y=1.05)
    
    sns.despine()

In [130]:
plot_logit_regression(data=df_first5,
                      trace=complete_log_trace,
                      group_index=logit_index)

### 层级模型  

$$  

\begin{array}{rll}  
Y_{ij}|\beta_{0j},\beta_{1j} & \sim \text{Bern}(\pi_{ij})\; \text{ with } \;\; \pi_i = \frac{e^{\beta_{0j} + \beta_{1j} X_{ij}}}{1 + e^{\beta_{0j} + \beta_{1j} X_{ij}}} \\  
&& \text{(每个站点 $j$内的线性模型)}\\  
\beta_{0j} | \beta_0, \sigma_0    & \stackrel{ind}{\sim} N(\beta_0, \sigma_0^2) & \text{(截距在站点间的变化)} \\  
\beta_{1j} | \beta_1, \sigma_1    & \stackrel{ind}{\sim} N(\beta_1, \sigma_1^2) & \text{(斜率在站点间的变化)} \\  
\beta_{0}  &  \sim N\left(0, 0.5^2 \right) & \text{(全局参数的先验)}\\  
\beta_1  &  \sim N\left(0, 0.5^2 \right) & \\  
\sigma_0 & \sim \text{Exp}(1). & \\  
\sigma_1 & \sim \text{Exp}(1). & \\  
\end{array}  

$$

In [131]:
non_centered = True
coords = {"site": df_first5["Site"].unique(),
          "obs_id": df_first5.obs_id}
with pm.Model(coords=coords) as hier_log:
    #传入自变量与因变量
    x = pm.MutableData("x", df_first5.avoidance_r, dims="obs_id")
    y = pm.MutableData('y', df_first5.romantic, dims = 'obs_id')

    #定义全局参数
    beta_0 = pm.Normal("beta_0", mu=0, sigma=0.5)          #定义beta_0
    beta_0_sigma = pm.Exponential("beta_0_sigma", 1)          
    beta_1 = pm.Normal("beta_1", mu=0, sigma=0.5)          #定义beta_1
    beta_1_sigma = pm.Exponential("beta_1_sigma", 1)

    #获得观测值对应的站点映射
    site = pm.MutableData("site", df_first5.site_idx, dims="obs_id") 

    #选择不同的模型定义方式，定义截距、斜率
    if non_centered:
        beta_0_offset = pm.Normal("beta_0_offset", 0, sigma=1, dims="site")
        beta_0j = pm.Deterministic("beta_0j", beta_0 + beta_0_offset * beta_0_sigma, dims="site")
        beta_1_offset = pm.Normal("beta_1_offset", 0, sigma=1, dims="site")
        beta_1j = pm.Deterministic("beta_1j", beta_1 + beta_1_offset * beta_1_sigma, dims="site")
    else:
        beta_0j = pm.Normal("beta_0j", mu=beta_0, sigma=beta_0_sigma, dims="site")
        beta_1j = pm.Normal("beta_1j", mu=beta_1, sigma=beta_1_sigma, dims="site")

    #线性关系
    mu = pm.Deterministic("mu", beta_0j[site] + beta_1j[site] * x, dims="obs_id")
    #进行logit变换
    pi = pm.Deterministic("pi", pm.math.invlogit(mu), dims="obs_id")
    #似然
    likelihood = pm.Bernoulli("y_est",p=pi, observed=y,dims="obs_id")

    hier_log_trace = pm.sample(draws=5000,            # 使用mcmc方法进行采样，draws为采样次数
                        tune=1000,                    # tune为调整采样策略的次数，可以决定这些结果是否要被保留
                        chains=4,                     # 链数
                        discard_tuned_samples= True,  # tune的结果将在采样结束后被丢弃
                        random_seed=84735,
                        target_accept=0.99)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_0, beta_0_sigma, beta_1, beta_1_sigma, beta_0_offset, beta_1_offset]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 75 seconds.


In [132]:
az.summary(hier_log_trace)

Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
beta_0,0.140,0.280,-0.371,0.694,0.003,0.002,8326.0,11056.0,1.0
beta_1,0.000,0.155,-0.285,0.302,0.002,0.001,10779.0,9688.0,1.0
beta_0_offset[Kassel],0.742,0.591,-0.297,1.910,0.006,0.004,9882.0,12353.0,1.0
beta_0_offset[Portugal],0.441,0.691,-0.821,1.772,0.006,0.005,12713.0,12973.0,1.0
beta_0_offset[Southampton],0.684,0.830,-0.916,2.232,0.007,0.005,15204.0,13496.0,1.0
...,...,...,...,...,...,...,...,...,...
pi[410],0.461,0.061,0.345,0.573,0.000,0.000,24078.0,18584.0,1.0
pi[411],0.495,0.075,0.352,0.633,0.000,0.000,22700.0,17807.0,1.0
pi[412],0.495,0.075,0.352,0.633,0.000,0.000,22700.0,17807.0,1.0
pi[413],0.491,0.067,0.364,0.614,0.000,0.000,23240.0,18136.0,1.0


In [133]:
plot_logit_regression(data=df_first5,
                      trace=hier_log_trace,
                      group_index=logit_index)

## 总结  

本节课介绍了如何使用将分层模型扩展到一般线性模型，以及广义线性模型。  

重点内容包括：  
* 如何根据研究问题和假设来确定模型  
* 如何在PyMC中定义分层模型，包括变化的截距、变化的斜率  
* 如何通过模型比较来检验不同的假设  
* 如何使用分层模型预测其他群体(站点)的结果  
* 如何使用Bambi定义广义分层线性模型  

🎉🎉🎉🎉🎉 最后，本课程的正式内容也到此为止，感谢大家参与本门课程，希望大家能在实践中应用课程中的知识和技能，提高自己统计分析的能力。