
# 简介

此篇意在比较 PCR（主成分分析）和 PLS（偏最小二乘回归）的效果，得出在什么情况下哪种方法更为合适。

模拟了一个 n 个观测值，p 个变量，变量之间相关系数为 $\rho$ 的数据集，通过 $\beta_0$ 和 $\beta_1$ 加上一个标准正态分布的残差模拟出被解释变量。

# 说明

在 comparison.py 中定义了一个 comparison 函数，用于输出 PCR 和 PLS 的指标对比，分别包括：

- 交叉验证中的测试误差
- 成分的个数（交叉验证中的测试误差取到最小时）
- 对Y的解释程度（在此成分个数下）

# 结论

相比于 PCR，PLS 在以下情况的表现更佳：

- 变量个数更多
- 变量之间相关系数较小
- 各个变量的系数较大（变量对结果的影响较大）

# 模拟过程

## 变化 - p



In [12]:
import numpy as np
import pandas as pd
from scipy.stats import norm
from src.scale import scale
from src.sim import sim
from model.comparison import comparison

In [13]:
n, p, rho = 1000, 10, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,6,1.026625,0.872094
1,PLS,2,1.031686,0.872448


In [14]:
n, p, rho = 1000, 30, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,19,1.04674,0.953943
1,PLS,2,1.055306,0.954295


In [15]:
n, p, rho = 1000, 50, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,19,1.063145,0.968717
1,PLS,2,1.064358,0.970807



## 变化 - rho



In [16]:
n, p, rho = 1000, 30, 0.25
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,28,1.051644,0.923607
1,PLS,2,1.054558,0.923748


In [17]:
n, p, rho = 1000, 30, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,20,0.989934,0.950897
1,PLS,3,0.992057,0.951363


In [18]:
n, p, rho = 1000, 30, 0.75
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,10,0.96422,0.980365
1,PLS,4,0.971294,0.980715



## 变化 - beta



In [19]:
n, p, rho = 1000, 30, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.1, 0.1 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,5,1.029797,0.446099
1,PLS,0,1.028784,0.448567


In [20]:
n, p, rho = 1000, 30, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 0.5, 0.5 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,17,1.105322,0.957875
1,PLS,3,1.101779,0.958351


In [21]:
n, p, rho = 1000, 30, 0.5
mu = norm.rvs(size=p, scale=1)
beta0, beta1 = 1, 1 * np.ones(p, dtype=float)
comparison(n, p, rho, mu, beta0, beta1)

Unnamed: 0,methods,n components,test error,variation explanation
0,PCR,25,1.029176,0.988951
1,PLS,5,1.030573,0.988981
