## 估计方法

## 线性回归模型与OLS估计

- 设$X$是一个$n * k$矩阵，对于$n$次观测，其中包含了$k$个自变量的观测值。由于我们的模型通常会包含一个常数项，所以$X$矩阵中的某一列将只由元素$1$组成。这一列应与$X$矩阵中的其他任何一列完全同等对待。
- 设$y$是一个$n * 1$的因变量观测值向量。
- 设$\epsilon$是一个$n*1$的干扰项或误差项向量。
- 设$\beta$是一个$k*1$的未知总体参数向量，是我们想要估计的对象。

我们的统计模型本质上大致如下

$$
\left[\begin{array}{c}
Y_1 \\
Y_2 \\
\vdots \\
\vdots \\
Y_n
\end{array}\right]_{n \times 1}=\left[\begin{array}{ccccc}
1 & X_{11} & X_{21} & \ldots & X_{k 1} \\
1 & X_{12} & X_{22} & \ldots & X_{k 2} \\
\vdots & \vdots & \vdots & \ldots & \vdots \\
\vdots & \vdots & \vdots & \ldots & \vdots \\
1 & X_{1 n} & X_{2 n} & \ldots & X_{k n}
\end{array}\right]_{n \times k}\left[\begin{array}{c}
\beta_1 \\
\beta_2 \\
\vdots \\
\vdots \\
\beta_n
\end{array}\right]_{k \times 1}+\left[\begin{array}{c}
\epsilon_1 \\
\epsilon_2 \\
\vdots \\
\vdots \\
\epsilon_n
\end{array}\right]_{n \times 1}
$$

或者

$$
y=X \beta+\epsilon
$$

那么残差$e$向量可以写为

$$
e=y-X \hat{\beta}
$$

残差平方和(RSS)为$e^{\prime} e$

$$
\left[\begin{array}{lllll}
e_1 & e_2 & \ldots & \ldots & e_n
\end{array}\right]_{1 \times n}\left[\begin{array}{c}
e_1 \\
e_2 \\
\vdots \\
\vdots \\
e_n
\end{array}\right]_{n \times 1}=\left[e_1 \times e_1+e_2 \times e_2+\ldots+e_n \times e_n\right]_{1 \times 1}
$$

它可以写为

$$
\begin{aligned}
e^{\prime} e & =(y-X \hat{\beta})^{\prime}(y-X \hat{\beta}) \\
& =y^{\prime} y-\hat{\beta}^{\prime} X^{\prime} y-y^{\prime} X \hat{\beta}+\hat{\beta}^{\prime} X^{\prime} X \hat{\beta} \\
& =y^{\prime} y-2 \hat{\beta}^{\prime} X^{\prime} y+\hat{\beta}^{\prime} X^{\prime} X \hat{\beta}
\end{aligned}
$$

这里的推导利用了标量的转置还是标量这一事实，即$y^{\prime} X \hat{\beta}=$
$\left(y^{\prime} X \hat{\beta}\right)^{\prime}=\hat{\beta}^{\prime} X^{\prime} y$

为了找到使残差平方和最小的$\hat{\beta}$，对上式对$\hat{\beta}$求偏导。根据矩阵求导法则

$$
\frac{\partial a^{\prime} b}{\partial b}=\frac{\partial b^{\prime} a}{\partial b}=a
$$

和

$$
\frac{\partial b^{\prime} A b}{\partial b}=2 A b=2 b^{\prime} A
$$

有

$$
\frac{\partial 2 \beta^{\prime} X^{\prime} y}{\partial b}=\frac{\partial 2 \beta^{\prime}\left(X^{\prime} y\right)}{\partial b}=2 X^{\prime} y
$$

和

$$
\frac{\partial \beta^{\prime} X^{\prime} X \beta}{\partial b}=\frac{\partial \beta^{\prime} A \beta}{\partial b}=2 A \beta=2 X^{\prime} X \beta
$$

因此一阶条件可以写为

$$
\frac{\partial e^{\prime} e}{\partial \hat{\beta}}=-2 X^{\prime} y+2 X^{\prime} X \hat{\beta}=0
$$

可以得到

$$
\left(X^{\prime} X\right) \hat{\beta}=X^{\prime} y
$$

于是有

$$
\left(X^{\prime} X\right)^{-1}\left(X^{\prime} X\right) \hat{\beta}=\left(X^{\prime} X\right)^{-1} X^{\prime} y
$$

最终得到

$$
\hat{\beta}=\left(X^{\prime} X\right)^{-1} X^{\prime} y
$$

**OLS估计量的方差-协方差矩阵**

鉴于

$$
\begin{aligned}
& \hat{\beta}=\left(X^{\prime} X\right)^{-1} X^{\prime}(X \beta+\epsilon) \\
& \hat{\beta}=\beta+\left(X^{\prime} X\right)^{-1} X^{\prime} \epsilon
\end{aligned}
$$

OLS估计量的方差-协方差阵为

$$
\begin{aligned}
E\left[(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\prime}\right] & =E\left[\left(\left(X^{\prime} X\right)^{-1} X^{\prime} \epsilon\right)\left(\left(X^{\prime} X\right)^{-1} X^{\prime} \epsilon\right)^{\prime}\right] \\
& =E\left[\left(X^{\prime} X\right)^{-1} X^{\prime} \epsilon \epsilon^{\prime} X\left(X^{\prime} X\right)^{-1}\right]
\end{aligned}
$$

若$X$是非随机，那么

$$
E\left[(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\prime}\right]=\left(X^{\prime} X\right)^{-1} X^{\prime} E\left[\epsilon \epsilon^{\prime}\right] X\left(X^{\prime} X\right)^{-1}
$$

若假设$E\left[\epsilon \epsilon^{\prime}\right]=\sigma^2 I$，那么

$$
\begin{aligned}
E\left[(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\prime}\right] & =\left(X^{\prime} X\right)^{-1} X^{\prime}\left(\sigma^2 I\right) X\left(X^{\prime} X\right)^{-1} \\
& =\sigma^2 I\left(X^{\prime} X\right)^{-1} X^{\prime} X\left(X^{\prime} X\right)^{-1} \\
& =\sigma^2\left(X^{\prime} X\right)^{-1}
\end{aligned}
$$

其中

$$
\hat{\sigma}^2=\frac{e^{\prime} e}{n-k}
$$

### 分块回归与Frisch-Waugh-Lovell定理

考虑两个变量的线性回归模型

$$
y=X_1 \hat{\beta}_1+X_2 \hat{\beta}_2+e
$$

我们想要分离出与$X_{2}$相关的系数，也就是$\hat{\beta}_{2}$，那么正规方程为

$$
\begin{aligned}
& \text { (1) } \\
& \text { (2) }
\end{aligned}\left[\begin{array}{ll}
X_1^{\prime} X_1 & X_1^{\prime} X_2 \\
X_2^{\prime} X_1 & X_2^{\prime} X_2
\end{array}\right]\left[\begin{array}{c}
\hat{\beta}_1 \\
\hat{\beta}_2
\end{array}\right]=\left[\begin{array}{c}
X_1^{\prime} y \\
X_2^{\prime} y
\end{array}\right]
$$

求解$\hat{\beta}_{1}$

$$
\begin{aligned}
\left(X_1^{\prime} X_1\right) \hat{\beta}_1+\left(X_1^{\prime} X_2\right) \hat{\beta}_2 & =X_1^{\prime} y \\
\left(X_1^{\prime} X_1\right) \hat{\beta}_1 & =X_1^{\prime} y-\left(X_1^{\prime} X_2\right) \hat{\beta}_2 \\
\hat{\beta}_1 & =\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} y-\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} X_2 \hat{\beta}_2 \\
\hat{\beta}_1 & =\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\left(y-X_2 \hat{\beta}_2\right)
\end{aligned}
$$

\*\*Frisch-Waugh-Lovell定理(\*)\*\*

考虑残差

$$
\begin{aligned}
e & =y-X \hat{\beta} \\
& =y-X\left(X^{\prime} X\right)^{-1} X^{\prime} y \\
& =\left(I-X\left(X^{\prime} X\right)^{-1} X^{\prime}\right) y \\
& =M y
\end{aligned}
$$

其中，$M$被称为残差生成矩阵，因为它从$y$中生成残差。$M$是一个方阵，并且是幂等矩阵。如果一个矩阵$A$满足$A^2 = A$（即$A$乘以自身等于$A$），那么矩阵$A$就是幂等矩阵。即

$$
\begin{aligned}
M M & =\left(I-X\left(X^{\prime} X\right)^{-1} X^{\prime}\right)\left(I-X\left(X^{\prime} X\right)^{-1} X^{\prime}\right) \\
& =I^2-2 X\left(X^{\prime} X\right)^{-1} X^{\prime}+X\left(X^{\prime} X\right)^{-1} X^{\prime} X\left(X^{\prime} X\right)^{-1} X^{\prime} \\
& =I-2 X\left(X^{\prime} X\right)^{-1} X^{\prime}+X\left(X^{\prime} X\right)^{-1} X^{\prime} \\
& =I-X\left(X^{\prime} X\right)^{-1} X^{\prime} \\
& =M
\end{aligned}
$$

若想要求解$\hat{\beta}_2$

$$
\hat{\beta}_1=\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\left(y-X_2 \hat{\beta}_2\right)
$$

带入之前的式子可以得到

$$
\begin{aligned}
X_2^{\prime} y & =X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} y-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} X_2 \hat{\beta}_2+X_2^{\prime} X_2 \hat{\beta}_2 \\
X_2^{\prime} y-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} y & =X_2^{\prime} X_2 \hat{\beta}_2-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} X_2 \hat{\beta}_2 \\
X_2^{\prime} y-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} y & =\left[X_2^{\prime} X_2-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} X_2\right] \hat{\beta}_2 \\
X_2^{\prime} y-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} y & =\left[\left(X_2^{\prime}-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) X_2\right] \hat{\beta}_2 \\
X_2^{\prime} y-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime} y & =\left[X_2^{\prime}\left(I-X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) X_2\right] \hat{\beta}_2 \\
\left(X_2^{\prime}-X_2^{\prime} X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) y & =\left[X_2^{\prime}\left(I-X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) X_2\right] \hat{\beta}_2 \\
X_2^{\prime}\left(I-X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) y & =\left[X_2^{\prime}\left(I-X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) X_2\right] \hat{\beta}_2 \\
\hat{\beta}_2 & =\left[X_2^{\prime}\left(I-X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) X_2\right]^{-1} X_2^{\prime}\left(I-X_1\left(X_1^{\prime} X_1\right)^{-1} X_1^{\prime}\right) y \\
& =\left(X_2^{\prime} M_1 X_2\right)^{-1}\left(X_2^{\prime} M_1 y\right)
\end{aligned}
$$

可以得到

$$
\hat{\beta}_2=\left(X_2^{*^{\prime}} X_2\right)^{-1} X_2^{*^{\prime}} y^*
$$

其中$X_2^*=M_1 X_2$和$y^*=M_1 y$

### 假设检验

#### t统计量

假设$\widehat{\theta}$是样本统计量，$se(\widehat{\theta})$是样本标准误，t统计量为

$$
t(\theta)=\frac{\widehat{\theta}-\theta}{se(\widehat{\theta})}
$$

#### F统计量

设$\widehat{\beta}_{\text {ols }}$为无约束最小二乘估计量，且设$\widehat{\sigma}^2=n^{-1} \sum_{i=1}^n\left(Y_i-X_i^{\prime} \widehat{\beta}_{\mathrm{ols}}\right)^2$为其相应的$\sigma^2$估计量。设$\widetilde{\beta}_{\mathrm{cls}}$为约束最小二乘
(CLS)
估计量，且设$\sigma^2=n^{-1} \sum_{i=1}^n\left(Y_i-X_i^{\prime} \widetilde{\beta}_{\mathrm{cls}}\right)^2$为其相应的$\sigma^2$估计量。用于检验原假设$\mathbb{H}_0: \beta \in B_0$的$F$
统计量为：

$$
F=\frac{\left(\widetilde{\sigma}^2-\widehat{\sigma}^2\right) / q}{\widehat{\sigma}^2 /(n-k)}
$$

其中$q$为约束条件的个数

## 极大似然估计（MLE）

### 一个正态分布的案例

假设得到一组数，例如

90.46561 105.1319 117.5445 102.7179 102.7788 107.6234 94.87266 95.48918
75.63886 87.40594

我们认为来自于正态分布的总体，其中参数为$\mu$和$\sigma^{2}$，并且它们相互独立。我们需要从中猜测最可能的参数值。

我们知道正态分布的概率密度函数，考虑到这些观测值是独立的。我们要计算的是观测到所有数据的总概率，即所有观测到的数据点的联合概率分布。

观测由正态分布生成的单个数据点 x 的概率密度由下式给出

$$
P(x ; \mu, \sigma)=\frac{1}{\sigma \sqrt{2 \pi}} \exp \left(-\frac{(x-\mu)^2}{2 \sigma^2}\right)
$$

所以联合分布函数可以写为

$$L(x ; \mu, \sigma)=\prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left(\frac{-\left(x_{i}-\mu\right)^{2}}{2 \sigma^{2}}\right)$$

为了简化计算，写出它的对数形式

$$\log (L(x ; \mu, \sigma))=-\frac{1}{2} N \log (2 \pi)-N \log (\sigma)-\frac{1}{2} \sum_{i=1}^{N} \frac{\left(x_{i}-\mu\right)^{2}}{\sigma^{2}}$$

对未知参数求偏导等于$0$，即可解得极大似然估计量。

例如，对于$\mu$

$$\frac{\partial \ln \left(L\left(\mu, \sigma^{2}\right)\right)}{\partial \mu}=2 \frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(y_{i}-\mu\right)=0$$

解得$\hat{\mu}=\frac{\sum_{i=1}^{n} y_{i}}{n}=\bar{y}$，即$\mu$的极大似然估计量是样本均值。

### R实现方法

``` r
# 加载 maxLik 包以进行最大似然估计
library(maxLik)

# 设置随机种子以确保结果可重复
set.seed(123)

# 生成 100 个正态分布随机数，均值为 1，标准差为 2
x <- rnorm(100, mean = 1, sd = 2)

# 定义对数似然函数
log_likelihood <- function(param) {
  mu <- param[1]  # 提取参数 mu
  sigma <- param[2]  # 提取参数 sigma
  # 计算给定参数下的对数似然值
  sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}

# 使用 maxLik 函数进行最大似然估计，初始参数为 mu = 0 和 sigma = 1
mle <- maxLik(log_likelihood, start = c(mu = 0, sigma = 1))

# 输出最大似然估计的结果摘要
summary(mle)
```

    --------------------------------------------
    Maximum Likelihood estimation
    Newton-Raphson maximisation, 7 iterations
    Return code 1: gradient close to zero (gradtol)
    Log-Likelihood: -201.5839 
    2  free parameters
    Estimates:
          Estimate Std. error t value  Pr(> t)    
    mu      1.1808     0.1816   6.503 7.89e-11 ***
    sigma   1.8165     0.1285  14.140  < 2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    --------------------------------------------

或者也可以这样改写代码

``` r
# 定义一个函数 fn_mle，用于生成对数似然函数
fn_mle <- function(x) {
  force(x)  # 确保 x 在闭包中可用
  # 返回一个新的函数，该函数接受参数并计算对数似然值
  function(param) {
    mu <- param[1]  # 提取参数 mu
    sigma <- param[2]  # 提取参数 sigma
    # 计算对数似然值
    sum((-1/2) * log(2 * pi) - log(sigma) - (1/2) * ((x - mu)^2 / (sigma^2)))
  }
}

# 使用 maxLik 函数进行最大似然估计，初始参数为 mu = 0 和 sigma = 1
mle <- maxLik(logLik = fn_mle(x), start = c(mu = 0, sigma = 1))

# 输出最大似然估计的结果摘要
summary(mle)
```

    --------------------------------------------
    Maximum Likelihood estimation
    Newton-Raphson maximisation, 7 iterations
    Return code 1: gradient close to zero (gradtol)
    Log-Likelihood: -201.5839 
    2  free parameters
    Estimates:
          Estimate Std. error t value  Pr(> t)    
    mu      1.1808     0.1817    6.50 8.06e-11 ***
    sigma   1.8165     0.1285   14.14  < 2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    --------------------------------------------

**伍德里奇书的案例**

``` r
# 设置警告选项为-1，以禁止显示警告信息
options(warn=-1)

# 加载wooldridge数据集
library("wooldridge")

# 导入wage1数据集
data(wage1)

# 进行线性回归分析，模型中自变量为教育年限(educ)、工作经验(exper)和在职时间(tenure)，因变量为对数工资(lwage)
lmobj <- lm(lwage ~ educ + exper + tenure, data = wage1)

# 输出线性回归模型的摘要信息，包括系数、标准误、t值和p值等
summary(lmobj)
```

    Call:
    lm(formula = lwage ~ educ + exper + tenure, data = wage1)

    Residuals:
         Min       1Q   Median       3Q      Max 
    -2.05802 -0.29645 -0.03265  0.28788  1.42809 

    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) 0.284360   0.104190   2.729  0.00656 ** 
    educ        0.092029   0.007330  12.555  < 2e-16 ***
    exper       0.004121   0.001723   2.391  0.01714 *  
    tenure      0.022067   0.003094   7.133 3.29e-12 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.4409 on 522 degrees of freedom
    Multiple R-squared:  0.316, Adjusted R-squared:  0.3121 
    F-statistic: 80.39 on 3 and 522 DF,  p-value: < 2.2e-16

``` r
# 定义最小二乘法的对数似然函数
ols_lf <- function(param) {
  # 提取回归系数和标准差
  beta <- param[-1]  # 回归系数
  sigma <- param[1]  # 标准差

  # 提取因变量和自变量
  y <- as.vector(wage1$lwage)  # 因变量：工资的对数
  x <- cbind(1, wage1$educ, wage1$exper, wage1$tenure)  # 自变量：常数项、教育年限、工作经验、任期

  # 计算线性预测值
  mu <- x %*% beta  # 矩阵相乘，得到预测值

  # 计算对数似然值
  sum(dnorm(y, mu, sigma, log = TRUE))  # 正态分布（观测值向量，均值，标准差）的对数似然
}

# 使用最大似然法估计参数
mle_ols <- maxLik(logLik = ols_lf,
                  start = c(sigma = 1, constant = 1,
                            educ = 1, exper = 1,
                            tenure = 1))
# 输出估计结果的摘要
summary(mle_ols)
```

    --------------------------------------------
    Maximum Likelihood estimation
    Newton-Raphson maximisation, 16 iterations
    Return code 1: gradient close to zero (gradtol)
    Log-Likelihood: -313.5478 
    5  free parameters
    Estimates:
             Estimate Std. error t value  Pr(> t)    
    sigma    0.439183   0.013541  32.434  < 2e-16 ***
    constant 0.284360   0.103795   2.740  0.00615 ** 
    educ     0.092029   0.007302  12.603  < 2e-16 ***
    exper    0.004121   0.001717   2.401  0.01637 *  
    tenure   0.022067   0.003082   7.160 8.05e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    --------------------------------------------

**线性回归模型的MLE和OLS**

考虑线性回归模型

$$
\begin{aligned}
& Y=X^{\prime} \beta+e \\
& e \sim \mathrm{~N}\left(0, \sigma^2\right)
\end{aligned}
$$

假设误差符合正态分布

$$
f(y \mid x)=\frac{1}{\left(2 \pi \sigma^2\right)^{1 / 2}} \exp \left(-\frac{1}{2 \sigma^2}\left(y-x^{\prime} \beta\right)^2\right)
$$

似然函数为

$$
\begin{aligned}
f\left(y_1, \ldots, y_n \mid x_1, \ldots, x_n\right) & =\prod_{i=1}^n f\left(y_i \mid x_i\right) \\
& =\prod_{i=1}^n \frac{1}{\left(2 \pi \sigma^2\right)^{1 / 2}} \exp \left(-\frac{1}{2 \sigma^2}\left(y_i-x_i^{\prime} \beta\right)^2\right) \\
& =\frac{1}{\left(2 \pi \sigma^2\right)^{n / 2}} \exp \left(-\frac{1}{2 \sigma^2} \sum_{i=1}^n\left(y_i-x_i^{\prime} \beta\right)^2\right) \\
& \stackrel{\text { def }}{=} L_n\left(\beta, \sigma^2\right)
\end{aligned}
$$

对数似然函数为

$$
\log L_n\left(\beta, \sigma^2\right)=-\frac{n}{2} \log \left(2 \pi \sigma^2\right)-\frac{1}{2 \sigma^2} \sum_{i=1}^n\left(Y_i-X_i^{\prime} \beta\right)^2 \stackrel{\text { def }}{=} \ell_n\left(\beta, \sigma^2\right)
$$

一阶条件为

$$
\begin{aligned}
& 0=\left.\frac{\partial}{\partial \beta} \ell_n\left(\beta, \sigma^2\right)\right|_{\beta=\widehat{\beta}_{\text {mle }}, \sigma^2=\widehat{\sigma}_{\text {mle }}^2}=\frac{1}{\widehat{\sigma}_{\text {mle }}^2} \sum_{i=1}^n X_i\left(Y_i-X_i^{\prime} \widehat{\beta}_{\text {mle }}\right) \\
& 0=\left.\frac{\partial}{\partial \sigma^2} \ell_n\left(\beta, \sigma^2\right)\right|_{\beta=\widehat{\beta}_{\text {mle }}, \sigma^2=\widehat{\sigma}_{\text {mle }}^2}=-\frac{n}{2 \widehat{\sigma}_{\text {mle }}^2}+\frac{1}{2 \widehat{\sigma}_{\text {mle }}^4} \sum_{i=1}^n\left(Y_i-X_i^{\prime} \widehat{\beta}_{\text {mle }}\right)^2
\end{aligned}
$$

因此

$$
\widehat{\beta}_{\mathrm{mle}}=\left(\sum_{i=1}^n X_i X_i^{\prime}\right)^{-1}\left(\sum_{i=1}^n X_i Y_i\right)=\widehat{\beta}_{\mathrm{ols}}
$$

$$
\widehat{\sigma}_{\mathrm{mle}}^2=\frac{1}{n} \sum_{i=1}^n\left(Y_i-X_i^{\prime} \widehat{\beta}_{\mathrm{mle}}\right)^2=\frac{1}{n} \sum_{i=1}^n\left(Y_i-X_i^{\prime} \widehat{\beta}_{\mathrm{ols}}\right)^2=\frac{1}{n} \sum_{i=1}^n \widehat{e}_i^2=\widehat{\sigma}_{\mathrm{ols}}^2
$$

``` r
```