## References

1. https://ani.stat.fsu.edu/~debdeep/location-scale.pdf
2. https://zhuanlan.zhihu.com/p/103110033 

# Honor 4 Sufficient Statistics

## Exponential Family

Exponential family is a class of distributions. It includes a variety of common distributions like normal, binomial, Poisson, negative binomial, exponential, Gamma. We present its definition as follows.

**Definition** If a distribution has parameter $\theta\in\mathbb R^n$ and PDF $f(x;\theta)$, and it can be written in the form 
$$f(x;\theta) = C(\theta)\exp\{\sum_{i=1}^k Q_i(\theta)T_i(x)\}h(x)$$
where $C(\theta),Q_i(\theta)$ are functions that only depend on $\theta$ while $T_i(x),h(x)$ only depend on $x$. Also, we require $C(\theta)>0$. In this case, we call that the distribution is in the exponential family.

### Support

Note that $f(x;\theta)>0\Leftrightarrow h(x)>0$. Given $\theta$, we define 
$${\rm supp} f(x;\theta) = \{x:\ f(x,\theta)>0\} = \{x:\ h(x)>0\}$$
as the support, which is independent of $\theta$. Thus, if a distribution has supporting set that relies on $\theta$, then it cannot be in exponential family.

### Examples

1. Multivariate normal distribution $N(\mu,\Sigma)$ is in exponential family.

Proof: Recall that
$$f(x;\mu,\Sigma) = \frac{1}{(2\pi )^\frac n2 |\Sigma|^\frac12}\exp\{-\frac 12 (x - \mu)^T\Sigma (x - \mu)\}
 = \frac{1}{(2\pi )^\frac n2 |\Sigma|^\frac12}\exp\{\sum \sigma_{ij}(x_ix_j-2\mu_ix_j + \mu_i\mu_j)\}.$$

2. Uniform distribution over $[0,\theta]$ is not in exponential family.

Proof: The supporting set $[0,\theta]$ relies on $\theta$.

### Natural Exponential Family

Natural exponential family is a subset of the exponential family. A distribution is called in the natrual exponential family if $Q_i(\theta) = \theta_i\in \mathbb R$, i.e. it has the  following form.
$$f(x;\theta) = C(\theta)\exp\{\sum_{i=1}^k \theta_iT_i(x)\}h(x)$$

And we define the **natural parameter space** by
$$\{\theta\in\mathbb R^n:\ \int \exp\{\sum_{i=1}^k \theta_i T_i(x)\}h(x)dx<\infty\}.$$

One can easily show that given $x$, the PDF with respect to $\theta$, $f_x(\theta)$, of natural exponential family is convex.

## Sufficient Statistics

Literally, if a vector of statistics conclude all the information of a sample, then we call it sufficient statistics. 

**Definition** If $\{X_i\}$ is sampled from some $F_\theta (x)$ where $\theta\in\Theta $ is yet unknown. Suppose we have known statistics $T_j = t_j$ and if the conditional distribution
$$f(X_1,\dotsc,X_n|T_1=t_1,\dotsc,T_m = t_m)$$
does not depend on $\theta$, then we call $\{T_i\}$ are sufficient statistics.

### Examples

1. Binomial trials: Suppose $X_1,\dotsc,X_n\sim B(p)$ are binomial trials. Let $T = \sum X_i$, then $T$ is a sufficient statistic.

Proof: Suppose $T = k$, then 
$$\begin{aligned}\mathbb P(X_1 = x_1,\dotsc,X_n = x_n|T = \sum X_i = k)& =
\frac{\mathbb P(X_1 = x_1,\dotsc,X_n = x_n,\ T = \sum X_i = k)}{\mathbb P(T = \sum X_i = k)}\\ &=
\left\{\begin{array}{ll}\frac{p^{k} (1 -p)^{n-k}}{\binom nkp^k(1-p)^{n-k}} & \sum x_i = k\\ 
0 & \sum x_i = 0\end{array}\right.\\&=
\left\{\begin{array}{ll}\frac{1}{\binom nk} & \sum x_i = k\\ 
0 & \sum x_i = 0\end{array}\right. \end{aligned}$$
is independent with $p$.

2. Normal distribution: Suppose $X_1,\dotsc,X_n$ are samples from a normal distribution $N(\mu,\sigma^2)$ where $\sigma$ is already known but $\mu$ is unknown. Then $T = \frac 1n \sum X_i$ is sufficient.

Proof: First we write $e = [1,\dotsc,1]^T\in \mathbb R^{n+1}$ is a vector full of ones, and 
$$[X_1,\dotsc,X_n,\frac1n\sum X_i]^T\sim N (\mu e,\sigma^2\left[\begin{matrix}I & \frac1n  e \\ \frac 1n e^T & \frac1n\end{matrix}\right])$$

Then we recall that the conditional multivariate Gaussian distribution is given by 
$$X_1,\dotsc,X_n|\frac 1n \sum X_i\quad \sim \quad 
N\left( \frac 1n  \sum X_i  e,\quad \sigma^2\left(I - \frac 1n ee^T\right) \right).$$

It does not depend on $\mu$ as expected.

### Factorization Theorem 

(Fisher) $T$ is sufficient if and only if the density can be factorized by 
$$f(x;\theta) = g(T(x),\theta)h(x).$$

### Examples
1. If $\phi(T)$ is a bijection, then $\phi(T)$ is sufficient when $T$ is.

2. If $f(x;\theta) = C(\theta)\exp\{\sum_{i=1}^n Q_i(\theta)T_i(x)\}h(x)$, then   $\{T_i(x)\}$ are sufficient.

3. Uniform distribution over $[0,\theta]$ where $\Theta = \{\theta:\ \theta>0\}$ has sufficient statistic $T = \max X_i$, because

$$f(x;\theta) = \theta^{-n}\prod \mathbb I_{x_i\leqslant \theta} = \theta^{-n}\mathbb I_{T\leqslant \theta}.$$

## Complete Statistics

If, $T$ is statistics such that, for all $\phi(T(X))$ that has the property
$\mathbb E_\theta (\phi(T(X)) )= 0$, we have that 
$$\mathbb P_\theta (\phi(T(X))) = 1,$$
then $T$ is called complete statistics.


### Theorem 

For a natural exponential family, if its natural parameter space is nonempty, then $T$ is complete.

## Estimator


### Estimator Improvement

Suppose $T= T(X)$ is a sufficient statistic and $\hat g(X)$ is already an unbiased estimator for $g(\theta)$. Then we can construct
$$h(T) = \mathbb E(\hat g(X)|T)$$
to be the conditional estimator. It has the property that $h$ is unbiased and 
$${\rm MSE}_\theta (h(T))\leqslant {\rm MSE}_\theta (\hat g(X)).$$


### Lehemann-Scheffé Theorem
(Lehemann-Scheffé, L-S) Suppose statistic $T$ is sufficient and complete. If $\hat g(T(X))$ is an unbiased estimator for $g$, then $\hat g$ is the unique UMVUE (uniformly minimum-variance unbiased estimator) of $g$.