## Definition of a random sample

### Random sample
Random variables $X_1,...,X_n$ are called a random sample of size $n$ from the population $f(x)$
- If $X_1,...X_n$ are
    - mutually independent, and
    - marginal pmf/pdf of each $X_i$ is $f(x)$
    - Alternative name for a random sample:
        - Independent and identically distributed (iid) random variables with pdf or pmf $f(x)$
        - i.i.d. $f(x)$ = random sample from $f(x)$



### Random Samples and Statistical inference

- We view data as **observations of random variables**
- Usually have more than one observation
    
    $X_1, X_2,..., X_n$

    - Can often assume that $X_1,X_2,...,X_n$ is a **random sample**

- We model the data by specifiying a joint distribution

    $f(x_1,x_2,...,x_n|\theta)$

    - $\theta$ is unknown
    - with the goal of 
        - learning about (estimating) $\theta$ and/or
        - predicting observations of $X_{n+1},X_{n+2},...$

- Use some summary of the data to do this
    - Need to find the distribution of that summary $\to$ **sampling distribution**

#### Example: 

- Say we collected data on temperatures at CLE airport.
- Say data points are $X_1 = 11^{\circ}C, X_2=12^{\circ}C, X_3=12.2^{\circ}C,...,X_n=17^{\circ}C$
- We will assume that $X_1,X_2,...,X_n$ are realizations of random variables.

    - $X_1,X_2,...,X_n$

- If $X_1,X_2,...,X_n$ are random sample from $f(x)$, then $f(x)$ can be seen as the (population) distribution of temperatures at all times at CLE.

- Calculate e.g. $\hat{X}=\frac{1}{n}\sum_{i=1}^{n}X_i$

- Distribution of $\hat{X}$ is sampling distribution.

### About random samples

- Recall: Random variables $X_1,X_2,...,X_n$ are mutually independent iif

    $f(x_1,...,x_n)=f_1(x_1)\times...\times f_n(x_n)$

    where $f_i(x_i)$ is the marginal pdf/pmf of $X_i$.

- So, what is the joint pdf/pmf of a random sample from $f(x)$?
    - $f_i(x_i)=f(x_i)$ for all $i$, so

        $\begin{aligned}
        f(x_1,...,x_n)&=f_1(x_1)\times\cdots\times f_n(x_n)\\
                      &=f(x_1)\times\cdots\times f(x_n)\\
                      &=\sum_{i=1}^{n}f(x_i)
        \end{aligned}$

### Mutually independent

- Random variables $X_1,...,X_n$ are (mutually) independent iif

    $f(x_1,...,x_n)=f_1(x_1)\times\cdots\times f_n(x_n)$

    where $f_i(x_i)$ is the marginal pdf/pmf of $X_i$.

- $\Rightarrow$ any subcollection of $X_1,...,X_n$ are also (mutually) independent.

- For example:

    $\begin{aligned}
    f(x_1,x_2)&=\int\cdots\int f(x_1,...,x_n)dx_3\cdots dx_4 \\
              &=\int\cdots\int f_1(x_1)\times\cdots\times f_n(x_n) dx_3\cdots dx_n\\
              &=f_1(x_1)f_2(x_2)\int f_3(x_3)dx_3\times\cdots\times\int f_n(x_n) dx_n\\
              &=f_1(x_1)f_2(x_2)
    \end{aligned}$



### More about random samples

- Not all **collections** of random variables are random **samples**
    - Need both independence and same (marginal) distributions
- If population is finite and we sample **without replacement** we don't get a random sample.

#### Example:
- Draw cards from a standard deck or 52 cards.
- Let $X_i=$ the card we get in draw $i$, $i=1,...,10.$
- All $X_i$ have the same (marginal) distribution, but they are not independent sicne e.g.:

    $P(X_1=3\spadesuit)=\frac{1}{52}$

    $\text{But}~P(X_1=3\spadesuit\mid X_4=2\diamondsuit)=\frac{1}{51}$

### A simple random sample

- Sampling without replacement from a finite population is a very common.
- A **simple random sample** of size $n$, $X_1,...,X_n$ from a finite population of size $N$ comes from a selection procedure were:
    - Any subset of $n$ elements have **the same probability** of being selected.
- Simple random sample $\neq$ ramdom sample
- If $N$ is huge we have simple random $\approx$ random sample
    - $\frac{1}{N}\approx\frac{1}{N-1}$, "almost independent!"

### More definitions

#### A Statistic

Let
- $X_1,...,X_n$ be a random sample of size $n$
- $T(x_1,...,x_n)$ be a real-valued (or vector-valued) funtion with domain that includes the sample space of $(X_1,...,X_n)$
    - You just have to be able to evaluate $T(X_1,X_2,...,X_n)$ for any possible value of $X_1,...,X_n$.

Then
- The random variable (or random vector) $Y=T(X_1,...,X_n)$ is called a **statistic**.
- The probability distribution of $Y$ is called the sampling distribution of $Y$
- In short: A statistic is a function of a random sample.
- Note: Cannot be a function of a parameter.
- Key points:
    - In general, a statistic is a function of a collection of random variables (does not have to be a random sample).


### Commonly seen statistics
- Sample mean:

    $\bar{X}=\frac{1}{n}(X_1+\cdots+X_n)=\frac{1}{n}\sum_{i=1}^n X_i$

- Sample variance:

    $S^2=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2$

- Sample standard deviation:

    $S=\sqrt{S^2}$

The random variables $\bar{X}, S^2, S$ all have a sampling distribution.

### Sampling distributions

- A lot of Statistical inference is based on sampling distributions
- Hence our focus on distribution of functions of random variables!
    - A bit easier if we have a random sample
- Example: If the mgf for the population exist:

Let $X_1,...,X_n$ be a random sample of size $n$ from a population with mgf $M_X(t)$. The mgf of the sample mean is

$$M_{\bar{X}}(t)=(M_X(t/n))^n$$

- Only useful if we recognize the mgf on the right side

$$\bar{X}=\frac{1}{n}X_1+\cdots+\frac{1}{n}X_n$$

- i.e. $X=b+a_1X_1+\cdots+a_nX_n,\,\forall a_i=\frac{1}{n},b=0$

#### Examples of sampling distributions
- Let $X_1,X_2,...,X_n$ be a random sample from $N(\mu,\sigma^2)$. What's the distribution of $\bar{X}$?

$$E(\bar{X})=E(\frac{1}{n}\sum_{i=1}^n X_i)=E(\mu)=\mu$$

$$\begin{aligned}
Var(\bar{X})&=Var(\frac{1}{n}\sum_{i=1}^n X_i)\\
            &=\frac{1}{n^2}Var(\sum_{i=1}^n X_i)\\
            &=\frac{1}{n^2}\sum_{i=1}^nVar(X_i)\\
            &=\frac{1}{n^2}n \sigma^{2}\\
            &=\frac{\sigma^2}{n}
\end{aligned}$$

$$\bar{X}\sim N(\mu,\frac{\sigma^2}{n})$$

- Let $X_1,X_2,...,X_n$ be a random sample from Gamma($\alpha,\sigma^2$). What is the distribution of $\bar{X}$?

    MGF of Gamma($\alpha,\beta$): $M_X(t)=(\frac{1}{1-\beta t})^\alpha$

    $\Rightarrow M_{\bar{X}}(t)=(M_X(\frac{t}{n}))^n=(\frac{1}{1-\frac{t\beta}{n}})^{\alpha n}$

    $\Rightarrow$ mgf of Gamma($n\alpha,\frac{\beta}{n}$)

    $\Rightarrow \bar{X}\sim$ Gamma($\alpha n,\frac{\beta}{n}$) 

### Sampling distributions: convolution formula
E.g. if mgfs are not available, we can use convalution formula.
#### Convolution formula
Let $X$ and $Y$ be independent random variables with pdfs $f_X(x)$ and $f_Y(y)$. Then the pdf of $Z=X+Y$ is

$$f_Z(z)=\int_{-\infty}^{\infty}f_X(w)f_Y(z-w)dw$$

- where $w=x$

- Proof:

    We can use inverse function:

    $x=w=h_1(z,w),y=z-x=z-w=h_2(z,w)$

    $(X,Y)\to(Z,W)$

    $\frac{\partial x}{\partial z}=0,\frac{\partial x}{\partial w}=1,\frac{\partial y}{\partial z}=1, \frac{\partial y}{\partial w}=-1$

    $|J|=|-1\cdot 0+1\cdot 1|=1$

    $f_{XY}(x,y)=f_{X}(x)f_{Y}(y)$

    Therefore, $f_{ZW}(z,w)=f_X(w)f_Y(z-w)|-1|$

    $f_Z(z)=\int_{-\infty}^{\infty}f(z,w)dw=\int_{-\infty}^{\infty}f_X(w)f_Y(z-w)dw$

- When $n>2$, just iterate:

    $Z_1=X_1+X_2$

    $Z_2=X_1+X_2+X_3=Z_1+X_3$

    $\vdots$

    $Z_n=Z_{n-1}+X_n$

### Moments of sampling distributions

#### Lemma

Let $X_1,...,X_n$ be a random sample of size $n$ from a population and let $g(x)$ be a function such that $E(g(X_1))$ and $Var(g(X_1))$ exists. Then

$$E(\sum_{i=1}^{n}g(X_i))=nE(g(X_1))\tag{1}$$

$$Var(\sum_{i=1}^{n}g(X_i))=nVar(g(X_1))\tag{2}$$

- Proof $(1)$:

    $\begin{aligned}
    &E(\sum_{i=1}^{n}g(X_i))\\
    &=E(g(X_1)+g(X_2)+...+g(X_n))\\
    &=E(g(X_1))+E(g(X_2))+...+E(g(X_n))\\
    &=nE(g(X_1))\\
    \end{aligned}$

- Proof (2):

    $\begin{aligned}
    &Var(\sum_{i=1}^{n}g(X_i))\\
    &=E\big([\sum_{i=1}^{n}g(X_i)-E(\sum_{i=1}^{n}g(X_i))]^2\big)\\
    &=E\big([\sum_{i=1}^{n}g(X_i)-nE(g(X_1))]^2\big)\\
    &=E\big([g(X_1)-E(g(X_1))+g(X_2)-E(g(X_2))+...+g(X_n)-E(g(X_n))]^2\big)\\
    &=E\big[g(X_1)-E(g(X_1)))^2+(g(X_2)-E(g(X_2)))^2+...(g(X_n)-E(g(X_n)))^2+2(g(X_1)-E(g(X_1))(g(X_2)-E(g(X_2)))+...+2(g(X_{n-1})-E(g(X_{n-1}))(g(X_{n})-E(g(X_{n}))))\big]\\
    &=E((g(X_1)-E(g(X_1)))^2)+...+E((g(X_n)-E(g(X_n)))^2),\,\text{Since } 2Cov(X_i,X_j)=0,\forall i\neq j\\
    &=nVar(g(X_1))\\
    \end{aligned}$

### Theorem

Let $X_1,...,X_n$ be a random sample of size $n$ from a population with mean $\mu$ and variance $\sigma^2<\infty$. Then

1. $E(\bar{X})=\mu$
    - $E(\frac{1}{n}\sum_{i=1}^n X_i)=\frac{1}{n}n\mu=\mu$
2. $Var(\bar{X})=\frac{\sigma^2}{n}$
    - $Var(\frac{1}{n}\sum_{i=1}^{n}X_i)=\frac{1}{n^2}n\sigma^2=\frac{\sigma^2}{n}$
3. $E(S^2)=\sigma^2\cdots(3)$
4. Useful fact: For any numbers $x_1,...,x_n$ we have
    - $\sum_{i=1}^{n}(x_i-\bar{x})^2=\sum_{i=1}^n x_{i}^2-n\bar{x}^2$
        - where $\bar{x}=\frac{1}{n}\sum_{i=1}^n x_i$

- Proof $(3)$:
    
    $S^2=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2$

    $\begin{aligned}
    E(S^2)&=\frac{1}{n-1}\big(E(\sum_{i=1}^{n}X_i^2-n\bar{X}^2)\big),\,using~(4)\\
          &=\frac{1}{n-1}(\sum_{i=1}^{n}E(X_i^2)-nE(\bar{X}^2))\\
          &=\frac{1}{n-1}(n\cdot(\sigma^2+\mu^2)-n(\frac{\sigma^2}{n}+\mu^2))\\
          &=\sigma^2
    \end{aligned}$