# Inference for Proportions
$$\newcommand{\ffrac}{\displaystyle \frac}
\newcommand{\Tran}[1]{{#1}^{\mathrm{T}}}
\newcommand{\d}[1]{\displaystyle{#1}}
\newcommand{\EE}[1]{\mathbb{E}\left[#1\right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[#1\right]}
\newcommand{\using}[1]{\stackrel{\mathrm{#1}}{=}}
\newcommand{\I}[1]{\mathrm{I}\left( #1 \right)}
\newcommand{\N}[1]{\mathrm{N} \left( #1 \right)}$$
***
<center> Review </center>

When sample size is sufficiently large, more specificlly when $n p_0 \geq 10$, $n(1-p_0) \geq 10$, we have *count* $X$ and *portion* $\hat{p}$:

$$X \sim \N{np, \sqrt{np(1-p)}} ,\hat{p} \sim \N { p, \sqrt{\frac{p(1-p)} {n}}}$$

Now we need to estimate the parameter $p$. Similar to $z\text{-test}$, $\sigma_{\hat{p}} = \sqrt{\ffrac{p(1-p)} {n}}$, and $\text{SE}_{\hat{p}} = \sqrt{\ffrac{\hat{p}\left(1-\hat{p}\right)} {n}}$.

So that $m = z^* \times \text{SE}_{\hat{p}}$ and the **confident interval**: $\hat{p} \pm m = \hat{p} \pm z^* \times \sqrt{\ffrac{\hat{p}\left(1-\hat{p}\right)} {n}}$. $z^*$ is the critical value for the standard Normal density curve with area $C$ between $– z^*$ and $z^*$.
***

And sometimes the prerequisites don't hold, we will apply the plus-four estimate.

**plus-four estimate** of the proportion: $\tilde{p} = \ffrac{2 + \text{number of success}} {n+4}$, and we have the **plus-four confidence interval** for $p$, it is $\tilde{p} \pm z^* \times\sqrt{\ffrac{\tilde{p}\left( 1-\tilde{p} \right)} {n+4}}$

## Significance Test for a Proportion
The hypothesis is like $H_0: p = p_0$, so that $\text{SE}_{\hat{p}} = \sqrt{\ffrac{p_0 \left( 1 - p_0 \right)} {n}}$ and the critical value: $z = \ffrac{\hat{p} - p} {\text{SE}_{\hat{p}}}$. About the $p\text{-value}$: 

$$\begin{array}{cc}
\hline
H_a & p\text{-value} \\ \hline
p > p_0 & P(Z \geq z) \\
p < p_0 & P(Z \leq z) \\
p \neq p_0 & 2\cdot P(Z \geq \left|z\right|) \\ \hline
\end{array}$$

we can also use the formula inversely, so that we can find a sample size that satisfies the limited margin:

$$n = \left( \frac{z^*} {m} \right)^2 p^* \left( 1-p^* \right)$$

here $p^*$ is the guessed value for sample proportion, like $0.5$ or something, if you want the margin be equal to or less than a certain value.

# Inference for Regression
## Simple Linear Regression
We first see what the model looks like
- **population** part
    - $X = $ Independent (Explanatory or Predictor) variable
    - $Y = $ Dependent (Response) variable
    - Model: $Y_i = \beta_0 + \beta_1 \cdot X_i + \varepsilon_i$
    - Mean: $\mu_Y = \beta_0 + \beta_1 \cdot X$
    - residual: $\varepsilon_i = \text{noise} \sim \N{0,\sigma}$
    - Parameters
        - $\mu_Y = $ mean response for a given $X$
        - $\beta_0 = y\text{-intercept}$
        - $\beta_1 = $ slope
        - $\sigma = $ Standard deviation of the model, both $Y$ and the residual.
- **sample** part
    - size: $n$
    - Data: $\left( x_1,y_1\right),\left( x_2,y_2\right),\dots,\left( x_n,y_n\right)$
    - Estimate: $\hat{y}_i = b_0 + b_1 \cdot x_i$
    - Residual (error): $e_i = \hat{y}_i - y_i$
    - Statistics
        - $\hat{y} = $ estimate of the mean $\mu_Y$
        - $b_0 = y\text{-intercept}$-estimate of $\beta_0$
        - $b_1 = $ slope-estimate of $\beta_1$
        - $s = $ Standard Error of estimate for $\sigma = \sqrt{\text{MSE}} = \text{RMSE}$ 

### Assumptions
1. The error terms $\varepsilon_i$ are *independent* and also, $\varepsilon_i \sim \N{0,\sigma^2}$
2. The underlying relationship between the $X$ and $Y$ is linear

### Estimated Regression Model
Regression Function: $\EE{Y_i\mid X_i} = \mu_Y = \beta_0 + \beta_1 \cdot X_i + \EE{\varepsilon_i} = \beta_0 + \beta_1 \cdot X_i$

Then the estimate: $\hat{Y}_{i} = b_0 + b_1 \cdot X_i$. Remember that the individual random error terms $e_i$ have a mean of $\mathbf{0}$

### Estimating the Parameters
Using the least-squares regression we can finally get the result: $\hat{y} = b_0 + b_1 \cdot x$, as the best estimate of the true regression line: $\mu_y = \beta_0 + \beta_1 \cdot x$.

- $\hat{y}$ is an unbiased estimate for mean response $\mu_y$
- $b_0$ is an unbiased estimate for intercept $\beta_0$
- $b_1$ is an unbiased estimate for slope $\beta_1$

The **population standard deviation** $\sigma$ for $y$ at any given value of $x$ represents the spread of the normal distribution of the $\varepsilon_i$ around the mean $\mu_y$. And for each **predicted value** $\hat{y}_i = b_0 + b_1 \cdot x_i$ there's a **residual** $y_i - \hat{y}_i$. The **regression standard error** $s$, for $n$ sample data points, is 

$$s = \sqrt{\frac{\sum \text{residual}^2} {n-2}} = \sqrt{\frac{\sum \left( y_i - \hat{y}_i \right)^2} {n-2}}$$

and this $s$ is also the unbiased estimate of the regression standard deviation $\sigma = \text{RMSE} = \sqrt{\text{MSE}}$



### Checking the regression inference
- The relationship is linear in the population.
- The response varies Normally about the population regression line.
- Observations are independent.
- The standard deviation of the responses is the same for all values of $x$.

We can also check the residual plots.

### $CI$ for regression slope $\beta_1$
The estimator $b_1$ has a $t$ distribution with a degree of freedom $n-2$. The $CI$ for this parameter has the form $b_1 \pm t^* \times SE_{b_1}$

### Significance test for regression slope $\beta_1$
For the hypothesis: $H_0: \beta_1 = \text{hypothesized value}$, first we can calculate the test statistic or the critical value: $\ffrac{b_1 - \text{hypothesized value}} {SE_{b_1}}$. Then using Table with degree of freedom $n-2$ to find the $p\text{-value}$ by the rule

$$\begin{array}{cc}
\hline
H_a & p\text{-value} \\ \hline
\beta > \text{hypothesized value} & P(T \geq t) \\
\beta < \text{hypothesized value} & P(T \leq t) \\
\beta \neq \text{hypothesized value} & 2\cdot P(T \geq \left|t\right|) \\ \hline
\end{array}$$

### Testing the hypothesis of no relationship
$H_0:\beta_1 = 0$, $H_1: \beta_1 \neq 0$. Why this test? Because for the slope, we have $b_1 = r \cdot \ffrac{s_y} {s_x}$, which means testing $\beta_1 = 0$ is equivalent to testing the hypothesis of no correlation between $x$ and $y$ in the population.

Besides, this statistic is the same with testing $H_0: \rho = 0$, originally be $T = \ffrac{r\sqrt{n-2}} {\sqrt{1-r^2}}$

Note that $\beta_0$ normally has no practical interpretation so commonly people don't test the hypothesis on that.

### Analyse the JMP output
Slides are following, here are some very important formulae

- $\text{SST} = \text{SSM} + \text{SSE}$
    - $\text{SST}$: Sum of squares of Total
    - $\text{SSM}$: Sum of squares of Model
    - $\text{SSE}$: Sum of squares of Error
- $\text{DF}_{\mathrm{T}} = \text{DF}_{\mathrm{M}} + \text{DF}_{\mathrm{E}}$
    - $\text{DF}_{\mathrm{T}}$: Degree of freedom of Total
    - $\text{DF}_{\mathrm{M}}$: Degree of freedom of Model
    - $\text{DF}_{\mathrm{E}}$: Degree of freedom of Error$\\[0.7em]$
- $\text{MSM} = \ffrac{\text{SSM}} {\text{DFM}}\\[0.7em]$
- $\text{MSE} = \ffrac{\text{SSE}} {\text{DFE}}\\[0.7em]$
- $F\text{-ratio} = \ffrac{\text{MSM}} {\text{MSE}}\\[0.7em]$
- The standard deviation of $n$ residuals $e_i = y_i - \hat{y}_i$, $s$ is calculated by$\\[0.7em]$
$$\text{MSE} = s^2 = \frac{\sum e^2_i} {n-2} = \frac{\sum \left( y_i - \hat{y}_i \right)^2} {n-2} = \frac{\text{SSE}} {\text{DF}_{\text{E}}} = \text{MSE} \\[0.7em]$$
- $R^2 = \ffrac{\text{SSM}} {\text{SST}} = \ffrac{\sum \left( \hat{y}_i - \bar{y} \right)^2} {\sum \left( y_i - \bar{y} \right)^2} \\[0.7em]$

And other points needed to be pointed out

1. $R = \pm \sqrt{R^2}$, and the sign is the same with estimated slope $b_1$, or positive relation or negative relation
2. $R^2$ is also called the **Coefficient of Determination**, $R$ is also called the **Correlation coefficient**
3. $R^2$ can also means the percentage of variation in the dependent variable $Y$ that is explained by the regression with independent variable $X$


![](./Raw/JMP_op01.png)

![](./Raw/JMP_op02.png)

![](./Raw/JMP_op03.png)

# Multiple Regression
## Inference

In multiple regression, the response variable $y$ depends on $p$ explanatory variables, $x_1, x_2, \dots, x_p$: $\mu_y = \beta_0 + \beta_1 \cdot x_1 + \cdots \beta_p \cdot x_p$. And the statistical model for this is: $y_i = \beta_0 + \beta_1 \cdot x_{i1} + \cdots +\beta_p \cdot x_{ip} + \varepsilon_i$. 

The **mean response**, $\mu_y$ is the linear function of the explanatory variables; the deviation $\varepsilon_i$ are independent and follow the same normal disribution.

The estimators are $b_0, b_1, \dots, b_p$, the degree of freedom is $n-p-1$.

For the $CI$ part, it's basically the same method with simple regression.

### Significance test
$H_0: \beta_j \equiv 0$ against: One of them at least is not $0$. So that when we made it, we can at least draw the conclusion that it’s safe to throw away at least one of the variables. But the way to find the $p\text{-value}$ is still similar. The difference is: A significant $p\text{-value}$ doesn’t mean that all $p$ explanatory variables have a significant influence on $y$, only that at least one does.

## Case study
JMP output is in the last, several points before that is listed here.



![](./Raw/JMP_op04.png)

![](./Raw/JMP_op05.png)

![](./Raw/JMP_op06.png)

![](./Raw/JMP_op07.png)

![](./Raw/JMP_op08.png)
