# $\textbf{Proportion Tests}$

## $\textbf{One sample Proportion Test}$

source: https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_One_Proportion.pdf, https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/One_Proportion-Equivalence_Tests.pdf

$\textbf{Purpose}$: The One-Sample Proportion Test is used to assess whether a population proportion $(P_{1})$ is significantly
different from a hypothesized value  $(P_{0})$.

$\textbf{Assumptions}$:
* Binomial distribution ($P(X_{i}) = const$, $P(\bar{X_{i}}) = 1 - P(X_{i})$, $X_{i} \in \{0,1\}$)
* The population's proportion, $P_{0}$, is known.

$\textbf{Required Data}$:
* $p$- Sample proportion or x number of successes.
* $n$ - Sample size

$\textbf{Proportions}$ can be represented in following ways:
1. $\textbf{Exact values}$ ($P_{1}$ and $P_{0}$)
2. $\textbf{Difference}$ ( $\delta$ = $P_{1} - P_{0}$)
3. $\textbf{Ratio}$ ($\phi$ = $\frac{P_{1}}{P_{0}}$)
4. $\textbf{Odds ratio}$ ($\psi$ = $\frac{O_{1}}{O_{0}}$=$\frac{P_{1}/(1-P_{1})}{P_{0}/(1-P_{0})}$)

### $\textbf{Test statistics}$

$\textbf{Test statistics}$:

1. $\textbf{Exact tests}$
  * The test statistic is $r$, the number of successes in $n$ trials. (Binomial/Hypergeometrical distributions)
  * The $\alpha$ and $\beta$ are computed by enumerating the possible values of $r$, computing the probability of each value, and then computing the corresponding value of the test statistic
2.  $\textbf{Z-Tests}$
  * All rely on CLT $\rightarrow$ statistic $\sim \mathbb{N}(\mu,\sigma^{2})$
  * $z = \frac{p-P_{0}}{SE}$, where $p$ = $\frac{r}{n}$, $P_{0}$ - estimated portion, $SE$ - Standart Error
  * $SE$ can be calculated in 4 forms:
    1. Utilizes ${P_0}$; $SE = \sqrt{\frac{P_{0}(1-P_{0})}{n}}$
    2. Utilizes $p$; $SE = \sqrt{\frac{p(1-p)}{n}}$
    3. Utilizes ${P_0}$ + Continuity Correctio; $SE = \sqrt{\frac{P_{0}(1-P_{0}) + c}{n}}$
    4. Utilizes $p$ + Continuity Correction;$SE =\sqrt{\frac{p(1-p) + c}{n}}$


  $c = \begin{cases}
  \frac{-1}{2n} & \quad \text{if } p > P_{0}\\
  \frac{1}{2n} & \quad \text{if } p < P_{0}\\
  0  & \quad \text{if } |p - P_{0}| < \frac{1}{2n}
  \end{cases}$

### $\textbf{Power calculation}$:

#### $\textbf{Binomial Enumeration of All Possible Outcomes}$:

1. $\textbf{State the Hypotheses}$:

* $H_{0}$: $P$ = $P_{0}$ versus $H_{1}$: $P$ = $P_{1} \neq P_{0}$
* $H_{0}$: $P \leq P_{0}$ versus $H_{1}$: $P$ > $P_{1} > P_{0}$
* $H_{0}$: $P \geq P_{0}$ versus $H_{1}$: $P$ = $P_{1} < P_{0}$

2. $\textbf{Find the critical value}$:

  For an upper-tailed test with a given sample size find the critical value, $P_{c}$, based on the binomial (or
hypergeometric) distribution, so that the probability of rejecting $H_{0}$ when $H_{0}$ is true is equal to a specified significance level, $\alpha$.

3. $\textbf{Evaluate the Sample}$:

  Select a sample of $n$ items from the population and compute the sample proportion, $p = \frac{r}{n}$. If $p > P_{c}$ then
reject the null hypothesis that $P = P_{0}$ in favor of an alternative hypothesis that $P = P_{1} > P_{0}$.

4. $\textbf{Calculate the Power}$:

  The power is the probability of rejecting $H_{0}$ when the true proportion is $P_{1}$. That is, the power is the
probability that $p > P_{c}$ calculated from a binomial (or hypergeometric) distribution in which $P = P_{1}$.
Similar steps are used for the lower-tail and two-tailed tests.

#### 1. $\textbf{Exact tests}$

$FPC = (N-n)(N-1)$ if the population size, $N$, is finite, otherwise 1

$\textbf{Two-sided hypothesis test}$:

$\begin{cases}
H_{0}: p = P_{0}\\
H_{1}: p \neq P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α/2}\sqrt{P_{0}(1-P_{0})FPC}}{\sqrt{P_{1}(1-P_{1})FPC}}) + 1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α/2}\sqrt{P_{0}(1-P_{0})FPC}}{\sqrt{P_{1}(1-P_{1})FPC}})$

$\textbf{Lower One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \geq P_{0}\\
H_{1}: p < P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α}\sqrt{P_{0}(1-P_{0})FPC}}{\sqrt{P_{1}(1-P_{1})FPC}})$

$\textbf{Upper One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \leq P_{0}\\
H_{1}: p > P_{0}\\
\end{cases}$

Power = $1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α}\sqrt{P_{0}(1-P_{0})FPC}}{\sqrt{P_{1}(1-P_{1})FPC}})$

#### 1. $\textbf{Z-Tests}$

##### 1.1 $\textbf{Z-Test using $P_{0}$}$

$\textbf{Two-sided hypothesis test}$:

$\begin{cases}
H_{0}: p = P_{0}\\
H_{1}: p \neq P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α/2}\sqrt{P_{0}(1-P_{0})}}{\sqrt{P_{1}(1-P_{1})}}) + 1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α/2}\sqrt{P_{0}(1-P_{0})}}{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Lower One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \geq P_{0}\\
H_{1}: p < P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α}\sqrt{P_{0}(1-P_{0})}}{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Upper One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \leq P_{0}\\
H_{1}: p > P_{0}\\
\end{cases}$

Power = $1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α}\sqrt{P_{0}(1-P_{0})}}{\sqrt{P_{1}(1-P_{1})}})$

##### 1.2 $\textbf{Z-Test using $P_{0}$ with Continuity Correction}$

$c = 1/2\sqrt{n}$ if $|P_{1} - P_{0}|> 1/2n$, otherwise $c=0$

$\textbf{Two-sided hypothesis test}$:

$\begin{cases}
H_{0}: p = P_{0}\\
H_{1}: p \neq P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α/2}\sqrt{P_{0}(1-P_{0})} - c}{\sqrt{P_{1}(1-P_{1})}}) + 1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α/2}\sqrt{P_{0}(1-P_{0})}+ c }{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Lower One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \geq P_{0}\\
H_{1}: p < P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α}\sqrt{P_{0}(1-P_{0})} - c}{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Upper One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \leq P_{0}\\
H_{1}: p > P_{0}\\
\end{cases}$

Power = $1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α}\sqrt{P_{0}(1-P_{0})}+c}{\sqrt{P_{1}(1-P_{1})}})$

##### 1.3 $\textbf{Z-Test using $p$}$

$\textbf{Two-sided hypothesis test}$:

$\begin{cases}
H_{0}: p = P_{0}\\
H_{1}: p \neq P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α/2}\sqrt{P_{1}(1-P_{1})}}{\sqrt{P_{1}(1-P_{1})}}) + 1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α/2}\sqrt{P_{1}(1-P_{1})}}{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Lower One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \geq P_{0}\\
H_{1}: p < P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α}\sqrt{P_{1}(1-P_{1})}}{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Upper One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \leq P_{0}\\
H_{1}: p > P_{0}\\
\end{cases}$

Power = $1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α}\sqrt{P_{1}(1-P_{1})}}{\sqrt{P_{1}(1-P_{1})}})$

##### 1.4 $\textbf{Z-Test using $p$ with Continuity Correction}$

$c = 1/2\sqrt{n}$ if $|P_{1} - P_{0}|> 1/2n$, otherwise $c=0$

$\textbf{Two-sided hypothesis test}$:

$\begin{cases}
H_{0}: p = P_{0}\\
H_{1}: p \neq P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α/2}\sqrt{P_{1}(1-P_{1})} - c}{\sqrt{P_{1}(1-P_{1})}}) + 1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α/2}\sqrt{P_{1}(1-P_{1})}+ c }{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Lower One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \geq P_{0}\\
H_{1}: p < P_{0}\\
\end{cases}$

Power = $\varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) - z_{α}\sqrt{P_{1}(1-P_{1})} - c}{\sqrt{P_{1}(1-P_{1})}})$

$\textbf{Upper One-sided hypothesis test}$:

$\begin{cases}
H_{0}: p \leq P_{0}\\
H_{1}: p > P_{0}\\
\end{cases}$

Power = $1 - \varPhi(\frac{\sqrt{n}(P_{0}-P_{1}) + z_{α}\sqrt{P_{1}(1-P_{1})}+c}{\sqrt{P_{1}(1-P_{1})}})$

### $\textbf{Confidence Intervals}$

#### $\textbf{Direct calculation}$

$LCL = \frac{XF_{[\alpha/2],[2X,2(n-X+1)]}}{(n-X+1) + XF_{[\alpha/2],[2X,2(n-X+1)]}}$

$UCL = \frac{(X+1)F_{[1-\alpha/2],[2(X+1),2(n-X)]}}{(n-X) + (X+1)F_{[1-\alpha/2],[2(X+1),2(n-X)]}}$

$X$ - sucessful trials, $n$ - number of observations, $F$ - F distribution with $\alpha$ confidence limit

#### $\textbf{Normal Approximation}$

$CI = p \pm z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}$

If a correction for continuity is added, the formula becomes
$CI_{cc} = p \pm (z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}} + \frac{1}{2n})$

$X$ - sucessful trials, $n$ - number of observations, $p$  = $\frac{X}{n}$

#### $\textbf{Wilson Score}$

$CI_{Wilson Score} = \frac{(2np + z_{\alpha/2}^2) \pm z_{\alpha/2}\sqrt{z_{\alpha/2}^2 + 4np(1-p)}}{2(n+z_{\alpha/2}^2)}$

$X$ - sucessful trials, $n$ - number of observations, $p$  = $\frac{X}{n}$

## $\textbf{TOST (Two One-Sided Tests)}$

source: https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/One_Proportion-Equivalence_Tests.pdf

$\textbf{Two one-sided tests (TOST)}$ approach is used to test equivalence. The equivalence test
essentially reverses the roles of the null and alternative hypothesis. Assume that $𝑃$ represents the
population proportion of the response, $S$ is a standard reference proportion, and $M$ is the so-called margin of equivalence. The null and alternative hypotheses are

$H_{0}: P<S-M$ or $P>S+M$
$H_{1}: S-M < P < S+M$

The null hypothesis is made up of two simple one-sided hypotheses:

$H_{01}: P<S - M$

$H_{02}: P>S+M$

If both of these one-sided tests are rejected, we conclude $𝐻_{1}$ that the response is equivalent to the standard proportion (their difference is confined within a small margin).
$\textbf{If we want the $\alpha$ level of the equivalence test to be α, then each of the one-sided tests should be α as well (not α/2)}$

### $\textbf{TOST Statistics}$

#### $\textbf{Large-sample z-test}$

$z=\frac{p-P_{0}}{\sqrt{\frac{P_{0}(1-P_{0})}{n}}}$

$X$ - sucessful trials, $n$ - number of observations, $p$  = $\frac{X}{n}, P_{0}$ - population proportion

#### $\textbf{Approximations}$

1. $z_{c} =\frac{X+0.5-nP_{0}}{\sqrt{nP_{0}(1-P_{0})}}$ if $X <nP_{0}$

2. $z_{c} =\frac{X-0.5-nP_{0}}{\sqrt{nP_{0}(1-P_{0})}}$ if $X >nP_{0}$

1. $z_{c} =\frac{X+0.5-nP_{0}}{\sqrt{np(1-p)}}$ if $X <nP_{0}$

2. $z_{c} =\frac{X-0.5-nP_{0}}{\sqrt{np(1-p)}}$ if $X >nP_{0}$


$X$ - sucessful trials, $n$ - number of observations, $p$  = $\frac{X}{n}, P_{0}$ - population proportion


## $\textbf{Two sample Proportion Test}$

source: https://docs.yandex.ru/docs/view?tm=1701535735&tld=ru&lang=en&name=Tests_for_Two_Proportions.pdf&text=test%20for%20two%20portions&url=https%3A%2F%2Fwww.ncss.com%2Fwp-content%2Fthemes%2Fncss%2Fpdf%2FProcedures%2FPASS%2FTests_for_Two_Proportions.pdf&lr=121600&mime=pdf&l10n=ru&sign=849751adbc4cfd983d39e382b987d66c&keyno=0&serpParams=tm%3D1701535735%26tld%3Dru%26lang%3Den%26name%3DTests_for_Two_Proportions.pdf%26text%3Dtest%2Bfor%2Btwo%2Bportions%26url%3Dhttps%253A%2F%2Fwww.ncss.com%2Fwp-content%2Fthemes%2Fncss%2Fpdf%2FProcedures%2FPASS%2FTests_for_Two_Proportions.pdf%26lr%3D121600%26mime%3Dpdf%26l10n%3Dru%26sign%3D849751adbc4cfd983d39e382b987d66c%26keyno%3D0

$\textbf{Purpose}$: We use this test to check if the proportion of $ treatment$ $group_1$ is the same as the proportion of $control$ $group_2$.

$\textbf{Assumptions}$:
* Binomial distribution ($P(X_{i}) = const$, $P(\bar{X_{i}}) = 1 - P(X_{i})$, $X_{i} \in \{0,1\}$)

$\textbf{Required Data}$:

* $p_{1}, p_{2}$ - Sample proportions number of successes.
* $n_{1}, n_{2}$ - Sample sizes

$\textbf{Technical Deatails}$:

| Group | Success | Failure | Total |
|----------|----------|----------|----------|
| Treatment| $a$    | $c$  | $m$ |
| Control  | $b$    | $d$   | $n$ |
| Total    | $s$    | $f$   | $N$ |

Alternative notation:

| Group | Success | Failure | Total |
|----------|----------|----------|----------|
| Treatment| $x_{11}$    | $x_{12}$  | $n_{1}$ |
| Control  | $x_{21}$    | $x_{22}$   | $n_{2}$ |
| Total    | $m_{1}$    | $m_{2}$   | $N$ |

The binomial proportions $p_{1}$ and $p_{2}$ are estimated from these data using the formulae

$p_{1} = \frac{a}{m} = \frac{x_{11}}{n_{1}}$ and $p_{2} = \frac{b}{n} = \frac{x_{21}}{n_{2}}$

$\textbf{Proportions}$ can be represented in following ways:
1. $\textbf{Exact values}$ ($p_{2}$ and $p_{1}$)
2. $\textbf{Difference}$ ( $\delta$ = $p_{2} - p_{1}$)
3. $\textbf{Ratio}$ ($\phi$ = $\frac{p_{2}}{p_{1}}$)
4. $\textbf{Odds ratio}$ ($\psi$ = $\frac{O_{2}}{O_{1}}$=$\frac{p_{2}/(1-p_{2})}{p_{1}/(1-p_{1})}$)

### $\textbf{Hypothesis Tests}$

Several statistical tests have been developed for testing the inequality of two proportions. For large samples,
the powers of the various tests are about the same. However, for small samples, the differences in the
powers can be quite large. Hence, it is important to base the power analysis on the test statistic that will be
used to analyze the data. If you have not selected a test statistic, you may wish to determine which one
offers the best power in your situation. No single test is the champion in every situation, so you must
compare the powers of the various tests to determine which to use.

#### $\textbf{Difference}$

1. $H_{0}: p_{1} - p_{2} = 0$ versus $H_{1}: p_{1} - p_{2} \neq 0$ (Two-tailed test)
2. $H_{0}: p_{1} - p_{2} \leq 0$ versus $H_{1}: p_{1} - p_{2} > 0$ (upper-tailed test)
3. $H_{0}: p_{1} - p_{2} \geq 0$ versus $H_{1}: p_{1} - p_{2} < 0$ (lower-tailed test)

The traditional approach for testing these hypotheses: `Pearson chi-square test` for large samples, `Yates chi-square test` for intermediate sample sizes, and `Fisher Exact test` for small samples. However, these solution have begun been questioned. For example, based on exact enumeration, Upton (1982) and D’Agostino (1988) conclude that the `Fisher Exact test` and `Yates test` should never be used.

#### $\textbf{Ratio}$

The (risk) ratio, $\phi = p_{1}/ p_{2}$, is often preferred to the difference when the baseline proportion is small (less
than 0.1) or large (greater than 0.9) because it expresses the difference as a percentage rather than an
amount. In this case, the hypothesized ratio of proportions, $\phi_{0}$, is one. Three sets of statistical
hypotheses can be formulated:

1. $H_{0}: p_{1}/ p_{2} = \phi_{0}$ versus $H_{1}: p_{1}/p_{2} \neq \phi_{0}$ (Two-tailed test)
2. $H_{0}: p_{1}/p_{2} \leq \phi_{0}$ versus $H_{1}: p_{1}/p_{2} > \phi_{0}$ (upper-tailed test)
3. $H_{0}: p_{1}/p_{2} \geq \phi_{0}$ versus $H_{1}: p_{1}/p_{2} < \phi_{0}$ (lower-tailed test)




#### $\textbf{Odds Ratio}$

The odds ratio, $\psi$ = $\frac{O_{2}}{O_{1}}$=$\frac{p_{2}/(1-p_{2})}{p_{1}/(1-p_{1})}$
, is sometimes used to compare the two proportions because of its statistical properties and because some experimental designs require its use. In this case, the
hypothesized odds ratio,ψ 0 0, is one. Three sets of statistical hypotheses can be formulated:

1. $H_{0}: \psi = \psi_{0}$ versus $H_{1}: \psi \neq \psi_{0}$ (Two-tailed test)
2. $H_{0}: \psi \leq \psi_{0}$ versus $H_{1}: \psi > \psi_{0}$ (upper-tailed test)
3. $H_{0}: \psi \geq \psi_{0}$ versus $H_{1}: \psi < \psi_{0}$ (lower-tailed test)

### $\textbf{Power Calculation}$



The power for a test statistic that is based on the normal approximation can be computed exactly using two
binomial distributions. The following steps are taken to compute the power of such a test.

1. Find the critical value (or values in the case of a two-sided test) using the standard normal distribution.
The critical value, , is that value of $z$ that leaves exactly the target value of $\alpha$ in the
appropriate tail of the normal distribution. For example, for an upper-tailed test with a target $\alpha$ of 0.05, the critical value is 1.645.

2. Compute the value of the test statistic,$z_{t}$, for every combination of $x_{11}$ and $x_{21}$ Note that $x_{11}$ ranges  from 0 to $n_{1}$, and $x_{21}$ ranges from 0 to $n_{2}$ A small value (around 0.0001) can be added to the zero cell counts to avoid numerical problems that occur when the cell value is zero.

3. If $z_{t} > z_{critical}$, the combination is in the rejection region. Call all combinations of $x_{11}$ and $x_{21}$ that lead to a rejection the set A.

4. Compute the power for given values of $p_{1}$ and $p_{2}$ as

$1-\beta =  \displaystyle\sum_{A}
    \left(\!
    \begin{array}{c}
      n_{1} \\
      x_{11}
    \end{array}
  \!\right) p_{1}^{x_{11}}q_{1}^{n_{1}-x_{11}}
  \left(\!
    \begin{array}{c}
      n_{2} \\
      x_{21}
    \end{array}
  \!\right) p_{2}^{x_{21}}q_{2}^{n_{2}-x_{21}}$

5. Compute the actual value of $\alpha$ achieved by the design by substituting $p_{2}$ for $p_{1}$ to obtain

$\alpha* = \displaystyle\sum_{A}
  \left(\!
    \begin{array}{c}
      n_{1} \\
      x_{11}
    \end{array}
  \!\right) p_{2}^{x_{11}}q_{2}^{n_{1}-x_{11}}
  \left(\!
    \begin{array}{c}
      n_{2} \\
      x_{21}
    \end{array}
  \!\right) p_{2}^{x_{21}}q_{2}^{n_{2}-x_{21}} =
  \displaystyle\sum_{A}
  \left(\!
    \begin{array}{c}
      n_{1} \\
      x_{11}
    \end{array}
  \!\right)
  \left(\!
    \begin{array}{c}
      n_{2} \\
      x_{21}
    \end{array}
  \!\right) p_{2}^{x_{11}+x_{21}}q_{2}^{n_{1} +n_{2}-x_{11} - x_{21}}$

When the values of $n_{1}$ and $n_{2}$ are large (say over 200), these formulas may take a little time to evaluate. In
this case, a large sample approximation may be used.



### $\textbf{Test Statistics}$

#### $\textbf{Fisher's Exact Test}$

$\textbf{Test statistic}$

$T = -\ln \left[\frac{\left(\!
    \begin{array}{c}
      n_{1} \\
      x_{1}
    \end{array}
  \!\right)
  \left(\!
    \begin{array}{c}
      n_{2} \\
      x_{2}
    \end{array}
  \!\right)}
  {\left(\!
    \begin{array}{c}
      N \\
      m
    \end{array}
  \!\right)} \right]$

The distribution of $T$ is based on the hypergeometric distribution. It is given by

 $Pr(T \geq t|m,H_{0}) = \displaystyle\sum_{A(m)} \left[\frac{\left(\!
    \begin{array}{c}
      n_{1} \\
      x_{1}
    \end{array}
  \!\right)
  \left(\!
    \begin{array}{c}
      n_{2} \\
      x_{2}
    \end{array}
  \!\right)}
  {\left(\!
    \begin{array}{c}
      N \\
      m
    \end{array}
  \!\right)} \right]$ where $A(m)$ = {all pairs $x_{1}, x_{2}$, such that $x_{1} + x_{2} = m$, given $T \geq t$}

  Conditional on $m$, the $\textbf{critical value, $t_{\alpha}$}$, is the smallest value of t such that

  $Pr(T \geq t_{\alpha}|m,H_{0}) \leq \alpha$

 $\textbf{Power}$  is defined as
  $1 - \beta = \displaystyle\sum_{m=0}^{N} P(m)Pr(T \geq t_{\alpha}|m,H_{1})$ where

$Pr(T \geq t_{\alpha}|m,H_{1}) =  \displaystyle\sum_{A(m,T \geq t_{\alpha})} \left[\frac{b(x_{1},n_{1},p_{1})b(x_{2},n_{2},p_{2})}{\sum_{A(m)}b(x_{1},n_{1},p_{1})b(x_{2},n_{2},p_{2})} \right]$

$P(m) = Pr(x_{1} + x_{2} = m|H_{1}) = b(x_{1},n_{1},p_{1})b(x_{2},n_{2},p_{2})$

$b(x,n,p) = \left(\!
    \begin{array}{c}
      n \\
      x
    \end{array}
  \!\right) p^{x}(1-p)^{n-x}$

When the normal approximation is used to compute power, the result is based on the pooled, continuity
corrected Z test.

#### $\textbf{Z Test
 (or Chi-Square Test) (Pooled and Unpooled)}$

This test statistic was first proposed by Karl Pearson in 1900 Although this test is usually expressed directly
as a Chi-Square statistic, it is expressed here as a z statistic so that it can be more easily used for one-sided
hypothesis testing.

Both *pooled* and *unpooled* versions of this test have been discussed in statistical literature. The pooling
refers to the way in which the standard error is estimated. In the pooled version, the two proportions are
averaged, and only one proportion is used to estimate the standard error. In the unpooled version, the two
proportions are used separately.

$\textbf{Test statistic}$

$z_{t} = \frac{p_{1} - p_{2}}{\delta_{D}}$

$\textbf{Pooled Version}$

$\delta_{D} = \sqrt{p(1-p)(\frac{1}{n_{1}} + \frac{1}{n_{2}})}$

$p = \frac{n_{1}p_{1} + n_{2}p_{2}}{n_{1} + n_{2}}$

$\textbf{Unpooled Version}$

$\delta_{D} = \sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}$

$\textbf{Power}$  

The power of this test is computed using the enumeration procedure described above. For large sample
sizes, the following approximation is used:

1. Find the critical value (or values in the case of a two-sided test) using the standard normal distribution. The critical value is that value of z that leaves exactly the target value of alpha in the tail.

2. Use the normal approximation to binomial distribution to compute binomial probabilities, compute
the power for the pooled and unpooled tests, respectively, using

<b>Pooled</b>: $1 - \beta = Pr(Z < \frac{z_{\alpha}\delta_{D,p}+(p_{1}-p_{2})}{\delta_{D,u}})$ <b>Unpooled</b>: $1 - \beta = Pr(Z < \frac{z_{\alpha}\delta_{D,u}+(p_{1}-p_{2})}{\delta_{D,u}})$

where

$\delta_{D,u} = \sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}$ (unpooled SE)

$\delta_{D,p} = \sqrt{\bar{p}\bar{q}(\frac{1}{n_{1}} + \frac{1}{n_{2}})}$ with $\bar{p} = \frac{n_{1}p_{1}+n_{2}p_{2}}{n_{1}+n_{2}}, \bar{q} = 1-\bar{p}$(unpooled SE)

#### $\textbf{Z Test
 (or Chi-Square Test) with Continuity Correction (Pooled and Unpooled)}$

$\textbf{Test statistic}$

$z = \frac{(p_{1} - p_{2})+\frac{F}{2}(\frac{1}{n_{1}} + \frac{1}{n_{2}})}{\delta_{D}}$ where F is -1 for lower-tailed, 1 for upper-tailed, and both -1 and 1 for two-sided hypotheses.

$\textbf{Pooled Version}$

$\delta_{D} = \sqrt{p(1-p)(\frac{1}{n_{1}} + \frac{1}{n_{2}})}$

$p = \frac{n_{1}p_{1} + n_{2}p_{2}}{n_{1} + n_{2}}$

$\textbf{Unpooled Version}$

$\delta_{D} = \sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}$

$\textbf{Power}$

The power of this test is computed using the enumeration procedure described for the $z-test$ above. For large samples, approximate results based on the normal approximation to the binomial are used.

#### $\textbf{Conditional Mantel-Haenszel Test}$

$\textbf{Test statistic}$

$z = \frac{x_{11} - E(x_{11})}{\sqrt{V_{c}(x_{11})}}$ where $E(x_{11}) = \frac{n_{1}m_{1}}{N}$, $V_{c}(x_{11}) = \frac{n_{1}n_{2}m_{1}m_{2}}{N^{2}(N-1)}$

$\textbf{Power}$

The power of this test is computed using the enumeration procedure described above.

#### $\textbf{Likelihood Ratio Test}$

$\textbf{Test statistic}$

$LR = 2\left[a\ln(a) + b\ln(b) + c\ln(c) + d\ln(d) + N\ln(N) - s\ln(s) - f\ln(f) - m\ln(m) - n\ln(n)\right]$

$\textbf{Power}$

The power of this test is computed using the enumeration procedure described above. When large sample
results are needed, the results for the $z$ test are used.

#### $\textbf{T-Test}$

$\textbf{Test statistic}$

$t_{N-2} = (ad-bc)(\frac{N-2}{N(nac+mbd)})^{\frac{1}{2}}$ ($t$ distribution with $N-2$ degrees of freedom.)

$\textbf{Power}$

The power of this test is computed using the enumeration procedure described above, except that the t
tables are used instead of the standard normal tables.

### $\textbf{Confidence Intervals}$

$CI = p_{1}-p_{2} \pm z_{1 - \alpha/2}\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}$

## $\textbf{Sample Size Calculation}$

souces: https://colab.research.google.com/drive/10DkHtLXsRHYSB7f5LMV5h5WW__uTcNHE?usp=sharing#scrollTo=PCJDTvlOeMy5, https://www.statskingdom.com/50_ci_sample_size.html

$\textbf{Total number of observations}$:

  * One-sided test:
$n \approx
\left[
\frac{\sigma(z_{\beta} + z_{\alpha})}
{\mu_0-\mu}
\right]^2$

  * Two-tailed test:
$n \approx
\left[
\frac{\sigma(z_{\beta} + z_{\alpha/2})}
{\mu_0-\mu}
\right]^2$


$\textbf{Number of observations per group with inequal sizes}$:

$m_{large} = \frac{(z_{1-\alpha/2}\sqrt{\bar{p}\bar{q}(r+1)} + z_{\beta}\sqrt{rp_{1}q_{1}+p_{2}q_{2}})^{2}}{r(p_{1}-p_{2})^{2}}$ where $r$ - ratio, $\bar{p} = \frac{p_{1} + rp_{2}}{r+1}, \bar{q} = 1 - \bar{p}$


$m_{small} =  r*m_{large}$

## $\textbf{Practice}$:

source: https://bytepawn.com/ab-testing-and-the-ztest.html

#### Lib import

In [None]:
import numpy as np
from random import random
import math
from scipy.stats import norm
import scipy
import hashlib
from statsmodels.stats.proportion import proportions_ztest, proportion_confint,confint_proportions_2indep

#### Comparison

$\textbf{Task}$: Let’s pretend we’re running an A/B test on funnel conversion. A is the current, B is the new version of the funnel. We want to know whether B is better. By looking at our funnel dashboard, we know that A is historically converting around 9-11%

1. Formulate hypotheses:

  B has higher conversion than A, meaning we're doing a one-sided test.

  $\begin{cases}
  H_{0}: CR = 0.10\\
  H_{1}: CR > 0.10\\
  \end{cases}$

2. We will set an $\alpha$ at level 0.05, and ($1 - \beta$) at 0.80.This means we're okay with 10% false positives and we will capture 80% of improvements.

$\textbf{Step 3.}$ Figure out how many samples we need to collect, given the historic conversion, traffic split, alpha and the kind of lift we’re looking.

We will write custom function, counting the total number of observations, for the purpose


In [None]:
# #from stats example
# import statsmodels.stats.api as sms
# es = sms.proportion_effectsize(p1, p2)
# round(sms.NormalIndPower().solve_power(es, power=0.80, alpha=0.05, ratio=1))

def alpha_to_z(alpha, one_sided):
    if one_sided:
        pos = 1 - alpha
    else:
        pos = 1 - alpha/2.0
    return norm.ppf(pos)

def power_to_z(power):
    pos = power
    return norm.ppf(pos)

def num_samples(alpha, mu_A, mu_delta, traffic_ratio_A, power=0.50, one_sided=True):
    z_alpha = alpha_to_z(alpha, one_sided)
    z_power  = power_to_z(power)
    mu_B = mu_A + mu_delta
    traffic_ratio_B = 1 - traffic_ratio_A
    N = ( mu_A*(1-mu_A)/traffic_ratio_A + mu_B*(1-mu_B)/traffic_ratio_B ) * ((z_alpha+z_power)**2) / (mu_A - mu_B)**2
    return math.ceil(N)

$\textbf{Step 4.}$ Create a random seed for the A/B test and save it server-side. We generate a new seed for each A/B test. Let’s say we generate the string for this one: `OkMdZa18pfr8m5sy2IL52pW9ol2EpLekgakJAIZFBbgZ`

$\textbf{Step 5.}$ Perform test by splitting users randomly between A and B according to the above proportions. Users coming, identified by a user_id (or cookie_id), should be put in the same funnel. We can accomplish this by hashing the test_id, where test_id = seed + user_id:

In [None]:
test_seed = 'OkMdZa18pfr8m5sy2IL52pW9ol2EpLekgakJAIZFBbgZ'

def funnel_user(base_traffic_split, test_seed, user_id):
    test_id = hashlib.md5(test_seed.encode('ascii') + str(user_id).encode('ascii')).hexdigest()
    bits = bin(int(test_id, 16))[3:]
    r = sum([int(bit)*(0.5**(i+1)) for i, bit in enumerate(bits)])
    if r < base_traffic_split:
        return 'A'
    else:
        return 'B'

$\textbf{Step 6.}$ Run the test. We're simulating the real-world here, so we will have to pick the actual conversions for A and B. This is not known to the test, this is what it's trying to estimate, so we call this a hidden variable:



In [None]:
hidden_conversion_params = {'A': 0.105, 'B': 0.115 }

def run_test(N, hidden_conversion_params, funnel_user_func):
    test_outcomes = {'A': {'N': 0, 'conversions': 0}, 'B': {'N': 0, 'conversions': 0}}
    for user_id in range(N):
        which_funnel = funnel_user_func(user_id) # returns 'A' or 'B'
        test_outcomes[which_funnel]['N'] += 1
        if random() < hidden_conversion_params[which_funnel]:
            test_outcomes[which_funnel]['conversions'] += 1
    return test_outcomes

$\textbf{Step 7.}$ Compute the p-value and compare it with the $\alpha$ we set to decide whether to accept or reject B. Additionaly calculate confidence intervals.

we will count both one-sample and two-sample test for the purpose of ptratice.Indeed, two-sample test should be picked.

In [None]:
def calculate_continuous_term(baseline_portion,prop_portion,nobs):
  if abs(baseline_portion - prop_portion) < 1/(2*nobs):
    c = 0
  elif baseline_portion > prop_portion:
    c = -1 / (2 * nobs)
  else:
    c = 1 / (2 * nobs)

  return c

def calculate__one_sample_z_statistics(nobs,count,value,se_formula='p0', alternative = 'two-sided'):

  nominator = (count/nobs - value)
  if se_formula == 'p0':
    se = np.sqrt(1/nobs * value * (1-value))
  elif se_formula == 'p':
    se = np.sqrt(1/nobs * count/nobs * (1-count/nobs))
  elif se_formula == 'p0_cont':
    c = calculate_continuous_term(count/nobs,value,nobs)
    se = np.sqrt(1/nobs * (value * (1-value) + c))
  else:
    c = calculate_continuous_term(count/nobs,value,nobs)
    se = np.sqrt(1/nobs * (count/nobs * (1-count/nobs) + c))

  z = nominator / se

  if alternative == 'two-sided':
    p = scipy.stats.norm.sf(abs(z)) * 2
  elif alternative == 'smaller':
    if count/nobs < value:
      p = scipy.stats.norm.sf(abs(z))
    else:
      p = 1 - scipy.stats.norm.sf(abs(z))

  else:
    if count/nobs > value:
      p = scipy.stats.norm.sf(abs(z))
    else:
      p = 1 - scipy.stats.norm.sf(abs(z))

  return z,p


def calculate_two_sample_z_statistics(x1, n1, x2, n2, one_sided=False):
    p1 = x1/n1
    p2 = x2/n2

    p = (x1+x2)/(n1+n2)
    se = p*(1-p)*(1/n1+1/n2)
    se = np.sqrt(se)

    z = (p1-p2)/se
    p = 1-scipy.stats.norm.cdf(abs(z))
    p *= 2-one_sided # if not one_sided: p *= 2
    return z, p

def compute_standard_error_prop_two_samples(x1, n1, x2, n2, alpha=0.05):
    p1 = x1/n1
    p2 = x2/n2
    se = p1*(1-p1)/n1 + p2*(1-p2)/n2
    return np.sqrt(se)

def zconf_interval_two_samples(x1, n1, x2, n2, alpha=0.05):
    p1 = x1/n1
    p2 = x2/n2
    se = compute_standard_error_prop_two_samples(x1, n1, x2, n2)
    z_critical = stats.norm.ppf(1- 0.5 * alpha)
    return p2-p1-z_critical*se, p2-p1+z_critical*se

def zconf_interval_one_sample(X,n,alpha=0.05):
  p = X/n
  se = np.sqrt(p * (1-p) * 1/n)
  z_critical = stats.norm.ppf(1 - 0.5 * alpha)

  return p - z_critical * se, p + z_critical * se

$\textbf{Step 8.}$ Run the experiment

In [None]:
alpha = 0.05
power = 0.80
base_conversion = 0.10
valuable_diff = 0.01
base_traffic_split = 0.8

N_required = num_samples(
    alpha=alpha,
    mu_A=base_conversion,
    mu_delta=valuable_diff,
    traffic_ratio_A=base_traffic_split,
    power=power,
)

# hidden_conversion_params is how our funnels actually perform:
# the difference between the two is what we're trying to establish
# with statistical confidence, using an A/B test
hidden_conversion_params = {'A': 0.105, 'B': 0.115 }
test_seed = 'OkMdZa18pfr8m5sy2IL52pW9ol2EpLekgakJAIZFBbgZ'
test_outcomes = run_test(
    N_required,
    hidden_conversion_params,
    lambda user_id: funnel_user(base_traffic_split, test_seed, user_id),
)

mu_A = test_outcomes['A']['conversions'] / test_outcomes['A']['N']
mu_B = test_outcomes['B']['conversions'] / test_outcomes['B']['N']

print(f"Measured conversion for A: {np.round(mu_A,3)} with the group size {test_outcomes['A']['N']}")
print(f"Measured conversion for B: {np.round(mu_B,3)} with the group size {test_outcomes['B']['N']}")

print('-'*50,'\nOne Sample Tests')
z,p = proportions_ztest(test_outcomes['B']['conversions'], test_outcomes['B']['N'], base_conversion,alternative='larger')
z1,p1 = calculate__one_sample_z_statistics(test_outcomes['B']['N'], test_outcomes['B']['conversions'], base_conversion, se_formula='p',alternative='larger')

lci,rci = zconf_interval_one_sample(count[1], nobs[1], alpha=alpha)
lci1,rci1 = proportion_confint(count[1], nobs[1], alpha=alpha, method='normal')

print(f'Stats function: z-score- {np.round(z,5)}, p_val- {np.round(p,5)}\nCustom function: z-score- {np.round(z1,5)}, p_val- {np.round(p1,5)}')
print(f'Stats: The {100* (1-alpha)}% CR in treatment group lies between {lci1 * 100} and {rci1 * 100}\nCustom: The {100* (1-alpha)}% CR in treatment group lies between {lci * 100} and {rci * 100}')
print('-'*50,'\nTwo Sample Tests')

count = np.array([test_outcomes['A']['conversions'],test_outcomes['B']['conversions']])
nobs = np.array([test_outcomes['A']['N'], test_outcomes['B']['N']])

z,p = proportions_ztest(count,nobs, alternative='smaller')
z1,p1 = calculate_two_sample_z_statistics(count[0], nobs[0], count[1], nobs[1], one_sided=True)

ci_low,ci_upp = zconf_interval_two_samples(count[0], nobs[0], count[1], nobs[1])
ci_low1,ci_upp1 = confint_proportions_2indep(count[1], nobs[1], count[0], nobs[0], compare='diff', alpha=0.05)


print(f'Stats function: z-score- {np.round(z,5)}, p_val- {np.round(p,5)}\nCustom function: z-score- {np.round(z1,5)}, p_val- {np.round(p1,5)}')
print(f'Stats: The {100* (1-alpha)}% difference in CR between groups lies between {ci_low1 * 100} and {ci_upp1 * 100}\nCustom: The {100* (1-alpha)}% difference in CR between groups lies between {ci_low * 100} and {ci_upp * 100}')
print('-'*50)


if p1 <= alpha:
    print("""Final decision: Reject Null hypotheses -> Action: B is better""")
else:
    print("""Final decision:: Not enough evidence for rejecting Null hypothesis -> We're not sure if B is better than A""")

Measured conversion for A: 0.105 with the group size 29756
Measured conversion for B: 0.113 with the group size 7463
-------------------------------------------------- 
One Sample Tests
Stats function: z-score- 3.64029, p_val- 0.00014
Custom function: z-score- 3.64029, p_val- 0.00014
Stats: The 95.0% CR in treatment group lies between 11.216300926780152 and 12.688295080187556
Custom: The 95.0% CR in treatment group lies between 11.216300926780152 and 12.688295080187556
-------------------------------------------------- 
Two Sample Tests
Stats function: z-score- -2.14683, p_val- 0.0159
Custom function: z-score- -2.14683, p_val- 0.0159
Stats: The 95.0% difference in CR between groups lies between 0.07381586948251115 and 1.6722786605542228
Custom: The 95.0% difference in CR between groups lies between 0.05833105267325604 and 1.6563988375300727
--------------------------------------------------
Final decision: Reject Null hypotheses -> Action: B is better
