# Modulus of Continuity

The modulus of continuity for the Fisher's combined p-value is in terms of $\delta$, the distance between two beta values. Since both ballot polling and ballot comparison tests utilize the same set of risk functions, the formula (in terms of $t$) for the modulus of continuity does not differ between the testing procedures as in SUITE. 

The risk functions in the `TestNonnegMean` class are in terms of $t$, the null value of the mean. However, in a stratified audit, the null hypothesis is adjusted by the proportion of each stratum's size to test the hypothesis $$\bar{A}_s^b\leq \frac{N}{N_s}\beta_s$$ 

This notebook consists of modulus derivations for the different risk functions in terms of $t$ and the composed modulus in terms of $\beta$.

In many cases, the modulus will be expressed in $n_1, n_2$ which are the number of ballots sampled from their respective strata. The actual number of ballots used to calculate the $p$-value may be less than $n_1, n_2$ if the ballots are sampled in a random order. However, ultilizing $n_1, n_2$ is a conservative estimate of the modulus and is more fitting for the current risk functions. 
 

## Modulus Properties and Definitions (CO18.pdf)

*Composition Property*

$$
\omega(f \circ g, \delta) = \omega_f(\omega_g(\delta))
$$

The modulus with respect to $x$:

*Logarithmic functions*

$$
\log(ax-b) \rightarrow \log(1+a\delta)
$$

*Linear functions*

$$
ax+b \rightarrow a\delta
$$


## CVR and no CVR strata modulus

In a stratified audit, the null $\bar{A}^b \leq 1/2$ can be rejected if the intersection hypothesis 

$$\bigg\{ \bigcap_{s \in S}\frac{N_s}{N}\bar{A}_s^b\le\beta_s\bigg\}$$

for all $(\beta_s)_{s=1}^S$ such that $\sum_{s=1}^S \beta_s \leq 1/2$. 

### CVR stratum

#### $t$ transformation

Ballot comparison audits utilize overstatement discrepancies between CVRs and corresponding MVRs to compute the $p$ value. The overstatement assorter follows the derivation in SHANGRLA 3.2. To test a null hypothesis $t=1/2$, the assorter is defined as 

$$
\bar{B} =\frac{1}{N}\sum_{i=1}^N B_i^b, B_i^b =\frac{1-\omega_i/u}{2-v/u}
$$

However, in a stratified audit, $t$ varies to maximize the combined $p$ value. Thus, we can define a generic assorter to test for some null $t$ as 

$$
B_i^b =\frac{1-\omega_i/u}{1/t-v/u}
$$

To test assertions for the generic assorter in terms of the implemented $t=1/2$ assorter, we can transform the null hypothesis $t_0$ to test in the CVR stratum. 

$$
\begin{aligned}
\bar{B} =\frac{1}{N}\sum_{i=1}^N\frac{\tau_i}{\frac{1}{t}-\frac{v}{u}}&<t\\
\frac{1}{N}\sum_{i=1}^N\tau_i&<t*\left(\frac{1}{t}-\frac{v}{u}\right)\\
\frac{1}{N}\sum_{i=1}^N\frac{\tau_i}{2-\frac{v}{u}}&<t*\frac{\frac{1}{t}-\frac{v}{u}}{2-\frac{v}{u}}\\
\frac{1}{N}\sum_{i=1}^N\frac{\tau_i}{2-\frac{v}{u}}&<t*\frac{u-vt}{tu}*\frac{u}{2u-v}\\
\frac{1}{N}\sum_{i=1}^N\frac{\tau_i}{2-\frac{v}{u}}&<\frac{u-vt}{2u-v}\\
\end{aligned}
$$

Thus, the transformed null tested in the ballot comparison strata is $t^*=(u-vt_1)/(2u-v)$ with $t_1$ being the null hypothesis originally tested in the stratum. The modulus for $t^*$ with respect to $t_1$ is 

$$
\omega(t^*, \delta)=\frac{v}{2u-v}*\delta
$$

#### $\beta$ transformation

The CVR stratum defines $t_1=(N_1+N_2)/N_1*\beta$ to test the null hypothesis

$$
\bar{A}_1^b\leq \frac{N_1+N_2}{N_1}\beta
$$ 

Since $t_1$ varies linearly with $\beta$, the modulus of $t_1$ with respect to $\beta$ is 

$$\omega(t_1(\beta), \delta) = \frac{N_1+N_2}{N_1}*\delta$$

#### CVR composition

Let $t_1^* = t_1 \circ t^*$. The composition property allows us to defin the modulus that takes the $t$ and $\beta$ transformations into account. 

$$
\omega(t_1(\beta), \delta) = \frac{v}{2u-v}*\frac{N_1+N_2}{N_1}*\delta
$$

### No CVR stratum

For any $f$ such that $\beta_1+\beta_2=f\leq 1/2$, we can define $t_2=(N_1+N_2)/N_2*(f-\beta)$ to test the assertion 

$$
\bar{A}_2^b\leq \frac{N_1+N_2}{N_2}(f-\beta)
$$

Since $t_2$ varies linearly with $\beta$, the modulus of $t_2$ with respect to $\beta$ is independent of $f$:  

$$
\omega(t_2(\beta), \delta) = \frac{N_1+N_2}{N_2}*\delta
$$



## kaplan_markov

The formula for the `kaplan_markov` risk function is 
$$
p_{KM}\equiv 1 \land \min_{1\leq j\leq J} 
\prod_{k=1}^j \frac{t+\gamma}{x_k+\gamma}
$$

Under the assumption that the entire sample is used to calculate the $p$-value, we can define $p_{KM}$ in terms of the number of ballots for the winner and loser. Looking at a single stratum $s$, let $n_s$ (sample size), $n_{s, w}$ (ballots for winner in sample), $n_{s, \ell}$ (ballots for loser in sample), $\gamma$, and $t_s$. Let $x_{s, w}, x_{s, \ell}, x_{s, \emptyset}$ represent the assorter values for winning ballots, losing ballots, and invalid ballots respectively.

$$
\begin{aligned}
p_{KM} =& \left(\frac{t+\gamma}{x_{s, w}+\gamma}\right)^{n_{s, w}}
\left(\frac{t+\gamma}{x_{s, \ell}+\gamma}\right)^{n_{s, \ell}}
\left(\frac{t+\gamma}{x_{s, \emptyset}+\gamma}\right)^{n_s-n_{s, w}-n_{s, \ell}} \\
=& (t+\gamma)^{n_s}\left(\frac{1}{x_{s, w}+\gamma}\right)^{n_{s, w}}
\left(\frac{1}{x_{s, \ell}+\gamma}\right)^{n_{s, \ell}}
\left(\frac{1}{x_{s, \emptyset}+\gamma}\right)^{n_s-n_{s, w}-n_{s, \ell}} \\
\chi_s(t_s) =& -2 n_s\log(t_s+\gamma)-2\log \left[\left(\frac{1}{x_{s, w}+\gamma}\right)^{n_{s, w}}
\left(\frac{1}{x_{s, \ell}+\gamma}\right)^{n_{s, \ell}}
\left(\frac{1}{x_{s, \emptyset}+\gamma}\right)^{n_s-n_{s, w}-n_{s, \ell}} \right]\\
\end{aligned}
$$

The resulting modulus of $\chi_s$ with respect to $t$ is independent of the $n_w, n_\ell$, the winner's and loser's share of the $n$ votes. 

$$
\omega(\chi_s(t), \delta_s)=2n \log(1+ \delta_s)
$$

Composing the kaplan_markov modulus with the CVR and no CVR derivations above, we have the modulus for the combined risk function:

$$
\omega(\chi_{KM}, \delta)=2n_1 \log\left(1+ \frac{v}{2u-v}*\frac{N_1+N_2}{N_1}*\delta\right)+2n_2\log\left( 1+\frac{N_1+N_2}{N_2}*\delta\right)
$$


In [None]:
def mod_KM(N1, N2, n1, n2, upper_bound, margin):
    """
    Modulus of continuity for kaplan_markov risk function. Assume same risk function is used for both strata

    Parameters
    ----------
    N1 : int
        upper bound of ballots cast in stratum 1
    N2 : int
        upper bound of ballots cast in stratum 2
    n1 : int
        ballots sampled from stratum 1
    n2 : int 
        ballots sampled from stratum 2
    upper_bound : double
        assorter upper bound
    margin : double
        reported assorter margin

    Output
    ------
    mod : callable
        kaplan_markov modulus of continuity
    """
    T1 = lambda delta: 2*n1*np.log(1 + margin/(2*upper_bound-margin)*(N1+N2)/N1*delta)
    T2 = lambda delta: 2*n2*np.log(1 + (N1+N2)/N2*delta)

    return lambda delta: T1(delta) + T2(delta)


## kaplan_wald

The formula for the `kaplan_wald` method is: 

$$
\begin{aligned}
p_{KW}\equiv 1 \land& \left( \max_{1\leq j\leq J} \prod_{k=1}^j (1-\gamma)\frac{x_k}{t}+\gamma\right)^{-1} \\
\equiv 1 \land& \min_{1\leq j\leq J} \prod_{k=1}^j \frac{t}{x_k(1-\gamma)+\gamma t}
\end{aligned}
$$

Let $n_s$ be the number of ballots sampled in stratum $s$. The modulus for $s$ can be expressed as 
$$
\begin{aligned}
\chi_s(t_s)=-2 \sum_{k=1}^{n_s} \left[\log(t_s)-\log(x_{k_s}(1-\gamma)+\gamma t_s)\right]
\end{aligned}
$$

The modulus of $\chi_s(t_s)$ is 

$$
\begin{aligned}
\omega(\chi_s(t_s), \delta_s)&=2\sum_{k_s=1}^{n_s}\left[\log(1+\delta_s) + \log(1+\gamma \delta_s) \right] \\
&=2n_s\left[\log(1+\delta_s) + \log(1+\gamma \delta_s) \right]
\end{aligned}
$$

Composing the moduli for the CVR and no CVR strata with the $\chi_s(t)$ modulus, we have 

$$
\begin{aligned}
\omega(\chi_{KW}, \delta)=&2n_1\left[\log\left(1+\frac{v}{2u-v}*\frac{N_1+N_2}{N_1}*\delta\right)+\log\left(1+\gamma*\frac{v}{2u-v}*\frac{N_1+N_2}{N_1}*\delta\right)\right]\\
&+2n_2\left[\log\left(1+\frac{N_1+N_2}{N_2}\delta\right)+\log\left(1+\gamma\frac{N_1+N_2}{N_2}\delta\right)\right]\\
\end{aligned}
$$

In [None]:
def mod_KW(N1, N2, n1, n2, g, upper_bound, margin):
    """
    Modulus of continuity for kaplan_wald risk function. Assume same risk function is used for both strata

    Parameters
    ----------
    N1 : int
        upper bound of ballots cast in stratum 1
    N2 : int
        upper bound of ballots cast in stratum 2
    n1 : int
        ballots sampled from stratum 1
    n2 : int 
        ballots sampled from stratum 2
    g : double in [0, 1)
        padding for assorted values of 0
    upper_bound : double
        assorter upper bound
    margin : double
        reported assorter margin

    Output
    ------
    mod : callable
        kaplan_wald modulus of continuity
    """
    T1 = lambda delta: 2*n1*(np.log(1 + margin/(2*upper_bound-margin)* \
        (N1 + N2)/N1*delta) + np.log(1 + g*margin/ (2*upper_bound-margin)* \
            (N1+N2)/N1*delta))
    T2 = lambda delta: 2*n2*(np.log(1+(N1+N2)/N2*delta) + np.log(1 + g*(N1+N2)/N2*delta))

    return lambda delta: T1(delta) + T2(delta)

## kaplan_kolmogorov

The formula for the `kaplan_kolmogorov` risk function is 
$$
\begin{aligned}
p_{KK}\equiv& 1 \land \left(\max_{1 \leq j \leq J}\prod_{k=1}^j
(X_{k}+\gamma)\frac{1-(k-1)/N}{t+\gamma -\frac{1}{N}\sum_{l=1}^{k-1} 
(X_l+\gamma)}\right)^{-1} \\
\equiv& 1 \land \min_{1 \leq j \leq J}\prod_{k=1}^j
\frac{t+\gamma -\frac{1}{N}\sum_{l=1}^{k-1} 
(X_l+\gamma)}{(X_{k}+\gamma)(1-(k-1)/N)} \\
\end{aligned}
$$

Let $n_s$ be the number of ballots sampled in stratum $s$. The function for stratum $s$ can be expressed as 

$$
\chi_s(t_s)=-2 \sum_{k_s=1}^{n_s} \log\left(\frac{t_s+\gamma -\frac{1}{N_s}\sum_{l_s=1}^{k_s-1} 
(X_{l_s}+\gamma)}{(X_{k_s}+\gamma)(1-(k_s-1)/N_s)}\right)
$$

We can find the modulus for $\chi_s(t_s)$

$$
\begin{aligned}
\omega(\chi_s(t_s), \delta_s)&=2\sum_{k_s=1}^{n_s}\log\left(1+\frac{1}{(X_{k_s}+\gamma)(1-(k_s-1)/N_s)}\delta\right)\\
&=2\sum_{k_s=1}^{n_s}\log\left(1+\frac{N_s}{(X_{k_s}+\gamma)(N_s-(k_s-1))}\delta\right)\\
\end{aligned}
$$

Using the composition property, we can find the modulus for the combined strata.

$$
\begin{aligned}
\omega(\chi_{KK}, \delta)=&2\sum_{k_1=1}^{n_1}\log\left(1+\frac{N_1}{(X_{k_1}+\gamma)(N_1-(k_1-1))}*\frac{v}{2u-v}*\frac{N_1+N_2}{N_1}*\delta\right)\\
&+2\sum_{k_2=1}^{n_2}\log\left(1+\frac{N_2}{(X_{k_2}+\gamma)(N_2-(k_2-1))}*\frac{N_1+N_2}{N_2}*\delta\right)\\
=&2\sum_{k_1=1}^{n_1}\log\left(1+\frac{v(N_1+N_2)}{(2u-v)(X_{k_1}+\gamma)(N_1-(k_1-1))}\delta\right)\\
&+2\sum_{k_2=1}^{n_2}\log\left(1+\frac{N_1+N_2}{(X_{k_2}+\gamma)(N_2-(k_2-1))}\delta\right)
\end{aligned}
$$


In [None]:
def mod_KK(N1, N2, n1, n2, g, x1, x2, upper_bound, margin):
    """
    Modulus of continuity for kaplan_kolmogorov risk function. Assume same risk function is used for both strata

    Parameters
    ----------
    N1 : int
        upper bound of ballots cast in stratum 1
    N2 : int
        upper bound of ballots cast in stratum 2
    n1 : int
        ballots sampled from stratum 1
    n2 : int 
        ballots sampled from stratum 2
    g : double in [0, 1)
        padding for assorted values of 0
    x1 : array-like
        list of assorted ballots in stratum 1
    x2 : array-like
        list of assorted ballots in stratum 2
    upper_bound : double
        assorter upper bound
    margin : double
        reported assorter margin

    Output
    ------
    mod : callable
        kaplan_kolmogorov modulus of continuity
    """
    T1 = lambda delta: 2*sum(np.log(1 + np.divide(margin*(N1 + N2), \
                        (2*upper_bound - margin)*np.multiply(np.array(x1) + g, \
                            N1 - np.array(range(len(x1)))))*delta))
    T2 = lambda delta: 2*sum(np.log(1 + np.divide(N1 + N2, \
        np.multiply(x2 + g, N2 - np.array(range(len(x2)))))*delta))

    return lambda delta: T1(delta) + T2(delta)