# Assertion RLA

**From 18 May 2022, consistent sampling seems to be working.**

**From 1 April 2022, integrating alpha_mart, p_history, and other features.**

**From 24 October 2021, dev version implementing consistent sampling to target smaller contests.**

## Overview of the assertion audit tool

The tool requires as input:

+ audit-specific and contest-specific parameters, such as
    - whether to sample with or without replacement
    - the name of the risk function to use, and any parameters it requires
    - a risk limit for each contest to be audited
    - the social choice function for each contest, including the number of winners
    - candidate identifiers for each contest
    - an upper bound on the number of ballot cards that contain each contest
    - an upper bound on the total number of cards across all contests
    - whether to use card style information to target sampling
+ a ballot manifest (see below)
+ a random seed
+ a file of cast vote records
+ reported results for each contest
+ json files of assertions for IRV contests (one file per IRV contest)
+ human reading of voter intent from the paper cards selected for audit

`use_style` controls whether the sample is drawn from all cards (`use_style == False`) or card style information is used
to target the cards that purport to contain each contest (`use_style == True`).
In the current implementation, card style information is inferred from cast-vote records, with additional 'phantom' CVRs if there could be more cards that contain a contest than is accounted for in the CVRs.
Errors in the card style information are treated conservatively using the  "phantoms-to-evil-zombies" (~2EZ) approach ([Banuelos & Stark, 2012](https://arxiv.org/abs/1207.3413)) so that the risk limit remains valid, even if the CVRs misrepresent
which cards contain which contests.

The two ways of sampling are treated differently. 
If the sample is to be drawn only from cards that--according to the CVR--contain a particular contest, and a sampled card turns out not to
contain that contest, that is considered a discrepancy, dealt with using the ~2EZ approach.
It is assumed that every CVR corresponds to a card in the manifest, but there might
be cards cast in the contest for which there is no corresponding CVR. In that case,
phantom records are created to ensure that the audit is still truly risk-limiting.

Given an independent (i.e., not relying on the voting system) upper bound on the number of cards that contain the contest, if the number of CVRs that contain the contest does not exceed that bound, we can sample from paper purported to contain the contest and use the ~2EZ approach to deal with missing CVRs. This can greatly increase the efficiency of the audit if 
some contests appear on only a small percentage of the cast cards ([Glazer, Spertus, and Stark (2021)](https://dl.acm.org/doi/10.1145/3457907)).

Any sampled phantom card (i.e., a card for which there is no CVR) is treated as if its CVR is a non-vote (which it is), and as if its MVR was least favorable (an "evil zombie" producing the greatest doubt in every assertion, separately). Any sampled card for which there is a CVR is compared to its corresponding CVR. 
If the card turns out not to contain the contest (despite the fact that the CVR says it does), the MVR is treated in the least favorable way for each assertion (i.e., as a zombie rather than as a non-vote).

The tool helps select cards for audit, and reports when the audit has found sufficiently strong evidence to stop.

The tool exports a log of all the audit inputs except the CVR file, but including the auditors' manually determined voter intent from the audited cards.

The pre-10/2021 version used a single sample to audit all contests. 

### Internal workflow

+ Read overall audit information and contest information, and random seed
+ Read assertions for IRV contests and construct assertions for all other contests
+ Read ballot manifest
+ Read cvrs. Every CVR should have a corresponding manifest entry.
+ Prepare ~2EZ:
    - `N_phantoms = max_cards - cards_in_manifest`
    - If `N_phantoms < 0`, complain
    - Else create `N_phantoms` phantom cards
    - For each contest `c`:
        + `N_c` is the input upper bound on the number of cards that contain `c`
        + if `N_c is None`, `N_c = max_cards - non_c_cvrs`, where `non_c_cvrs` is #CVRs that don't contain `c`
        + `C_c` is the number of CVRs that contain the contest
        + if `C_c > N_c`, complain
        + else if `N_c - C_c > N_phantoms`, complain
        + else:
            - Consider contest `c` to be on the first `N_c - C_c` phantom CVRs
            - Consider contest `c` to be on the first `N_c - C_c` phantom ballots
+ Calculate assorter margins for all assorters:
    - If `not use_style`, apply the assorter to all cards and CVRs, including phantoms
    - Else apply the assorter only to cards/cvrs reported to contain the contest, including phantoms that contain the contest
+ Estimate starting sample size for the specified sampling design (w/ or w/o replacement, stratified, etc.), for chosen risk function:
    - User-specified criterion, controlled by parameters. Examples:
        + expected sample size for completion, on the assumption that there are no errors
        + 90th percentile of sample size for completion, on the assumption that errors are not more frequent than specified
    - If `not use_style`, base estimate on sampling from the entire manifest, i.e., smallest assorter margin
    - Else use consistent sampling:
        + Augment each CVR (including phantoms) with a probability of selection, `p`, initially 0
        + For each contest `c`:
            - Find sample size `n_c` that meets the criterion 
            - For each non-phantom CVR that contains the contest, set `p = max(p, n_c/N_c)` 
        + Estimated sample size is the sum of `p` over all non-phantom CVRs
+ Draw the random sample:
    - Use specified design, with consistent sampling for style information
    - Express sample cards in terms of the manifest
    - Export
+ Read manual interpretations of the cards (MVRs)
+ Calculate attained risk for each assorter
    - Use ~2EZ to deal with phantom CVRs or cards; the treatment depends on whether `use_style == True`
+ Report
+ Estimate incremental sample size if any assorter nulls have not been rejected
+ Draw incremental sample; etc.

In [26]:
import math
import json
import warnings
import numpy as np
import pandas as pd
import csv
import copy

from collections import OrderedDict
from IPython.display import display, HTML

from cryptorandom.cryptorandom import SHA256, int_from_hash
from cryptorandom.sample import sample_by_index

from assertion_audit_utils import \
    Assertion, Assorter, CVR, TestNonnegMean, check_audit_parameters, find_margins,\
    find_p_values, initial_sample_size, find_sample_size, new_sample_size, prep_comparison_sample, \
    prep_polling_sample, summarize_status, write_audit_parameters
from dominion_tools import \
    prep_manifest, sample_from_cvrs, sample_from_manifest, write_cards_sampled


# Audit parameters.

* `seed`: the numeric seed for the pseudo-random number generator used to draw sample 
* `replacement`: whether to sample with replacement. If the sample is drawn with replacement, gamma must also be specified.
* `risk_function`: the name of the function to be used to measure risk. Options are `kaplan_markov`,`kaplan_wald`,`kaplan_kolmogorov`,`wald_sprt`,`kaplan_mart`, `alpha_mart`. 
Not all risk functions work with every social choice function. 
* `g`: a parameter to hedge against the possibility of observing a maximum overstatement. Require $g \in [0, 1)$ for `kaplan_kolmogorov`, `kaplan_markov`, and `kaplan_wald`.
* **TO DO** pass an estimator with `alpha_mart`. Perhaps generalize `g` so it can be callable, not just real
* `max_cards`: an upper bound on the number of pieces of paper cast in the contest. This should be derived independently of the voting system. A ballot consists of one or more cards.

----

* `cvr_file`: filename for CVRs (input)
* `manifest_file`: filename for ballot manifest (input)
* `use_style`: Boolean. If True, use card style information (inferred from CVRs) to target samples. If False, sample from all cards, regardless of the contest.
* `assertion_file`: filename of assertions for IRV contests, in RAIRE format (input)
* `sample_file`: filename for sampled card identifiers (output)
* `mvr_file`: filename for manually ascertained votes from sampled cards (input)
* `log_file`: filename for audit log (output)

----

* `error_rate`: expected rate of 1-vote overstatements. Recommended value $\ge$ 0.001 if there are hand-marked ballots. Larger values increase the initial sample size, but make it more likely that the audit will conclude in a single round if the audit finds errors

* `contests`: a dict of contest-specific data 
    + the keys are unique contest identifiers for contests under audit
    + the values are dicts with keys:
        - `risk_limit`: the risk limit for the audit of this contest
        - `cards`: an upper bound on the number of cast cards that contain the contest
        - `choice_function`: `plurality`, `supermajority`, or `IRV`
        - `n_winners`: number of winners for majority contests. (Multi-winner IRV not supported)
        - `share_to_win`: for super-majority contests, the fraction of valid votes required to win, e.g., 2/3. share_to_win*n_winners must be less than 100%)
        - `candidates`: list of names or identifiers of candidates
        - `reported_winners` : list of identifier(s) of candidate(s) reported to have won. Length should equal `n_winners`.
        - `assertion_file`: filename for a set of json descriptors of Assertions (see technical documentation) that collectively imply the reported outcome of the contest is correct. Required for IRV; ignored for other social choice functions
        - other keys and values are added by the software, including `cvrs`, the number of CVRs that contain the contest, and `p`, the sampling fraction expected to be required to confirm the contest

In [2]:
seed = 12345678901234567890  # use, e.g., 20 rolls of a 10-sided die. Seed doesn't have to be numeric
replacement = False

# implied value of eta for margin m is eta=(m+1)/2
risk_function = "alpha_mart"
risk_fn = lambda x, m, N: TestNonnegMean.alpha_mart(x, eta=(m+1)/2 , N=N)

# Other options for the risk function:

# risk_function = "kaplan_mart"
# risk_fn = lambda x, m, N: TestNonnegMean.kaplan_mart(x, N)

# risk_function = "kaplan_kolmogorov"
# risk_fn = lambda x, m, N: TestNonnegMean.kaplan_kolmogorov(x, N, g=g)

g=0.1
max_cards = 293555 # 146662 VBM turnout per SF Elections release 12
        # https://sfelections.sfgov.org/november-5-2019-election-results-summary

In [3]:
cvr_file = './Data/SFDA2019_PrelimReport12VBMJustDASheets.raire'
manifest_file = './Data/N19 ballot manifest with WH location for RLA Upload VBM 11-14.xlsx'
use_style = 'True'  # every card should contain the contest
sample_file = './Data/sample.csv'
# mvr_file = './Data/mvr_prepilot_test.json'
# mvr_file = './Data/mvrTest-PR12-DA-VBM-AllBallots-4TargetedErrors.json'
mvr_file = './Data/mvr.json'
log_file = './Data/log.json'

In [4]:
error_rate = 0.002   # to estimate sample sizes for comparison audits, 2 1-vote overstatements per 1000 ballots

In [5]:
# contests to audit. Edit with details of your contest (eg., Contest 339 is the DA race)
contests = {'339':{'risk_limit':0.05,
                     'cards': 146662,
                     'choice_function':'IRV',
                     'n_winners':1,
                     'candidates':['15','16','17','18'],
                     'reported_winners' : ['15'],
                     'assertion_file' : './Data/SF2019Nov8Assertions.json'
                    }
           }

Example of other social choice functions:

        contests =  {'city_council':{'risk_limit':0.05,
                             'cards': None,
                             'choice_function':'plurality',
                             'n_winners':3,
                             'candidates':['Doug','Emily','Frank','Gail','Harry'],
                             'reported_winners' : ['Doug', 'Emily', 'Frank']
                            },
                        'measure_1':{'risk_limit':0.05,
                             'cards': 65432,
                             'choice_function':'supermajority',
                             'share_to_win':2/3,
                             'n_winners':1,
                             'candidates':['yes','no'],
                             'reported_winners' : ['yes']
                            }                  
                      }
              
If `cards` is `None`, uses `max_cards` as the upper bound for the contest.

In [6]:
# read the assertions for the IRV contest
for c in contests:
    if contests[c]['choice_function'] == 'IRV':
        with open(contests[c]['assertion_file'], 'r') as f:
            contests[c]['assertion_json'] = json.load(f)['audits'][0]['assertions']

In [7]:
# construct the dict of dicts of assertions for each contest
Assertion.make_all_assertions(contests)

True

In [8]:
for c in contests:
    print(f'{c}\n{contests[c]["assertions"]}')

339
{'18 v 17 elim 15 16 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9af0>, '17 v 16 elim 15 18 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee96d0>, '15 v 18 elim 16 17 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9880>, '18 v 16 elim 15 17 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee97c0>, '17 v 16 elim 15 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9f70>, '15 v 17 elim 16 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9100>, '15 v 17 elim 16 18 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee98b0>, '18 v 16 elim 15 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9610>, '15 v 16 elim 17 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9490>, '15 v 16 elim 17 18 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee9310>, '15 v 16 elim 18 45': <assertion_audit_utils.Assertion object at 0x7fd9e0ee93a0>, '15 v 16 elim 45': <assertion_audit_utils.Assertion object at 0x7fd99064b7f

## Read the ballot manifest

In [9]:
# special for Primary/Dominion manifest format
manifest = pd.read_excel(manifest_file)

## Read the CVRs 

In [10]:
# for ballot-level comparison audits

cvr_input = []
with open(cvr_file) as f:
    cvr_reader = csv.reader(f, delimiter=',', quotechar='"')
    for row in cvr_reader:
        cvr_input.append(row)

print("Read {} rows".format(len(cvr_input)))

Read 146664 rows


In [11]:
# Import/convert CVRs
cvr_list = CVR.from_raire(cvr_input)
print("After merging, there are CVRs for {} cards".format(len(cvr_list)))

After merging, there are CVRs for 146662 cards


In [12]:
# turn RAIRE-style card identifiers into Dominion's style by substituting "-" for "_"
for c in cvr_list:
    c.set_id(str(c.id).replace("_","-"))

In [13]:
for i in range(10):
    print(str(cvr_list[i]))

id: 99813-1-1 votes: {'339': {'17': 1}} phantom: False
id: 99813-1-3 votes: {'339': {'16': 1}} phantom: False
id: 99813-1-6 votes: {'339': {'18': 1, '17': 2, '15': 3, '16': 4}} phantom: False
id: 99813-1-8 votes: {'339': {'18': 1}} phantom: False
id: 99813-1-9 votes: {'339': {'': 1}} phantom: False
id: 99813-1-11 votes: {'339': {'16': 1, '17': 2, '15': 3, '18': 4}} phantom: False
id: 99813-1-13 votes: {'339': {'15': 1, '16': 2, '17': 3, '18': 4}} phantom: False
id: 99813-1-16 votes: {'339': {'15': 1}} phantom: False
id: 99813-1-17 votes: {'339': {'15': 1}} phantom: False
id: 99813-1-19 votes: {'339': {'16': 1}} phantom: False


In [14]:
contests

{'339': {'risk_limit': 0.05,
  'cards': 146662,
  'choice_function': 'IRV',
  'n_winners': 1,
  'candidates': ['15', '16', '17', '18'],
  'reported_winners': ['15'],
  'assertion_file': './Data/SF2019Nov8Assertions.json',
  'assertion_json': [{'winner': '18',
    'loser': '17',
    'already_eliminated': ['15', '16', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 18 17]'},
   {'winner': '17',
    'loser': '16',
    'already_eliminated': ['15', '18', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 17 16]'},
   {'winner': '15',
    'loser': '18',
    'already_eliminated': ['16', '17', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 15 18]'},
   {'winner': '18',
    'loser': '16',
    'already_eliminated': ['15', '17', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 18 16]'},


In [15]:
# check whether the manifest accounts for every card
max_cards, np.sum(manifest['Total Ballots'])

(293555, 293555)

In [16]:
# Check that there is a card in the manifest for every card (possibly) cast. If not, add phantoms.
manifest, manifest_cards, phantom_cards = prep_manifest(manifest, max_cards, len(cvr_list))
manifest

Unnamed: 0,Tray #,Tabulator Number,Batch Number,Total Ballots,VBMCart.Cart number,cum_cards
0,1,99808,78,116,3,116
1,1,99808,77,115,3,231
2,1,99808,79,120,3,351
3,1,99808,81,76,3,427
4,1,99808,80,116,3,543
...,...,...,...,...,...,...
5476,3506,99815,86,2,19,292557
5477,3506,99815,84,222,19,292779
5478,3506,99815,83,346,19,293125
5479,3506,99815,82,332,19,293457


## Create CVRs for phantom cards

In [17]:
# For Comparison Audits Only
#----------------------------

# If the sample draws a phantom card, these CVRs will be used in the comparison.
# phantom MVRs should be treated as zeros by the Assorter for every contest

# setting use_style = False to generate phantoms

cvr_list, phantom_vrs = CVR.make_phantoms(max_cards, cvr_list, contests, use_style=use_style, prefix='phantom-1-')
print(f"Created {phantom_vrs} phantom records")

Created 0 phantom records


In [18]:
contests

{'339': {'risk_limit': 0.05,
  'cards': 146662,
  'choice_function': 'IRV',
  'n_winners': 1,
  'candidates': ['15', '16', '17', '18'],
  'reported_winners': ['15'],
  'assertion_file': './Data/SF2019Nov8Assertions.json',
  'assertion_json': [{'winner': '18',
    'loser': '17',
    'already_eliminated': ['15', '16', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 18 17]'},
   {'winner': '17',
    'loser': '16',
    'already_eliminated': ['15', '18', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 17 16]'},
   {'winner': '15',
    'loser': '18',
    'already_eliminated': ['16', '17', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 15 18]'},
   {'winner': '18',
    'loser': '16',
    'already_eliminated': ['15', '17', '45'],
    'assertion_type': 'IRV_ELIMINATION',
    'explanation': 'Rules out outcomes with tail [... 18 16]'},


In [19]:
# find the mean of the assorters for the CVRs and check whether the assertions are met
min_margin = find_margins(contests, cvr_list, use_style=use_style)

print("minimum assorter margin {}".format(min_margin))
for c in contests:
    print("margins in contest {}".format(c))
    for a, m in contests[c]['margins'].items():
        print(a, m)

minimum assorter margin 0.019902906001554532
margins in contest 339
18 v 17 elim 15 16 45 0.045792366120740224
17 v 16 elim 15 18 45 0.019902906001554532
15 v 18 elim 16 17 45 0.028923647570604505
18 v 16 elim 15 17 45 0.0830003681935334
17 v 16 elim 15 45 0.058079120699294995
15 v 17 elim 16 45 0.08064120222007065
15 v 17 elim 16 18 45 0.10951712099930444
18 v 16 elim 15 45 0.14875018750596603
15 v 16 elim 17 45 0.13548158350492967
15 v 16 elim 17 18 45 0.1365247985163165
15 v 16 elim 18 45 0.16666893946625572
15 v 16 elim 45 0.15626406294745743
15 v 45 0.2956457705472446


In [20]:
check_audit_parameters(risk_function, g, error_rate, contests)

In [21]:
print(f'{log_file=}\n{seed=}\n{replacement=}\n{risk_function=}\n{g=}\n{max_cards=}'
      f'\n{len(cvr_list)=}\n{manifest_cards=}\n{phantom_cards=}\n{error_rate=}\n{contests=}')

log_file='./Data/log.json'
seed=12345678901234567890
replacement=False
risk_function='alpha_mart'
g=0.1
max_cards=293555
len(cvr_list)=146662
manifest_cards=293555
phantom_cards=0
error_rate=0.002
contests={'339': {'risk_limit': 0.05, 'cards': 146662, 'choice_function': 'IRV', 'n_winners': 1, 'candidates': ['15', '16', '17', '18'], 'reported_winners': ['15'], 'assertion_file': './Data/SF2019Nov8Assertions.json', 'assertion_json': [{'winner': '18', 'loser': '17', 'already_eliminated': ['15', '16', '45'], 'assertion_type': 'IRV_ELIMINATION', 'explanation': 'Rules out outcomes with tail [... 18 17]'}, {'winner': '17', 'loser': '16', 'already_eliminated': ['15', '18', '45'], 'assertion_type': 'IRV_ELIMINATION', 'explanation': 'Rules out outcomes with tail [... 17 16]'}, {'winner': '15', 'loser': '18', 'already_eliminated': ['16', '17', '45'], 'assertion_type': 'IRV_ELIMINATION', 'explanation': 'Rules out outcomes with tail [... 15 18]'}, {'winner': '18', 'loser': '16', 'already_eliminated'

In [22]:
write_audit_parameters(log_file, seed, replacement, risk_function, g, max_cards, len(cvr_list), \
                      manifest_cards, phantom_cards, error_rate, contests)

## Set up for sampling

## Find initial sample size

In [29]:
# find initial sample size
rf = lambda x, m, N: risk_fn(x,m, N)[0]   # p_history is the second returned value
ss_fn = lambda m, r, N: TestNonnegMean.initial_sample_size(\
                        risk_function=rf, N=N, margin=m, polling=False, \
                        error_rate=error_rate, alpha=r, reps=10) # change for comparison audits

# debugging
initial_sample_size(risk_function=rf, N=len(cvr_list), margin=0.1, \
                            polling=False, error_rate=0.001, alpha=0.05, \
                            t=1/2, u=1, reps=None,\
                            bias_up=True, quantile=0.5, seed=1234567890)

# 
sample_size, sample_sizes = find_sample_size(contests, sample_size_function=ss_fn, use_style=use_style, \
                               cvr_list=cvr_list)  
print(f'{sample_size=}\n{sample_sizes=}')

# override for testing
sample_size = 20000
print(sample_size)


NameError: name 'initial_sample_size' is not defined

## Draw the first sample

In [24]:
# draw the initial sample
sample_size = 146778
prng = SHA256(seed)
sample = sample_by_index(max_cards, sample_size, prng=prng) # 1-indexed
n_phantom_sample = np.sum(sample > manifest_cards)
print(f'The sample includes {n_phantom_sample} phantom cards.')

The sample includes 0 phantom cards.


In [25]:
len(cvr_list), manifest_cards, max_cards

(146662, 293555, 293555)

In [26]:
# for comparison audit
# cards_to_retrieve, sample_order, cvr_sample, mvr_phantoms_sample = sample_from_cvrs(cvr_list, manifest, sample)

# for polling audit
cards_to_retrieve, sample_order, mvr_phantoms_sample = sample_from_manifest(manifest, sample)

In [None]:
# write the sample
write_cards_sampled(sample_file, cards_to_retrieve, print_phantoms=False)

## Read the audited sample data

In [None]:
# for real data
with open(mvr_file) as f:
    mvr_json = json.load(f)

mvr_sample = CVR.from_dict(mvr_json['ballots'])

In [None]:
# Simulate ballot-polling data for testing/debugging, using the San Francisco IRV contest as an example
mvr_sample = []
for c in cvr_list:
    if c in sample_order.keys():
        mvr_sample.append(c)

for s in set(sample_order.keys()) - set([c.id for c in mvr_sample]):
    inx = np.random.randint(len(cvr_list))
    mvr_sample.append(CVR(id = s, votes = cvr_list[inx].votes, phantom=False))

print(f'simulated sample contains {len(mvr_sample)} mvrs')

## Find measured risks for all assertions

In [None]:
# prep_comparison_sample(mvr_sample, cvr_sample, sample_order)  # for comparison audit

prep_polling_sample(mvr_sample, sample_order)  # for polling audit
p_max = find_p_values(contests, mvr_sample, None, use_style, \
                      risk_function=risk_fn)
print("maximum assertion p-value {}".format(p_max))
done = summarize_status(contests)

In [32]:
# Log the status of the audit 
write_audit_parameters(log_file, seed, replacement, risk_function, g, max_cards, len(cvr_list), \
                       manifest_cards, phantom_cards, error_rate, contests)

# How many more cards should be audited?

Estimate how many more cards will need to be audited to confirm any remaining contests. The enlarged sample size is based on:

* cards already sampled
* the assumption that we will continue to see errors at the same rate observed in the sample

In [33]:
# Estimate sample size required to confirm the outcome, if errors continue
# at the same rate as already observed.

new_size, sams = new_sample_size(contests, mvr_sample,\
                                 cvr_sample, manifest_type,\
                                 risk_fn, quantile=0.8, reps=100)
new_size

NameError: name 'cvr_sample' is not defined

In [None]:
# augment the sample
# reset the seed
prng = SHA256(seed)
old_sample = sample
sample = sample_by_index(max_cards, new_size, prng=prng)
incremental_sample = np.sort(list(set(sample) - set(old_sample)))
n_phantom_sample = np.sum([cvr_list[i].phantom for i in incremental_sample])
print("The incremental sample includes {} phantom cards.".format(n_phantom_sample))

In [None]:
cvr_sample_lookup_new, cvr_sample_new, mvr_phantoms_sample_new = \
                sample_from_cvrs(cvr_list, manifest, incremental_sample)
write_cards_sampled(sample_file, cvr_sample_lookup_new, print_phantoms=False)

In [None]:
# mvr_json should contain the complete set of mvrs, including those in previous rounds

with open(mvr_file) as f:
    mvr_json = json.load(f)

mvr_sample = CVR.from_dict(mvr_json['ballots']) 

In [None]:
# compile entire sample
cvr_sample_lookup, cvr_sample, mvr_phantoms_sample = sample_from_cvrs(cvr_list, manifest, sample)

In [None]:
# add MVRs for phantoms
mvr_sample = mvr_sample + mvr_phantoms_sample

## Find measured risks for all assertions

In [None]:
prep_sample(mvr_sample, cvr_sample)
p_max = find_p_values(contests, mvr_sample, cvr_sample, manifest_type, \
                      risk_function= risk_fn)
print("maximum assertion p-value {}".format(p_max))
done = summarize_status(contests)

In [None]:
# Log the status of the audit 
write_audit_parameters(log_file, seed, replacement, risk_function, g, max_cards, len(cvr_list), \
                       manifest_cards, phantom_cards, error_rate, contests)