# Batch Testing Tutorial

This tutorial has two roles:

1. Be familiar with our code.
2. Reproduce the results



In [1]:
import numpy as np
import pandas as pd
from numba import jit
import time
import numba
import fast_btk as fbtk
import collections
from sklearn.metrics import precision_score, recall_score
%load_ext autoreload
%autoreload 2

# Data generation

The function `data_gen` can generate a population with a certain size and infection rate.

In [6]:
np.random.seed(0)
fbtk.data_gen(size = 10, p = 0.1)

array([[0, 0],
       [1, 0],
       [2, 0],
       [3, 0],
       [4, 0],
       [5, 0],
       [6, 0],
       [7, 0],
       [8, 1],
       [9, 0]])

# Conventional Test

`conventional_test` gives the test results to a subject array given the probability of a type II error, the probability of a type I error, the number of repetition, and setting of sequence testing or not.


In [3]:
subject_array = fbtk.data_gen(10, 0.1)
test_result, consum = fbtk.conventional_test(subject_array, typeII_error = 0.15,
typeI_error=0.01, repeat= 1)
print(f'accuracy: {np.mean(subject_array[:,1] == test_result[:,1])}')
print(f'test consumption {consum}')

Compilation is falling back to object mode WITH looplifting enabled because Function "conventional_test" failed type inference due to: [1m[1m[1mNo implementation of function Function(<built-in function zeros>) found for signature:
 
 >>> zeros(UniTuple(int64 x 2), dtype=Function(<class 'int'>))
 
There are 2 candidate implementations:
[1m      - Of which 2 did not match due to:
      Overload of function 'zeros': File: numba/core/typing/npydecl.py: Line 507.
        With argument(s): '(UniTuple(int64 x 2), dtype=Function(<class 'int'>))':[0m
[1m       No match.[0m
[0m
[0m[1mDuring: resolving callee type: Function(<built-in function zeros>)[0m
[0m[1mDuring: typing of call at /home/hajiang/Desktop/batch_testing/fast_btk.py (40)
[0m
[1m
File "fast_btk.py", line 40:[0m
[1mdef conventional_test(subject_array, typeII_error, typeI_error, repeat = 1,
    <source elided>
        
[1m        test_result = np.zeros(subject_array.shape, dtype = int)
[0m        [1m^[0m[0m
[0m

## Multi-step Batch Testing

`seq_test` gives the test results to a subject array, the total number of 
test-kit consumption, and the number of individual tests given the subject array,
stopping rule, batch size, probability of a type II error, probability of a Type I error, and the number of repetition, probability threshold, and 
setting of sequence testing or not.

The following code will generate a population with size 100000 and the infection rate of 0.01. The setting of this multi-step batch testing is up to 3 sequential individual tests for 3 batch positives.

In [4]:
subject_array = fbtk.data_gen(100000, 0.01)
batch_size = fbtk.one_batch_test_int_solver(0.01, 0.15, 0.01)
test_result, consum, ind_consum = fbtk.seq_test(subject_array, batch_size = batch_size,stop_rule = 3,p = 0.01, typeII_error = 0.15, typeI_error=0.01, repeat= 3, seq = True)
print(f'accuracy: {np.mean(subject_array[:,1] == test_result[:,1])}')
print(f'test consumption {consum}')

accuracy: 0.99891
test consumption 28203.0


# Reproduce Results

The following code is to produce results on Table 7 and Table 8. We will go through table 7_(a) and show the output. 

In [6]:
# table 7 (a)
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    length = len(temp_data)
    acc = np.zeros(length)
    sens = np.zeros(length)
    spec = np.zeros(length)
    ppv = np.zeros(length)
    npv = np.zeros(length)
    test_consum = np.zeros(length)
    for i in range(length):
        pred, consum = fbtk.conventional_test(temp_data[i], typeII_error= 0.15, typeI_error=0.01)
        acc[i] = np.mean(pred[:,1] == temp_data[i][:, 1])
        sens[i] = recall_score(temp_data[i][:, 1], pred[:, 1])
        spec[i] = fbtk.specificity_score(temp_data[i][:, 1], pred[:, 1])
        ppv[i] = precision_score(temp_data[i][:, 1], pred[:, 1])
        npv[i] = fbtk.npv_score(temp_data[i][:, 1], pred[:, 1])
        test_consum[i] = consum
    result = {
        'acc': acc,
        'sens': sens,
        'spec': spec,
        'PPV': ppv,
        'NPV': npv,
        'test_consum': test_consum
    
    }
    result = pd.DataFrame(result)
    result_mean = result.mean()
    result_std = result.std()
    temp_df = [prob, result_mean['acc'], result_std['acc'], result_mean['sens'], result_std['sens'],
    result_mean['spec'], result_std['spec'], result_mean['PPV'], result_std['PPV'], result_mean['NPV'],
    result_std['NPV'], result_mean['test_consum'], result_std['test_consum']]
    temp_df = pd.DataFrame(temp_df)
    temp_df = temp_df.T
    temp_df.columns = df.columns
    df = pd.concat([df, temp_df])


  
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 153.8344419002533 s


In [8]:
# Show the result
df

Unnamed: 0,Infection_rate,Acc,Acc_SD,Sens,Sens_SD,Spec,Spec_SD,PPV,PPV_SD,NPV,NPV_SD,Test_consum,Test_consum_SD
0,0.001,0.989861,0.000327,0.852501,0.035635,0.990001,0.000326,0.079851,0.008803,0.999849,3.9e-05,100000.0,0.0
0,0.01,0.988598,0.000316,0.849064,0.01049,0.990009,0.0003,0.462084,0.010728,0.998461,0.000114,100000.0,0.0
0,0.03,0.985887,0.000393,0.849974,0.006703,0.990073,0.000319,0.725062,0.007459,0.995355,0.000219,100000.0,0.0
0,0.05,0.983061,0.000396,0.85093,0.005089,0.990012,0.000326,0.817558,0.005445,0.992142,0.0003,100000.0,0.0
0,0.1,0.975959,0.000442,0.849704,0.003048,0.98999,0.000335,0.904148,0.003147,0.983409,0.000367,100000.0,0.0


For table 7 (b)

In [9]:
# table 7 (b)
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [1]:
            for k in [1]:
                
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': 10,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 117.52977275848389 s


In [10]:
df.to_csv('table7_b.csv')

For table 7 (c)

In [4]:
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(10000, prob) for _ in range(10)]
    for i in [True]:
        for j in [1]:
            for k in [3]:
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01, batch_limit= 32)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i, 'batch_limit': 32}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 1.3733797073364258 s


In [12]:
# table 7 d
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sq_Repeat', 'Ind_Repeat', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [3]: # sq_repeat
        for j in [3]: # ind_repeat
            kwargs = {
                'side_length': 12,
                'typeII_error': 0.15,
                'typeI_error': 0.01,
                'sq_repeat': i,
                'ind_repeat': j
            }
            test_1 = fbtk.test_result(temp_data, fbtk.matrix_test, **kwargs)
            temp_mean = test_1.mean()
            temp_std = test_1.std()
            temp = [prob, kwargs['sq_repeat'], kwargs['ind_repeat'], temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
            temp_df = pd.DataFrame(temp)
            temp_df = temp_df.T
            temp_df.columns = ['Infection_rate', 'Sq_Repeat', 'Ind_Repeat', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
            df = pd.concat([df, temp_df])

            
                
               
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 152.28528094291687 s


In [5]:
df

Unnamed: 0,Infection_rate,Sequential_test,Stop_rule,Repeat,Prob_threshold,Acc,Acc_SD,Sens,Sens_SD,Spec,...,PPV,PPV_SD,NPV,NPV_SD,Test_consum,Test_consum_SD,Ind_consum,Ind_consum_SD,Batch_consum,Batch_consum_SD
0,0.001,True,1,3,0.3,0.9989,0.000356,0.836949,0.096612,0.999089,...,0.515986,0.112481,0.99981,0.000129,1349.6,223.522408,1036.6,223.522408,313.0,0.0
0,0.01,True,1,3,0.3,0.99537,0.00064,0.84868,0.024739,0.996867,...,0.738856,0.025972,0.998452,0.000171,3956.8,269.785017,3122.8,269.785017,834.0,0.0
0,0.03,True,1,3,0.3,0.99097,0.000617,0.846138,0.015518,0.995473,...,0.853121,0.011845,0.995217,0.000509,6100.0,305.3377,4671.0,305.3377,1429.0,0.0
0,0.05,True,1,3,0.3,0.98741,0.00105,0.850537,0.016774,0.994438,...,0.887168,0.013225,0.992341,0.000852,7695.5,305.011566,6028.5,305.011566,1667.0,0.0
0,0.1,True,1,3,0.3,0.97863,0.000851,0.849407,0.007765,0.993137,...,0.932939,0.006359,0.983261,0.000999,9993.7,245.049224,7493.7,245.049224,2500.0,0.0


table 7 (E)

In [13]:
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [1]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 161.998272895813 s


In [210]:
df.to_csv('table7_e.csv')

table 7 (f)

In [22]:
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [3]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01, batch_limit= 32)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i, 'batch_limit': 32}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 170.3076992034912 s


In [33]:
df

Unnamed: 0,Infection_rate,Sequential_test,Stop_rule,Repeat,Prob_threshold,Acc,Acc_SD,Sens,Sens_SD,Spec,...,PPV,PPV_SD,NPV,NPV_SD,Test_consum,Test_consum_SD,Ind_consum,Ind_consum_SD,Batch_consum,Batch_consum_SD
0,0.001,True,3,3,0.3,0.99984,3.9e-05,0.968935,0.016514,0.999871,...,0.885238,0.028985,0.999969,1.6e-05,7053.31,259.415571,1349.47,157.9937,5703.84,110.743467
0,0.01,True,3,3,0.3,0.998984,0.000106,0.970045,0.005699,0.999274,...,0.930681,0.008458,0.999699,5.8e-05,27809.02,404.188749,8372.65,260.833721,19436.37,161.018642
0,0.03,True,3,3,0.3,0.996851,0.000184,0.972962,0.00331,0.99759,...,0.925891,0.004399,0.999163,0.0001,58428.4,609.246098,26619.39,501.235723,31809.01,154.224649
0,0.05,True,3,3,0.3,0.995771,0.000207,0.972293,0.002484,0.997004,...,0.94457,0.002841,0.998543,0.000133,75908.21,599.958051,34377.77,440.692702,41530.44,188.8413
0,0.1,True,3,3,0.3,0.992637,0.000283,0.972528,0.001737,0.994869,...,0.954653,0.001836,0.996943,0.000194,116341.38,718.261281,57730.23,605.723957,58611.15,164.584778


In [23]:
df.to_csv('table7_f_limit_32.csv')

In [24]:
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [3]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01, batch_limit= 64)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i, 'batch_limit': 64}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
running time: 167.65039229393005 s


In [25]:
df.to_csv('table7_f_limit_64.csv')

In [7]:
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [2]: # stop_rule
            for k in [2]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01, batch_limit= 32)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i, 'batch_limit': 32}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 144.58256888389587 s


In [8]:
df.to_csv('stop_rule_2_limit_32.csv')

In [15]:
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [2]: # stop_rule
            for k in [2]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01, batch_limit= 64)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i, 'batch_limit': 64} # change batch_limt
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')
df.to_csv('stop_rule_2_limit_64.csv')

running time: 144.80537056922913 s


In [15]:
# appendix A
# table 7 (a)
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    length = len(temp_data)
    acc = np.zeros(length)
    sens = np.zeros(length)
    spec = np.zeros(length)
    ppv = np.zeros(length)
    npv = np.zeros(length)
    test_consum = np.zeros(length)
    for i in range(length):
        pred, consum = fbtk.conventional_test(temp_data[i], typeII_error= 0.25, typeI_error=0.03)
        acc[i] = np.mean(pred[:,1] == temp_data[i][:, 1])
        sens[i] = recall_score(temp_data[i][:, 1], pred[:, 1])
        spec[i] = fbtk.specificity_score(temp_data[i][:, 1], pred[:, 1])
        ppv[i] = precision_score(temp_data[i][:, 1], pred[:, 1])
        npv[i] = fbtk.npv_score(temp_data[i][:, 1], pred[:, 1])
        test_consum[i] = consum
    result = {
        'acc': acc,
        'sens': sens,
        'spec': spec,
        'PPV': ppv,
        'NPV': npv,
        'test_consum': test_consum
    
    }
    result = pd.DataFrame(result)
    result_mean = result.mean()
    result_std = result.std()
    temp_df = [prob, result_mean['acc'], result_std['acc'], result_mean['sens'], result_std['sens'],
    result_mean['spec'], result_std['spec'], result_mean['PPV'], result_std['PPV'], result_mean['NPV'],
    result_std['NPV'], result_mean['test_consum'], result_std['test_consum']]
    temp_df = pd.DataFrame(temp_df)
    temp_df = temp_df.T
    temp_df.columns = df.columns
    df = pd.concat([df, temp_df])


  
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 152.71437311172485 s


In [238]:
df.to_csv('appendix_a.csv')

In [16]:
# Appendix (b)
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [1]:
            for k in [1]:
                
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': 10,
                'typeII_error': 0.25, 'typeI_error': 0.03, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 117.7569146156311 s


In [242]:
df.to_csv('appendix_b.csv')

In [17]:
# Appendix (c)
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [1]:
            for k in [3]:
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.25, 0.03)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.25, 'typeI_error': 0.03, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 131.44559955596924 s


In [247]:
df.to_csv('appendix_c.csv')

In [18]:
# Appendix (d)
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sq_Repeat', 'Ind_Repeat', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [3]: # sq_repeat
        for j in [3]: # ind_repeat
            kwargs = {
                'side_length': 12,
                'typeII_error': 0.25,
                'typeI_error': 0.03,
                'sq_repeat': i,
                'ind_repeat': j
            }
            test_1 = fbtk.test_result(temp_data, fbtk.matrix_test, **kwargs)
            temp_mean = test_1.mean()
            temp_std = test_1.std()
            temp = [prob, kwargs['sq_repeat'], kwargs['ind_repeat'], temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
            temp_df = pd.DataFrame(temp)
            temp_df = temp_df.T
            temp_df.columns = ['Infection_rate', 'Sq_Repeat', 'Ind_Repeat', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
            df = pd.concat([df, temp_df])

            
                
               
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 154.8913700580597 s


In [250]:
df.to_csv('appendix_d.csv')

In [19]:
# Appendix e
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [1]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.25, 0.03)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.25, 'typeI_error': 0.03, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

running time: 165.98010206222534 s


In [254]:
# Appendix f
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [3]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.25, 0.03)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.25, 'typeI_error': 0.03, 'repeat': k,
                'prob_threshold': 0.3, 'seq': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['seq'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Sequential_test', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

time cost: 177.40325736999512 s


In [4]:
string = ''
str.count(string, '+')

0

In [5]:
def helper_fun(n, string = ''):
    if str.count(string, '+') >= n or str.count(string, '-') >= n:
        return string
    else:
        next_string = string + '+'
        return helper_fun(n, next_string )

In [2]:
def name_fun(n):
    """
    input: stopping rule
    output: finish nodes
    """
    output = []
    temp = ['']
    for i in range(2*n-1):
        temp_cur = []
        for j in temp:
            candidate_pos = j + '+'
            candidate_neg = j + '-'
            if str.count(candidate_pos, '+') >= n:
                output.append(candidate_pos)
            else:
                temp_cur.append(candidate_pos)

            if str.count(candidate_neg, '-') >= n:
                output.append(candidate_neg)
            else:
                temp_cur.append(candidate_neg)

        temp = temp_cur

        neg_symbol = [x for x in output if str.count(x, '-') == n]
        pos_symbol = [x for x in output if str.count(x, '+') == n]

    return output, neg_symbol, pos_symbol





In [3]:
a, b, c = name_fun(3)

In [28]:
(1-0.15) ** 2

0.7224999999999999

In [21]:
from scipy.special import binom

In [27]:
sum([(1-0.15) ** 3 * (1-0.15 ** 3) * binom(2+i, i) * 0.15 ** i for i in range(0, 3)])

0.9701029400781247

In [19]:
(1-0.15) ** 2 * (1 - 0.15 ** 3)

0.7200615624999999

In [13]:
def sensitivity(n, beta):
    _, _, pos_node = name_fun(n)
    res = [(1-beta) * (1-beta) ** i.count('+') * beta ** (i.count('-')) for i in pos_node]
    return sum(res)



In [12]:
n = 3
output = []
temp = ['']
for i in range(2*n-1):
    temp_cur = []
    for j in temp:
        candidate_pos = j + '+'
        candidate_neg = j + '-'
        if str.count(candidate_pos, '+') >= n:
            output.append(candidate_pos)
        else:
            temp_cur.append(candidate_pos)

        if str.count(candidate_neg, '-') >= n:
            output.append(candidate_neg)
        else:
            temp_cur.append(candidate_neg)

    temp = temp_cur

In [178]:
temp = fbtk.data_gen(100000, 0.1)

In [191]:
a, b, c, d, e = fbtk.seq_test_with_node(temp,stop_rule = 3,p = 0.1, batch_size = 32, typeII_error = 0.15, typeI_error = 0.01, repeat = 1, 
prob_threshold = 0.3, seq = True, batch_limit = 32)

In [165]:
10 > np.inf

False

In [190]:
a

array([[    0,     0],
       [    1,     0],
       [    2,     0],
       ...,
       [99997,     0],
       [99998,     0],
       [99999,     0]])

In [30]:
def node_summary(data, seq_test, stopping_rule, **kwargs):
    a, _, _ = fbtk.name_fun(stopping_rule)
    a.extend(['other'])
    df = pd.DataFrame([], columns = a)
    b = ['stage_' + str(i) for i in range(1, 2*stopping_rule)]
    df_b = pd.DataFrame([], columns = b)
    if not isinstance(data, list):
        _, _, _, node, batch_list = seq_test(data, **kwargs)
        temp = np.zeros(len(a))
        node_count = collections.Counter(node)
        for i in node_count:
            if i in a:
                idx = a.index(i)
                temp[idx] = node_count[i]
            else:
                temp[-1] += node_count[i]
        df.loc[len(df)] = temp
        df_b.loc[len(df)] = batch_list

        return df, df_b

    else:
        for j in range(len(data)):
            _, _, _, node, batch_list = seq_test(data[j], **kwargs)
            temp = np.zeros(len(a))
            node_count = collections.Counter(node)
            for i in node_count:
                if i in a:
                    idx = a.index(i)
                    temp[idx] = node_count[i]
                else:
                    temp[-1] += node_count[i]
            df.loc[len(df)] = temp
            df_b.loc[len(df)] = batch_list


        return df, df_b




In [4]:
s = 0.1
k = 10000000
time_start = time.time()
np.random.seed(0)
data = fbtk.data_gen(100000, s)
a,b, c, d, e, = fbtk.seq_test_with_node(data, 3, 0.1,32, 0.15, 0.01, 3, 0.3, batch_limit=k)
a1, b1, c1 = fbtk.seq_test(data, 3, 0.1, 32, 0.15, 0.01, 3, 0.3, seq = True, batch_limit = k)

In [6]:
e

[3125.0, 27218.0, 45978.0, 53976.0, 56385.0]

In [5]:
np.mean(a[:,1] == data[:,1])
#print(b, b1, c, c1)

0.99167

In [20]:
time_start = time.time()
np.random.seed(0)
node_name,_,_ = fbtk.name_fun(3)
result = pd.DataFrame([], columns=node_name.extend(['p', 'stop_rule', 'upper limit']))
col_b = ['stage_' + str(i) for i in range(1, 2*3)]
col_b.extend(['p', 'stop_rule', 'upper limit'])

result_b = pd.DataFrame([], columns = col_b)
for s in [0.001, 0.01, 0.03, 0.05, 0.1]:
    temp_data = [fbtk.data_gen(100000, s) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [32, 64, 1000000]: # batch_upperlimit
                batch_size = fbtk.one_batch_test_int_solver(s, 0.15, 0.01, batch_limit= k)
                kwargs = {'stop_rule': j, 'p': s, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': 3,
                'prob_threshold': 0.3, 'seq': i, 'batch_limit': k}
                test, test_b = node_summary(temp_data, fbtk.seq_test_with_node,j,**kwargs)
                test['p'] = s
                test['stop_rule'] = 3
                test['upper limit'] = k
                test_b['p'] = s
                test_b['stop_rule'] = 3
                test_b['p'] = s
                test_b['upper limit'] = k
                result = pd.concat([result, test])
                result_b = pd.concat([result_b, test_b])

time_end = time.time()
print(time_end - time_start)

  improvement from the last ten iterations.
  improvement from the last five Jacobian evaluations.
494.0853660106659


In [7]:
time_start = time.time()
np.random.seed(0)
node_name,_,_ = fbtk.name_fun(3)
result = pd.DataFrame([], columns=node_name.extend(['p', 'stop_rule', 'upper limit']))
col_b = ['stage_' + str(i) for i in range(1, 2*3)]
col_b.extend(['p', 'stop_rule', 'upper limit'])

result_b = pd.DataFrame([], columns = col_b)
for s in [0.001, 0.01, 0.03]:
    temp_data = [fbtk.data_gen(100000, s) for _ in range(1000)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [1000000]: # batch_upperlimit
                batch_size = fbtk.one_batch_test_int_solver(s, 0.15, 0.01, batch_limit= k)
                kwargs = {'stop_rule': j, 'p': s, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': 3,
                'prob_threshold': 1, 'seq': i, 'batch_limit': k}
                test, test_b = node_summary(temp_data, fbtk.seq_test_with_node,j,**kwargs)
                test['p'] = s
                test['stop_rule'] = 3
                test['upper limit'] = k
                test_b['p'] = s
                test_b['stop_rule'] = 3
                test_b['p'] = s
                test_b['upper limit'] = k
                result = pd.concat([result, test])
                result_b = pd.concat([result_b, test_b])

time_end = time.time()
print(time_end - time_start)

  improvement from the last ten iterations.
612.2456967830658


In [9]:
result.to_csv('more_sim.csv')

In [17]:
result_b

Unnamed: 0,stage_1,stage_2,stage_3,stage_4,stage_5,p,stop_rule,upper limit
1,3125.0,6616.0,10040.0,10339.0,10417.0,0.001,3,32
2,3125.0,6586.0,9980.0,10257.0,10324.0,0.001,3,32
3,3125.0,6613.0,10035.0,10322.0,10388.0,0.001,3,32
4,3125.0,6574.0,9941.0,10170.0,10222.0,0.001,3,32
5,3125.0,6580.0,10007.0,10300.0,10368.0,0.001,3,32
...,...,...,...,...,...,...,...,...
6,25000.0,43650.0,52128.0,56847.0,58331.0,0.100,3,1000000
7,25000.0,43747.0,52300.0,57108.0,58674.0,0.100,3,1000000
8,25000.0,43822.0,52400.0,57318.0,58933.0,0.100,3,1000000
9,25000.0,43777.0,52308.0,57158.0,58789.0,0.100,3,1000000


In [21]:
result.to_csv('node_test.csv')
result_b.to_csv('batch_cum.csv')

In [6]:
time_start = time.time()
np.random.seed(0)
node_name,_,_ = fbtk.name_fun(3)
result = pd.DataFrame([], columns=node_name.extend(['p', 'stop_rule', 'upper limit']))
for s in [0.001, 0.01, 0.03, 0.05, 0.1]:
    temp_data = [fbtk.data_gen(100000, s) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [32, 64, 1000000]: # batch_upperlimit
                batch_size = fbtk.one_batch_test_int_solver(s, 0.15, 0.01, batch_limit= k)
                kwargs = {'stop_rule': j, 'p': s, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': 3,
                'prob_threshold': 1, 'seq': i, 'batch_limit': k}
                test = node_summary(temp_data, fbtk.seq_test_with_node,j,**kwargs)
                test['p'] = s
                test['stop_rule'] = 3
                test['upper limit'] = k
                result = pd.concat([result, test])

time_end = time.time()
print(time_end - time_start)

TypeError: 'tuple' object does not support item assignment

In [37]:
result.to_csv('node_test_without_threshold.csv')

In [224]:
temp = np.zeros(21)

for item in test:
    if item in a:
        idx = a.index(item)
        temp[idx] = test[item]
    else:
        temp[-1] += test[item]
df.loc[len(df)] = temp

In [225]:
df

Unnamed: 0,+++,---,++-+,+-++,+---,-+++,-+--,--+-,++--+,++---,...,+-+--,+--++,+--+-,-++-+,-++--,-+-++,-+-+-,--+++,--++-,other
0,59840.0,3392.0,8416.0,8992.0,416.0,8480.0,128.0,480.0,1152.0,288.0,...,192.0,1312.0,192.0,1376.0,160.0,1472.0,320.0,1856.0,288.0,0.0
1,59840.0,3392.0,8416.0,8992.0,416.0,8480.0,128.0,480.0,1152.0,288.0,...,192.0,1312.0,192.0,1376.0,160.0,1472.0,320.0,1856.0,288.0,0.0


In [223]:
df

Unnamed: 0,+++,---,++-+,+-++,+---,-+++,-+--,--+-,++--+,++---,...,+-+--,+--++,+--+-,-++-+,-++--,-+-++,-+-+-,--+++,--++-,other
0,59840.0,3392.0,8416.0,8992.0,416.0,8480.0,128.0,480.0,1152.0,288.0,...,192.0,1312.0,192.0,1376.0,160.0,1472.0,320.0,1856.0,288.0,0.0


In [154]:
pd.Series(collections.Counter(node))

++-+      8416
+++      59840
--+++     1856
+-++      8992
-+-+-      320
---       3392
++--+     1152
-+++      8480
+---       416
+--+-      192
-++-+     1376
--++-      288
-+-++     1472
+-+-+     1248
+--++     1312
-+--       128
+-+--      192
--+-       480
-++--      160
++---      288
dtype: int64

In [135]:
import collections

In [None]:
collections.Counter(node)

In [None]:
time_start = time.time()
np.random.seed(0)
node_name,_,_ = fbtk.name_fun(3)
result = pd.DataFrame([], columns=node_name.extend(['p', 'stop_rule', 'upper limit']))
for s in [0.001, 0.01, 0.03, 0.05, 0.1]:
    temp_data = [fbtk.data_gen(100000, s) for _ in range(100)]
    for i in [True]:
        for j in [3]: # stop_rule
            for k in [32, 64, 1000000]: # batch_upperlimit
                batch_size = fbtk.one_batch_test_int_solver(s, 0.15, 0.01, batch_limit= k)
                kwargs = {'stop_rule': j, 'p': s, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': 3,
                'prob_threshold': 1, 'seq': i, 'batch_limit': k}
                test = node_summary(temp_data, fbtk.seq_test_with_node,j,**kwargs)
                test['p'] = s
                test['stop_rule'] = 3
                test['upper limit'] = k
                result = pd.concat([result, test])

time_end = time.time()
print(time_end - time_start)

In [2]:
# Appendix f
time_start = time.time()
np.random.seed(0)
df = pd.DataFrame([], columns = ['Infection_rate', 'Batch_upper_limit', 'Stop_rule', 'Repeat', 'Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD'])
for prob in [0.001, 0.01, 0.03, 0.05, 0.10]:
    temp_data = [fbtk.data_gen(100000, prob) for _ in range(100)]
    for i in [32, 64, 100000]:
        for j in [3]: # stop_rule
            for k in [3]: # repeat
                batch_size = fbtk.one_batch_test_int_solver(prob, 0.15, 0.01, batch_limit = i)
                kwargs = {'stop_rule': j, 'p': prob, 'batch_size': batch_size,
                'typeII_error': 0.15, 'typeI_error': 0.01, 'repeat': k,
                'prob_threshold': 0.3, 'seq': True, 'batch_limit': i}
                test_1 = fbtk.test_result(temp_data, fbtk.seq_test, **kwargs)
                temp_mean = test_1.mean()
                temp_std = test_1.std()
                temp = [kwargs['p'], kwargs['batch_limit'], kwargs['stop_rule'], kwargs['repeat'], kwargs['prob_threshold'],temp_mean['acc'], temp_std['acc'], temp_mean['sens'], temp_std['sens'], temp_mean['spec'], temp_std['spec'], temp_mean['PPV'], temp_std['PPV'], temp_mean['NPV'], temp_std['NPV'], temp_mean['test_consum'], temp_std['test_consum'], temp_mean['ind_consum'], temp_std['ind_consum'], temp_mean['batch_consum'], temp_std['batch_consum']]
                temp_df = pd.DataFrame(temp)
                temp_df = temp_df.T
                temp_df.columns = ['Infection_rate', 'Batch_upper_limit', 'Stop_rule', 'Repeat','Prob_threshold', 'Acc', 'Acc_SD', 'Sens', 'Sens_SD', 'Spec','Spec_SD','PPV', 'PPV_SD',
    'NPV', 'NPV_SD', 'Test_consum', 'Test_consum_SD', 'Ind_consum', 'Ind_consum_SD', 'Batch_consum','Batch_consum_SD']
                df = pd.concat([df, temp_df])
            
time_end = time.time()
print('running time:', time_end - time_start, 's')

Compilation is falling back to object mode WITH looplifting enabled because Function "data_gen" failed type inference due to: [1m[1m[1mNo implementation of function Function(<built-in function zeros>) found for signature:
 
 >>> zeros(Tuple(int64, Literal[int](2)), dtype=Function(<class 'int'>))
 
There are 2 candidate implementations:
[1m  - Of which 2 did not match due to:
  Overload of function 'zeros': File: numba/core/typing/npydecl.py: Line 507.
    With argument(s): '(UniTuple(int64 x 2), dtype=Function(<class 'int'>))':[0m
[1m   No match.[0m
[0m
[0m[1mDuring: resolving callee type: Function(<built-in function zeros>)[0m
[0m[1mDuring: typing of call at /home/hajiang/Desktop/batch_testing/fast_btk.py (396)
[0m
[1m
File "fast_btk.py", line 396:[0m
[1mdef data_gen(size, p):
    <source elided>
    random_table = np.random.binomial(size = size, p = p, n = 1)
[1m    test_array = np.zeros((size, 2), dtype = int)
[0m    [1m^[0m[0m
[0m
  @jit(parallel = True)
Comp

In [4]:
df.to_csv('result_batch_limit.csv')

In [11]:
fbtk.one_batch_test_int_solver(0.05, 0.15, 0.01, batch_limit = 100000)

6

In [9]:
?fbtk.one_batch_test_int_solver

[0;31mSignature:[0m
[0mfbtk[0m[0;34m.[0m[0mone_batch_test_int_solver[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mprevalence_rate[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtypeII_error[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtypeI_error[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbatch_limit[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_initial_guess[0m[0;34m=[0m[0;36m2[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
A function gives (int) the best batch size for one batch test given the infection rate

Inputs:
    prevalence_rate(float): infection rate
    n_initial_guess(float): the initial guess 
    typeII_error(float): the prob of type II error
    typeI_error(float):  the prob of type I error
    n_initial_guess:
    batch_limit (int): the upper limit of batch size

Output:
    (int): the optimal batch size
[0;31mFile:[0m      ~/Desktop/batch_testing/fast_btk.py
[0;31mType:[0m      function


In [12]:
B = np.array([[2,-2,0],[1, 0, -1],[0, 1, -1]])

In [14]:
A = [[2, 4, 6, 8],[1, 2, 3, 4], [3, 5, 7, 9]]

In [18]:
np.matmul(np.matmul(B, A) , np.array([[1],[-1], [1], [0]]))

array([[ 4],
       [-1],
       [-3]])