# CI width by changing sample size for QMC and IID Beta for the Hedged (Betting) CI Method

[This JRSSB article by Ian Waudby-Smith and Aaditya Ramdas](https://academic.oup.com/jrsssb/article/86/1/1/7043257) takes $X_1, X_2, \ldots \stackrel{\text{IID}}{\sim} F$ and computes a sequential confidence interval for $\mu = \mathbb{E}(X)$.

For Quasi-Monte Carlo (QMC), also know as low discrepancy sequences, we are going to take 

$$
X_i = \frac{1}{n} \sum_{j=1}^n T_{ij},
$$ 

where for each $i$, $\{T_{ij}\}_{j=1}^n$ is a QMC set that mimics $F$. Therefore, $X_i$ is close to $\mu$, and the sequence $\{X_i\}_{i=1}^R$ is an IID sequence based on $N = nR$ samples.

In this notebook, $F$ is a Beta Distribution.


Similarly, for QMC, for $Y = f(X)$ where $X \sim U(0, 1)$ and $\mu = \mathbb{E}(Y) = \mathbb{E}(f(X))$, we are going to take

$$
Y_i = \frac{1}{n} \sum_{j=1}^n f(x_{ij})
$$ 

Therefore, $Y_i$ is close to $\mu$, and the sequence $\{Y_i\}_{i=1}^R$ is an IID sequence based on $N = nR$ samples. 

In this notebook, we use two integrands: 

$Y = f(X) = \frac{X e^X}{e}$

$Y =
f(X,Y) = 
\begin{cases} 
1, & \text{if } X + Y > \frac{2}{3} \\
0, & \text{otherwise}
\end{cases}$

$ Y = f(X) = \begin{cases} 
1 & \text{if } x < \frac{1}{3} \\
0 & \text{otherwise}
\end{cases}$

We also use the following ridge functions:

1. $ g_{jmp}(w) = 1{\{w \geq 1\}} $
2. $ g_{knk}(w) = \frac {\min(\max(âˆ’2, w), 1) + 2} {3} $
3. $ g_{smo}(w) = \Phi (w)$
4. $ g_{fin}(w) = \min(1,\sqrt{\max(w + 2, 0)}/2) $

$w = \frac{1}{\sqrt{d}} \sum_{j=1}^{d}\Phi^{-1}(x_{j})$, $\Phi(.)$ is the CDF of standard Normal Distribution on R, denoted by $ \mathcal{N}(0,1)$, and $x \sim U(0, 1)^d$.

We have used DigitalNetB2 (Sobol) for QMC.

Importing the necessary modules:

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm,t
from scipy.stats import beta,uniform
from confseq.betting import betting_ci_seq
from confseq.predmix import predmix_empbern_ci_seq
import qmcpy as qp
import math
import pandas as pd

The parameters used for our numerical experiments

In [8]:
alpha = 0.05 # Significance level, confidence level = 1 - alpha

# parameteres used for the beta distribution simulations

beta_param = np.array([10,30]) #parameters for the beta distribution

# parameters used for the integrand problem

# The integrand functions:
fs = {
    "smooth_1d": lambda x: x[...,0]*np.exp(x[...,0])/np.exp(1), 
    "discontinuous_1d": lambda x: (x[...,0]) < (1/3),
    "discontinuous_2d": lambda x: (x[...,0]+x[...,1])>=(2/3),
}
# parameters used for the ridge functions

# The ridge functions:
gs = {
    "jmp": lambda w: w>=1, 
    "knk": lambda w: ((np.minimum(np.maximum(-2,w),1)) + 2) / 3,
    "smo": lambda w: norm.cdf(w + 1),
    "fin": lambda w: np.minimum(1,((np.sqrt(np.maximum(w+2,0)))/2)),
}
d = np.array([1,2,4,16]) # The different d's to test on
ci_methods = np.array(["CLT", "EB", "Betting"]) # The different CI methods

# parameters used for the integrand problems and ridge functions


N_vary = np.array([2**8,2**10,2**12, 2**14, 2**16])# The maximum sample size to be used. Recommended to keep a power of 2 since n must be a power of 2 (QMC rules).
n_vary = 2 ** np.arange(0, 7) # The vector of number of low discrepancy or QMC samples generated per replication
M = 20 # The number of times the computation is repeated

# seed settings

global_seed = 7
parent_seed = np.random.SeedSequence(global_seed)

The function to generate IID replications of QMC samples

In [9]:
def gen_qmc_samples_iid(discrete_distrib, true_measure, n = 2**8, function = None, ridge = False):
    assert isinstance(discrete_distrib,(qp.DigitalNetB2,qp.Lattice,qp.IIDStdUniform))
    assert true_measure in ["uniform","beta"]
    x_rld = discrete_distrib.gen_samples(n).reshape((discrete_distrib.replications,n,discrete_distrib.d))
    if true_measure=="beta":
        x_rld = beta(a=beta_param[0], b=beta_param[1]).ppf(x_rld)
    if ridge is True:
        return x_rld
    if function is None:
        y_rld = x_rld[...,0]
    else:
        y_rld = function(x_rld)
    return y_rld.mean(1),y_rld.flatten()

The function to return the sequence of CLT CI Widths

In [10]:
def clt_ci_seq (values, times, alpha = 0.05):
    assert np.all(times <= len(values)), f"Invalid values in times: {times[times > len(values)]}"
    ci_arr = np.zeros(len(times))
    for time in range (len(times)):
        curr_val = values[0:times[time]]
        ci_arr[time] = 2 * (t.ppf(1 - alpha / 2,times[time] - 1) * curr_val.std(ddof=1) / np.sqrt(times[time])) 
    return ci_arr

Generating the QMC samples that will be used in both the ridge and three test functions

In [11]:
x_qmc_arr = np.empty(len(n_vary), dtype=object)
for i in range(len(n_vary)):
    R_vary = N_vary.max() // n_vary[i]
    child_seed = parent_seed.spawn(1)[0]
    x_qmc_arr[i] = gen_qmc_samples_iid(discrete_distrib=qp.DigitalNetB2(d[-1],seed = child_seed,replications=(M*R_vary)), true_measure="uniform",n = n_vary[i],ridge = True)

# Using Ridge Functions:

In [12]:
qmc_arr_ridge = np.empty((len(N_vary),M, len(ci_methods),len(d),len(gs), len(n_vary))) # consists of CIs (CLT, EB, Betting) for QMC, the ridge functions, the different dimensions, different R's and n's, and different N's.
for i in range (len(n_vary)):
    x_qmc = norm.ppf(x_qmc_arr[i])
    R_vary = N_vary // n_vary[i]
    print("x_qmc.shape = %s"%str(x_qmc.shape))
    for j in range (len(d)):
        w_qmc = x_qmc[:, :, :d[j]].sum(axis = 2)/np.sqrt(d[j])
        m_counter = 0
        for m in range (M):
            m_qmc = w_qmc[m_counter:m_counter + R_vary.max()]
            counter = 0
            for g in gs.values():
                y = g(m_qmc).mean(axis = 1)
                qmc_arr_ridge[:,m,0,j,counter,i] = clt_ci_seq(y,times = R_vary,alpha=alpha) # CLT CI widths
                lower_bound_qmc_integrand_eb,upper_bound_qmc_integrand_eb = predmix_empbern_ci_seq(y, times=R_vary, alpha=alpha, parallel=False, truncation =1/2) 
                # Getting the sequential EB CI widths according to the code from the paper above
                qmc_arr_ridge[:,m,1,j,counter,i] = upper_bound_qmc_integrand_eb - lower_bound_qmc_integrand_eb # The EB CI based on N_vary
                lower_bound_qmc_integrand_bet,upper_bound_qmc_integrand_bet = betting_ci_seq(y, times=R_vary, alpha=alpha, parallel=False, m_trunc=True, trunc_scale=3 / 4) 
                # Getting the sequential Betting CI widths according to the code from the paper above
                qmc_arr_ridge[:,m,2,j,counter,i] = upper_bound_qmc_integrand_bet - lower_bound_qmc_integrand_bet # The Betting CI based on N_vary
                counter = counter + 1
            m_counter = m_counter + R_vary.max()

x_qmc.shape = (1310720, 1, 16)


/opt/miniconda3/envs/bet_sim/lib/python3.8/site-packages/numpy/core/fromnumeric.py:57


x_qmc.shape = (655360, 2, 16)
x_qmc.shape = (327680, 4, 16)
x_qmc.shape = (163840, 8, 16)
x_qmc.shape = (81920, 16, 16)
x_qmc.shape = (40960, 32, 16)
x_qmc.shape = (20480, 64, 16)


# The Integrand Problems

In [13]:
qmc_arr_func = np.empty((len(N_vary),M, len(ci_methods),len(fs), len(n_vary))) # consists of CIs (CLT, EB, Betting) for QMC, the integrands, different R's and n's, and different N's.
for i in range (len(n_vary)):
    R_vary = N_vary // n_vary[i]
    x_qmc = x_qmc_arr[i]
    print("x_qmc.shape = %s"%str(x_qmc.shape))
    m_counter = 0
    for m in range (M):
        m_qmc = x_qmc[m_counter:m_counter + R_vary.max()]
        counter = 0
        for f in fs.values():
            y = f(m_qmc).mean(axis = 1)
            qmc_arr_func[:,m,0,counter,i] = clt_ci_seq(y,times = R_vary,alpha=alpha) # CLT CI widths
            lower_bound_qmc_integrand_eb,upper_bound_qmc_integrand_eb = predmix_empbern_ci_seq(y, times=R_vary, alpha=alpha, parallel=False, truncation =1/2) 
            # Getting the sequential EB CI widths according to the code from the paper above
            qmc_arr_func[:,m,1,counter,i] = upper_bound_qmc_integrand_eb - lower_bound_qmc_integrand_eb # The EB CI based on N_vary
            lower_bound_qmc_integrand_bet,upper_bound_qmc_integrand_bet = betting_ci_seq(y, times=R_vary, alpha=alpha, parallel=False, m_trunc=True, trunc_scale=3 / 4) 
            # Getting the sequential Betting CI widths according to the code from the paper above
            qmc_arr_func[:,m,2,counter,i] = upper_bound_qmc_integrand_bet - lower_bound_qmc_integrand_bet # The Betting CI based on N_vary
            counter = counter + 1
        m_counter = m_counter + R_vary.max()


x_qmc.shape = (1310720, 1, 16)
x_qmc.shape = (655360, 2, 16)
x_qmc.shape = (327680, 4, 16)
x_qmc.shape = (163840, 8, 16)
x_qmc.shape = (81920, 16, 16)
x_qmc.shape = (40960, 32, 16)
x_qmc.shape = (20480, 64, 16)


Appending the data to a csv file:

In [14]:
rows = []

# Ridge Function Entries
for i, N in enumerate(N_vary):
    for k, n in enumerate(n_vary):
        for g_idx, g_name in enumerate(gs.keys()):
            for d_idx, dim in enumerate(d):
                for ci_idx, ci in enumerate(ci_methods):
                    # Extract the corresponding M values
                    M_values = qmc_arr_ridge[i, :, ci_idx, d_idx, g_idx, k]
                    row = [N, n, g_name, dim, ci] + list(M_values)
                    rows.append(row)

# Integrand Entries 
for i, N in enumerate(N_vary):
    for k, n in enumerate(n_vary):
        for f_idx, f_name in enumerate(fs.keys()):
            for ci_idx, ci in enumerate(ci_methods):
                # Extract the corresponding M values
                M_values = qmc_arr_func[i, :, ci_idx, f_idx, k]
                row = [N, n, f_name, "", ci] + list(M_values)  # Empty string for Dimension
                rows.append(row)

# Define column labels, starting M from M1 instead of M0
col_labels = ["N_vary", "n_vary", "Function", "Dimension", "CI Method"] + [f"M{m+1}" for m in range(M)]

# Create DataFrame
df = pd.DataFrame(rows, columns=col_labels)

# Save to CSV
df.to_csv("qmc_combined_results.csv", index=False)

# Print DataFrame for verification
print(df)

      N_vary  n_vary          Function Dimension CI Method        M1  \
0        256       1               jmp         1       CLT  0.087691   
1        256       1               jmp         1        EB  0.157505   
2        256       1               jmp         1   Betting  0.112000   
3        256       1               jmp         2       CLT  0.090458   
4        256       1               jmp         2        EB  0.159020   
...      ...     ...               ...       ...       ...       ...   
1990   65536      64  discontinuous_1d                  EB  0.014504   
1991   65536      64  discontinuous_1d             Betting  0.006000   
1992   65536      64  discontinuous_2d                 CLT  0.001980   
1993   65536      64  discontinuous_2d                  EB  0.014674   
1994   65536      64  discontinuous_2d             Betting  0.007000   

            M2        M3        M4        M5  ...       M11       M12  \
0     0.088633  0.085743  0.099232  0.092206  ...  0.093882  0