# How Much Accuracy Do You Need?

Sou-Cheng Choi (with some edits by Fred Hickernell)

Jun 16, 2025

## Art Owen's Reflections on Lyness and the Accuracy Question

This notebook explores a fascinating discussion about the accuracy requirements for numerical integration, particularly in the context of automatic quadrature routines. It's based on a conversation between Art Owen and Fred, highlighting the challenges of translating scientific needs into quantifiable accuracy targets.

The central theme revolves around how a scientist determines the desired accuracy of a numerical integration result.  Let's examine three common responses:

**Case A: The "Plenty of Time" Response**

*   **Response:** "I would like 8-figure accuracy. I have quite enough computer time available for this."
*   **Explanation:** This is a relatively rare response. It indicates a situation where the scientist is primarily concerned with the *result* itself, rather than the computational cost.  It's suitable for small problems where the time spent on achieving high accuracy isn't a significant constraint.  Automatic quadrature routines are designed to handle such scenarios.

**Case B: The "Time-Constrained" Response**

*   **Response:** "I need at least 4-figure accuracy. But I don’t want to use more than 2 seconds CPU time. If this can’t be done, I shall abandon this problem. If it can be done, I should prefer 6- or 7-figure accuracy. But if the marginal cost for more figures is really small let’s go to 12 figures."
*   **Explanation:** This is a much more typical response. It reflects a realistic situation where the scientist is operating under time and resource constraints. They need a solution, but they also have a limited amount of CPU time.  Automatic quadrature routines with restart facilities (which are becoming more common) can be useful here, allowing the routine to refine its solution if it initially falls short of the desired accuracy.  The "marginal cost" refers to the additional time and effort required to increase the number of digits of accuracy.

**Case C: The "I Don't Know" Response**

*   **Response:** "I really don’t know. Let me explain..."
*   **Explanation:** This is the most common response, and it highlights the fundamental challenge. The scientist may not have a clear understanding of the required accuracy, perhaps because the problem is complex, the underlying physics is poorly understood, or the desired application doesn't demand extremely high precision.  This is where the numerical analyst's role becomes crucial – to guide the scientist in understanding the implications of accuracy for their specific problem.


 

## The Problem
Automatic quadrature routines (like those in QMCPy) require the user to specify a target accuracy. But in practice, scientists often don't know what accuracy they need, or their needs may change as they see results or as computational budgets shift.

## The Solution: Resumable Integration in QMCPy
With the new `resume` feature in QMCPy, you can start an integration with a loose tolerance, inspect the results, and then _resume_ the computation with a tighter tolerance—without starting over. This enables a flexible, iterative, and cost-effective approach to scientific computing.

### How it Works
- **Start with a loose tolerance**: Get a quick, rough answer.
- **Save the computation state**: The integration data can be saved to disk.
- **Resume with a tighter tolerance**: Continue from where you left off, using all previous samples.
- **Repeat as needed**: You can keep tightening the tolerance, or even pause and resume across sessions or machines.

This is especially useful for Case B and Case C scientists: you can explore, adapt, and only pay for more accuracy if you need it.

## Implementation
The following implementation will be carried out in the subclasses of StoppingCriterion:
1. Add a `resume` parameter to the `integrate()` method 
2. Implement logic to restore the previous state when `resume=<AccumulateData instance>`.
    * Ensure `n_min` is set correctly based on the resumed state.
    * Support restoration of all relevant data (e.g., sample points, posterior state).
3. Write tests to verify correct resumption behavior.
 
**Bayesian Classes:**
- `CubBayesLatticeG` 
- `CubBayesNetG` 

**Monte Carlo Classes:**
- `CubMCCLT`
- `CubMCCLT`
- `CubMC`
- Multilevel methods
  - `CubMCML`
  - `CubMCMLCont`

**Quasi Monte Carlo Classes:**
- `CubQMCCLT`
- `CubQMCLatticeG`
- `CubQMCNetG`
- Multilevel methods
  - `CubQMCML`
  - `CubQMCMLCont`


Let's see how this works in code. We will use a Genz oscillatory integrand and QMCPy's `CubQMCLatticeG` routine.

In [1]:
from qmcpy import *
import os

# Define integrand and measure
dimension = 3
discrete_distrib = Lattice(dimension=dimension)
true_measure = Gaussian(discrete_distrib, mean=0, covariance=1)
integrand = Genz(discrete_distrib, kind_func='oscillatory', kind_coeff=1)

### Step 1: Quick Estimate
Suppose you want a quick answer, so you set a loose tolerance.

In [2]:
abs_tol = 1e-4
rel_tol = 0
solver = CubQMCLatticeG(integrand, abs_tol=abs_tol, rel_tol=rel_tol)
solution1, data1 = solver.integrate()
print(f'Loose tolerance solution: {solution1[0]:.6f}, time: {data1.time_integrate:.3f} s, # samples: {data1.n_total:.0f}, error bound: {(data1.comb_bound_high[0] - data1.comb_bound_low[0])/2:.3e}')

Loose tolerance solution: -0.428931, time: 0.003 s, # samples: 4096, error bound: 9.787e-05


In [3]:
data1

LDTransformData (AccumulateData Object)
    solution        -0.429
    comb_bound_low  -0.429
    comb_bound_high -0.429
    comb_flags      1
    n_total         2^(12)
    n               2^(12)
    time_integrate  0.003
CubQMCLatticeG (StoppingCriterion Object)
    abs_tol         1.00e-04
    rel_tol         0
    n_init          2^(10)
    n_max           2^(35)
Genz (Integrand Object)
    kind_func       oscillatory
    kind_coeff      1
Uniform (TrueMeasure Object)
    lower_bound     0
    upper_bound     1
Lattice (DiscreteDistribution Object)
    d               3
    dvec            [0 1 2]
    randomize       SHIFT
    order           NATURAL
    gen_vec         [     1 182667 213731]
    entropy         325005213987133239155911009368926147147
    spawn_key       ()

The `data1` object contains all the diagnostic information from the first integration run. It includes the estimated solution, error bounds, number of samples used, and other useful statistics. This lets you see how close you are to your initial (loose) tolerance and how much work was done so far.

### Step 2: Save the State (Optional)
You can save the integration state to disk for later resumption.

In [4]:
from pathlib import Path
output_dir = Path("demo_resume_data")
output_dir.mkdir(parents=True, exist_ok=True)
save_path = output_dir / 'demo_resume_data.pkl'
data1.save(save_path)
print(f'Saved integration state to {save_path}')

Saved integration state to demo_resume_data/demo_resume_data.pkl


**Bonus: File Size Optimization with Compression**

QMCPy's `AccumulateData.save()` method supports compression to reduce file sizes. This is especially useful for large integration problems or when storage space is limited.  When `compress=True`, the `.gz` extension is automatically appended to maintain consistency with standard compression conventions.

**Example Usage:**
```python
# Save compressed - automatically creates 'data.pkl.gz'
data.save('data.pkl', compress=True)  

# Load compressed - auto-detection works
loaded_data = AccumulateData.load('data.pkl.gz')
```

Let's demonstrate the file size difference and automatic naming behavior.

In [5]:
# Save without compression (default)
save_path_uncompressed = output_dir / 'demo_resume_data_uncompressed.pkl'
data1.save(save_path_uncompressed, compress=False)

# Save with compression (note: .gz extension will be automatically appended)
save_path_compressed_base = output_dir / 'demo_resume_data_compressed.pkl'
data1.save(save_path_compressed_base, compress=True)
save_path_compressed = save_path_compressed_base.with_suffix('.pkl.gz')  

print(f"Files created:")
print(f"  Uncompressed: {save_path_uncompressed.name}")
print(f"  Compressed:   {save_path_compressed.name}")

# Compare file sizes
size_uncompressed = os.path.getsize(save_path_uncompressed)
size_compressed = os.path.getsize(save_path_compressed)
print(f"  Without compression: {size_uncompressed:,} bytes")
print(f"  With compression:    {size_compressed:,} bytes ({100*(1-size_compressed/size_uncompressed):.1f}% reduction)")

# Verify that both files load correctly and produce the same data
data_uncompressed = data1.__class__.load(save_path_uncompressed) 
data_compressed = data1.__class__.load(save_path_compressed)     
print(f"  Uncompressed data samples: {data_uncompressed.n_total:.0f}")
print(f"  Compressed data samples:   {data_compressed.n_total:.0f}")


Files created:
  Uncompressed: demo_resume_data_uncompressed.pkl
  Compressed:   demo_resume_data_compressed.pkl.gz
  Without compression: 339,578 bytes
  With compression:    197,262 bytes (41.9% reduction)
  Uncompressed data samples: 4096
  Compressed data samples:   4096


In [None]:
# Test if TqdmWarning is resolved
import warnings
warnings.filterwarnings('error')  # Convert warnings to errors to catch them

try:
    from tqdm.auto import tqdm
    import numpy as np
    # Test tqdm with a simple progress bar
    for i in tqdm(range(10), desc="Testing tqdm"):
        pass
    print("✓ TqdmWarning resolved - ipywidgets working correctly")
except Exception as e:
    print(f"⚠️  Issue still exists: {e}")
finally:
    warnings.resetwarnings()  # Reset warning filters

### Step 3: Resume with Tighter Tolerance
Now suppose you want more accuracy. You can resume from the saved state, using all previous samples.

In [6]:
# data1 = AccumulateData.load(save_path)  # optional
abs_tol = 1e-7
solver.set_tolerance(abs_tol=abs_tol)
solution2, data2 = solver.integrate(resume=data1)
print(f'Resumed with tighter tolerance solution: {solution2[0]:.7f}, time: {data2.time_integrate:.3f} s, # samples: {data2.n_total:.0f}, error bound: {(data2.comb_bound_high[0] - data2.comb_bound_low[0])/2:.3e}')

Resumed with tighter tolerance solution: -0.4289321, time: 0.250 s, # samples: 1048576, error bound: 4.570e-08


In [7]:
data2

LDTransformData (AccumulateData Object)
    solution        -0.429
    comb_bound_low  -0.429
    comb_bound_high -0.429
    comb_flags      1
    n_total         2^(20)
    n               2^(20)
    time_integrate  0.250
CubQMCLatticeG (StoppingCriterion Object)
    abs_tol         1.00e-07
    rel_tol         0
    n_init          2^(10)
    n_max           2^(35)
Genz (Integrand Object)
    kind_func       oscillatory
    kind_coeff      1
Uniform (TrueMeasure Object)
    lower_bound     0
    upper_bound     1
Lattice (DiscreteDistribution Object)
    d               3
    dvec            [0 1 2]
    randomize       SHIFT
    order           NATURAL
    gen_vec         [     1 182667 213731]
    entropy         325005213987133239155911009368926147147
    spawn_key       ()


After resuming with a tighter tolerance, `data2` shows the updated integration state. You can compare this to `data1` to see how the error bounds have tightened, how many more samples were needed, and how the solution has changed. This demonstrates the efficiency of the resume feature: you don't lose any previous work.

### Step 4: Compare to Starting from Scratch
For reference, let's see how long it takes to get the same accuracy if we start from scratch.

In [8]:
solver2 = CubQMCLatticeG(integrand, abs_tol=abs_tol, rel_tol=rel_tol)
solution3, data3 = solver2.integrate()
print(f'Start from scratch solution: {solution3[0]:.7f}, time: {data3.time_integrate:.3f} s, # samples: {data3.n_total:.0f}, error bound: {(data3.comb_bound_high[0] - data3.comb_bound_low[0])/2:.3e}')
print(f'Compared to the two-step process of time: {data2.time_integrate :.3f}')

Start from scratch solution: -0.4289321, time: 0.260 s, # samples: 1048576, error bound: 4.570e-08
Compared to the two-step process of time: 0.250


In [9]:
data3

LDTransformData (AccumulateData Object)
    solution        -0.429
    comb_bound_low  -0.429
    comb_bound_high -0.429
    comb_flags      1
    n_total         2^(20)
    n               2^(20)
    time_integrate  0.260
CubQMCLatticeG (StoppingCriterion Object)
    abs_tol         1.00e-07
    rel_tol         0
    n_init          2^(10)
    n_max           2^(35)
Genz (Integrand Object)
    kind_func       oscillatory
    kind_coeff      1
Uniform (TrueMeasure Object)
    lower_bound     0
    upper_bound     1
Lattice (DiscreteDistribution Object)
    d               3
    dvec            [0 1 2]
    randomize       SHIFT
    order           NATURAL
    gen_vec         [     1 182667 213731]
    entropy         325005213987133239155911009368926147147
    spawn_key       ()


This is the diagnostic output from starting the integration from scratch with the tight tolerance. Compare this to `data2` (the resumed run), you should see that the time required is greater. This highlights the practical benefit of QMCPy's resume feature.

## Conclusion

- With the resume feature, you can adaptively decide how much accuracy you need, and only pay for more if you want it.
- You can pause, checkpoint, and resume long computations.
- This is a practical answer to Lyness's Case B and Case C: you don't have to know your accuracy in advance!

Try it yourself: change the tolerances, or resume from a saved file in a new session.