Skip to content

Δ‑analysis v0.2: Full‑stack exact numerics — from lazy series summations to DEC on arbitrary grids

Latest

Choose a tag to compare

@aratraw aratraw released this 04 May 04:33
· 2 commits to main since this release
268c5f7

Δ‑analysis Library — v0.2 Release Notes

Date: 2026‑05‑04
DOI (paper): 10.5281/zenodo.18761044
DOI (software, v0.1): 10.5281/zenodo.18934082
License: PolyForm Small Business License 1.0.0


Version 0.2 represents a comprehensive re‑engineering of the Δ‑analysis library. The codebase has been transformed from an initial research prototype into a stable, high‑performance platform for exact constructive computation. This release incorporates a complete redesign of the arithmetic engine, integrates the geometry and numerical modules into a unified framework, and provides extensive human‑written documentation backed by over 400 test cases.

Below we provide a technical summary of the implementation work and the resulting capabilities of the library.


1. Rational Arithmetic Engine — Complete Redesign

The rational number subsystem has been rewritten from the ground up to achieve three simultaneous goals: exactness of all arithmetic operations, compactness of internal representations, and performance that meets or exceeds the raw Boost.Multiprecision backend on which it is built.

1.1 Eager Rational: Zero‑Overhead Wrapper

The Rational class is implemented as a thin wrapper over boost::multiprecision::rational_adaptor<cpp_int_backend<...>>. Backend parameters were tuned empirically:

  • MinBits = 128 — integers up to 128 bits (approximately 38 decimal digits) are stored directly on the stack via Boost's small‑object optimisation, avoiding heap allocation for the vast majority of intermediate values
  • MaxBits = 0 — unlimited precision is available when needed; no artificial bound on representable magnitude
  • signed_magnitude with unchecked — standard representation without runtime overhead for sign checks or range validation
  • et_off — expression templates are disabled, giving the library full control over when evaluation occurs and eliminating the indirection cost of Boost's own lazy layer

The result is an eager rational type whose performance is indistinguishable from raw Boost.Multiprecision (measured difference ≤ 1%, within benchmark noise), validated on workloads of up to 500 000 terms with random, dyadic, and harmonic‑series fractions.

1.2 Lazy Rational Engine: Architecture

The LazyRational class introduces a mutable, move‑only expression graph with deferred evaluation. Key design elements:

  • Mutable tree construction. Binary arithmetic operators mutate the left operand in place: a + b absorbs the tree of b into a, amortised O(1) per operation. This enables chained accumulation acc + term1 + term2 + ... without constructing intermediate objects. Copy semantics are intentionally deleted; explicit deep copies are obtained via .clone()

  • Dirty / clean state machine. A lazy expression begins in a dirty state — a local, un‑validated tree of DirtyNode objects. The canonicalize() method converts it into a clean state, where the entire expression is represented by a single integer index into the global node pool. The transition involves algebraic simplification and hash‑consing, after which the node is immutable and shared

  • Heterogeneous SUM and PRODUCT nodes. Leaf constants are stored directly in leaf_values vectors rather than as separate CONST child nodes. This reduces node count, flattens the tree, and is essential for efficient batching during evaluation

1.3 Global Node Pool and Hash‑Consing

The clean node pool (delta::internal::NodePool) is a thread‑local, append‑only vector with several hash‑consing caches:

  • constant_cache: maps Value → CONST node index
  • sum_product_cache: maps canonicalised operand sets to SUM/PRODUCT node indices
  • unary_cache: maps (LazyOp, children, eps_idx) tuples to indices for NEG, RECIP, SQRT, EXP, LOG, SIN, COS, ACOS, PI, E, POW

The caches guarantee that structurally identical sub‑expressions are represented by a single node. Structural equality between clean expressions reduces to integer comparison of their clean_index_ values.

Pool allocation uses a next_free_index hint with chunked growth of 4096 nodes. Slots are only reclaimed by the garbage collector, never reused individually. This simplifies reference counting and makes pool indices stable between GC cycles.

1.4 Garbage Collection as Deferred Computation

When the pool occupancy exceeds 90% of max_size (default 1 000 000 nodes), collect_garbage() is triggered:

  1. All live clean LazyRational objects are identified via the clean object registry (g_clean_rationals, a thread‑local unordered set)
  2. Each live root's entire subtree is evaluated to a single Value
  3. A new pool is created; each evaluated value is stored as a CONST node at the same index as the original root
  4. The old pool is discarded

Thus, garbage collection is the computational step that forces deferred evaluation. It is not an incidental cleanup; it is the moment all outstanding lazy work is settled.

The clean object registry also enables reset_pool() — a full teardown that destroys all clean trees, reinstates every LazyRational object as a dirty zero, and creates an empty pool. This is used between independent computation phases.

1.5 Algebraic Simplification

The simplify_tree() function (in simplify_impl.h) performs constructive rewrites on a TempNode tree during canonicalization:

  • Flattening of nested SUM and PRODUCT nodes
  • Removal of neutral elements (0 in sums, 1 in products)
  • Grouping of identical scalar constants (a + a → 2*a) and identical sub‑expressions (A + A → 2*A, A * A → A^2)
  • Distributivity: a*b + a*c → a*(b+c) when common factors are detected
  • Cancellation: x + NEG(x) → 0, x * RECIP(x) → 1
  • Inverse functions: NEG(NEG(x)) → x, EXP(LOG(x)) → x, LOG(EXP(x)) → x
  • Power simplifications: x^0 → 1, x^1 → x, 1^y → 1, (x^a)^b → x^(a*b) for integer exponents

Simplification is not a default — it must be requested or is triggered implicitly by eval(). The destructive eval_inplace(true) path skips it entirely, avoiding its cost for unstructured sums where it provides no benefit. Benchmarks show that simplification can reduce evaluation time by 10–1000× on structured expressions, but is a net loss for flat sums.

1.6 Evaluation and Pyramidal Compact Reduction

Dirty tree evaluation uses evaluate_tree(), which performs post‑order traversal with intermediate result caching. For SUM nodes, pyramidal compact reduction (PCR) groups terms into batches of 32 and reduces them hierarchically. This minimises intermediate rational growth by ensuring that additions occur in a balanced tree rather than a linear chain.

  • eval_inplace(true) uses destructive PCR: it moves leaf values out of the node, modifies them in place, and produces a single constant result, replacing the dirty tree with a clean CONST node
  • eval() copies the necessary data and preserves the original tree

PCR with batch size 32 was chosen because it keeps the results of summing 32 random fractions within 128 bits for typical inputs, allowing the stack‑allocated small‑integer path of Boost's backend to dominate.


2. Transcendental Functions — Implementation and Mutual Consistency

All elementary transcendental functions are provided in both eager (returning Rational) and lazy (creating LazyRational nodes) forms. They accept an explicit absolute error bound ε (default 1e‑30).

2.1 Hybrid Evaluation Strategy

A function‑by‑function decision was made on when to use a floating‑point path versus a pure rational series path, based on extensive benchmarks:

  • Float path (cpp_dec_float_100): Used for sin, cos, exp, π, acos, asin, atan, tan when ε ≥ 1e‑35. The float path is 2–3× faster at moderate precision but loses relative accuracy for large arguments or very small ε
  • Series path (pure rational): Used for sqrt, log, e at all precisions; used for all functions when ε < 1e‑35. For exp, the series path is also forced when |x| > 20 because the float path's mantissa cannot preserve relative accuracy at extreme magnitudes

2.2 Specific Implementations

  • Square root: Newton's method with an integer floor‑sqrt initial guess (isqrt(num*den) / den). This yields compact rationals (a few hundred bits for typical inputs) and avoids the representation bloat that plagued earlier versions using x/2 as the initial guess. The integer root is extracted via a fast Newton method on dumb_int, with a preliminary mod‑256 filter for quick rejection of non‑squares

  • Exponential: Scaling‑and‑squaring combined with a Padé approximant, with the order chosen adaptively based on ε. Argument reduction multiplies ε by 2^(exp_bits + k + 2) to guarantee that, after squaring, the absolute error remains within tolerance. The reduction threshold is 2.0 — lowering it to 1.0 was empirically shown to produce bloated intermediate fractions that cripple downstream operations (log(exp(x)) became catastrophically slow)

  • Logarithm: Argument reduction to [1/2, 2] via k·ln(2), then the arctanh series ln((1+y)/(1-y)) = 2(y + y³/3 + y⁵/5 + …) with y = (m‑1)/(m+1). ln(2) is computed once and cached

  • Sine and Cosine: Binary splitting of the Maclaurin series with exact rational reduction modulo 2π. The binary splitting formula:

    P, Q, T = merge((P_L, Q_L, T_L), (P_R, Q_R, T_R))
    P = P_L * P_R
    Q = Q_L * Q_R
    T = T_L * Q_R + P_L * T_R
    

    prevents intermediate rational swell far more effectively than iterative term‑by‑term summation

  • π (Pi): Chudnovsky algorithm with binary splitting, providing approximately 14.18 decimal digits per term. For small N (< 16), a recurrence is used; for larger N, full binary splitting is employed. The result is cached per ε in a thread‑local map

  • Inverse trigonometric functions: asin uses binary splitting; acos(x) = π/2 − asin(x); atan uses binary splitting with reduction to |x| ≤ 0.5 via atan(x) = π/4 + atan((x‑1)/(x+1)) and atan(x) = π/2 − atan(1/x). tan is sin/cos

  • Power function: Integer exponents use binary exponentiation. Fractional exponents use exp(exp * log(base)) with appropriately scaled ε. Exact integer roots are detected via try_exact_nth_root() before falling back to the general series path

2.3 Mutual Consistency

The complexity of ensuring that all transcendental functions are consistent with each other cannot be overstated. The problem is that each function computes its result to an absolute tolerance ε, but the size of the rational representation determines the cost of all subsequent operations.

Key measures taken:

  • The π cache ensures that sin, cos, and trigonometry‑based argument reductions use a single, consistent value of π for a given ε. This prevents contradiction between, e.g., sin(π) ≈ 0 and cos(π/2) ≈ 0 — if π were recomputed with slightly different accuracy, one would deviate
  • The reduction threshold in exp (2.0) prevents exp(log(x)) from producing bloated intermediate fractions that make the identity exp(log(x)) = x computationally unverifiable at high precision
  • The integer‑sqrt initial guess in sqrt prevents the size of sqrt(2) from inflating when it is used as input to π, sin, or cos. This was the single most impactful optimisation for the mutual consistency of the transcendental suite: a bloated sqrt(2) would silently degrade the Chudnovsky algorithm and the sin/cos binary splitting, causing tests like PiSinConsistency and PiCosConsistency to time out

All of these choices were validated by running the full correctness test suite (transcendentals_correctness.cpp) which checks fundamental identities (sin² + cos² = 1, exp(log(x)) = x, sqrt(x)² = x, cos(acos(x)) = x, acos(x) + asin(x) = π/2) for multiple arguments across ε from 1e‑12 to 1e‑100.


3. Discrete Exterior Calculus and Numerical Operators

The geometry and numerical modules have been integrated and tested.

3.1 Simplicial Complexes

SimplicialComplex<Dim, Coord> stores vertices and simplices up to dimension Dim. Incidence relations (incident_faces) are verified to produce correct signs following the (−1)^i convention. Barycentric subdivision is implemented for 2D with a subdivision map linking coarse simplices to their fine covers.

3.2 Barycentric Dual Complex

DualComplex builds the barycentric dual for 2D triangulations and 3D tetrahedralisations. Dual volumes are computed analytically. The implementation handles boundary edges/faces correctly (dual cells extend to the boundary midpoint/barycentre). Tests verify bijection of primal‑dual mappings and exact sum of dual volumes equalling the mesh volume.

3.3 Discrete Forms (DEC)

DiscreteForm<k, Value, Complex> implements:

  • Exterior derivative d via incidence matrices with exact signs
  • Diagonal Hodge star using dual and primal volumes, with distinct formulations for k=0, 0<k<n, and k=n
  • Codifferential δ = (−1)^{n(k−1)+1} ⋆ d ⋆
  • Hodge Laplacian Δ = dδ + δd
  • Wedge product for 1‑forms in 2D

All algebraic identities (d∘d = 0, constant‑in‑kernel, antisymmetry of , integral preservation of ) hold exactly in rational arithmetic. The test suite explicitly does not demand that the DEC Laplacian match the cotangent Laplacian, because this identity holds only for the circumcentric (Voronoi) dual; this architectural decision is documented in both the header file and the test file.

3.4 Finite‑Difference Operators

discrete_operators.h provides gradient, divergence, curl (2D and 3D), and Laplacian on UniformGrid, ListGrid, and ProductGrid<Grid, N> up to 4 dimensions. All operators:

  • Are metric‑aware and support forward, backward, and central differences with fallback at boundaries
  • Produce TensorField objects keyed by the grid's address type
  • Are parallelised with OpenMP when the number of grid points exceeds 1000
  • Are tested for exactness on polynomials and for second‑order convergence under refinement

3.5 Cotangent Laplacian

build_cotangent_laplacian constructs the standard cotangent‑weight Laplacian for 2D triangle meshes using exact rational arithmetic. Symmetry, row‑sum zero, constant‑kernel, and analytically derived values on a symmetric square mesh are verified.


4. Adaptive Refinement

AdaptiveDeltaPath implements a priority‑queue‑driven refinement process. Intervals are prioritised by deviation from linearity (computed via the value metric). The path supports:

  • Construction from arbitrary initial points or from a uniform path (via from_uniform)
  • Incremental update of the global maximum oscillation to avoid O(N) recomputation
  • Configurable threshold; the queue naturally empties when no interval exceeds it

Benchmarks demonstrate the power of adaptivity:

Function ε Uniform time Adaptive time Speedup
abs(x-0.5) (kink) 1e‑4 98 ms 96 µs 1021×
exp(-1000*(x-0.5)^2) (narrow peak) 1e‑4 5.3 s 8.6 ms 618×
abs(x-0.25) + abs(x-0.75) (two kinks) 1e‑4 389 ms 127 µs 3065×
sin(100*pi*x) (uniform oscillations) 1e‑4 9.3 µs 11.5 s uniform wins

5. Test Suite and Documentation


6. Performance Summary

Detailed methodology and interpretation in docs/benchmark_results.md.

Scenario v0.2 Performance
Harmonic series (50k terms) 487 ms (lazy) vs 2860 ms (Boost eager) – 5.9× faster
Random rationals (500k terms) 794 ms (lazy) vs 1847 ms (Boost eager) – 2.33× faster
sin(1.23…) at ε=1e‑80 247 µs (Delta) vs 1646 µs (naive series) – 6.7× faster
π (cached) at ε=1e‑80 2 µs (Delta) vs 547 µs (naive series) – 273× faster
Adaptive vs uniform (kink, ε=1e‑4) 96 µs vs 98 ms – 1021× faster

All benchmarks are preceded by a correctness verification that the computed values match independent reference implementations.


7. Breaking Changes from v0.1

  • License: Changed from CC‑BY‑NC‑SA 4.0 / Commercial dual licensing to PolyForm Small Business License 1.0.0
  • Rational header structure: rational.h is now a thin aggregation header; internal headers are in delta/rational/. Code relying on internal includes may need updates
  • Value type: The internal representation changed; direct manipulation of the old SmallStorage is no longer possible. Use the public Rational interface

8. Future Work (v0.3)

  • Symbolic differentiation on LazyRational trees
  • Generalised DEC with circumcentric duals, 3D wedge products, and N‑forms
  • Template‑based PDE solvers decoupled from discretisation
  • Further performance optimisations

9. Getting Started

cmake --preset x64-release
cmake --build out/build/x64-release
cd out/build/x64-release && ctest --output-on-failure

Read the User Manual and browse the test files for usage examples.


This release transforms the Δ‑analysis library from a research prototype into a stable, high‑performance, and thoroughly documented platform for exact constructive computation. We hope it serves the community well.

— Timofey Ishimtsev, May 2026