Δ‑analysis v0.2: Full‑stack exact numerics — from lazy series summations to DEC on arbitrary grids
LatestΔ‑analysis Library — v0.2 Release Notes
Date: 2026‑05‑04
DOI (paper): 10.5281/zenodo.18761044
DOI (software, v0.1): 10.5281/zenodo.18934082
License: PolyForm Small Business License 1.0.0
Version 0.2 represents a comprehensive re‑engineering of the Δ‑analysis library. The codebase has been transformed from an initial research prototype into a stable, high‑performance platform for exact constructive computation. This release incorporates a complete redesign of the arithmetic engine, integrates the geometry and numerical modules into a unified framework, and provides extensive human‑written documentation backed by over 400 test cases.
Below we provide a technical summary of the implementation work and the resulting capabilities of the library.
1. Rational Arithmetic Engine — Complete Redesign
The rational number subsystem has been rewritten from the ground up to achieve three simultaneous goals: exactness of all arithmetic operations, compactness of internal representations, and performance that meets or exceeds the raw Boost.Multiprecision backend on which it is built.
1.1 Eager Rational: Zero‑Overhead Wrapper
The Rational class is implemented as a thin wrapper over boost::multiprecision::rational_adaptor<cpp_int_backend<...>>. Backend parameters were tuned empirically:
MinBits = 128— integers up to 128 bits (approximately 38 decimal digits) are stored directly on the stack via Boost's small‑object optimisation, avoiding heap allocation for the vast majority of intermediate valuesMaxBits = 0— unlimited precision is available when needed; no artificial bound on representable magnitudesigned_magnitudewithunchecked— standard representation without runtime overhead for sign checks or range validationet_off— expression templates are disabled, giving the library full control over when evaluation occurs and eliminating the indirection cost of Boost's own lazy layer
The result is an eager rational type whose performance is indistinguishable from raw Boost.Multiprecision (measured difference ≤ 1%, within benchmark noise), validated on workloads of up to 500 000 terms with random, dyadic, and harmonic‑series fractions.
1.2 Lazy Rational Engine: Architecture
The LazyRational class introduces a mutable, move‑only expression graph with deferred evaluation. Key design elements:
-
Mutable tree construction. Binary arithmetic operators mutate the left operand in place:
a + babsorbs the tree ofbintoa, amortised O(1) per operation. This enables chained accumulationacc + term1 + term2 + ...without constructing intermediate objects. Copy semantics are intentionally deleted; explicit deep copies are obtained via.clone() -
Dirty / clean state machine. A lazy expression begins in a dirty state — a local, un‑validated tree of
DirtyNodeobjects. Thecanonicalize()method converts it into a clean state, where the entire expression is represented by a single integer index into the global node pool. The transition involves algebraic simplification and hash‑consing, after which the node is immutable and shared -
Heterogeneous SUM and PRODUCT nodes. Leaf constants are stored directly in
leaf_valuesvectors rather than as separateCONSTchild nodes. This reduces node count, flattens the tree, and is essential for efficient batching during evaluation
1.3 Global Node Pool and Hash‑Consing
The clean node pool (delta::internal::NodePool) is a thread‑local, append‑only vector with several hash‑consing caches:
constant_cache: mapsValue → CONSTnode indexsum_product_cache: maps canonicalised operand sets toSUM/PRODUCTnode indicesunary_cache: maps(LazyOp, children, eps_idx)tuples to indices forNEG,RECIP,SQRT,EXP,LOG,SIN,COS,ACOS,PI,E,POW
The caches guarantee that structurally identical sub‑expressions are represented by a single node. Structural equality between clean expressions reduces to integer comparison of their clean_index_ values.
Pool allocation uses a next_free_index hint with chunked growth of 4096 nodes. Slots are only reclaimed by the garbage collector, never reused individually. This simplifies reference counting and makes pool indices stable between GC cycles.
1.4 Garbage Collection as Deferred Computation
When the pool occupancy exceeds 90% of max_size (default 1 000 000 nodes), collect_garbage() is triggered:
- All live clean
LazyRationalobjects are identified via the clean object registry (g_clean_rationals, a thread‑local unordered set) - Each live root's entire subtree is evaluated to a single
Value - A new pool is created; each evaluated value is stored as a
CONSTnode at the same index as the original root - The old pool is discarded
Thus, garbage collection is the computational step that forces deferred evaluation. It is not an incidental cleanup; it is the moment all outstanding lazy work is settled.
The clean object registry also enables reset_pool() — a full teardown that destroys all clean trees, reinstates every LazyRational object as a dirty zero, and creates an empty pool. This is used between independent computation phases.
1.5 Algebraic Simplification
The simplify_tree() function (in simplify_impl.h) performs constructive rewrites on a TempNode tree during canonicalization:
- Flattening of nested
SUMandPRODUCTnodes - Removal of neutral elements (0 in sums, 1 in products)
- Grouping of identical scalar constants (
a + a → 2*a) and identical sub‑expressions (A + A → 2*A,A * A → A^2) - Distributivity:
a*b + a*c → a*(b+c)when common factors are detected - Cancellation:
x + NEG(x) → 0,x * RECIP(x) → 1 - Inverse functions:
NEG(NEG(x)) → x,EXP(LOG(x)) → x,LOG(EXP(x)) → x - Power simplifications:
x^0 → 1,x^1 → x,1^y → 1,(x^a)^b → x^(a*b)for integer exponents
Simplification is not a default — it must be requested or is triggered implicitly by eval(). The destructive eval_inplace(true) path skips it entirely, avoiding its cost for unstructured sums where it provides no benefit. Benchmarks show that simplification can reduce evaluation time by 10–1000× on structured expressions, but is a net loss for flat sums.
1.6 Evaluation and Pyramidal Compact Reduction
Dirty tree evaluation uses evaluate_tree(), which performs post‑order traversal with intermediate result caching. For SUM nodes, pyramidal compact reduction (PCR) groups terms into batches of 32 and reduces them hierarchically. This minimises intermediate rational growth by ensuring that additions occur in a balanced tree rather than a linear chain.
eval_inplace(true)uses destructive PCR: it moves leaf values out of the node, modifies them in place, and produces a single constant result, replacing the dirty tree with a cleanCONSTnodeeval()copies the necessary data and preserves the original tree
PCR with batch size 32 was chosen because it keeps the results of summing 32 random fractions within 128 bits for typical inputs, allowing the stack‑allocated small‑integer path of Boost's backend to dominate.
2. Transcendental Functions — Implementation and Mutual Consistency
All elementary transcendental functions are provided in both eager (returning Rational) and lazy (creating LazyRational nodes) forms. They accept an explicit absolute error bound ε (default 1e‑30).
2.1 Hybrid Evaluation Strategy
A function‑by‑function decision was made on when to use a floating‑point path versus a pure rational series path, based on extensive benchmarks:
- Float path (
cpp_dec_float_100): Used forsin,cos,exp,π,acos,asin,atan,tanwhen ε ≥ 1e‑35. The float path is 2–3× faster at moderate precision but loses relative accuracy for large arguments or very small ε - Series path (pure rational): Used for
sqrt,log,eat all precisions; used for all functions when ε < 1e‑35. Forexp, the series path is also forced when |x| > 20 because the float path's mantissa cannot preserve relative accuracy at extreme magnitudes
2.2 Specific Implementations
-
Square root: Newton's method with an integer floor‑sqrt initial guess (
isqrt(num*den) / den). This yields compact rationals (a few hundred bits for typical inputs) and avoids the representation bloat that plagued earlier versions usingx/2as the initial guess. The integer root is extracted via a fast Newton method ondumb_int, with a preliminary mod‑256 filter for quick rejection of non‑squares -
Exponential: Scaling‑and‑squaring combined with a Padé approximant, with the order chosen adaptively based on ε. Argument reduction multiplies ε by
2^(exp_bits + k + 2)to guarantee that, after squaring, the absolute error remains within tolerance. The reduction threshold is 2.0 — lowering it to 1.0 was empirically shown to produce bloated intermediate fractions that cripple downstream operations (log(exp(x))became catastrophically slow) -
Logarithm: Argument reduction to [1/2, 2] via
k·ln(2), then the arctanh seriesln((1+y)/(1-y)) = 2(y + y³/3 + y⁵/5 + …)withy = (m‑1)/(m+1).ln(2)is computed once and cached -
Sine and Cosine: Binary splitting of the Maclaurin series with exact rational reduction modulo 2π. The binary splitting formula:
P, Q, T = merge((P_L, Q_L, T_L), (P_R, Q_R, T_R)) P = P_L * P_R Q = Q_L * Q_R T = T_L * Q_R + P_L * T_Rprevents intermediate rational swell far more effectively than iterative term‑by‑term summation
-
π (Pi): Chudnovsky algorithm with binary splitting, providing approximately 14.18 decimal digits per term. For small N (< 16), a recurrence is used; for larger N, full binary splitting is employed. The result is cached per ε in a thread‑local map
-
Inverse trigonometric functions:
asinuses binary splitting;acos(x) = π/2 − asin(x);atanuses binary splitting with reduction to |x| ≤ 0.5 viaatan(x) = π/4 + atan((x‑1)/(x+1))andatan(x) = π/2 − atan(1/x).tanissin/cos -
Power function: Integer exponents use binary exponentiation. Fractional exponents use
exp(exp * log(base))with appropriately scaled ε. Exact integer roots are detected viatry_exact_nth_root()before falling back to the general series path
2.3 Mutual Consistency
The complexity of ensuring that all transcendental functions are consistent with each other cannot be overstated. The problem is that each function computes its result to an absolute tolerance ε, but the size of the rational representation determines the cost of all subsequent operations.
Key measures taken:
- The
πcache ensures thatsin,cos, and trigonometry‑based argument reductions use a single, consistent value of π for a given ε. This prevents contradiction between, e.g.,sin(π) ≈ 0andcos(π/2) ≈ 0— if π were recomputed with slightly different accuracy, one would deviate - The reduction threshold in
exp(2.0) preventsexp(log(x))from producing bloated intermediate fractions that make the identityexp(log(x)) = xcomputationally unverifiable at high precision - The integer‑sqrt initial guess in
sqrtprevents the size ofsqrt(2)from inflating when it is used as input toπ,sin, orcos. This was the single most impactful optimisation for the mutual consistency of the transcendental suite: a bloatedsqrt(2)would silently degrade the Chudnovsky algorithm and the sin/cos binary splitting, causing tests likePiSinConsistencyandPiCosConsistencyto time out
All of these choices were validated by running the full correctness test suite (transcendentals_correctness.cpp) which checks fundamental identities (sin² + cos² = 1, exp(log(x)) = x, sqrt(x)² = x, cos(acos(x)) = x, acos(x) + asin(x) = π/2) for multiple arguments across ε from 1e‑12 to 1e‑100.
3. Discrete Exterior Calculus and Numerical Operators
The geometry and numerical modules have been integrated and tested.
3.1 Simplicial Complexes
SimplicialComplex<Dim, Coord> stores vertices and simplices up to dimension Dim. Incidence relations (incident_faces) are verified to produce correct signs following the (−1)^i convention. Barycentric subdivision is implemented for 2D with a subdivision map linking coarse simplices to their fine covers.
3.2 Barycentric Dual Complex
DualComplex builds the barycentric dual for 2D triangulations and 3D tetrahedralisations. Dual volumes are computed analytically. The implementation handles boundary edges/faces correctly (dual cells extend to the boundary midpoint/barycentre). Tests verify bijection of primal‑dual mappings and exact sum of dual volumes equalling the mesh volume.
3.3 Discrete Forms (DEC)
DiscreteForm<k, Value, Complex> implements:
- Exterior derivative
dvia incidence matrices with exact signs - Diagonal Hodge star
⋆using dual and primal volumes, with distinct formulations for k=0, 0<k<n, and k=n - Codifferential
δ = (−1)^{n(k−1)+1} ⋆ d ⋆ - Hodge Laplacian
Δ = dδ + δd - Wedge product
∧for 1‑forms in 2D
All algebraic identities (d∘d = 0, constant‑in‑kernel, antisymmetry of ∧, integral preservation of ⋆) hold exactly in rational arithmetic. The test suite explicitly does not demand that the DEC Laplacian match the cotangent Laplacian, because this identity holds only for the circumcentric (Voronoi) dual; this architectural decision is documented in both the header file and the test file.
3.4 Finite‑Difference Operators
discrete_operators.h provides gradient, divergence, curl (2D and 3D), and Laplacian on UniformGrid, ListGrid, and ProductGrid<Grid, N> up to 4 dimensions. All operators:
- Are metric‑aware and support forward, backward, and central differences with fallback at boundaries
- Produce
TensorFieldobjects keyed by the grid's address type - Are parallelised with OpenMP when the number of grid points exceeds 1000
- Are tested for exactness on polynomials and for second‑order convergence under refinement
3.5 Cotangent Laplacian
build_cotangent_laplacian constructs the standard cotangent‑weight Laplacian for 2D triangle meshes using exact rational arithmetic. Symmetry, row‑sum zero, constant‑kernel, and analytically derived values on a symmetric square mesh are verified.
4. Adaptive Refinement
AdaptiveDeltaPath implements a priority‑queue‑driven refinement process. Intervals are prioritised by deviation from linearity (computed via the value metric). The path supports:
- Construction from arbitrary initial points or from a uniform path (via
from_uniform) - Incremental update of the global maximum oscillation to avoid O(N) recomputation
- Configurable threshold; the queue naturally empties when no interval exceeds it
Benchmarks demonstrate the power of adaptivity:
| Function | ε | Uniform time | Adaptive time | Speedup |
|---|---|---|---|---|
abs(x-0.5) (kink) |
1e‑4 | 98 ms | 96 µs | 1021× |
exp(-1000*(x-0.5)^2) (narrow peak) |
1e‑4 | 5.3 s | 8.6 ms | 618× |
abs(x-0.25) + abs(x-0.75) (two kinks) |
1e‑4 | 389 ms | 127 µs | 3065× |
sin(100*pi*x) (uniform oscillations) |
1e‑4 | 9.3 µs | 11.5 s | uniform wins |
5. Test Suite and Documentation
- ~45 test files, over 400 test cases. All tests pass. The test suite code volume is comparable to the library headers and covers exact invariants, edge cases, and cross‑component integration
- Human‑written documentation: User Manual, Architecture Overview, Benchmark Results, Optimal Coding Guidelines, Test Coverage Report
- Auto‑generated Doxygen API reference available via
cmake --build . --target doxygen
6. Performance Summary
Detailed methodology and interpretation in docs/benchmark_results.md.
| Scenario | v0.2 Performance |
|---|---|
| Harmonic series (50k terms) | 487 ms (lazy) vs 2860 ms (Boost eager) – 5.9× faster |
| Random rationals (500k terms) | 794 ms (lazy) vs 1847 ms (Boost eager) – 2.33× faster |
| sin(1.23…) at ε=1e‑80 | 247 µs (Delta) vs 1646 µs (naive series) – 6.7× faster |
| π (cached) at ε=1e‑80 | 2 µs (Delta) vs 547 µs (naive series) – 273× faster |
| Adaptive vs uniform (kink, ε=1e‑4) | 96 µs vs 98 ms – 1021× faster |
All benchmarks are preceded by a correctness verification that the computed values match independent reference implementations.
7. Breaking Changes from v0.1
- License: Changed from CC‑BY‑NC‑SA 4.0 / Commercial dual licensing to PolyForm Small Business License 1.0.0
- Rational header structure:
rational.his now a thin aggregation header; internal headers are indelta/rational/. Code relying on internal includes may need updates Valuetype: The internal representation changed; direct manipulation of the oldSmallStorageis no longer possible. Use the publicRationalinterface
8. Future Work (v0.3)
- Symbolic differentiation on
LazyRationaltrees - Generalised DEC with circumcentric duals, 3D wedge products, and N‑forms
- Template‑based PDE solvers decoupled from discretisation
- Further performance optimisations
9. Getting Started
cmake --preset x64-release
cmake --build out/build/x64-release
cd out/build/x64-release && ctest --output-on-failureRead the User Manual and browse the test files for usage examples.
This release transforms the Δ‑analysis library from a research prototype into a stable, high‑performance, and thoroughly documented platform for exact constructive computation. We hope it serves the community well.
— Timofey Ishimtsev, May 2026