compprov (Computational Provenance) is a Java framework that automatically builds a Calculation Provenance Graph (CPG) — a DAG that records every variable and every operation in a computation as it runs. The result is a complete, machine-readable audit trail of how each output was derived from its inputs.
- Core concepts
- Getting started
- Usage example
- Snapshot: export, replay, and diff
- Extending with custom type wrappers
- Built-in types
- Thread safety
- Visualization
- Examples
- License
| Concept | Description |
|---|---|
| CPG | Directed Acyclic Graph where nodes are variables and edges are data-flow dependencies. Produced automatically during execution. |
| Snapshot | Immutable, point-in-time capture of all variables and operations recorded in a context. Can be serialized to JSON, replayed, or compared. |
| Descriptor | Name + optional metadata (Meta) attached to a variable or operation. Used in logs, diff reports, and audit trails. |
| VariableWrapper | Factory that converts a plain value into a provenance-tracked WrappedVariable and registers it in the active context. |
| ComputationEnvironment | Shared, thread-safe configuration: registered wrappers, clock, Jackson mapper, descriptor enforcement rules. |
| ComputationContext | Per-computation scope that accumulates the CPG. Not safe to snapshot while mutating. |
Add the dependency to your pom.xml (latest version: ):
<dependency>
<groupId>io.compprov</groupId>
<artifactId>compprov-core</artifactId>
<version>VERSION</version>
</dependency>Requires Java 17+.
The entry point is DefaultComputationEnvironment (preconfigured with all built-in wrappers
and Jackson serializers) and DefaultComputationContext (typed convenience wrappers on top
of the base context).
import io.compprov.core.*;
import io.compprov.core.meta.Descriptor;
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
// --- 1. Create the environment (thread-safe; reuse across computations) ---
var env = new DefaultComputationEnvironment();
// --- 2. Create a context for this computation run ---
var ctx = new DefaultComputationContext(
env,
new DataContext(Descriptor.descriptor("invoice-calculation")));
// --- 3. Wrap all inputs ---
// Every wrapped value gets a unique ID and is recorded in the CPG as an INPUT node.
var mc = ctx.wrapMathContext(new MathContext(10, RoundingMode.HALF_UP),
Descriptor.descriptor("mc"));
var price = ctx.wrapBigDecimal(new BigDecimal("100.00"),
Descriptor.descriptor("price"));
var taxRate = ctx.wrapBigDecimal(new BigDecimal("0.08"),
Descriptor.descriptor("tax-rate"));
// --- 4. Perform operations ---
// Each call records an operation node in the CPG and returns a wrapped result.
// Pass null as the last argument to let the framework auto-name the result.
var tax = price.multiply(taxRate, mc, Descriptor.descriptor("tax"));
var total = price.add(tax, mc, Descriptor.descriptor("total"));
// --- 5. Read the result like any other value ---
System.out.println(total.getValue()); // 108.0000000
// --- 6. Export the full Calculation Provenance Graph ---
Snapshot snapshot = ctx.snapshot();
System.out.println(env.toJson(snapshot));
System.out.println(env.toHumanReadableLog(snapshot));{
"descriptor" : { "name" : "invoice-calculation", "meta" : { } },
"variables" : [
{ "track" : { "id" : "i_1", "descriptor" : { "name" : "mc" }, ... }, "value" : ... },
{ "track" : { "id" : "i_2", "descriptor" : { "name" : "price" }, ... }, "value" : "100.00" },
{ "track" : { "id" : "i_3", "descriptor" : { "name" : "tax-rate" }, ... }, "value" : "0.08" },
{ "track" : { "id" : "o_4", "descriptor" : { "name" : "tax" }, ... }, "value" : "8.000000000" },
{ "track" : { "id" : "o_5", "descriptor" : { "name" : "total" }, ... }, "value" : "108.0000000" }
],
"operations" : [
{ "track" : { "id" : "o_1", "descriptor" : { "name" : "multiply" }, ... },
"arguments" : { "a" : "i_2", "b" : "i_3", "mc" : "i_1" },
"resultId" : "o_4" },
{ "track" : { "id" : "o_2", "descriptor" : { "name" : "add" }, ... },
"arguments" : { "a" : "i_2", "b" : "o_4", "mc" : "i_1" },
"resultId" : "o_5" }
]
}Variable IDs use the prefix i_ for inputs and o_ for outputs, followed by a sequential
numeric counter that is stable within a single context run.
String json = env.toJson(ctx.snapshot());
// Deserialize back to a Snapshot
Snapshot restored = env.fromJson(json);env.compute() replays all recorded operations against the given snapshot,
producing a new context with freshly computed outputs:
var replayed = env.compute(restored);
BigDecimal replayedTotal = (BigDecimal) replayed.getVariable("o_5").getValue();Use copyWith to substitute one or more input values, then replay:
Snapshot modified = env.copyWith(
restored,
Descriptor.descriptor("invoice-calculation-v2"),
Map.of("i_3", new ValueWithDescriptor(
Descriptor.descriptor("tax-rate"),
new BigDecimal("0.10")))); // 10% tax instead of 8%
var updated = env.compute(modified);
// updated.getVariable("o_5") now reflects the new totalAdding support for a type not built into the framework requires three things:
- A
Wrapped<Type>class that defines the tracked operations for your type. - A
VariableWrapper<Type>factory that instantiates it. - Registering the factory with the environment.
import io.compprov.core.ComputationContext;
import io.compprov.core.meta.Descriptor;
import io.compprov.core.variable.AbstractWrappedVariable;
import io.compprov.core.variable.VariableTrack;
import java.util.*;
import java.util.function.Function;
public final class WrappedLong extends AbstractWrappedVariable<Long> {
// Define one Descriptor constant per operation.
private static final Descriptor OP_ADD = Descriptor.descriptor("add");
private static final Descriptor OP_MULTIPLY = Descriptor.descriptor("multiply");
// Map each Descriptor to a lambda that performs the actual computation.
private static final Map<Descriptor, Function<List<Object>, Object>> FUNCTIONS;
static {
Map<Descriptor, Function<List<Object>, Object>> m = new HashMap<>();
m.put(OP_ADD, args -> (Long) args.get(0) + (Long) args.get(1));
m.put(OP_MULTIPLY, args -> (Long) args.get(0) * (Long) args.get(1));
FUNCTIONS = Collections.unmodifiableMap(m);
}
public WrappedLong(ComputationContext context, VariableTrack track, Long value) {
super(context, track, value);
}
@Override
public Function<List<Object>, Object> getFunction(Descriptor operationDescriptor) {
return FUNCTIONS.get(operationDescriptor);
}
// --- Public API ---
// Each operation comes in two overloads: with and without a result Descriptor.
public WrappedLong add(WrappedLong augend, Descriptor resultDescriptor) {
return (WrappedLong) execute(OP_ADD, "a", this, "b", augend, resultDescriptor);
}
public WrappedLong add(WrappedLong augend) {
return add(augend, null);
}
public WrappedLong multiply(WrappedLong multiplicand, Descriptor resultDescriptor) {
return (WrappedLong) execute(OP_MULTIPLY, "a", this, "b", multiplicand, resultDescriptor);
}
public WrappedLong multiply(WrappedLong multiplicand) {
return multiply(multiplicand, null);
}
}import io.compprov.core.ComputationContext;
import io.compprov.core.variable.VariableTrack;
import io.compprov.core.variable.VariableWrapper;
import io.compprov.core.variable.WrappedVariable;
public final class LongWrapperFactory implements VariableWrapper<Long> {
@Override
public WrappedVariable wrap(ComputationContext context, VariableTrack track, Long value) {
return new WrappedLong(context, track, value);
}
}var env = new DefaultComputationEnvironment();
env.registerWrapper(Long.class, new LongWrapperFactory());
var ctx = new DefaultComputationContext(env,
new DataContext(Descriptor.descriptor("my-computation")));
// Use the base wrap() method — DefaultComputationContext does not have a wrapLong() helper.
// Cast to your concrete type after wrapping.
var a = (WrappedLong) ctx.wrap(100L, Descriptor.descriptor("a"));
var b = (WrappedLong) ctx.wrap(42L, Descriptor.descriptor("b"));
var sum = a.add(b, Descriptor.descriptor("sum"));If you use a custom type frequently, extend DefaultComputationContext to add a typed
wrapLong() convenience method, the same way DefaultComputationContext does for
wrapBigDecimal, wrapBigInteger, etc.
If you need to serialize and deserialize snapshots containing your custom type, register a
Jackson deserializer with the ObjectMapper inside your custom ComputationEnvironment.
See DefaultComputationEnvironment for examples using ZonedDateTimeSerializer,
MathContextDeserializer, and VariableDeserializer.
DefaultComputationEnvironment registers the following wrappers out of the box:
| Java type | Wrapped class | Notes |
|---|---|---|
BigDecimal |
WrappedBigDecimal |
Full arithmetic: add, subtract, multiply, divide, pow, sqrt, abs, negate, remainder, max, min, and more |
BigInteger |
WrappedBigInteger |
Full arithmetic including modPow (ternary) |
Integer |
WrappedInteger |
Parameter-only type; used as an argument to pow, scaleByPowerOfTen, etc. |
Long |
WrappedLong |
Parameter-only type |
MathContext |
WrappedMathContext |
Carries precision and rounding mode; passed to most BigDecimal / BigInteger ops |
ComputationEnvironment and its wrappers map are fully thread-safe — a single instance can
be shared across threads and computations.
ComputationContext is thread-safe for all wrap and executeOperation calls. The snapshot()
method is not safe to call while other threads are still recording operations into the same
context.
To visualize your CPG data use compprov-render — a set of HTML pages that run locally in your web browser, no server required.
Simply export a snapshot to JSON and open the page:
String json = env.toJson(ctx.snapshot());
// save to a file, then open graph.html or plot.html in your browserRenders the full CPG as an interactive node-edge graph. Variables are shown as typed nodes (input / output), operations as diamond nodes with labeled argument edges.
Plots numeric variable values across one or more datasets side-by-side. Supports points, line, and table views with configurable X-axis labels.
The io.compprov.examples package contains three self-contained examples that each demonstrate
a different aspect of the framework.
| Example | Package | Domain | Key technique |
|---|---|---|---|
| Net Asset Value (NAV) | io.compprov.examples.nav |
Crypto-portfolio accounting | Custom domain type wrappers |
| Gauge Block Calibration | io.compprov.examples.gaugeblock |
Precision length metrology | Pure BigDecimal scalar formula chain |
| Hydrological Model Evaluation | io.compprov.examples.hydrology |
River discharge modelling | List-based tracked operations |
io.compprov.examples.nav · NetAssetValueCalculator.calculate()
Computes the total USD value of a multi-asset crypto portfolio (BTC, ETH, USDC positions held across Binance, staking, and Morpho DeFi) by converting each position to USD at a spot rate and summing the results.
The primary focus is showing how to wrap custom domain types. The domain model uses Amount
and Rate objects rather than raw BigDecimal, and the example integrates them with the
framework without modifying them — using the three-step pattern:
WrappedAmount/WrappedRateextendAbstractWrappedVariable<T>and declare their operations (add,convert,addBulk) asDescriptorconstants mapped to computation lambdas.AmountWrapper/RateWrapperimplementVariableWrapper<T>— the one-method factory the framework calls to instantiate tracked variables.NavComputationContextextendsDefaultComputationContext, registers both wrappers with the sharedComputationEnvironment, and exposes typedwrap(Amount, ...)/wrap(Rate, ...)convenience overloads.
After the calculation the snapshot is serialized to JSON, then deserialized and replayed via
NavComputationContext.environment.compute() — verifying that the CPG is round-trip stable and the
replayed output matches the original result.
io.compprov.examples.gaugeblock · GaugeBlockCalibration.calibrate()
Reproduces the interferometric calibration of a 7 mm tungsten carbide gauge block (NRC 91A) from the following paper, which uses this measurement as a demonstration of metrological provenance management:
Ryan M. White, Provenance in the Context of Metrological Traceability, Metrology 2025, 5(3), 52. DOI: 10.3390/metrology5030052
The computation chain has three stages, all tracked in the CPG:
- Refractive index — the Birch–Downs modified Ciddor equation (8 tracked steps) converts air temperature, pressure, relative humidity, CO₂ concentration, and saturation vapor pressure into the refractive index n of the measurement medium.
- Interferometric length — the HeNe laser vacuum wavelength (632.99 nm) divided by n
gives the air wavelength; the observed fringe order
m + fgives the raw lengthL_raw = (m + f) × λ_air / 2. - Thermal correction — the raw length is corrected to the ISO 1 reference temperature
(20 °C) using the tungsten carbide expansion coefficient α = 4.23 × 10⁻⁶ K⁻¹ from
the paper:
L_cal = L_raw / (1 + α × ΔT).
The deviation from the 7 mm nominal length is asserted to round to +2 nm, matching the paper's reported result (expanded uncertainty U = 31 nm, k = 2).
This example uses only built-in WrappedBigDecimal arithmetic — no custom wrappers needed —
showing that the framework handles complex pure-scalar formula chains out of the box.
io.compprov.examples.hydrology · MhmDischargeEvaluation.evaluateParameterSetP1()
Evaluates the mesoscale Hydrologic Model (mHM) output against observed river discharge at the Moselle River basin upstream of Perl (~11 500 km², Luxembourg/Germany), as described in:
Villamar et al., Archivist: a metadata management tool for facilitating FAIR research, Scientific Data, 2025. DOI: 10.1038/s41597-025-04521-6
The metric is the Kling-Gupta Efficiency (KGE) (Gupta et al. 2009):
KGE = 1 − √[ (r−1)² + (α−1)² + (β−1)² ]
r = Pearson correlation = Σ(devObs · devSim) / √(Σ devObs² · Σ devSim²)
α = variability ratio = σ_sim / σ_obs
β = bias ratio = μ_sim / μ_obs
KGE = 1 is perfect; values below 0 indicate the model is worse than the observed mean as a predictor. The paper reports that parameter set P₁ outperforms P₂ with scores mostly above 0.5.
The computation uses ArrayList<WrappedBigDecimal> with loops and addBulk, demonstrating the
pattern for list-based tracked operations where the number of time steps is dynamic.
The 8-step chain (means → deviations → squared deviations and cross products → sums → r → α
→ β → KGE) is fully recorded in the CPG, with every intermediate quantity named and traceable.
The synthetic dataset is engineered so that r = 1, β = 1, α = 0.9, giving KGE = 0.9 exactly,
verified by exact BigDecimal equality.
Apache License 2.0 — see LICENSE.

