compprov-core

compprov (Computational Provenance) is a Java framework that automatically builds a Calculation Provenance Graph (CPG) — a DAG that records every variable and every operation in a computation as it runs. The result is a complete, machine-readable audit trail of how each output was derived from its inputs.

Core concepts

Concept	Description
CPG	Directed Acyclic Graph where nodes are variables and edges are data-flow dependencies. Produced automatically during execution.
Snapshot	Immutable, point-in-time capture of all variables and operations recorded in a context. Can be serialized to JSON, replayed, or compared.
Descriptor	Name + optional metadata (`Meta`) attached to a variable or operation. Used in logs, diff reports, and audit trails.
VariableWrapper	Factory that converts a plain value into a provenance-tracked `WrappedVariable` and registers it in the active context.
ComputationEnvironment	Shared, thread-safe configuration: registered wrappers, clock, Jackson mapper, descriptor enforcement rules.
ComputationContext	Per-computation scope that accumulates the CPG. Not safe to snapshot while mutating.

Getting started

Add the dependency to your pom.xml (latest version: ):

<dependency>
    <groupId>io.compprov</groupId>
    <artifactId>compprov-core</artifactId>
    <version>VERSION</version>
</dependency>

Requires Java 17+.

Usage example

The entry point is DefaultComputationEnvironment (preconfigured with all built-in wrappers and Jackson serializers) and DefaultComputationContext (typed convenience wrappers on top of the base context).

import io.compprov.core.*;
import io.compprov.core.meta.Descriptor;
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;

// --- 1. Create the environment (thread-safe; reuse across computations) ---
var env = new DefaultComputationEnvironment();

// --- 2. Create a context for this computation run ---
var ctx = new DefaultComputationContext(
        env,
        new DataContext(Descriptor.descriptor("invoice-calculation")));

// --- 3. Wrap all inputs ---
// Every wrapped value gets a unique ID and is recorded in the CPG as an INPUT node.
var mc      = ctx.wrapMathContext(new MathContext(10, RoundingMode.HALF_UP),
                                  Descriptor.descriptor("mc"));
var price   = ctx.wrapBigDecimal(new BigDecimal("100.00"),
                                 Descriptor.descriptor("price"));
var taxRate = ctx.wrapBigDecimal(new BigDecimal("0.08"),
                                 Descriptor.descriptor("tax-rate"));

// --- 4. Perform operations ---
// Each call records an operation node in the CPG and returns a wrapped result.
// Pass null as the last argument to let the framework auto-name the result.
var tax   = price.multiply(taxRate, mc, Descriptor.descriptor("tax"));
var total = price.add(tax, mc, Descriptor.descriptor("total"));

// --- 5. Read the result like any other value ---
System.out.println(total.getValue()); // 108.0000000

// --- 6. Export the full Calculation Provenance Graph ---
Snapshot snapshot = ctx.snapshot();
System.out.println(env.toJson(snapshot));
System.out.println(env.toHumanReadableLog(snapshot));

JSON output (abbreviated)

{
  "descriptor" : { "name" : "invoice-calculation", "meta" : { } },
  "variables" : [
    { "track" : { "id" : "i_1", "descriptor" : { "name" : "mc" }, ... }, "value" : ... },
    { "track" : { "id" : "i_2", "descriptor" : { "name" : "price" }, ... }, "value" : "100.00" },
    { "track" : { "id" : "i_3", "descriptor" : { "name" : "tax-rate" }, ... }, "value" : "0.08" },
    { "track" : { "id" : "o_4", "descriptor" : { "name" : "tax" }, ... }, "value" : "8.000000000" },
    { "track" : { "id" : "o_5", "descriptor" : { "name" : "total" }, ... }, "value" : "108.0000000" }
  ],
  "operations" : [
    { "track" : { "id" : "o_1", "descriptor" : { "name" : "multiply" }, ... },
      "arguments" : { "a" : "i_2", "b" : "i_3", "mc" : "i_1" },
      "resultId" : "o_4" },
    { "track" : { "id" : "o_2", "descriptor" : { "name" : "add" }, ... },
      "arguments" : { "a" : "i_2", "b" : "o_4", "mc" : "i_1" },
      "resultId" : "o_5" }
  ]
}

Variable IDs use the prefix i_ for inputs and o_ for outputs, followed by a sequential numeric counter that is stable within a single context run.

Snapshot: export, replay, and diff

Serialize and deserialize

String json = env.toJson(ctx.snapshot());

// Deserialize back to a Snapshot
Snapshot restored = env.fromJson(json);

Replay a computation

env.compute() replays all recorded operations against the given snapshot, producing a new context with freshly computed outputs:

var replayed = env.compute(restored);
BigDecimal replayedTotal = (BigDecimal) replayed.getVariable("o_5").getValue();

Change an input and propagate

Use copyWith to substitute one or more input values, then replay:

Snapshot modified = env.copyWith(
        restored,
        Descriptor.descriptor("invoice-calculation-v2"),
        Map.of("i_3", new ValueWithDescriptor(
                Descriptor.descriptor("tax-rate"),
                new BigDecimal("0.10"))));  // 10% tax instead of 8%

var updated = env.compute(modified);
// updated.getVariable("o_5") now reflects the new total

Extending with custom type wrappers

Adding support for a type not built into the framework requires three things:

A Wrapped<Type> class that defines the tracked operations for your type.
A VariableWrapper<Type> factory that instantiates it.
Registering the factory with the environment.

Step 1 — Create the wrapped class

import io.compprov.core.ComputationContext;
import io.compprov.core.meta.Descriptor;
import io.compprov.core.variable.AbstractWrappedVariable;
import io.compprov.core.variable.VariableTrack;

import java.util.*;
import java.util.function.Function;

public final class WrappedLong extends AbstractWrappedVariable<Long> {

    // Define one Descriptor constant per operation.
    private static final Descriptor OP_ADD      = Descriptor.descriptor("add");
    private static final Descriptor OP_MULTIPLY = Descriptor.descriptor("multiply");

    // Map each Descriptor to a lambda that performs the actual computation.
    private static final Map<Descriptor, Function<List<Object>, Object>> FUNCTIONS;

    static {
        Map<Descriptor, Function<List<Object>, Object>> m = new HashMap<>();
        m.put(OP_ADD,      args -> (Long) args.get(0) + (Long) args.get(1));
        m.put(OP_MULTIPLY, args -> (Long) args.get(0) * (Long) args.get(1));
        FUNCTIONS = Collections.unmodifiableMap(m);
    }

    public WrappedLong(ComputationContext context, VariableTrack track, Long value) {
        super(context, track, value);
    }

    @Override
    public Function<List<Object>, Object> getFunction(Descriptor operationDescriptor) {
        return FUNCTIONS.get(operationDescriptor);
    }

    // --- Public API ---
    // Each operation comes in two overloads: with and without a result Descriptor.

    public WrappedLong add(WrappedLong augend, Descriptor resultDescriptor) {
        return (WrappedLong) execute(OP_ADD, "a", this, "b", augend, resultDescriptor);
    }

    public WrappedLong add(WrappedLong augend) {
        return add(augend, null);
    }

    public WrappedLong multiply(WrappedLong multiplicand, Descriptor resultDescriptor) {
        return (WrappedLong) execute(OP_MULTIPLY, "a", this, "b", multiplicand, resultDescriptor);
    }

    public WrappedLong multiply(WrappedLong multiplicand) {
        return multiply(multiplicand, null);
    }
}

Step 2 — Create the factory

import io.compprov.core.ComputationContext;
import io.compprov.core.variable.VariableTrack;
import io.compprov.core.variable.VariableWrapper;
import io.compprov.core.variable.WrappedVariable;

public final class LongWrapperFactory implements VariableWrapper<Long> {
    @Override
    public WrappedVariable wrap(ComputationContext context, VariableTrack track, Long value) {
        return new WrappedLong(context, track, value);
    }
}

Step 3 — Register and use

var env = new DefaultComputationEnvironment();
env.registerWrapper(Long.class, new LongWrapperFactory());

var ctx = new DefaultComputationContext(env,
        new DataContext(Descriptor.descriptor("my-computation")));

// Use the base wrap() method — DefaultComputationContext does not have a wrapLong() helper.
// Cast to your concrete type after wrapping.
var a = (WrappedLong) ctx.wrap(100L, Descriptor.descriptor("a"));
var b = (WrappedLong) ctx.wrap(42L,  Descriptor.descriptor("b"));
var sum = a.add(b, Descriptor.descriptor("sum"));

If you use a custom type frequently, extend DefaultComputationContext to add a typed wrapLong() convenience method, the same way DefaultComputationContext does for wrapBigDecimal, wrapBigInteger, etc.

Type deserialization

If you need to serialize and deserialize snapshots containing your custom type, register a Jackson deserializer with the ObjectMapper inside your custom ComputationEnvironment. See DefaultComputationEnvironment for examples using ZonedDateTimeSerializer, MathContextDeserializer, and VariableDeserializer.

Built-in types

DefaultComputationEnvironment registers the following wrappers out of the box:

Java type	Wrapped class	Notes
`BigDecimal`	`WrappedBigDecimal`	Full arithmetic: add, subtract, multiply, divide, pow, sqrt, abs, negate, remainder, max, min, and more
`BigInteger`	`WrappedBigInteger`	Full arithmetic including modPow (ternary)
`Integer`	`WrappedInteger`	Parameter-only type; used as an argument to `pow`, `scaleByPowerOfTen`, etc.
`Long`	`WrappedLong`	Parameter-only type
`MathContext`	`WrappedMathContext`	Carries precision and rounding mode; passed to most `BigDecimal` / `BigInteger` ops

Thread safety

ComputationEnvironment and its wrappers map are fully thread-safe — a single instance can be shared across threads and computations.

ComputationContext is thread-safe for all wrap and executeOperation calls. The snapshot() method is not safe to call while other threads are still recording operations into the same context.

Visualization

To visualize your CPG data use compprov-render — a set of HTML pages that run locally in your web browser, no server required.

Simply export a snapshot to JSON and open the page:

String json = env.toJson(ctx.snapshot());
// save to a file, then open graph.html or plot.html in your browser

Graph view — provenance graph

Renders the full CPG as an interactive node-edge graph. Variables are shown as typed nodes (input / output), operations as diamond nodes with labeled argument edges.

Plot view — multi-dataset comparison

Plots numeric variable values across one or more datasets side-by-side. Supports points, line, and table views with configurable X-axis labels.

Examples

The io.compprov.examples package contains three self-contained examples that each demonstrate a different aspect of the framework.

Example	Package	Domain	Key technique
Net Asset Value (NAV)	`io.compprov.examples.nav`	Crypto-portfolio accounting	Custom domain type wrappers
Gauge Block Calibration	`io.compprov.examples.gaugeblock`	Precision length metrology	Pure `BigDecimal` scalar formula chain
Hydrological Model Evaluation	`io.compprov.examples.hydrology`	River discharge modelling	List-based tracked operations

Net Asset Value (NAV)

io.compprov.examples.nav · NetAssetValueCalculator.calculate()

Computes the total USD value of a multi-asset crypto portfolio (BTC, ETH, USDC positions held across Binance, staking, and Morpho DeFi) by converting each position to USD at a spot rate and summing the results.

The primary focus is showing how to wrap custom domain types. The domain model uses Amount and Rate objects rather than raw BigDecimal, and the example integrates them with the framework without modifying them — using the three-step pattern:

WrappedAmount / WrappedRate extend AbstractWrappedVariable<T> and declare their operations (add, convert, addBulk) as Descriptor constants mapped to computation lambdas.
AmountWrapper / RateWrapper implement VariableWrapper<T> — the one-method factory the framework calls to instantiate tracked variables.
NavComputationContext extends DefaultComputationContext, registers both wrappers with the shared ComputationEnvironment, and exposes typed wrap(Amount, ...) / wrap(Rate, ...) convenience overloads.

After the calculation the snapshot is serialized to JSON, then deserialized and replayed via NavComputationContext.environment.compute() — verifying that the CPG is round-trip stable and the replayed output matches the original result.

Gauge Block Calibration

io.compprov.examples.gaugeblock · GaugeBlockCalibration.calibrate()

Reproduces the interferometric calibration of a 7 mm tungsten carbide gauge block (NRC 91A) from the following paper, which uses this measurement as a demonstration of metrological provenance management:

Ryan M. White, Provenance in the Context of Metrological Traceability, Metrology 2025, 5(3), 52. DOI: 10.3390/metrology5030052

The computation chain has three stages, all tracked in the CPG:

Refractive index — the Birch–Downs modified Ciddor equation (8 tracked steps) converts air temperature, pressure, relative humidity, CO₂ concentration, and saturation vapor pressure into the refractive index n of the measurement medium.
Interferometric length — the HeNe laser vacuum wavelength (632.99 nm) divided by n gives the air wavelength; the observed fringe order m + f gives the raw length L_raw = (m + f) × λ_air / 2.
Thermal correction — the raw length is corrected to the ISO 1 reference temperature (20 °C) using the tungsten carbide expansion coefficient α = 4.23 × 10⁻⁶ K⁻¹ from the paper: L_cal = L_raw / (1 + α × ΔT).

The deviation from the 7 mm nominal length is asserted to round to +2 nm, matching the paper's reported result (expanded uncertainty U = 31 nm, k = 2).

This example uses only built-in WrappedBigDecimal arithmetic — no custom wrappers needed — showing that the framework handles complex pure-scalar formula chains out of the box.

Hydrological Model Evaluation

io.compprov.examples.hydrology · MhmDischargeEvaluation.evaluateParameterSetP1()

Evaluates the mesoscale Hydrologic Model (mHM) output against observed river discharge at the Moselle River basin upstream of Perl (~11 500 km², Luxembourg/Germany), as described in:

Villamar et al., Archivist: a metadata management tool for facilitating FAIR research, Scientific Data, 2025. DOI: 10.1038/s41597-025-04521-6

The metric is the Kling-Gupta Efficiency (KGE) (Gupta et al. 2009):

KGE = 1 − √[ (r−1)² + (α−1)² + (β−1)² ]

  r = Pearson correlation = Σ(devObs · devSim) / √(Σ devObs² · Σ devSim²)
  α = variability ratio  = σ_sim / σ_obs
  β = bias ratio         = μ_sim / μ_obs

KGE = 1 is perfect; values below 0 indicate the model is worse than the observed mean as a predictor. The paper reports that parameter set P₁ outperforms P₂ with scores mostly above 0.5.

The computation uses ArrayList<WrappedBigDecimal> with loops and addBulk, demonstrating the pattern for list-based tracked operations where the number of time steps is dynamic. The 8-step chain (means → deviations → squared deviations and cross products → sums → r → α → β → KGE) is fully recorded in the CPG, with every intermediate quantity named and traceable. The synthetic dataset is engineered so that r = 1, β = 1, α = 0.9, giving KGE = 0.9 exactly, verified by exact BigDecimal equality.

License

Apache License 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

compprov-core

Contents

Core concepts

Getting started

Usage example

JSON output (abbreviated)

Snapshot: export, replay, and diff

Serialize and deserialize

Replay a computation

Change an input and propagate

Extending with custom type wrappers

Step 1 — Create the wrapped class

Step 2 — Create the factory

Step 3 — Register and use

Type deserialization

Built-in types

Thread safety

Visualization

Graph view — provenance graph

Plot view — multi-dataset comparison

Examples

Net Asset Value (NAV)

Gauge Block Calibration

Hydrological Model Evaluation

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

compprov-core

Contents

Core concepts

Getting started

Usage example

JSON output (abbreviated)

Snapshot: export, replay, and diff

Serialize and deserialize

Replay a computation

Change an input and propagate

Extending with custom type wrappers

Step 1 — Create the wrapped class

Step 2 — Create the factory

Step 3 — Register and use

Type deserialization

Built-in types

Thread safety

Visualization

Graph view — provenance graph

Plot view — multi-dataset comparison

Examples

Net Asset Value (NAV)

Gauge Block Calibration

Hydrological Model Evaluation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages