# VSDF End-to-End Walkthrough

This notebook demonstrates how to learn constraints from a tabular dataset, generate synthetic samples, and verify fidelity and privacy guarantees using the Verifiable Synthetic Data Forge (VSDF).

In [None]:
import pandas as pd
from vsdf import (
    SchemaLearner,
    ConstraintSpecification,
    ConstraintCompiler,
    ConstraintDrivenSampler,
    ConstraintVerifier,
)

pd.options.display.float_format = '{:.2f}'.format

In [None]:
# Create a small reference dataset
reference = pd.DataFrame({
    'age': [25, 32, 40, 28, 36, 52, 47, 30, 45, 38],
    'income': [50000, 62000, 58000, 52000, 61000, 75000, 68000, 54000, 72000, 59000],
    'segment': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'B', 'A', 'B'],
    'city': ['Denver', 'Denver', 'Boulder', 'Denver', 'Boulder', 'Denver', 'Boulder', 'Denver', 'Boulder', 'Denver'],
})
reference

In [None]:
# Learn schema metadata and compile constraints
schema = SchemaLearner().learn(reference)
spec = ConstraintSpecification(
    marginal_columns=['segment', 'city'],
    correlation_pairs=[('age', 'income')],
    marginal_tolerance=0.1,
    correlation_tolerance=0.1,
    denial_predicates=['age < 21 and income > 60000'],
    denial_tolerance=0.0,
    dp_epsilon=8.0,
)
compiler = ConstraintCompiler(schema)
constraints = compiler.learn(reference, spec)
constraints.dp_epsilon

In [None]:
# Generate synthetic data under the compiled constraints
sampler = ConstraintDrivenSampler(schema, constraints, random_state=42)
synthetic = sampler.sample(200)
synthetic.head()

In [None]:
# Verify fidelity and privacy metrics
verifier = ConstraintVerifier(constraints, privacy_threshold=0.05)
report = verifier.verify(synthetic, reference)
report.to_dict()

The verifier reports constraint adherence metrics, including marginal distribution distances, correlation deltas, denial-constraint violation rates, and the estimated privacy risk (re-identification propensity). Adjust the tolerances or DP budget to tighten or loosen the acceptance criteria.