# Demonstration of Synthius

This notebook displays the core functionality of Synthius, wrapped in a high-level programming interface

Essentially, Synthius consists of two steps at this stage:
1. Generation of Synthetic data given generation models and original data.
2. Evaluation of the synthetic data with various metrics as output.

The highest-level API allows both steps to be done with one call though it is designed to be loosely coupled and each step can be executed individually.



### Step 1 - Install Synthius

Run the following command in the CLI: `pip install synthius`.

### Step 2 - Prepare Your Data

Ensure your original data is in its own directory and in a csv file.

e.g.

Data directory: `/data`

Data filename: `data.csv`

Data absolute path: `/data/data.csv`

### Step 3 - Prepare Your Models

Note that Synthius exposes an interface which allows you to utilize any model that follows the Python protocol in `synthius/model/synthesizer.py`:

```
@runtime_checkable
class Synthesizer(Protocol):
    """Any synthetic data model should implement this interface."""
    metadata: Optional[SingleTableMetadata]
    name: str  # unique name for saving/output

    def fit(self, train_data: pd.DataFrame) -> None:
        ...
    
    def generate(self, total_samples: int, conditions: list = None) -> pd.DataFrame:
        ...
```

For our purposes, we wrap several methods to implement the above interface.

In [1]:
from sdv.single_table import CopulaGANSynthesizer

from synthius.model import (
    ARFSynthesizer,
    SDVSynthesizer,
)

models = [
    SDVSynthesizer(CopulaGANSynthesizer),
    ARFSynthesizer(id_column=None),
]

2025-11-19 07:37:34.722537: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-11-19 07:37:35.837748: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-11-19 07:37:35.837835: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-19 07:37:37.022940: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Step 4 - Prepare Field Metadata

There are certain metadata that should be specified for the evaluation phase of Synthius. Notably, one must provide `key_fields`, `sensitive_fields`, and `aux_cols`.

These are:
 Description here

 

In [2]:
key_fields = [
    "Age",
    "Education",
    "Occupation",
    "Income",
    "Marital-status",
    "Native-country",
    "Relationship",
]

sensitive_fields = ["Race", "Sex"]


aux_cols = [
    ["Occupation", "Education", "Education-num", "Hours-per-week", "Capital-loss", "Capital-gain"],
    ["Race", "Sex", "Fnlwgt", "Age", "Native-country", "Workclass", "Marital-status", "Relationship"],
]

### Step 5 - Run Synthius

Now we simply call the API and pass in the appropriate caching directories. 

The synthetic data directory is specified by `synth_dir`.

The resulting metrics directory is specified by `results_dir`.

When no models are specified, the default models in Synthius are used.

Note the random seed only controls the train test split of the original data.

In [3]:
from synthius.api.high_level import run_synthius

run_synthius(
    original_data_filename="adult_subset.csv",
    data_dir="/storage/Synthius/examples/data",
    synth_dir="/storage/Synthius/examples/synthetic_data",
    models_dir="/storage/Synthius/examples/models",
    results_dir="/storage/Synthius/examples/metrics",
    target_column="Income",
    key_fields=key_fields,
    sensitive_fields=sensitive_fields,
    aux_cols=aux_cols,
    models=models,
    random_seed=42,
)

INFO:sdv.metadata.single_table:Detected metadata:
INFO:sdv.metadata.single_table:{
    "columns": {
        "Unnamed: 0": {
            "sdtype": "id"
        },
        "Age": {
            "sdtype": "numerical"
        },
        "Workclass": {
            "sdtype": "unknown",
            "pii": true
        },
        "Fnlwgt": {
            "sdtype": "unknown",
            "pii": true
        },
        "Education": {
            "sdtype": "unknown",
            "pii": true
        },
        "Education-num": {
            "sdtype": "numerical"
        },
        "Marital-status": {
            "sdtype": "unknown",
            "pii": true
        },
        "Occupation": {
            "sdtype": "unknown",
            "pii": true
        },
        "Relationship": {
            "sdtype": "unknown",
            "pii": true
        },
        "Race": {
            "sdtype": "categorical"
        },
        "Sex": {
            "sdtype": "categorical"
        },
        "Capital-gain":

Initial accuracy is 0.125


INFO:sdv.metadata.single_table:Detected metadata:
INFO:sdv.metadata.single_table:{
    "columns": {
        "Unnamed: 0": {
            "sdtype": "id"
        },
        "Age": {
            "sdtype": "numerical"
        },
        "Workclass": {
            "sdtype": "unknown",
            "pii": true
        },
        "Fnlwgt": {
            "sdtype": "unknown",
            "pii": true
        },
        "Education": {
            "sdtype": "unknown",
            "pii": true
        },
        "Education-num": {
            "sdtype": "numerical"
        },
        "Marital-status": {
            "sdtype": "unknown",
            "pii": true
        },
        "Occupation": {
            "sdtype": "unknown",
            "pii": true
        },
        "Relationship": {
            "sdtype": "unknown",
            "pii": true
        },
        "Race": {
            "sdtype": "categorical"
        },
        "Sex": {
            "sdtype": "categorical"
        },
        "Capital-gain":