# Demonstration of Synthius

This notebook displays the core functionality of Synthius, wrapped in a high-level programming interface

Essentially, Synthius consists of two steps at this stage:
1. Generation of Synthetic data given generation models and original data.
2. Evaluation of the synthetic data with various metrics as output.

The highest-level API allows both steps to be done with one call though it is designed to be loosely coupled and each step can be executed individually.



### Step 1 - Install Synthius

Run the following command in the CLI: `pip install synthius`.

### Step 2 - Prepare Your Data

Ensure your original data is in its own directory and in a csv file.

e.g.

Data directory: `/data`

Data filename: `data.csv`

Data absolute path: `/data/data.csv`

### Step 3 - Prepare Your Models

Note that Synthius exposes an interface which allows you to utilize any model that follows the Python protocol in `synthius/model/synthesizer.py`:

```
@runtime_checkable
class Synthesizer(Protocol):
    """Any synthetic data model should implement this interface."""
    metadata: Optional[SingleTableMetadata]
    name: str  # unique name for saving/output

    def fit(self, train_data: pd.DataFrame) -> None:
        ...
    
    def generate(self, total_samples: int, conditions: list = None) -> pd.DataFrame:
        ...
```

For our purposes, we wrap several methods to implement the above interface.

In [2]:
from synthius.model import (
    SDVSynthesizer,
    ARFSynthesizer,
)

from sdv.single_table import CopulaGANSynthesizer

models = [
    SDVSynthesizer(CopulaGANSynthesizer),
    ARFSynthesizer(id_column=None),
]

### Step 4 - Prepare Field Metadata

There are certain metadata that should be specified for the evaluation phase of Synthius. Notably, one must provide `key_fields`, `sensitive_fields`, and `aux_cols`.

These are:
 Description here

 

In [3]:
key_fields = [
    "Age",
    "Education",
    "Occupation",
    "Income",
    "Marital-status",
    "Native-country",
    "Relationship",
]

sensitive_fields = ["Race", "Sex"]


aux_cols = [
    ["Occupation", "Education", "Education-num", "Hours-per-week", "Capital-loss", "Capital-gain"],
    ["Race", "Sex", "Fnlwgt", "Age", "Native-country", "Workclass", "Marital-status", "Relationship"],
]

### Step 5 - Run Synthius

Now we simply call the API and pass in the appropriate caching directories. 

The synthetic data directory is specified by `synth_dir`.

The resulting metrics directory is specified by `results_dir`.

When no models are specified, the default models in Synthius are used.

Note the random seed only controls the train test split of the original data.

In [5]:
from synthius.api.high_level import run_synthius

run_synthius(
    original_data_filename="adult_subset.csv",
    data_dir="/storage/Synthius/examples/data",
    synth_dir="/storage/Synthius/examples/synthetic_data",
    models_dir="/storage/Synthius/examples/models",
    results_dir="/storage/Synthius/examples/metrics",
    target_column="Income",
    key_fields=key_fields,
    sensitive_fields=sensitive_fields,
    aux_cols=aux_cols,
    models=models,
    random_seed=42
)

INFO:SingleTableSynthesizer:{'EVENT': 'Instance', 'TIMESTAMP': datetime.datetime(2025, 11, 18, 15, 20, 13, 899673), 'SYNTHESIZER CLASS NAME': 'CopulaGANSynthesizer', 'SYNTHESIZER ID': 'CopulaGANSynthesizer_1.16.1_179088bcbac746a2acbb4e8f00ba53c7'}
INFO:SingleTableSynthesizer:{'EVENT': 'Fit', 'TIMESTAMP': datetime.datetime(2025, 11, 18, 15, 20, 13, 900934), 'SYNTHESIZER CLASS NAME': 'CopulaGANSynthesizer', 'SYNTHESIZER ID': 'CopulaGANSynthesizer_1.16.1_179088bcbac746a2acbb4e8f00ba53c7', 'TOTAL NUMBER OF TABLES': 1, 'TOTAL NUMBER OF ROWS': 16, 'TOTAL NUMBER OF COLUMNS': 16}
INFO:sdv.data_processing.data_processor:Fitting table  metadata
INFO:sdv.data_processing.data_processor:Fitting formatters for table 
INFO:sdv.data_processing.data_processor:Fitting constraints for table 
INFO:sdv.data_processing.data_processor:Setting the configuration for the ``HyperTransformer`` for table 
INFO:sdv.data_processing.data_processor:Fitting HyperTransformer for table 
INFO:SingleTableSynthesizer:{'EVEN

[Info] Generating Synthetic Data


INFO:rdt.transformers.null:Guidance: There are no missing values in column Hours-per-week. Extra column not created.
INFO:rdt.transformers.null:Guidance: There are no missing values in column Age. Extra column not created.
INFO:rdt.transformers.null:Guidance: There are no missing values in column Education-num. Extra column not created.
INFO:rdt.transformers.null:Guidance: There are no missing values in column Hours-per-week. Extra column not created.
Sampling conditions:  12%|█▎        | 2/16 [00:00<00:01,  8.21it/s]INFO:sdv.single_table.base:2 valid rows remaining. Resampling 4 rows
Sampling conditions:  62%|██████▎   | 10/16 [00:00<00:00, 14.91it/s]INFO:sdv.single_table.base:6 valid rows remaining. Resampling 12 rows
Sampling conditions: 100%|██████████| 16/16 [00:01<00:00, 15.07it/s]
INFO:sdv.metadata.single_table:Detected metadata:
INFO:sdv.metadata.single_table:{
    "METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
    "primary_key": "Unnamed: 0",
    "columns": {
        "Unnamed: 0"

[Info] Model CopulaGANSynthesizer finished. Saved to /storage/Synthius/examples/synthetic_data/CopulaGANSynthesizer.csv
[Error] ARFSynthesizer: name 'ARF' is not defined
[Info] Evaluating Synthetic Data
Getting metrics for the original dataset only.


ERROR:root:Evaluation failed for Original.csv
Traceback (most recent call last):
  File "/storage/Synthius/synthius/metric/basic_quality.py", line 247, in evaluate_all
    result = self.evaluate(path)
             ^^^^^^^^^^^^^^^^^^^
  File "/storage/Synthius/synthius/metric/basic_quality.py", line 216, in evaluate
    results["New Row Synthesis"] = self.evaluate_new_row(synthetic_data)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/storage/Synthius/synthius/metric/basic_quality.py", line 164, in evaluate_new_row
    for i in range(0, len(synthetic_data), chunk_size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: range() arg 3 must not be zero
ERROR:root:run_basic_quality_metrics skipped due to No valid metrics found in the results. Check the selected metrics.
INFO:synthius.metric.advanced_quality:Advanced Quality for Original Done.
INFO:root:Likelihood for Original Done.
INFO:root:Privacy Against Inference for Original Done.
INFO: