# Tutorial 2: Benchmarks

`synthcity` support comparing the performance of multiple models on some data.

In this tutorial, we will cover how to run benchmarks and which metrics are available.

The available metrics can change on different data loaders or task types.

In [1]:
!pip install synthcity
!pip uninstall -y torchaudio torchdata

Collecting torchdata==0.7.1 (from torchtext>=0.10->decaf-synthetic-data>=0.1.6->synthcity)
  Using cached torchdata-0.7.1-cp39-cp39-win_amd64.whl.metadata (13 kB)
Using cached torchdata-0.7.1-cp39-cp39-win_amd64.whl (1.3 MB)
Installing collected packages: torchdata
Successfully installed torchdata-0.7.1
Found existing installation: torchdata 0.7.1
Uninstalling torchdata-0.7.1:
  Successfully uninstalled torchdata-0.7.1




In [2]:
# stdlib
import sys
import warnings

warnings.filterwarnings("ignore")

# third party
from sklearn.datasets import load_iris

# synthcity absolute
import synthcity.logger as log
from synthcity.plugins import Plugins
from synthcity.plugins.core.dataloader import GenericDataLoader

X, y = load_iris(return_X_y=True, as_frame=True)
X["target"] = y

loader = GenericDataLoader(X, target_column="target", sensitive_columns=[])

loader.dataframe()



Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


## List the available generative models

In [3]:
# synthcity absolute
from synthcity.plugins import Plugins

plugins = Plugins().list()

plugins

[2024-11-24T20:32:45.787282+0800][1508][CRITICAL] module disabled: e:\qycache\anaconda\envs\LLM\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py


['dpgan',
 'adsgan',
 'marginal_distributions',
 'bayesian_network',
 'survival_gan',
 'timegan',
 'privbayes',
 'decaf',
 'survae',
 'fflows',
 'survival_ctgan',
 'survival_nflow',
 'ddpm',
 'ctgan',
 'uniform_sampler',
 'nflow',
 'timevae',
 'aim',
 'great',
 'image_adsgan',
 'image_cgan',
 'tvae',
 'pategan',
 'arf',
 'radialgan',
 'rtvae',
 'dummy_sampler']

## Benchmarking metrics

| **Metric**                                         | **Description**                                                                                                            |
|----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| sanity.data\_mismatch.score                        | Data types mismatch between the real//synthetic features                                                                   |
| sanity.common\_rows\_proportion.score              | Real data copy-paste in the synthetic data                                                                                 |
| sanity.nearest\_syn\_neighbor\_distance.mean       | Computes the \textless{}reduction\textgreater{}(distance) from the real data to the closest neighbor in the synthetic data |
| sanity.close\_values\_probability.score            | the probability of close values between the real and synthetic data.                                                       |
| sanity.distant\_values\_probability.score          | the probability of distant values between the real and synthetic data.                                                     |
| stats.jensenshannon\_dist.marginal                 | the average Jensen-Shannon distance                                                                                        |
| stats.chi\_squared\_test.marginal                  | the one-way chi-square test.                                                                                               |
| stats.feature\_corr.joint                          | the correlation/strength-of-association of features in data-set with both categorical and continuous features              |
| stats.inv\_kl\_divergence.marginal                 | the average inverse of the Kullback–Leibler Divergence metric.                                                             |
| stats.ks\_test.marginal                            | the Kolmogorov-Smirnov test for goodness of fit.                                                                           |
| stats.max\_mean\_discrepancy.joint                 | Empirical maximum mean discrepancy. The lower the result the more evidence that distributions are the same.                |
| stats.prdc.precision                               | precision between the two manifolds                                                                                        |
| stats.prdc.recall                                  | recall between the two manifolds                                                                                           |
| stats.prdc.density                                 | density between the two manifolds                                                                                          |
| stats.prdc.coverage                                | coverage between the two manifolds                                                                                         |
| stats.alpha\_precision.delta\_precision\_alpha\_OC | Delta precision                                                                                                            |
| stats.alpha\_precision.delta\_coverage\_beta\_OC   | Delta coverage                                                                                                             |
| stats.alpha\_precision.authenticity\_OC            | Authetnticity                                                                                                              |
| performance.linear\_model.gt.aucroc              | Train on real, test on the test real data using LogisticRegression: AUCROC                                                             |
| performance.linear\_model.syn\_id.aucroc         | Train on synthetic, test on the train real data using LogisticRegression: AUCROC                                                       |
| performance.linear\_model.syn\_ood.aucroc        | Train on synthetic, test on the test real data using LogisticRegression: AUCROC                                                        |
| performance.mlp.gt.aucroc                        | Train on real, test on the test real data using NN: AUCROC                                                                |
| performance.mlp.syn\_id.aucroc                    | Train on synthetic, test on the train real data using NN: AUCROC                                                          |
| performance.mlp.syn\_ood.aucroc                   | Train on synthetic, test on the test real data using NN: AUCROC                                                           |
| performance.xgb.gt.aucroc                         | Train on real, test on the test real data using XGB: AUCROC                                                               |
| performance.xgb.syn\_id.aucroc                    | Train on synthetic, test on the train real data using XGB: AUCROC                                                         |
| performance.xgb.syn\_ood.aucroc                   | Train on synthetic, test on the test real data using XGB: AUCROC                                                          |
| performance.feat\_rank\_distance.corr              | Correlation for the rank distances between the feature importance on real and synthetic data                               |
| performance.feat\_rank\_distance.pvalue            | p-vale for the rank distances between the feature importance on real and synthetic data                                    |
| detection.detection\_xgb.mean                      | The average AUCROC score for detecting synthetic data using an XGBoost.                                                    |
| detection.detection\_mlp.mean                      | The average AUCROC score for detecting synthetic data using a NN.                                                          |
| detection.detection\_gmm.mean                      | The average AUCROC score for detecting synthetic data using a GMM.                                                         |
| privacy.delta-presence.score                       | the maximum re-identification probability on the real dataset from the synthetic dataset.                                  |
| privacy.k-anonymization.gt                         | the k-anon for the real data                                                                                               |
| privacy.k-anonymization.syn                        | the k-anon for the synthetic data                                                                                          |
| privacy.k-map.score                                | the minimum value k that satisfies the k-map rule.                                                                         |
| privacy.distinct l-diversity.gt                    | the l-diversity for the real data                                                                                          |
| privacy.distinct l-diversity.syn                   | the l-diversity for the synthetic data                                                                                     |
| privacy.identifiability\_score.score               | the re-identification score on the real dataset from the synthetic dataset.                                                |

## Benchmark the quality of plugins

In [4]:
# synthcity absolute
from synthcity.benchmark import Benchmarks

score = Benchmarks.evaluate(
    [("uniform_sampler", "uniform_sampler", {})],
    loader,
    synthetic_size=len(X),
    repeats=1,
)

In [5]:
Benchmarks.print(score)


[4m[1mPlugin : uniform_sampler[0m[0m


Unnamed: 0,min,max,mean,stddev,median,iqr,rounds,errors,durations
sanity.data_mismatch.score,0.166667,0.166667,0.166667,0.0,0.166667,0.0,1,0,0.0
sanity.common_rows_proportion.score,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0.0
sanity.nearest_syn_neighbor_distance.mean,0.52656,0.52656,0.52656,0.0,0.52656,0.0,1,0,0.0
sanity.close_values_probability.score,0.2,0.2,0.2,0.0,0.2,0.0,1,0,0.0
sanity.distant_values_probability.score,0.233333,0.233333,0.233333,0.0,0.233333,0.0,1,0,0.0
stats.jensenshannon_dist.marginal,0.031286,0.031286,0.031286,0.0,0.031286,0.0,1,0,0.0
stats.chi_squared_test.marginal,0.795143,0.795143,0.795143,0.0,0.795143,0.0,1,0,0.0
stats.inv_kl_divergence.marginal,0.652426,0.652426,0.652426,0.0,0.652426,0.0,1,0,0.0
stats.ks_test.marginal,0.813333,0.813333,0.813333,0.0,0.813333,0.0,1,0,0.0
stats.max_mean_discrepancy.joint,0.20307,0.20307,0.20307,0.0,0.20307,0.0,1,0,0.0





In [6]:
Benchmarks.highlight(score)

Unnamed: 0,uniform_sampler
sanity.data_mismatch.score,0.166667
sanity.common_rows_proportion.score,0.0
sanity.nearest_syn_neighbor_distance.mean,0.52656
sanity.close_values_probability.score,0.2
sanity.distant_values_probability.score,0.233333
stats.jensenshannon_dist.marginal,0.031286
stats.chi_squared_test.marginal,0.795143
stats.inv_kl_divergence.marginal,0.652426
stats.ks_test.marginal,0.813333
stats.max_mean_discrepancy.joint,0.20307


## Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

### Star [Synthcity](https://github.com/vanderschaarlab/synthcity) on GitHub

- The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.


### Checkout other projects from vanderschaarlab
- [HyperImpute](https://github.com/vanderschaarlab/hyperimpute)
- [AutoPrognosis](https://github.com/vanderschaarlab/autoprognosis)
