# Tutorial: Quick start

**This tutorial provides a quick introduction to using pysdg.** It assumes that pysdg is already installed in a Conda environment, the environment has been activated from the shell, and this notebook is being run within that activated environment. For detailed instructions, please refer to the "pysdg" documentation.

The core functions in pysdg include: loading, training, generating and unloading. 

### Import 

First we import the Generator class from `pysdg` synth module. 

In [4]:
from pysdg.gen import Generator  

### Create a Generator Object

Create a Generator object with your generator of interest.

In [5]:
gen=Generator("synthcity/bayesian_network")

2025-05-12 15:24:40,903 - pysdg - INFO - 1440 - generate.py:122 - **************Started logging the generator: synthcity/bayesian_network, num_cores= None.**************


### Load (Real) Training Data

**Option1:** Infer variable types (default) when loading the data.

In [6]:
real=gen.load(raw_data='raw_data.csv')
real.head(10)

2025-05-12 15:24:43,938 - pysdg - INFO - 1440 - generate.py:461 - The dataset ['unnamed'] is loaded into the generator synthcity_bayesian_network


Unnamed: 0,outc_cod_0,event_dt,wt,wt_cod,age,age_cod,drugname_0,indi_pt_0,sex
0,,,,,,,ZANTAC,,
1,DE,201506.0,,,18.0,YR,OXYCONTIN,Drug abuse,M
2,OT,201907.0,,,,,LEMTRADA,,
3,OT,20190917.0,,,46.0,YR,COSENTYX,Psoriatic arthropathy,M
4,DE,20161201.0,110.0,KG,73.0,YR,ENTRESTO,Cardiac failure,M
5,OT,,95.0,KG,33.0,YR,Champix,Smoking cessation therapy,M
6,,,,,,,CEFTRIAXONE SODIUM,,
7,,,86.0,KG,74.0,YR,LYRICA,Nerve injury,F
8,,,,,,,XELJANZ,,
9,,20190907.0,,,57.0,YR,COSENTYX,Ankylosing spondylitis,M


**Option2:** Explicitly define the types of variables. 

In [7]:
# Load the raw dataset and raw info file into the generator. The returned real dataset will have all data types correctly converted according to the specifications in the raw info file.
real=gen.load(raw_data='raw_data.csv', raw_info='raw_info.json')
real.head(10)

2025-05-12 15:24:49,304 - pysdg - INFO - 1440 - generate.py:347 - Checking the input metadata for any conflict in variable indexes - Passed.
2025-05-12 15:24:50,060 - pysdg - INFO - 1440 - generate.py:461 - The dataset ['tutorial_data'] is loaded into the generator synthcity_bayesian_network


Unnamed: 0,outc_cod_0,event_dt,wt,wt_cod,age,age_cod,drugname_0,indi_pt_0,sex
0,,NaT,,,,,ZANTAC,,
1,DE,NaT,,,18.0,YR,OXYCONTIN,Drug abuse,M
2,OT,NaT,,,,,LEMTRADA,,
3,OT,2019-09-17,,,46.0,YR,COSENTYX,Psoriatic arthropathy,M
4,DE,2016-12-01,110.0,KG,73.0,YR,ENTRESTO,Cardiac failure,M
5,OT,NaT,95.0,KG,33.0,YR,Champix,Smoking cessation therapy,M
6,,NaT,,,,,CEFTRIAXONE SODIUM,,
7,,NaT,86.0,KG,74.0,YR,LYRICA,Nerve injury,F
8,,NaT,,,,,XELJANZ,,
9,,2019-09-07,,,57.0,YR,COSENTYX,Ankylosing spondylitis,M


### Train the Generator

Train the generator on the real dataset.

In [8]:
gen.train()

[2025-05-12T15:24:54.749597-0400][1440][CRITICAL] module disabled: /home/samer/miniconda3/envs/pysdgdev/lib/python3.10/site-packages/synthcity/plugins/generic/plugin_goggle.py
2025-05-12 15:24:56,318 - pysdg - INFO - 1440 - generate.py:925 - Started training using synthcity_bayesian_network...
INFO:pysdg:Started training using synthcity_bayesian_network...
2025-05-12 15:25:00,425 - pysdg - INFO - 1440 - generate.py:930 - Completed training using synthcity_bayesian_network.
INFO:pysdg:Completed training using synthcity_bayesian_network.


### Generate Synthetic Data

Generate the desired number of synthetic datasets (num_synths), each containing the specified number of rows (num_rows).

In [9]:
gen.gen(num_rows=len(real), num_synths=2)

2025-05-12 15:25:05,123 - pysdg - INFO - 1440 - generate.py:981 - Generating synth no. 0 of size (10000, 12) -- Completed!
INFO:pysdg:Generating synth no. 0 of size (10000, 12) -- Completed!
2025-05-12 15:25:05,767 - pysdg - INFO - 1440 - generate.py:981 - Generating synth no. 1 of size (10000, 12) -- Completed!
INFO:pysdg:Generating synth no. 1 of size (10000, 12) -- Completed!


### Decode the Synthetic Datasets


In [10]:
synths=gen.unload()
synths[0].head(10)


Unnamed: 0,outc_cod_0,event_dt,wt,wt_cod,age,age_cod,drugname_0,indi_pt_0,sex
0,,NaT,,,,,VARGATEF,,
1,,NaT,,,,,ZESTRIL,,
2,,NaT,,,,,CASIRIVIMAB\IMDEVIMAB,Neutrophil function disorder,M
3,,NaT,,,,,XTAMPZA ER,,
4,,NaT,,,,,BRIGATINIB,Product used for unknown indication,
5,OT,NaT,,,,,UPADACITINIB,,
6,,2019-06-13,29.770553,KG,13.0,MON,Dolutegravir,Myelodysplastic syndrome,M
7,,2013-06-13,,,,,GAMMAPLEX,Metastases to bone,F
8,,NaT,,,,,DURVALUMAB,Prostatomegaly,F
9,,NaT,,,,,ORILISSA,,
