# Tutorial: Quick start

**This tutorial provides a quick introduction to using pysdg.** It assumes that pysdg is already installed in a Conda environment, the environment has been activated from the shell, and this notebook is being run within that activated environment. For detailed instructions, please refer to the "pysdg" documentation.

The core functions in pysdg include: loading, training, generating and unloading. 

In [11]:
# Import the Generator class.
from pysdg.synth.generate import Generator  

In [12]:
# Define your paths to the raw data and raw info files.
raw_data_path='raw_data.csv'
raw_info_path='raw_info.json'

In [13]:
# Create a Generator object with your generator of interest.
gen=Generator("synthcity_bayesian_network")

2025-01-30 16:22:19,243 - pysdg - INFO - 1041332 - generate.py:92 - **************Started logging the generator: synthcity_bayesian_network, num_cores= None.**************
INFO:pysdg:**************Started logging the generator: synthcity_bayesian_network, num_cores= None.**************


In [14]:
# Load the raw dataset and raw info file into the generator. The returned real dataset will have all data types correctly converted according to the specifications in the raw info file.
real=gen.load(raw_data_path, raw_info_path)
real.head(10)

2025-01-30 16:22:19,301 - pysdg - INFO - 1041332 - generate.py:219 - Checking the input metadata for any conflict in variable indexes - Passed.
INFO:pysdg:Checking the input metadata for any conflict in variable indexes - Passed.


2025-01-30 16:22:21,011 - pysdg - INFO - 1041332 - generate.py:287 - The dataset ['tutorial_data'] is loaded into the generator synthcity_bayesian_network
INFO:pysdg:The dataset ['tutorial_data'] is loaded into the generator synthcity_bayesian_network


Unnamed: 0,outc_cod_0,event_dt,wt,wt_cod,age,age_cod,drugname_0,indi_pt_0,sex
0,,NaT,,,,,ZANTAC,,
1,DE,NaT,,,18.0,YR,OXYCONTIN,Drug abuse,M
2,OT,NaT,,,,,LEMTRADA,,
3,OT,2019-09-17,,,46.0,YR,COSENTYX,Psoriatic arthropathy,M
4,DE,2016-12-01,110.0,KG,73.0,YR,ENTRESTO,Cardiac failure,M
5,OT,NaT,95.0,KG,33.0,YR,Champix,Smoking cessation therapy,M
6,,NaT,,,,,CEFTRIAXONE SODIUM,,
7,,NaT,86.0,KG,74.0,YR,LYRICA,Nerve injury,F
8,,NaT,,,,,XELJANZ,,
9,,2019-09-07,,,57.0,YR,COSENTYX,Ankylosing spondylitis,M


In [15]:
# save a clean version of the raw data and call it real
real.to_csv('real.csv', index=False)

In [16]:
# Train the generator on the real dataset.
gen.train()

[2025-01-30T16:22:21.096546-0500][1041332][CRITICAL] module disabled: /share/personal/skababji/conda_envs/pysdg_dev/lib/python3.10/site-packages/synthcity/plugins/generic/plugin_goggle.py
2025-01-30 16:22:21,353 - pysdg - INFO - 1041332 - generate.py:672 - Started training using synthcity_bayesian_network...
INFO:pysdg:Started training using synthcity_bayesian_network...
2025-01-30 16:23:16,353 - pysdg - INFO - 1041332 - generate.py:677 - Completed training using synthcity_bayesian_network.
INFO:pysdg:Completed training using synthcity_bayesian_network.


In [17]:
# Generate the desired number of synthetic datasets (num_synths), each containing the specified number of rows (num_rows).
gen.gen(num_rows=len(real), num_synths=2)

2025-01-30 16:23:18,161 - pysdg - INFO - 1041332 - generate.py:725 - Generating synth no. 0 of size (10000, 12) -- Completed!
INFO:pysdg:Generating synth no. 0 of size (10000, 12) -- Completed!
2025-01-30 16:23:20,114 - pysdg - INFO - 1041332 - generate.py:725 - Generating synth no. 1 of size (10000, 12) -- Completed!
INFO:pysdg:Generating synth no. 1 of size (10000, 12) -- Completed!


In [18]:
# unload the synthetic datasets
synths=gen.unload()
synths[0].head(10)


2025-01-30 16:23:20,392 - pysdg - ERROR - 1041332 - generate.py:1126 - Failed to remove 'None': expected str, bytes or os.PathLike object, not NoneType
ERROR:pysdg:Failed to remove 'None': expected str, bytes or os.PathLike object, not NoneType


Unnamed: 0,outc_cod_0,event_dt,wt,wt_cod,age,age_cod,drugname_0,indi_pt_0,sex
0,,NaT,,,,,VARGATEF,,
1,,NaT,,,,,ZESTRIL,,
2,,NaT,,,,,CASIRIVIMAB\IMDEVIMAB,Neutrophil function disorder,M
3,,NaT,,,,,XTAMPZA ER,,
4,,NaT,,,,,BRIGATINIB,Product used for unknown indication,
5,OT,NaT,,,,,UPADACITINIB,,
6,,2019-06-13,29.770553,KG,13.0,MON,Dolutegravir,Myelodysplastic syndrome,M
7,,2013-06-13,,,,,GAMMAPLEX,Metastases to bone,F
8,,NaT,,,,,DURVALUMAB,Prostatomegaly,F
9,,NaT,,,,,ORILISSA,,


In [19]:
# save synthetic data to csv
synths[0].to_csv('synth.csv', index=False)