# Tutorial: Using Optuna's Bayesian optimization to tune hyperparameters 


It is highly recommended that the Bayesian optimization routine is executed in an environment **with access to CUDA and/or OpenMP**, as it greatly accelerates the entire process.

It assumes that pysdg is already installed in a Conda environment, the environment has been activated from the shell, and this notebook is being run within that activated environment. For detailed instructions, please refer to the "pysdg" documentation.

The following cell sets the working directory to the location of this notebook. It is assumed that all files accessed by this notebook are stored in the same directory.

### Import

First we import the necessary packages and apply the proper settings for prettier display of both Pandas data frames and Python dictionaries. The last line below imports the  Generator class from `pysdg` synth module. 

In [1]:
from pysdg.gen import Generator
from pysdg.optimize import BayesianOptimizationRoutine

### Choose the Generator and Load the Training Data

We use CTGAN to demonstrate how CTGAN can be applied in synthetic data generation.

In [2]:
gen = Generator(gen_name="synthcity/ctgan")
real=gen.load("./raw_data.csv", "./raw_info.json")

2025-06-13 13:36:16,855 - pysdg - INFO - 99291 - generate.py:132 - **************Started logging the generator: synthcity/ctgan, num_cores= None.**************
2025-06-13 13:36:16,871 - pysdg - INFO - 99291 - generate.py:349 - Checking the input metadata for any conflict in variable indexes - Passed.
2025-06-13 13:36:17,637 - pysdg - INFO - 99291 - generate.py:463 - The dataset ['tutorial_data'] is loaded into the generator synthcity_ctgan


In [3]:
# Define you own evaluation function
def my_eval_function(gen: Generator):
    real_data = gen.enc_real
    synth_data = gen.enc_synths[0] # we'll be assuming that we're generating only one dataset and we compare the encoded datasets, for simplicity
    n_mismatches  = (real_data != synth_data).sum().sum()
    return n_mismatches



In [4]:
bayes_opt = BayesianOptimizationRoutine(
                                        gen=gen,
                                        eval_function=my_eval_function,
                                        objective="minimize",
                                        n_trials=1, # to make it finish faster
                                        study_name="mismatches_study",
                                        dump_csv=False, # dumping csv will only happen at the end of the optimization
                                        dump_sqlite=False # dumping sql happens after each trial
                                        )

2025-06-13 13:36:17,655 - pysdg - INFO - 99291 - generate.py:981 - Started training using synthcity_ctgan...
[2025-06-13T13:36:17.682374-0400][99291][CRITICAL] module disabled: /home/samer/miniconda3/envs/pysdg_dev_ml/lib/python3.10/site-packages/synthcity/plugins/generic/plugin_goggle.py
2025-06-13 13:36:19,515 - pysdg - INFO - 99291 - generate.py:985 - No of Iterations=25, Batch Size=512
INFO:pysdg:No of Iterations=25, Batch Size=512
100%|██████████| 25/25 [01:01<00:00,  2.45s/it]
2025-06-13 13:37:32,845 - pysdg - INFO - 99291 - generate.py:993 - Completed training using synthcity_ctgan.
INFO:pysdg:Completed training using synthcity_ctgan.
2025-06-13 13:37:33,145 - pysdg - INFO - 99291 - generate.py:1020 - Generating synth no. 0 of size (10000, 12) -- Completed!
INFO:pysdg:Generating synth no. 0 of size (10000, 12) -- Completed!
2025-06-13 13:37:33,216 - pysdg - INFO - 99291 - generate.py:981 - Started training using synthcity_ctgan...
INFO:pysdg:Started training using synthcity_ctga

In [5]:
bayes_opt.best_gen.gen(num_rows=len(real), num_synths=1)
synths=bayes_opt.best_gen.unload()
synths[0]


2025-06-13 13:37:55,189 - pysdg - INFO - 99291 - generate.py:1020 - Generating synth no. 0 of size (10000, 12) -- Completed!
INFO:pysdg:Generating synth no. 0 of size (10000, 12) -- Completed!
2025-06-13 13:37:55,242 - pysdg - INFO - 99291 - generate.py:2032 - The directory '/home/samer/projects/pysdg/tutorials/pysdgws79cc7563826d4627b95e9b6df8a98411' has been removed successfully.
INFO:pysdg:The directory '/home/samer/projects/pysdg/tutorials/pysdgws79cc7563826d4627b95e9b6df8a98411' has been removed successfully.


Unnamed: 0,outc_cod_0,event_dt,wt,wt_cod,age,age_cod,drugname_0,indi_pt_0,sex
0,,NaT,,,,,METHOTREXATE.,,F
1,DE,NaT,,,0,,warfarin,,M
2,,2018-08-27,73.380687,,,,METHOTREXATE.,,
3,,NaT,,,0,,METHOTREXATE.,,
4,HO,2018-08-27,,,,YR,METHOTREXATE.,,M
...,...,...,...,...,...,...,...,...,...
9995,,NaT,,,,YR,DEXTROMETHORPHAN HYDROBROMIDE\QUINIDINE SULFATE,,
9996,OT,NaT,,,,,METHOTREXATE.,,F
9997,,NaT,,,,,warfarin,,M
9998,DE,NaT,,,,,METHOTREXATE.,,
