# Robin Calibration

This notebook is intended to guid through the process of setting up and executing a calibration workflow using the `robin.calibration` library. It includes steps for initializing a `Calibration` object, configuring the study, running the Optuna dashboard, and performing parallel calibration.

## 0. Import Libraries

In [1]:
from robin.calibration.entities import Calibration

## 1. Calibration Setup

The `Calibration` object is initialized with the following configuration files:

- **Supply Configuration (`path_config_supply`)**: Specifies the path to the YAML file containing the supply-side configuration data.
- **Demand Configuration (`path_config_demand`)**: Specifies the path to the YAML file containing the demand-side configuration data, where the `null` values are the ones to be optimized.
- **Target Output (`target_output_path`)**: Specifies the path to the CSV file containing the target output data for calibration.

Additionally, the study is stored in an SQLite database (`sqlite:///calibration_test.db`). While SQLite is convenient for single-threaded workflows, it is not recommended for parallel calibration due to potential database locking issues. For parallel calibration, consider using a more robust database system such as PostgreSQL or MySQL.

Finally, the demand used to generate the `target.csv` is located in `../configs/calibration/supply_data_target.yaml`.

In [2]:
calibration = Calibration(
    path_config_supply='../configs/calibration/supply_data.yaml',
    path_config_demand='../configs/calibration/demand_data.yaml',
    target_output_path='../configs/calibration/target.csv',
    departure_time_hard_restriction=False
)
calibration.create_study(
    study_name='calibration_test',
    storage='sqlite:///calibration_test.db',
    n_trials=10,
    seed=42,
    show_progress_bar=True
)

[I 2025-05-05 12:34:16,353] A new study created in RDB with name: calibration_test


  0%|          | 0/10 [00:00<?, ?it/s]

[I 2025-05-05 12:34:32,566] Trial 0 finished with value: 15138.658192090395 and parameters: {'Business_arrival_time_6': 0.31752950812868397, 'Business_arrival_time_7': 0.5286593021142155, 'Business_arrival_time_8': 0.05885321274400923, 'Business_arrival_time_9': 0.18672315304966325, 'Business_arrival_time_10': 0.2474705187173215, 'Business_arrival_time_11': 0.42563299338184324, 'Business_arrival_time_12': 0.7111480218101008, 'Business_arrival_time_13': 0.2988833260538767, 'Business_arrival_time_14': 0.9043037354010703, 'Business_arrival_time_15': 0.0554510145601691, 'Business_arrival_time_16': 0.048331978416684684, 'Business_arrival_time_17': 0.8989599062122674, 'Business_arrival_time_18': 0.7661485747703239, 'Business_arrival_time_19': 0.10989040940223083, 'Business_arrival_time_20': 0.04347828862481662, 'Business_arrival_time_21': 0.4934377107541835, 'Business_arrival_time_22': 0.4843729360785832, 'Business_arrival_time_23': 0.33728504804349446, 'Student_arrival_time_5': 0.6197108222

The `Calibration` object provides functionality to save the top `k` trials from the study, where `k` is configurable using the `keep_top_k` parameter. For each of the top trials, the following outputs are saved:

1. **Calibrated Demand YAML (`checkpoint_{trial}.yaml`)**: The demand configuration file is updated with the optimized parameters from the trial.
2. **Generated Output (`checkpoint_{trial}.csv`)**: The output generated using the calibrated demand is saved for further analysis.
3. **DataFrame for MSE Calculation (`df_target_output_{trial}.csv`)**: A DataFrame is saved that contains the data used to calculate the Mean Squared Error (MSE) for each service, providing insights into the calibration performance.

In [3]:
!tree calibration_logs/

[01;34mcalibration_logs/[0m
├── [01;34mtrial_1[0m
│   ├── [00mcheckpoint_1.csv[0m
│   ├── [00mcheckpoint_1.yaml[0m
│   └── [00mdf_target_output_1.csv[0m
├── [01;34mtrial_4[0m
│   ├── [00mcheckpoint_4.csv[0m
│   ├── [00mcheckpoint_4.yaml[0m
│   └── [00mdf_target_output_4.csv[0m
└── [01;34mtrial_7[0m
    ├── [00mcheckpoint_7.csv[0m
    ├── [00mcheckpoint_7.yaml[0m
    └── [00mdf_target_output_7.csv[0m

3 directories, 9 files


## 2. Optuna Dashboard

The `optuna-dashboard` is a command-line tool provided by Optuna to visualize and monitor the progress of optimization studies. It provides an interactive web-based interface to explore the trials, their parameters, and the corresponding objective values.

In [5]:
!optuna-dashboard sqlite:///calibration_test.db

Listening on http://127.0.0.1:8080/
Hit Ctrl-C to quit.

127.0.0.1 - - [05/May/2025 12:39:27] "GET /api/studies/1?after=10 HTTP/1.1" 200 47060
127.0.0.1 - - [05/May/2025 12:39:38] "GET /api/studies/1?after=10 HTTP/1.1" 200 47060
Traceback (most recent call last):
  File "/home/kinrre/.virtualenvs/robin/bin/optuna-dashboard", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/kinrre/.virtualenvs/robin/lib/python3.11/site-packages/optuna_dashboard/_cli.py", line 140, in main
    run_wsgiref(app, args.host, args.port, args.quiet)
  File "/home/kinrre/.virtualenvs/robin/lib/python3.11/site-packages/optuna_dashboard/_cli.py", line 44, in run_wsgiref
    httpd.serve_forever()
  File "/usr/lib/python3.11/socketserver.py", line 233, in serve_forever
    ready = selector.select(poll_interval)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
                    ^^^^^^

## 3. Parallel Calibration

For parallel calibration, the calibration process must be executed in a script. This is because parallel execution requires multiple processes or threads to run simultaneously, which cannot be achieved directly within a Jupyter Notebook.

To perform parallel calibration, save the calibration code into a Python script (e.g., `parallel_calibration.py`). Then, manually execute the script in multiple shells or terminals. Each shell will run an independent process, contributing to the parallel execution of the calibration trials.

For example, if you want to run 4 parallel processes, open 4 separate terminals and execute the script in each terminal:

```bash
python parallel_calibration.py
```

This approach ensures that the calibration study is updated concurrently by multiple processes, leveraging parallelism to speed up the optimization process.