# 2 Running experiments using registries
PyDFLT includes a few tools that make it easier to run experiments. This notebook introduces registries that have predefined configurations of methods, models and data from literature. In 2.2 the usage of configs and utils is shown to be able to easily run more extended experiments.

## 2.1 Registries
To make it easy to use different models, data and DFL methods from literature, we created different registries in `pydflt.registries`. These 3 categories are stored here with the settings as they are relevant and appear in literature. It also makes it easy to specify a base version of a model, data or decision maker. Below are the currently stored registries. In the files themselves there are some comments regarding works that use about the same configuration, with slightly different parameters


In [1]:
from pydflt.registries.decision_makers import decision_maker_registry
from pydflt.utils.load import print_registry

print_registry(decision_maker_registry)

Auto-Sklearn cannot be imported.
Name: differentiable
Class/function: <class 'pydflt.decision_makers.differentiable_decision_maker.DifferentiableDecisionMaker'>
Parameters: {'learning_rate': 0.001, 'device_str': 'cpu', 'loss_function_str': 'regret', 'predictor_str': 'MLP'}

Name: PFL linear
Class/function: <class 'pydflt.decision_makers.differentiable_decision_maker.DifferentiableDecisionMaker'>
Parameters: {'learning_rate': 0.001, 'device_str': 'cpu', 'loss_function_str': 'mse', 'predictor_str': 'MLP', 'predictor_kwargs': {'num_hidden_layers': 0}}

Name: SPO+ linear
Class/function: <class 'pydflt.decision_makers.differentiable_decision_maker.DifferentiableDecisionMaker'>
Parameters: {'learning_rate': 0.001, 'device_str': 'cpu', 'loss_function_str': 'SPOPlus', 'predictor_str': 'MLP', 'predictor_kwargs': {'num_hidden_layers': 0}}

Name: SFGE
Class/function: <class 'pydflt.decision_makers.sfge_decision_maker.SFGEDecisionMaker'>
Parameters: {'learning_rate': 0.01, 'batch_size': 32, 'devic

In [2]:
from pydflt.registries.models import model_registry

print_registry(model_registry)

Name: knapsack_2D_Tang2022
Class/function: <class 'pydflt.concrete_models.grbpy_knapsack.GRBPYKnapsackModel'>
Parameters: {'num_decisions': 32, 'capacity': 20, 'weights_lb': 3, 'weights_ub': 8, 'dimension': 2, 'seed': 5}

Name: shortest_path
Class/function: <class 'pydflt.concrete_models.grbpy_shortest_path.ShortestPath'>
Parameters: {'grid': (5, 5)}

Name: tsp
Class/function: <class 'pydflt.concrete_models.grbpy_tsp.TravelingSalesperson'>
Parameters: {'num_nodes': 20}

Name: WSMC_Silvestri2024
Class/function: <class 'pydflt.concrete_models.grbpy_two_stage_weighted_set_multi_cover.WeightedSetMultiCover'>
Parameters: {'num_items': 5, 'num_covers': 25, 'penalty': 5, 'cover_costs_lb': 1, 'cover_costs_ub': 100, 'Silvestri2024': True, 'seed': 5}

Name: knapsack_continuous
Class/function: <class 'pydflt.concrete_models.cvxpy_knapsack.CVXPYDiffKnapsackModel'>
Parameters: {'num_decisions': 10, 'capacity': 20, 'weights_lb': 3, 'weights_ub': 8, 'dimension': 1, 'seed': 5}

Name: knapsack_2_stage


In [3]:
from pydflt.registries.data import data_registry

print_registry(data_registry)

Name: load_data_from_dict
Class/function: <function load_data_from_dict at 0x15a703ec0>
Parameters: {'path': None}

Name: knapsack
Class/function: <function gen_data_knapsack at 0x131d0a520>
Parameters: {'seed': 5, 'num_data': 2000, 'num_features': 5, 'num_items': 10, 'dimension': 2, 'polynomial_degree': 6, 'noise_width': 0.5}

Name: shortest_path
Class/function: <function gen_data_shortest_path at 0x131d0a340>
Parameters: {'seed': 5, 'num_data': 2000, 'num_features': 5, 'grid': (5, 5), 'polynomial_degree': 6, 'noise_width': 0.5}

Name: tsp
Class/function: <function gen_data_traveling_salesperson at 0x14a387f60>
Parameters: {'seed': 5, 'num_data': 2000, 'num_features': 5, 'num_nodes': 20, 'polynomial_degree': 6, 'noise_width': 0.5}

Name: WSMC_Silvestri2024
Class/function: <function gen_data_wsmc at 0x15a703d80>
Parameters: {'seed': 5, 'num_data': 2500, 'num_features': 5, 'num_items': 10, 'degree': 5, 'noise_width': 0.5}



With these registries one can simply load a certain model, data generation and method. For example, if we want to apply DFL ot a shortest path problem using SPO+ [1] with a linear predictor, one can use:

In [4]:
from pydflt.problem import Problem
from pydflt.registries.data import get_data
from pydflt.registries.decision_makers import make_decision_maker
from pydflt.registries.models import make_model
from pydflt.runner import Runner

model, _ = make_model("shortest_path")
data, _ = get_data("shortest_path")
problem = Problem(data_dict=data, opt_model=model)
decision_maker, _ = make_decision_maker(problem, "SPO+ linear", learning_rate=0.1)  # adjust learning rate
runner = Runner(decision_maker, num_epochs=3, use_wandb=False)
result = runner.run()

Set parameter Username
Set parameter LicenseID to value 2612263
Academic license - for non-commercial use only - expires 2026-01-20
Set parameter FeasibilityTol to value 1e-06
Generating data using shortest_path
Computing optimal decisions for the entire dataset...
Optimal decisions computed and added to dataset.
Computing optimal objectives for the entire dataset...
Optimal objectives computed and added to dataset.
Shuffling indices before splitting...
Dataset split completed: Train=1400, Validation=300, Test=300
Problem mode set to: train
Problem mode set to: train
Num of cores: 1
Epoch 0/3: Starting initial validation...
Problem mode set to: validation
Epoch Results:
validation/objective_mean: 7.2163
validation/select_arc_mean: 0.2000
validation/sym_rel_regret_mean: 0.3727
validation/rel_regret_mean: 1.5864
validation/arc_costs_mean: 0.8190
validation/mse_mean: 2.5531
validation/abs_regret_mean: 4.1574
Initial best validation metric (abs_regret): 4.157423496246338
Starting training.

The `make_decision_maker`, `make_model` and `get_data` methods above return the used configuration as a second argument, such that it can be saved with the results.

## 2.2 Configs and utils
Configs are most easily defined using as a `.yml` file (though it is also possible to directly define dictionaries). Below we load an example config that for the `model`, `data` and `decision_maker` specifies a name from the registries.

In [5]:
import yaml

yaml_dir = "configs/shortest_path.yml"
config = yaml.safe_load(open(yaml_dir))
for key, value in config.items():
    print(f"{key}: {value}")

model: {'name': 'shortest_path'}
data: {'name': 'shortest_path', 'num_data': 250}
runner: {'num_epochs': 5, 'use_wandb': False, 'save_best': True, 'experiments_folder': 'results/', 'seed': 5}
problem: {'train_ratio': 0.75, 'val_ratio': 0.15, 'seed': 5}
decision_maker: {'name': 'SPO+ linear', 'learning_rate': 0.005}


Using configs we can directly run experiments with the `run` method from `pydflt.utils.experiments`.

In [6]:
from pydflt.utils.experiments import run

result = run(config)

Set parameter FeasibilityTol to value 1e-06
Generating data using shortest_path
Computing optimal decisions for the entire dataset...
Optimal decisions computed and added to dataset.
Computing optimal objectives for the entire dataset...
Optimal objectives computed and added to dataset.
Shuffling indices before splitting...
Dataset split completed: Train=187, Validation=37, Test=26
Problem mode set to: train
Problem mode set to: train
Num of cores: 1
Epoch 0/5: Starting initial validation...
Problem mode set to: validation
Epoch Results:
validation/objective_mean: 8.8257
validation/select_arc_mean: 0.2000
validation/sym_rel_regret_mean: 0.3757
validation/rel_regret_mean: 1.4209
validation/arc_costs_mean: 0.7233
validation/mse_mean: 4.1249
validation/abs_regret_mean: 4.9402
Initial best validation metric (abs_regret): 4.940208911895752
Starting training...
Epoch: 1/5
Problem mode set to: train
Epoch Results:
validation/objective_mean: 8.8257
train/rel_regret_mean: 1.2780
validation/sele

To set up a more extended set of experiments, one can use the base config and update only parts of the config by using `update_config`. Below we compare two DFL methods to a PFL baseline: The Smart "Predict-then-Optimize"+ loss [1] and perturbed Fenchel-Young loss [2].

In [7]:
from pydflt.utils.experiments import update_config

experiment_kwargs = {
    "SPO+": {},
    "PFYL": {
        "decision_maker": {
            "loss_function_str": "perturbedFenchelYoung",
        },
    },
    "PFL": {
        "decision_maker": {
            "loss_function_str": "mse",
        }
    },
}

seeds = list(range(1))
for experiment_name, kwargs in experiment_kwargs.items():
    for seed in seeds:
        yaml_dir = "configs/shortest_path.yml"
        config = yaml.safe_load(open(yaml_dir))
        config["runner"]["experiment_name"] = experiment_name
        for key in config:
            if isinstance(config[key], dict) and "seed" in config[key]:
                config[key]["seed"] = seed
        updated_config = update_config(config, kwargs)
        run(updated_config)

Set parameter FeasibilityTol to value 1e-06
Generating data using shortest_path
Computing optimal decisions for the entire dataset...
Optimal decisions computed and added to dataset.
Computing optimal objectives for the entire dataset...
Optimal objectives computed and added to dataset.
Shuffling indices before splitting...
Dataset split completed: Train=187, Validation=37, Test=26
Problem mode set to: train
Problem mode set to: train
Num of cores: 1
Epoch 0/5: Starting initial validation...
Problem mode set to: validation
Epoch Results:
validation/objective_mean: 7.9157
validation/select_arc_mean: 0.2000
validation/sym_rel_regret_mean: 0.3156
validation/rel_regret_mean: 1.0661
validation/arc_costs_mean: 0.9063
validation/mse_mean: 2.4028
validation/abs_regret_mean: 3.9976
Initial best validation metric (abs_regret): 3.997610330581665
Starting training...
Epoch: 1/5
Problem mode set to: train
Epoch Results:
validation/objective_mean: 7.9157
train/rel_regret_mean: 1.1059
validation/sele

To be able to analyze the results of multiple runs like these properly, we suggest using Weight & Biases. We explain how to use this tool in the next notebook.

## References

[1] Adam N. Elmachtoub and Paul Grigas. Smart “predict, then optimize”’. Management Science, 68:9–26, 2022. [doi:10.1287/mnsc.2020.3922](https://doi.org/10.1287/mnsc.2020.3922).

[2] Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, and Francis Bach. Learning with differentiable perturbed optimizers. Advances in Neural Information Processing Systems, 33:9508–9519, 2020. [doi:10.48550/arXiv.2002.08676](https://doi.org/10.48550/arXiv.2002.08676).