Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assessing uncertainty quantification quality metrics using the design-bench benchmarks #7

Open
sgbaird opened this issue Mar 24, 2022 · 2 comments

Comments

@sgbaird
Copy link

sgbaird commented Mar 24, 2022

I'm considering using this for some simple tests of a few different uncertainty quantification quality metrics to see which ones are better predictors for how successful an adaptive design scheme will be.

From a very black box standpoint, what this requires is y_true, y_pred, and sigma (true, predicted, and uncertainty, resp.) and a "notion of best" for the adaptive design task.

Does that seem like something feasible/easy to implement with this repository? Or do you think it would be better to look elsewhere or start from scratch?

@brandontrabucco
Copy link
Owner

brandontrabucco commented Mar 26, 2022

Hi sgbaird,

Great question! For most (see the note below) of the benchmarking tasks in design-bench, we use a procedure that collects a dataset of design values whose y_true spans both high-performing values and low-performing ones.

When we apply offline model-based optimization algorithms to these tasks, which are released in our https://github.com/brandontrabucco/design-baselines repository, we then typically subsample the original task dataset in order to hide some fraction of the high-performing designs from the optimizer. This serves to ensure that there exists headroom in the task objective function y_true(x) that the optimizer can be evaluated against.

For your purposes, you may find task.y helpful, which represents y_true(x) for every design value x in the subsampled task dataset that is typically passed to an optimizer.

In addition, if you would like to obtain the highest-performing designs, and their y_true values, you can modify the dataset subsampling hyperparameters used when loading the task dataset. In this example, I'll modify the parameters used to load the
"Superconductor-RandomForest-v0" dataset, which would otherwise have a min_percentile of 0 and max_percentile of 40. In this example, we're effectively obtaining the held-out set of highest-performing designs:

import design_bench
max_percentile = 100
min_percentile = 40
task = design_bench.make("Superconductor-RandomForest-v0", 
    dataset_kwargs=dict(max_percentile=max_percentile , 
        min_percentile=min_percentile))

In terms of getting y_pred and sigma, this may depend on the optimization algorithm you are using. Several of the baselines implemented here (https://github.com/brandontrabucco/design-baselines) include a probabilistic neural network that fits a distribution to the objective function y_true(x), which may be used to obtain y_pred and sigma per task.

Let me know if you have any other questions!

-Brandon

NOTE:

Our HopperController suite of tasks does not use subsampling, and if the optimal performance is desired for this task, one can look at the performance of standard RL baselines on the Hopper-v2 MuJoCo task as reference.

@sgbaird
Copy link
Author

sgbaird commented Mar 26, 2022

Fantastic! Thank you for the thorough reply. This is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants