In [1]:
import pandas as pd
import numpy as np 
import sys
import torch

from bikebench.benchmarking.public_benchmarking_utils import Benchmarker, get_unconditionally_valid_sample, get_conditionally_valid_sample, ScoreReportDashboard

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
device = "cpu"

First, we set up an instance of the benchmarker. In this baseline, we do not call the evaluators so we can set masked_constraints and gradient_free to True

In [3]:
bench = Benchmarker(device=device, masked_constraints=True, gradient_free=True)

For this simple baseline, random sampling from the dataset is our "model". 
We sample a random set then we score the results. 

In [4]:
data_tens = bench.get_train_data()

#Call the function to "generate" the data
rendom_indices = torch.randint(0, data_tens.shape[0], (10000,))
generated = data_tens[rendom_indices]

main_scores, detailed_scores, all_evaluation_scores = bench.score(generated)
bench.save_results("results/benchmark_results/baseline_dataset")

  0%|          | 0/100 [00:00<?, ?it/s]

100%|██████████| 100/100 [00:08<00:00, 12.15it/s]


We can see a variety of scores including quality, diversity, etc. We also seems some stats about the "model." Since we did not perform any evaluation, our evaluator was called 0 times. The model is marked as "conditional" because we did not access the test set conditions (unconditional models, in contrast would train/fit using the test set conditions). Masked constraints is true, as we specified, and gradients-free is true, as we specified.

In [5]:
print(main_scores)

Design Quality ↑ (HV)      0.003457
Constraint Violation ↓       2.6988
Sim. to Data ↓ (MMD)       0.005056
Novelty ↑                  5.617839
Binary Validity ↑            0.0202
Diversity ↓ (DPP)         12.721894
Evaluation Count                0.0
Conditional?                   True
Masked Constraints?            True
Gradient Free?                 True
dtype: object
