
# Benchmark Data Specification

All raw benchmark data must be serialized into a JSON format that closely corresponds with basic Python data types (list, dict, float, etc...). A minimal example of the expected format is as follows:

```json
[ // List of benchmarks for each instance
 {
// Metadata keys for the solver infrastructure
   "solver": "pysa:walksat",
   "solver_parameters": {
     "num_sweeps": 4,
     "num_replicas": 24,
     // additional parameters ...
    },
    "hardware": "CPU:Apple M2:1",
// Keys for benchmark results
  "set": "Batch-XX",
  "instance_idx": 0,
  "cutoff_type": "iterations",
  "cutoff": 4,
  "runs_attempted": 9,
  "runs_solved": 7,
  "n_unsat_clauses": [ // Optimality gap data
   1.0,
   0.0,
   0.0,
   1.0,
   0.0,
   0.0,
   0.0,
   0.0,
   0.0
  ],
// Keys for the resource costs for each repetition
  "pre_runtime_seconds": 0.0,
  "runtime_seconds": [
   0.005082,
   0.0052239999999999995,
   0.0050739999999999995,
   0.005068,
   0.005095,
   0.00528,
   0.0050739999999999995,
   0.005124999999999999,
   0.005095
  ],
 },
// list continues ...
]
```
As a tip, a file with this format can be easily generated from a list-of-dictionaries using the `json` module in Python. A 1D Numpy array `arr` with floating point numbers can be converted to a JSON-serializable list via `list(float(x) for x in arr)`, and similarly for an integer array.

There are 3 major caterogies of keys that must be included in the benchmarking data.
#### 1. Solver and Hardware Metadata
Each element of the outermost list is the result of a benchmark on a single instance with multiple repetitions of the solver.
There are 3 metadata keys that are required for comparison across methods:
- `solver` A metadata field that specifies the name of the solver or simulator being used to solve this instance. This can be arbitrary as long as it is consistent; different benchmarks have the same value for this key if and only if they were run with exactly the same algorithm.
- `solver_parameters`: A dictionary of parameters passed to the solver. Any parameter that may affect the TTS (other than pseudorandom generator seeds) should be included.
- `hardware`: A list of strings (or a single string) specifying all hardware devices used.
  - A CPU-only algorithm and simulations should simply state `"CPU:[cpu name]:[concurrency]"`, where `[cpu name]` is the name of the processor and `[concurrency]` is the maximum number of CPU cores used concurrently during the benchmark (which does not need to be the maximum number of cores of the CPU). The CPU name only needs to be as specific as needed for benchmarking purposes. It is currently expected that benchmarks will be performed utilizing a single processor or a homogeneous array of processors of the same type.
  - Any additional hardware used as an accelerator may be specified with any sufficiently specific and consistently applied strings.

#### 2. Benchmark Results
The basic keys expected from raw benchmark data are
- `set` The name of the problem instance set being benchmarked. This should be set to the folder name of the batch the instance belongs to.
- `instance_idx` Identifier for the instance in the batch. This should be the integer index of the instance filename (without the `.cnf` extension).
- `cutoff_type` The type of cutoff applied for each repetition, if any. Timeout cutoffs have type `"time_seconds"` given in units of seconds. Alternatively  the solver can be cut off by a maximum number of iterations (e.g. number of Monte Carlo sweeps) specified by type `"iterations"`.
- `cutoff`: The cutoff amount. If `cutoff_type` is `"iterations"`, this should be an integer. A timeout cutoff with `"time_seconds"` may be given as a float. May be `null` if no cutoff was applied.
- `runs_attempted` The total number of repetitions of the solver completed.
- `runs_solved` The number of successful repetitions of the solver, out of the completed repetitions
- `configurations`  The list of configurations found by each repetition should also be included for completeness. For QUBO and SAT problems, each configuration should be a list of `0` or `1` values for each variable in the problem.

Finally, there is a single problem type-specific *optimality gap* key, specifying the list of gaps between the true optimal solution of the instance and the solution found by each repetition, under the optimization objective function. For SAT (with a satisfiability promise) the optimization gap key must be named `n_unsat_clauses`. The only strict requirement for this list is that a repetition must be considered "successful" if and only if the optimality gap is equal to 0. Thus, it is critical that the value of  `runs_solved` is equal to the number of elements in `n_unsat_clauses` that are equal to 0.

#### 3. Benchmark Resources

The list of resource costs for every repetition must also be specified to analyze the TTS scaling (or any of its analogs using another resource metric). In general, every hardware component should account for both usage time and energy consumption.

Resource costs are categorized into two types:
 1. *Pre-processing resources*: These are any one-time costs required for pre-processing, programming, optimizing, or otherwise changing or tuning any aspect of the problem instance or the analog hardware, in a way that is specific to *one* problem instance.
 2. *Optimization resources*: Resource costs required to attempt repetitions of an algorithm to optimize a problem, after pre-processing resources have been spent. These should be specified as a list of numeric values, with one value for each repetition. The pre-processing resources are *not* included in the optimization resources. These resources include CPU time and energy for a controller/hybrid algorithm and hardware energy consumption. Readout times are not currently included.

The following keys are *pre-processing resource* keys:
- `pre_cpu_time_seconds` CPU time utilized for preprocessing steps, in seconds. **(Required)**
- `pre_cpu_energy_joules` CPU energy utilized for preprocessing steps, in joules. **(Required)**
- `pre_runtime_seconds` The total amount of analog hardware time, in seconds. **(Required, if applicable)**
- `pre_energy_joules` The total amount of analog hardware energy consumption, in joules. **(Required, if applicable)**

The following keys are used to specify resource costs for each repetition of the optimization algorithm solver:
- `cpu_time_seconds` List of the CPU time used for each repetition, in seconds. **(Required)**
- `cpu_energy_joules` List of the CPU energy consumption used in each repetition, in joules. **(Required)**
- `hardware_time_second` List of time spent computing on analog hardware during each repetition. **(Required, if available)**
- `hardware_energy_joules` List of the energy consumption from analog hardware used during each algorithm repetition. **(Required)**
- `hardware_calls` List of the number of analog hardware calls used in each repetition. **(Required, if applicable for scaling)**
- `solver_iterations` List of the number of complete iterations of the controlling co-design framework in each iteration. This may be equal to or different from the number of hardware calls, depending on the specific problem decomposition strategies, heuristics, etc... that are in use. This may be omitted for an exact solver or for other algorithms where there is no notion of an outer loop "iteration" with approximately uniform duration **(Required, if applicable)**

