# Dataset Overview

OMMX Quantum Benchmarks provides access to optimization benchmark datasets converted to OMMX format. This page describes the current status of available dataset categories.

**Current Sources**: The initial release includes datasets from QOBLIB, with framework designed for expansion to additional benchmark sources in the future.

## Marketsplit (`01_marketsplit`)

**Problem Type**: Market split optimization problems  
**Models**: Binary linear, Binary unconstrained  
**Instances**: 120 instances per model (ms_03_050_002 - ms_15_200_003)

In [1]:
from ommx_quantum_benchmarks.qoblib import Marketsplit

dataset = Marketsplit()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['binary_linear', 'binary_unconstrained']
binary_linear: 156 instances
binary_unconstrained: 156 instances


## Labs (`02_labs`)

**Problem Type**: Low autocorrelation binary sequences  
**Models**: Integer, Quadratic unconstrained  
**Instances**: 99 instances (labs002 - labs100)

In [2]:
from ommx_quantum_benchmarks.qoblib import Labs

dataset = Labs()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['integer', 'quadratic_unconstrained']
integer: 99 instances
quadratic_unconstrained: 99 instances


## Birkhoff (`03_birkhoff`)

**Problem Type**: Minimum birkhoff decomposition  
**Models**: Integer linear  
**Instances**: 800 instances (bhD-3-001 - bhS-6-100)

In [3]:
from ommx_quantum_benchmarks.qoblib import Birkhoff

dataset = Birkhoff()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['integer_linear']
integer_linear: 800 instances


## Steiner (`04_steiner`)

**Problem Type**: Steiner tree packing problem  
**Models**: Integer linear  
**Instances**: 31 instances (stp_s020_l2_t3_h2_rs24098 - stp_s040_l2_t4_h3_rs123)

In [4]:
from ommx_quantum_benchmarks.qoblib import Steiner

dataset = Steiner()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['integer_linear']
integer_linear: 31 instances


## Independent Set (`07_independentset`)

**Problem Type**: Maximum independent set problems  
**Models**: Binary linear, Binary unconstrained  
**Instances**: 42 instances per model (various graph instances)

In [5]:
from ommx_quantum_benchmarks.qoblib import IndependentSet

dataset = IndependentSet()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['binary_linear', 'binary_unconstrained']
binary_linear: 42 instances
binary_unconstrained: 42 instances


## Network (`08_network`)

**Problem Type**: Network design  
**Models**: Integer LP  
**Instances**: 20 instances (network05 - network24)

In [6]:
from ommx_quantum_benchmarks.qoblib import Network

dataset = Network()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['integer_lp']
integer_lp: 20 instances


## Routing (`09_routing`)

**Problem Type**: Vehicle routing  
**Models**: Integer linear  
**Instances**: 55 instances (XSH-n20-k4-01 - XSH-n20-k4-55)

In [7]:
from ommx_quantum_benchmarks.qoblib import Routing

dataset = Routing()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['integer_linear']
integer_linear: 55 instances


## Topology (`10_topology`)

**Problem Type**: Topology design  
**Models**: Flow MIP, Seidel linear, Seidel quadratic  
**Instances**: 16 instances per model (topology_15_3 - topology_50_4)

In [8]:
from ommx_quantum_benchmarks.qoblib import Topology

dataset = Topology()
print(f"Available models: {dataset.model_names}")
for model in dataset.model_names:
    instances = dataset.available_instances[model]
    print(f"{model}: {len(instances)} instances")

Available models: ['flow_mip', 'seidel_linear', 'seidel_quadratic']
flow_mip: 16 instances
seidel_linear: 16 instances
seidel_quadratic: 16 instances


## Other Dataset Categories

The following dataset categories are defined in the framework but currently contain no instances. These represent problem types that may be expanded in future releases:

- **Sports** (`05_sports`) - Mixed integer linear sports scheduling problems
- **Portfolio** (`06_portfolio`) - Binary quadratic and quadratic unconstrained portfolio optimization

**Note**: These datasets can be instantiated but will return empty instance lists. Check the `available_instances` property to see current availability.

## Current Status Summary

| Dataset | Models | Instance Count | Status |
|---------|--------|----------------|---------|
| Marketsplit | 2 | 120 per model | ✅ Available |
| Labs | 2 | 99 per model | ✅ Available |
| Birkhoff | 1 | 800 | ✅ Available |
| Steiner | 1 | 31 | ✅ Available |
| Sports | 1 | 0 | 🚧 Defined, no instances |
| Portfolio | 2 | 0 | 🚧 Defined, no instances |
| IndependentSet | 2 | 42 per model | ✅ Available |
| Network | 1 | 20 | ✅ Available |
| Routing | 1 | 55 | ✅ Available |
| Topology | 3 | 16 per model | ✅ Available |

**Legend**: 
- ✅ Available: Instances have been converted and are accessible
- 🚧 Defined, no instances: Dataset classes exist but no instances are currently available