diff --git a/examples/no-priors-characterization/README.md b/examples/no-priors-characterization/README.md
deleted file mode 100644
index 79fc313be..000000000
--- a/examples/no-priors-characterization/README.md
+++ /dev/null
@@ -1,281 +0,0 @@
-# Exploring Parameter Spaces with No-Priors Characterization
-
-
-
-> [!NOTE] The scenario
->
-> You have an experiment with multiple parameters,
-> and you want to understand how these parameters influence the outcome.
-> **In this example, `ado`'s no-priors characterization operator is used to
-> systematically sample and measure the target property across the parameter
-> space using various sampling strategies aimed at covering uniformly the
-> parameter space.** Using the no-priors characterization
-> operator involves:
->
-> 1. Defining the parameter space to explore.
-> 2. Creating an `operation` that uses no-priors characterization to sample
-> points using a chosen strategy.
-> 3. Observing the sampling process as it measures the target output property with
-> the selected strategy.
-
-> [!IMPORTANT] Prerequisites
->
-> Get the example files and install dependencies:
->
-> ```commandline
-> git clone https://github.com/IBM/ado.git
-> cd ado
-> pip install plugins/operators/no-priors-characterization/
-> pip install examples/no-priors-characterization/custom_experiments/
-> ```
-
-> [!CAUTION]
->
-> All commands below assume you are running them from the
-> **top-level of the `ado` repository**.
-
-> [!TIP] TL;DR
->
-> To create a `discoveryspace` and explore it with the no-priors
-> characterization operator, execute the following from the root of the `ado`
-> repository:
->
-> ```bash
-> : # Create the space to explore based on a custom experiment
-> ado create space -f \
-> examples/no-priors-characterization/example_yamls/space_reaction.yaml \
-> --new-sample-store
-> : # Explore it with no-priors characterization!
-> ado create operation -f \
-> examples/no-priors-characterization/example_yamls/op_basic_sampling.yaml \
-> --use-latest space
-> ```
-
-
-
-## What is No-Priors Characterization?
-
-**No-Priors Characterization** is a sampling operator designed to explore a
-parameter space systematically without requiring any prior knowledge or
-existing data. It's perfect for initial exploration of a system where you want
-to gather representative samples across the entire parameter space.
-
-**Handling Existing Measurements**: If the discovery space already contains
-measured entities for the target property, the operator automatically:
-
-- Identifies which entities have already been measured
-- Excludes them from sampling, so that the operator will measure the
- desired amount of entities
-
-The operator supports three sampling strategies:
-
-1. **Random Sampling (`random`)**: Uniformly random sampling across the
- parameter space. Fast and simple, but may not provide optimal coverage.
-
-2. **Concatenated Latin Hypercube Sampling (`clhs`)**: An adaptation of Latin
- Hypercube Sampling for discrete spaces. Good coverage in each dimension is
- obtained by avoiding measuring parameters combinations with many common
- values. Particularly effective for high-dimensional spaces.
-
-3. **Sobol Sampling (`sobol`)**: A quasi-random low-discrepancy sampling
- method that provides better space-filling properties than pure random
- sampling. It has been adapted for discrete parameter spaces. It falls back
- to Concatenated Latin Hypercube Sampling when collisions are detected
- during the discretization process.
-
-
-> [!CAUTION]
->
-> In the current version of no-priors characterization, if not all
-> measurements produce the observed target output property specified in the
-> `operation.parameters.targetOutput` field, the operation may fail or produce
-> incomplete results. Ensure all experiments return the expected target property.
-
-
-
-The operator samples a specified number of points in batches, measures them
-using the configured experiment, and stores the results in the sample store.
-
-## Creating a `discoveryspace`
-
-A `discoveryspace` describes the parameters you want to explore (`entitySpace`)
-and how to measure them (`measurementSpace`). In this example, we'll use two
-custom Python functions as experiments and take inspiration from the Chemistry domain:
-
-1. **`calculate_reaction_yield`**: Calculates chemical reaction yield based on
- temperature (K), concentration (mol/L), and catalyst amount (g) using an
- Arrhenius-like equation.
-
-2. **`calculate_material_strength`**: Calculates material tensile strength (MPa)
- based on composition percentages, temperature (°C), and grain size (μm) using
- a Hall-Petch relationship.
-
-First, create the `discoveryspace` by executing this command from the repository
-root:
-
-```commandline
-ado create space -f \
- examples/no-priors-characterization/example_yamls/space_reaction.yaml \
- --new-sample-store
-```
-
-This will create a new space and a sample store to hold the measurement results.
-The output will be similar to:
-
-```terminaloutput
-Success! Created space with identifier: space-bfed2d-19b49a
-```
-
-## Exploring with a No-Priors Characterization Operation
-
-Next, we will run an `operation` that uses no-priors characterization to
-explore the `discoveryspace`. We provide three example configurations with
-different sampling strategies:
-
-### Basic Sampling with CLHS
-
-The configuration for a basic sampling operation using CLHS is in
-`op_basic_sampling.yaml`:
-
-
-
-```yaml
-{%
- include-markdown "./example_yamls/op_basic_sampling.yaml"
-%}
-```
-
-
-To run the operation, execute:
-
-
-
-```commandline
-ado create operation -f \
- examples/no-priors-characterization/example_yamls/op_basic_sampling.yaml \
- --use-latest space
-```
-
-
-
-### Exploration with Random Sampling
-
-For an exploration with random sampling (uses random sampling with 20 samples
-and batch size of 5 for quick initial exploration):
-
-```commandline
-ado create operation -f \
- examples/no-priors-characterization/example_yamls/op_quick_exploration.yaml \
- --use-latest space
-```
-
-**Note**: Each operation samples different points from the space based on its
-strategy and parameters, even when using the same discovery space.
-
-### Thorough Coverage with Sobol Sequence
-
-For comprehensive coverage using Sobol sequences (uses Sobol sampling with 100
-samples and batch size of 1 for detailed parameter space coverage):
-
-```commandline
-ado create operation -f \
- examples/no-priors-characterization/example_yamls/op_thorough_coverage.yaml \
- --use-latest space
-```
-
-### What to Expect in the Terminal
-
-You will see output as the no-priors characterization operator samples and
-measures points. The key stages are:
-
-#### Initialization
-
-The operator will log the start of the sampling process:
-
-
-
-```commandline
-2026-03-09 16:30:00,000 INFO MainThread no_priors_characterization.operator: Starting no-priors characterization with 30 samples using clhs strategy
-```
-
-
-
-#### Sampling and Measurement
-
-For each batch of points, you will see output indicating the experiments being
-submitted and completed:
-
-
-
-```commandline
-(RandomWalk pid=82843) Continuous batching: SUBMIT EXPERIMENT. Submitted experiment custom_experiments.calculate_reaction_yield for temperature.353-concentration.4.1-catalyst_amount.4.5. Request identifier: c72090
-(RandomWalk pid=82843)
-(RandomWalk pid=82843) Continuous batching: SUMMARY. Entities sampled and submitted: 2. Experiments completed: 1 Waiting on 1 active requests. There are 0 dependent experiments
-(RandomWalk pid=82843) Continuous Batching: EXPERIMENT COMPLETION. Received finished notification for experiment in measurement request in group 1: request-c72090-experiment-calculate_reaction_yield-entities-temperature.353-concentration.4.1-catalyst_amount.4.5 (no_priors_characterization)-requester-randomwalk-1.6.1.dev9+03a65e7b.dirty-9a277d-time-2026-03-10 11:43:11.066810+00:00
-```
-
-
-
-#### Completion
-
-The operation will end with a success message:
-
-
-
-```commandline
-Success! Created operation with identifier operation-no-priors-characterization-v0.1-8b23a245 and it finished successfully.
-```
-
-
-
-## Looking at the `operation` output
-
-After the operation completes, you can view the sampled entities and their
-measured values.
-
-You can see the relationship between the space and operations with:
-
-```commandline
-ado show related space --use-latest
-```
-
-This will show the `discoveryspace` and the operations that were run.
-To see the entities of the space that have been measured, you can run:
-
-
-
-```commandline
-ado show entities space --use-latest
-```
-
-
-
-This will display a table of the entities sampled and their measured reaction
-yield values.
-
-
-
-```text
-┌───────┬──────────────────────────────────────────────────────────┬────────────────────────────┬─────────────────────────────────────────────┬─────────────┬───────────────┬─────────────────┬──────────┐
-│ INDEX │ identifier │ generatorid │ experiment_id │ temperature │ concentration │ catalyst_amount │ yield │
-├───────┼──────────────────────────────────────────────────────────┼────────────────────────────┼─────────────────────────────────────────────┼─────────────┼───────────────┼─────────────────┼──────────┤
-│ 0 │ temperature.300-concentration.1.0-catalyst_amount.2.0 │ no_priors_characterization │ custom_experiments.calculate_reaction_yield │ 300 │ 1.0 │ 2.0 │ 45.23 │
-│ 1 │ temperature.350-concentration.2.5-catalyst_amount.5.0 │ no_priors_characterization │ custom_experiments.calculate_reaction_yield │ 350 │ 2.5 │ 5.0 │ 78.91 │
-│ 2 │ temperature.400-concentration.0.5-catalyst_amount.1.0 │ no_priors_characterization │ custom_experiments.calculate_reaction_yield │ 400 │ 0.5 │ 1.0 │ 92.15 │
-│ ... │ ... │ ... │ ... │ ... │ ... │ ... │ ... │
-└───────┴──────────────────────────────────────────────────────────┴────────────────────────────┴─────────────────────────────────────────────┴─────────────┴───────────────┴─────────────────┴──────────┘
-```
-
-
-
-## Takeaways
-
-- **Systematic Exploration**: The no-priors characterization operator provides
- systematic sampling of parameter spaces without requiring prior knowledge.
-- **Multiple Strategies**: Choose from random, Sobol, or CLHS sampling based on
- your needs for speed vs. coverage quality.
-- **Flexible Configuration**: Adjust the number of samples and batch size to
- balance thoroughness with computational resources.
-- **Foundation for Further Analysis**: The sampled data can serve as a
- foundation for building surrogate models or for use with other operators like
- TRIM.
diff --git a/examples/no-priors-characterization/custom_experiments/no_priors_custom_experiments/experiments.py b/examples/no-priors-characterization/custom_experiments/no_priors_custom_experiments/experiments.py
deleted file mode 100644
index 8d426ef38..000000000
--- a/examples/no-priors-characterization/custom_experiments/no_priors_custom_experiments/experiments.py
+++ /dev/null
@@ -1,176 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-from typing import Literal
-
-import numpy as np
-
-from orchestrator.modules.actuators.custom_experiments import custom_experiment
-from orchestrator.schema.domain import PropertyDomain, VariableTypeEnum
-from orchestrator.schema.property import ConstitutiveProperty
-
-# ---------------------------
-# Properties for Reaction Yield
-# ---------------------------
-
-temperature = ConstitutiveProperty(
- identifier="temperature",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[273, 473], # 0-200°C in Kelvin
- ),
-)
-
-concentration = ConstitutiveProperty(
- identifier="concentration",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[0.1, 5.0], # mol/L
- ),
-)
-
-catalyst_amount = ConstitutiveProperty(
- identifier="catalyst_amount",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[0.0, 10.0], # grams
- ),
-)
-
-# ---------------------------
-# Properties for Material Strength
-# ---------------------------
-
-composition_a = ConstitutiveProperty(
- identifier="composition_a",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[0, 100], # percentage
- ),
-)
-
-composition_b = ConstitutiveProperty(
- identifier="composition_b",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[0, 100], # percentage
- ),
-)
-
-temperature_celsius = ConstitutiveProperty(
- identifier="temperature_celsius",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[-50, 200], # Celsius
- ),
-)
-
-grain_size = ConstitutiveProperty(
- identifier="grain_size",
- propertyDomain=PropertyDomain(
- variableType=VariableTypeEnum.CONTINUOUS_VARIABLE_TYPE,
- domainRange=[1, 100], # micrometers
- ),
-)
-
-# ---------------------------
-# Reaction Yield Experiment
-# ---------------------------
-
-
-@custom_experiment(
- required_properties=[temperature, concentration, catalyst_amount],
- output_property_identifiers=["yield"],
-)
-def calculate_reaction_yield(
- temperature: float, concentration: float, catalyst_amount: float
-) -> dict[Literal["yield"], float]:
- """
- Calculate chemical reaction yield using Arrhenius-like equation with catalyst effect.
-
- The yield is calculated using:
- k = A * exp(-Ea / (R * T)) * (1 + 0.1 * catalyst_amount)
- yield = 100 * (1 - exp(-k * concentration * time))
-
- where:
- A = 1e10 (pre-exponential factor)
- Ea = 50000 J/mol (activation energy)
- R = 8.314 J/(mol·K) (gas constant)
- time = 3600 s (reaction time)
-
- Args:
- temperature: Reaction temperature in Kelvin
- concentration: Reactant concentration in mol/L
- catalyst_amount: Catalyst amount in grams
-
- Returns:
- dict: Dictionary containing the calculated yield as a percentage (0-100)
- """
- A = 1e10 # pre-exponential factor
- Ea = 50000 # J/mol, activation energy
- R = 8.314 # J/(mol·K), gas constant
- time = 3600 # seconds, reaction time
-
- # Calculate rate constant with catalyst effect
- k = A * np.exp(-Ea / (R * temperature)) * (1 + 0.1 * catalyst_amount)
-
- # Calculate yield
- reaction_yield = 100 * (1 - np.exp(-k * concentration * time))
-
- # Ensure yield is between 0 and 100
- reaction_yield = np.clip(reaction_yield, 0, 100)
-
- return {"yield": float(reaction_yield)}
-
-
-# ---------------------------
-# Material Strength Experiment
-# ---------------------------
-
-
-@custom_experiment(
- required_properties=[composition_a, composition_b, temperature_celsius, grain_size],
- output_property_identifiers=["tensile_strength"],
-)
-def calculate_material_strength(
- composition_a: float,
- composition_b: float,
- temperature_celsius: float,
- grain_size: float,
-) -> dict[Literal["tensile_strength"], float]:
- """
- Calculate material tensile strength using Hall-Petch relationship with composition effects.
-
- The strength is calculated using:
- base_strength = composition_a * 500 + composition_b * 300 + (100 - composition_a - composition_b) * 200
- temp_factor = 1 - 0.002 * (temperature_celsius - 20)
- grain_factor = 1 + 100 / sqrt(grain_size)
- tensile_strength = base_strength * temp_factor * grain_factor / 1000
-
- Args:
- composition_a: Percentage of component A (0-100)
- composition_b: Percentage of component B (0-100)
- temperature_celsius: Testing temperature in Celsius
- grain_size: Grain size in micrometers
-
- Returns:
- dict: Dictionary containing the calculated tensile strength in MPa
- """
- # Calculate base strength from composition
- composition_c = 100 - composition_a - composition_b
- base_strength = composition_a * 500 + composition_b * 300 + composition_c * 200
-
- # Temperature effect (strength decreases with temperature)
- temp_factor = 1 - 0.002 * (temperature_celsius - 20)
- temp_factor = np.clip(temp_factor, 0.1, 2.0) # Prevent unrealistic values
-
- # Hall-Petch relationship (strength increases with smaller grain size)
- grain_factor = 1 + 100 / np.sqrt(grain_size)
-
- # Calculate final tensile strength in MPa
- tensile_strength = base_strength * temp_factor * grain_factor / 1000
-
- # Ensure positive strength
- tensile_strength = np.maximum(tensile_strength, 0)
-
- return {"tensile_strength": float(tensile_strength)}
diff --git a/examples/no-priors-characterization/custom_experiments/pyproject.toml b/examples/no-priors-characterization/custom_experiments/pyproject.toml
deleted file mode 100644
index 2c21edf05..000000000
--- a/examples/no-priors-characterization/custom_experiments/pyproject.toml
+++ /dev/null
@@ -1,19 +0,0 @@
-[project]
-name = "no_priors_custom_experiments"
-description = "A set of custom experiments used to test No-Priors Characterization Operation"
-dependencies = [
- "ado-core",
- "numpy",
-]
-dynamic = ["version"]
-
-[project.entry-points."ado.custom_experiments"]
-# This should be python file with your decorated function(s).
-no_priors_experiments = "no_priors_custom_experiments.experiments"
-
-[build-system]
-requires = ["setuptools", "setuptools_scm"]
-build-backend = "setuptools.build_meta"
-
-[tool.setuptools_scm]
-root = "../../../"
diff --git a/examples/no-priors-characterization/example_yamls/op_basic_sampling.yaml b/examples/no-priors-characterization/example_yamls/op_basic_sampling.yaml
deleted file mode 100644
index 566ecf4e1..000000000
--- a/examples/no-priors-characterization/example_yamls/op_basic_sampling.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-spaces:
- - space-c8717f-3a68bf
-operation:
- module:
- operationType: characterize
- operatorName: no_priors_characterization
- parameters:
- targetOutput: yield
- samples: 30
- batchSize: 1
- sampling_strategy: clhs
diff --git a/examples/no-priors-characterization/example_yamls/op_quick_exploration.yaml b/examples/no-priors-characterization/example_yamls/op_quick_exploration.yaml
deleted file mode 100644
index 1d5bea309..000000000
--- a/examples/no-priors-characterization/example_yamls/op_quick_exploration.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-spaces:
- - space-c8717f-3a68bf
-operation:
- module:
- operationType: characterize
- operatorName: no_priors_characterization
- parameters:
- targetOutput: yield
- samples: 20
- batchSize: 5
- sampling_strategy: random
diff --git a/examples/no-priors-characterization/example_yamls/op_thorough_coverage.yaml b/examples/no-priors-characterization/example_yamls/op_thorough_coverage.yaml
deleted file mode 100644
index a2026f891..000000000
--- a/examples/no-priors-characterization/example_yamls/op_thorough_coverage.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-spaces:
- - space-c8717f-3a68bf
-operation:
- module:
- operationType: characterize
- operatorName: no_priors_characterization
- parameters:
- targetOutput: yield
- samples: 100
- batchSize: 1
- sampling_strategy: sobol
diff --git a/examples/no-priors-characterization/example_yamls/space_reaction.yaml b/examples/no-priors-characterization/example_yamls/space_reaction.yaml
deleted file mode 100644
index eea63704b..000000000
--- a/examples/no-priors-characterization/example_yamls/space_reaction.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-sampleStoreIdentifier: 3a68bf
-metadata:
- name: reaction_yield_space
-entitySpace:
- - identifier: temperature
- propertyDomain:
- domainRange: [273, 473]
- interval: 10
- - identifier: concentration
- propertyDomain:
- domainRange: [0.1, 5.0]
- interval: 0.2
- - identifier: catalyst_amount
- propertyDomain:
- domainRange: [0.0, 10.0]
- interval: 0.5
-experiments:
- - actuatorIdentifier: custom_experiments
- experimentIdentifier: calculate_reaction_yield
diff --git a/examples/trim/example_yamls/randomwalk_clhs_operation.yaml b/examples/trim/example_yamls/randomwalk_clhs_operation.yaml
new file mode 100644
index 000000000..35efa1b2e
--- /dev/null
+++ b/examples/trim/example_yamls/randomwalk_clhs_operation.yaml
@@ -0,0 +1,26 @@
+# Copyright IBM Corporation 2025, 2026
+# SPDX-License-Identifier: MIT
+metadata:
+ name: 'randomwalk-sobol'
+ description: 'Perform a random walk using Sobol quasi-random sampling for better space coverage'
+spaces:
+ - space-2fa5d0-2905f9
+operation:
+ module:
+ operatorName: "random_walk"
+ operationType: "search"
+ parameters:
+ numberEntities: 20
+ batchSize: 5
+ singleMeasurement: true
+ samplerConfig:
+ module:
+ moduleName: trim.samplers.no_priors_sampler
+ moduleClass: NoPriorsSampleSelector
+ parameters:
+ targetOutput: pressure
+ samples: 20
+ batchSize: 1
+ sampling_strategy: clhs
+
+# Made with Bob
diff --git a/examples/trim/example_yamls/randomwalk_sobol_operation.yaml b/examples/trim/example_yamls/randomwalk_sobol_operation.yaml
new file mode 100644
index 000000000..19ba89791
--- /dev/null
+++ b/examples/trim/example_yamls/randomwalk_sobol_operation.yaml
@@ -0,0 +1,26 @@
+# Copyright IBM Corporation 2025, 2026
+# SPDX-License-Identifier: MIT
+metadata:
+ name: 'randomwalk-sobol'
+ description: 'Perform a random walk using Sobol quasi-random sampling for better space coverage'
+spaces:
+ - space-2fa5d0-2905f9
+operation:
+ module:
+ operatorName: "random_walk"
+ operationType: "search"
+ parameters:
+ numberEntities: 20
+ batchSize: 5
+ singleMeasurement: true
+ samplerConfig:
+ module:
+ moduleName: trim.samplers.no_priors_sampler
+ moduleClass: NoPriorsSampleSelector
+ parameters:
+ targetOutput: pressure
+ samples: 20
+ batchSize: 1
+ sampling_strategy: sobol
+
+# Made with Bob
diff --git a/plugins/operators/no-priors-characterization/README.md b/plugins/operators/no-priors-characterization/README.md
deleted file mode 100644
index f7bdf3c03..000000000
--- a/plugins/operators/no-priors-characterization/README.md
+++ /dev/null
@@ -1,46 +0,0 @@
-# ADO No-Priors Characterization Operator
-
-`ado-no-priors-characterization` is an operator plugin for the
-[Accelerated Discovery Orchestrator (ADO)](https://github.com/IBM/ado),
-providing initial exploration of discovery spaces using high-dimensional
-sampling strategies.
-
-**No-Priors Characterization** is designed for unbiased exploration when no
-measured data exists yet, establishing an initial dataset for subsequent
-model-based exploration.
-
-## How it Works
-
-The `No-Priors Characterization` operator uses different sampling strategies
-to ensure good coverage of the discovery space:
-
-- **`random`**: Random sampling across the space for unbiased exploration.
- This provides the baseline sampling approach.
-- **`clhs`** (Concatenated Latin Hypercube Sampling): Ensures uniform coverage
- by enforcing stratification in each dimension independently. Each dimension
- cycles through all possible values before repeating.
-- **`sobol`**: Sobol sequence sampling for quasi-random low-discrepancy coverage
-
-The operator retrieves already-measured entities from the discovery space,
-orders the unmeasured entities using the specified sampling strategy,
-and yields entities in batches
-for measurement.
-
-## Installation
-
-You can install the `No-Priors Characterization` operator and its dependencies
-(including `ado-core`) directly from PyPI:
-
-```bash
-pip install ado-no-priors-characterization
-```
-
-## More Information
-
-To learn more about No-Priors Characterization and explore the full
-capabilities of ADO, including detailed documentation, configuration guides, and
-additional examples, visit the official ADO website:
-
-- **No-Priors Quickstart**:
-- **Configuring No-Priors**:
-- **ADO Documentation**:
diff --git a/plugins/operators/no-priors-characterization/pyproject.toml b/plugins/operators/no-priors-characterization/pyproject.toml
deleted file mode 100644
index d9df0afcb..000000000
--- a/plugins/operators/no-priors-characterization/pyproject.toml
+++ /dev/null
@@ -1,30 +0,0 @@
-[project]
-name = "ado-no-priors-characterization"
-description = "No-priors characterization operator for sampling discovery spaces using high-dimensional sampling strategies"
-readme = "README.md"
-requires-python = ">=3.10,<3.14"
-dependencies = [
- "ado-core",
- "numpy",
- "pandas>=2.2.0",
- "scipy",
-]
-dynamic = ["version"]
-
-[project.entry-points."ado.operators"]
-no-priors-characterization = "no_priors_characterization.operator"
-
-[build-system]
-requires = ["setuptools", "setuptools_scm"]
-build-backend = "setuptools.build_meta"
-
-[tool.setuptools.packages.find]
-where = ["src"]
-include = ["no_priors_characterization*"]
-
-[tool.setuptools_scm]
-root = "../../../"
-local_scheme = "node-and-timestamp"
-
-[tool.uv.sources]
-ado-core = { workspace = true }
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/__init__.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/__init__.py
deleted file mode 100644
index ff303fb0e..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/__init__.py
+++ /dev/null
@@ -1,6 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-from no_priors_characterization.operator import no_priors_characterization
-
-__all__ = ["no_priors_characterization"]
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/operator.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/operator.py
deleted file mode 100644
index b6c26b592..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/operator.py
+++ /dev/null
@@ -1,109 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-import logging
-from importlib.metadata import version
-
-from no_priors_characterization.no_priors_pydantic import NoPriorsParameters
-from orchestrator.core.discoveryspace.space import DiscoverySpace
-from orchestrator.core.operation.config import FunctionOperationInfo
-from orchestrator.core.operation.operation import OperationOutput
-from orchestrator.modules.operators.collections import characterize_operation
-
-logger = logging.getLogger(__name__)
-
-
-@characterize_operation(
- name="no_priors_characterization",
- configuration_model=NoPriorsParameters,
- configuration_model_default=NoPriorsParameters(targetOutput="default_target"),
- description="""
- No-priors characterization samples a discovery space using high-dimensional
- sampling strategies (random, CLHS, Sobol, etc.) without relying on prior
- model knowledge or feature importance. This operator is useful for initial
- exploration of discovery spaces when no training data exists yet.
- """,
- version=version("ado-no-priors-characterization"),
-)
-def no_priors_characterization(
- discoverySpace: DiscoverySpace = None, # type: ignore[name-defined]
- operationInfo: FunctionOperationInfo | None = None,
- **kwargs: object,
-) -> OperationOutput:
- """
- Execute no-priors characterization on a discovery space.
-
- Samples entities using high-dimensional sampling strategies without requiring
- prior model training or feature importance information. Useful for initial
- characterization when no measured data exists.
-
- Args:
- discoverySpace: The discovery space to characterize
- operationInfo: Optional operation metadata
- **kwargs: Additional parameters validated against NoPriorsParameters model
-
- Returns:
- OperationOutput containing the operation resources and metadata
- """
- # Lazy import to avoid circular import issues during plugin loading
- import orchestrator.modules.operators.randomwalk # noqa: F401 — registers explore.random_walk
- from orchestrator.modules.operators.collections import explore
- from orchestrator.modules.operators.randomwalk import (
- CustomSamplerConfiguration,
- RandomWalkParameters,
- SamplerModuleConf,
- )
-
- random_walk = explore.operators["random_walk"].function
-
- params = NoPriorsParameters.model_validate(kwargs)
- logger.info(
- f"No-priors characterization starts. Target variable = {params.targetOutput}"
- )
- logger.info(f"Parameters: {params}")
-
- # Configure the no-priors sampler
- no_priors_module = SamplerModuleConf(
- moduleClass="NoPriorsSampleSelector",
- moduleName="no_priors_characterization.no_priors_sampler",
- )
-
- no_priors_sampler_config = CustomSamplerConfiguration(
- module=no_priors_module, parameters=params
- )
-
- no_priors_random_walk_params = RandomWalkParameters(
- samplerConfig=no_priors_sampler_config,
- batchSize=params.batchSize,
- numberEntities=params.samples,
- singleMeasurement=True,
- )
-
- # Execute the random walk with the no-priors sampler
- from orchestrator.core.metadata import ConfigurationMetadata
-
- # Create metadata with custom fields for tracking no-priors parameters
- metadata = ConfigurationMetadata(
- name="No-priors characterization",
- description=f"No-priors characterization using {params.sampling_strategy} strategy with {params.samples} samples",
- )
- # Add custom fields using extra="allow" in ConfigurationMetadata
- metadata.sampling_strategy = params.sampling_strategy # type: ignore[attr-defined]
- metadata.samples = params.samples # type: ignore[attr-defined]
-
- updated_operation_info = FunctionOperationInfo(
- metadata=metadata,
- actuatorConfigurationIdentifiers=(
- operationInfo.actuatorConfigurationIdentifiers if operationInfo else []
- ),
- )
-
- op_output = random_walk(
- discoverySpace=discoverySpace,
- operationInfo=updated_operation_info,
- **no_priors_random_walk_params.model_dump(),
- )
-
- logger.info("No-priors characterization completed")
-
- return op_output
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/__init__.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/__init__.py
deleted file mode 100644
index deb2683da..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/__init__.py
+++ /dev/null
@@ -1,33 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-# Export commonly used utilities for easier imports
-from no_priors_characterization.utils.high_dimensional_sampling import (
- concatenated_latin_hypercube_sampling,
- get_sampling_indices_multi_dimensional,
-)
-from no_priors_characterization.utils.one_dimensional_sampling import (
- get_index_list_ordered_partitions,
- get_index_list_van_der_corput,
-)
-from no_priors_characterization.utils.space_df_connector import (
- get_df_all_entities_no_measurements,
- get_list_of_entities_from_df_and_space,
- get_project_context,
- get_source_and_target,
- get_space,
-)
-from orchestrator.utilities.pandas import sort_rows_by_column_names
-
-__all__ = [
- "concatenated_latin_hypercube_sampling",
- "get_df_all_entities_no_measurements",
- "get_index_list_ordered_partitions",
- "get_index_list_van_der_corput",
- "get_list_of_entities_from_df_and_space",
- "get_project_context",
- "get_sampling_indices_multi_dimensional",
- "get_source_and_target",
- "get_space",
- "sort_rows_by_column_names",
-]
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/high_dimensional_sampling.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/high_dimensional_sampling.py
deleted file mode 100644
index a77bbabb4..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/high_dimensional_sampling.py
+++ /dev/null
@@ -1,338 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-import logging
-import math
-import random
-from typing import Literal
-
-import numpy as np
-from scipy.stats.qmc import Sobol
-
-from no_priors_characterization.utils.one_dimensional_sampling import (
- get_index_list_van_der_corput,
-)
-
-logger_high_dimensional = logging.getLogger(__name__)
-
-
-def concatenated_latin_hypercube_sampling(
- dimensions: list[int],
- final_sample_size: int,
- seed: int | None = None,
-) -> list[list[int]]:
- """
- Generates samples using a Concatenated Latin Hypercube Sampling strategy.
-
- For each dimension independently, this method enforces a 1D stratification
- (Latin Hypercube property) by generating random permutations of the
- possible values. If the number of requested samples 'final_sample_size' exceeds the cardinality
- of a dimension, new random permutations are concatenated to the sequence.
-
- This guarantees that for any dimension j with size d_j, every sequence
- of d_j samples contains exactly one instance of every value in range(d_j).
-
- Args:
- dimensions (List[int]): Cardinality (size) of each dimension. Must be positive.
- final_sample_size (int): Total number of points to sample.
- seed (Optional[int]): Optional PRNG seed for reproducibility.
-
- Returns:
- List[List[int]]: A list of final_sample_size sampled points, where each point is a
- list of indices corresponding to the dimensions.
-
- Raises:
- ValueError: If any dimension size is less than 1.
- """
- if any(d <= 0 for d in dimensions):
- raise ValueError(
- f"All dimensions must be >= 1, received dimensions={dimensions}"
- )
-
- if final_sample_size <= 0:
- return []
-
- # Use default RNG when seed is not provided, otherwise create seeded instance
- rng = random.Random() if seed is None else random.Random(seed) # noqa: S311
-
- # Per-dimension pools: active permutation for the current block.
- # We maintain the Latin Hypercube property by sampling without replacement.
- pools: list[list[int]] = [list(range(d)) for d in dimensions]
- samples: list[list[int]] = []
-
- for _ in range(final_sample_size):
- point: list[int] = []
- for j, d in enumerate(dimensions):
- # If the current permutation block is exhausted, start a new one (new cycle).
- if not pools[j]:
- pools[j] = list(range(d))
-
- # Select a random element from the remaining pool for this block.
- k = rng.randrange(len(pools[j]))
- value = pools[j].pop(k)
- point.append(value)
-
- samples.append(point)
-
- return samples
-
-
-# NOTE: preliminary tests on collision reveal that if final_sample_size is half of the product of dimensions collisions are rare
-def sobol_sampling(
- dimensions: list[int], final_sample_size: int, seed: int | None = None
-) -> list[list[int]]:
- """
- Generates Sobol sampled points scaled to integer dimensions.
-
- This function uses a Sobol sequence to generate points in the unit hypercube [0, 1)^d,
- scales them to the specified integer dimensions, and checks for collisions. If collisions
- occur (duplicate points), it falls back to Concatenated Latin Hypercube Sampling.
-
- Args:
- dimensions (list[int]): A list of integers representing the size (cardinality) of each dimension.
- final_sample_size (int): The number of points to sample.
- seed (int | None, optional): Random seed for the Sobol scrambler. Defaults to None.
-
- Returns:
- list[list[int]]: A list of final_sample_size points, where each point is a list of integer coordinates.
- """
- # Sobol generates points in [0, 1). We scale them to the integer dimensions.
-
- sampler = Sobol(d=len(dimensions), scramble=True, rng=seed)
- points = sampler.random(final_sample_size)
-
- # Scale and floor to get integer indices
- discrete_points = [
- [int(val * d) for val, d in zip(p, dimensions, strict=True)] for p in points
- ]
-
- # Check for collisions
- # Convert inner lists to tuples because lists are unhashable and cannot be used in a set
- unique_points = {tuple(p) for p in discrete_points}
- n_collisions = final_sample_size - len(unique_points)
-
- if n_collisions > 0:
- logger_high_dimensional.error(
- f"Sobol sampling failed, {n_collisions} collisions detected, defaulting to clhs sampling"
- )
- return concatenated_latin_hypercube_sampling(
- dimensions=dimensions, final_sample_size=final_sample_size, seed=seed
- )
-
- return discrete_points
-
-
-# TODO: test this function
-def distinct_sobol_sampling(
- dimensions: list[int], final_sample_size: int, seed: int | None = None
-) -> list[list[int]]:
- """
- Generates 'n' distinct points on a grid of size 'dimensions' using a Sobol sequence.
- Guarantees no collisions by skipping duplicates in the sequence.
- """
- # 1. Safety Check: Is the grid big enough?
- total_capacity = np.prod(dimensions)
- if final_sample_size > total_capacity:
- raise ValueError(
- f"Cannot generate {final_sample_size} distinct points: Grid only has {total_capacity} cells."
- )
-
- # 2. Setup Sobol
- # We scramble to get better coverage.
- sampler = Sobol(d=len(dimensions), scramble=True, rng=seed)
-
- unique_points = set()
- results = []
-
- # 3. Iterative Generation
- # We generate in batches to be efficient.
- # Start with a batch larger than N to account for potential rejections.
- batch_size = max(final_sample_size * 2, 64)
-
- while len(results) < final_sample_size:
- # Draw a batch of float points [0, 1)
- raw_points = sampler.random(batch_size)
-
- for p in raw_points:
- # Discretize: Map [0, 1) -> Integer coordinates
- # Using int(x * dim) scales it to the grid index [0, dim-1]
- coord = tuple([int(p[i] * dimensions[i]) for i in range(len(dimensions))])
-
- # Check Uniqueness
- if coord in unique_points:
- continue
-
- unique_points.add(coord)
- results.append(list(coord))
-
- # Stop immediately if we have enough
- if len(results) == final_sample_size:
- return results
-
- # If we need more points, increase batch size for next iteration
- # (helpful if the grid is nearly full and collisions are frequent)
- batch_size *= 2
-
- return results
-
-
-def random_high_dimensional_sampling(
- dimensions: list[int], final_sample_size: int, seed: int | None = None
-) -> list[list[int]]:
- """
- Generate n unique random samples from a high-dimensional space.
-
- Args:
- dimensions: Cardinality (size) of each dimension. Must be positive.
- final_sample_size: Total number of points to sample.
- seed: Optional PRNG seed for reproducibility.
-
- Returns:
- List of final_sample_size sampled points, each point is a list of indices
-
- Raises:
- ValueError: If final_sample_size exceeds the total number of possible configurations
- """
- import itertools
- import random
- from math import prod
-
- # Set the seed for the random number generator
- if seed is not None:
- random.seed(seed)
-
- # Check if the number of requested samples is valid
- num_configs = prod(dimensions)
- if final_sample_size > num_configs:
- raise ValueError(
- f"Cannot generate {final_sample_size} unique samples. "
- f"The sample space only contains {num_configs} possibilities."
- )
-
- # This still creates all combinations in memory, which is a limitation
- # for extremely large dimensional spaces.
- configs = list(itertools.product(*[range(d) for d in dimensions]))
-
- # Ensure we don't try to sample more than available
- actual_sample_size = min(final_sample_size, len(configs))
- if actual_sample_size < final_sample_size:
- import logging
-
- logger = logging.getLogger(__name__)
- logger.warning(
- f"Requested {final_sample_size} samples but only {len(configs)} unique "
- f"configurations available. Sampling {actual_sample_size} instead."
- )
-
- # random.sample is highly optimized for this task.
- # It's much faster than manually choosing and removing.
- samples = random.sample(configs, actual_sample_size)
-
- return [list(s) for s in samples]
-
-
-def get_sampling_indices_multi_dimensional(
- dimensions: list[int],
- n: int | Literal["all", "max"],
- space: dict[str, int] | None = None,
- strategy: Literal["random", "clhs", "sobol"] = "clhs",
- seed: int | None = None,
-) -> list[list[int]]:
- """
- Generate sampling indices for a high-dimensional space using `get_index_list_van_der_corput` for each dimension.
-
- Args:
- dimensions (List[int]): Sizes of each dimension (e.g., [8, 5]).
- n (int | str): Number of points to sample:
- - 'all': sample all possible combinations (product of dimensions)
- - 'max': sample up to max(dimensions)
- strategy (str): sampling subroutine:
- - 'random': selects random points from the beginning
- - 'clhs': refer to concatenated_latin_hypercube_sampling
- - 'sobol': sobol sampling
-
- space (Optional[Dict[str, int]]): Optional mapping of dimension names to sizes (used only for logging/debug purposes).
- Example:
- space = {'batch_size': 8, 'model_name': 5}
- seed (Optional[int]): controls the randomness
-
- note: strategies may have an upper bound on the number of elements that respect the strategy that they can return
- if this number is exceeded, they resort to random sampling.
-
- Returns:
- List[List[int]]: Outer list length = n (or product of dimensions if n='all').
- Each inner list contains one sampled combination across dimensions.
- """
-
- # Set the seed for the random number generator
- if seed is not None:
- random.seed(seed)
-
- # Log space details if provided
- if space:
- indices_dict = {
- k: get_index_list_van_der_corput(v, v) for k, v in space.items()
- }
- if [len(indices) for indices in list(indices_dict.values())] != dimensions:
- logger_high_dimensional.error(
- f"A space dict has been provided ->{space}. It is inconsistent with dimensions={dimensions}"
- )
- logger_high_dimensional.warning(
- f"list(indices_dict.values()) = {list(indices_dict.values())}"
- )
- raise ValueError("Space has inconsistent dimensions!")
- logger_high_dimensional.info(
- "Sampling indices for each named dimension (ordered low to high): %s",
- indices_dict,
- )
-
- # Compute sampling orders for each dimension
- orders = [get_index_list_van_der_corput(v, v) for v in dimensions]
-
- if logger_high_dimensional.isEnabledFor(logging.DEBUG):
- logger_high_dimensional.debug("Dimensions: %s", dimensions)
- logger_high_dimensional.debug("Sampling orders for each dimension:")
- for i, o in enumerate(orders):
- logger_high_dimensional.debug("Dimension %d order: %s", i, o)
-
- # Calculate maximum possible samples
- maximum_n = 1
- for d in dimensions:
- maximum_n *= d
- lcm = math.lcm(*dimensions)
-
- if lcm != maximum_n:
- logger_high_dimensional.debug(
- "Periodicity detected, the sampling subroutine will ensure that you will not sampple"
- "the same configuration more than once."
- )
-
- if isinstance(n, str):
- if n == "all":
- n = maximum_n
- elif n == "max":
- n = max(dimensions)
- else:
- raise ValueError(f"Unrecognized string for n: {n}")
-
- if n > maximum_n:
- logger_high_dimensional.warning(
- f"Maximal sample size is {maximum_n}, you requested {n} sampling presciptions."
- f"Elaborating prescription for n_samples = {maximum_n}"
- )
-
- logger_high_dimensional.debug(
- "Preparing to sample %d out of %d possible points.", n, maximum_n
- )
-
- match strategy:
- case "random":
- return random_high_dimensional_sampling(dimensions, n, seed=seed)
- case "clhs":
- return concatenated_latin_hypercube_sampling(
- dimensions=dimensions, final_sample_size=n, seed=seed
- )
- case "sobol":
- return sobol_sampling(dimensions=dimensions, final_sample_size=n, seed=seed)
- case _:
- raise NotImplementedError(f"Strategy {strategy} is unknown")
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/one_dimensional_sampling.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/one_dimensional_sampling.py
deleted file mode 100644
index e5b28e625..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/one_dimensional_sampling.py
+++ /dev/null
@@ -1,293 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-#
-import logging
-
-logger = logging.getLogger(__name__)
-
-
-def get_index_list_van_der_corput(
- length_segment: int,
- tot_points_to_sample: int,
- sampled_indices: list[int] | None = None,
- sort: bool = False,
- verbose: bool = False,
-) -> list[int]:
- """
- Selects a set of indices from a 1D segment using a deterministic sampling strategy.
- It is a modified Van der Corput Sequence
-
- :param length_segment: Total number of units in the 1D segment.
- :type length_segment: int
- :param tot_points_to_sample: Total number of indices to sample.
- :type tot_points_to_sample: int
- :param sampled_indices: List of indices already sampled. Defaults to an empty list.
- :type sampled_indices: list[int], optional
- :param sort: If True, returns the final list sorted in ascending order. Defaults to False.
- :type sort: bool, optional
- :param verbose: If True, prints debug information during sampling. Defaults to False.
- :type verbose: bool, optional
-
- :raises ValueError: If `tot_points_to_sample` exceeds `length_segment`.
-
- :return: A list of sampled indices satisfying the distribution strategy.
- :rtype: list[int]
-
- ## Additional Observations and examples
- This function assumes that the data has been projected into a 1D segment based on feature importance,
- making it isomorphic to a 1d segment. The goal is to sample `tot_points_to_sample` indices from this segment,
- optionally considering a set of already sampled indices (`sampled_indices`). The strategy ensures that the
- selected points are well-distributed and structurally balanced, akin to placing support ropes on a beam to
- prevent collapse.
-
- The metaphor used is that of a beam suspended by ropes. Initially, ropes are placed at the extremities (indices 0 and `length_segment - 1`)
- to ensure boundary support. Additional ropes (sampled points) are added iteratively at the midpoint of the longest unsampled intervals.
- In cases of symmetry or multiple equally sparse regions, the algorithm evaluates local neighborhood density to prioritize selection.
-
-
- For example, consider a segment of 14 elements (get_index_list_van_der_corput(14,8)):
-
- ::
-
- Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
- Sample: 1 - 8 5 - 7 3 - - 4 - 6 - 2
-
- Here, numbers in the bottom row represent the order in which each point is added, and `-` indicates unsampled positions.
- The algorithm ensures that each new point is placed where it maximally improves the balance of the structure,
- often targeting the midpoint of the largest gaps.
-
- :examples:
-
- >>> get_index_list_van_der_corput(5, 3, sampled_indices=[0, 4])
- [0, 2, 4]
-
- >>> get_index_list_van_der_corput(10, 4, sampled_indices=[0, 4, 9])
- [0, 4, 6, 9]
-
- This strategy is particularly useful in optimization settings where boundary coverage and balanced sampling are important.
- """
-
- if tot_points_to_sample == 0:
- return []
-
- if tot_points_to_sample > length_segment:
- raise ValueError(
- "ValueError: You are trying to sample more points than those that are available"
- )
-
- if sampled_indices is None:
- sampled_indices = []
-
- if len(sampled_indices) == length_segment:
- maximal_indices_list = list(range(length_segment))
- if sampled_indices.sort() != maximal_indices_list:
- logging.error(
- "Sampled indices do not correspond to [0,..., max_n_indices -1]"
- "Returning list(range(max_n_indices)"
- )
- return maximal_indices_list
-
- if len(sampled_indices) > tot_points_to_sample:
- logging.warning(
- "Number of sampled indices is greater than the number of indices you want to sample"
- "Returning sampled indices"
- )
- return sampled_indices
-
- index_list = list(sampled_indices)
- sampled_set = set(index_list)
-
- for point in [0, length_segment - 1]:
- if point not in sampled_set:
- index_list.append(point)
- sampled_set.add(point)
- if len(index_list) == tot_points_to_sample:
- return sorted(index_list)
-
- def build_prefix_and_len(index_list: list[int]) -> tuple[list[int], int]:
- """
- Builds prefix sums over a truncated mask: M = max(index_list)+1.
- prefix[j] = sum(mask[0:j]) with prefix length M+1.
- """
- if not index_list:
- return [0], 0
-
- M = max(index_list) + 1
-
- # You must define sampled_set based on the input list
- sampled_set = set(index_list)
-
- prefix = [0] * (M + 1)
- s = 0
-
- for i in range(M):
- # i represents the current index in the imaginary mask array
- s += 1 if i in sampled_set else 0
- prefix[i + 1] = s
-
- return prefix, M
-
- def get_list_min_weight(
- prefix: list[int], M: int, d: int, selectable_indices: list[int]
- ) -> list[int]:
- """
- uses prefix sums instead of numpy.mean.
- Only considers indices i in selectable_indices intersected with [0, M-1],
- and preserves ascending order for ties exactly like the OG.
- """
- # cmpute mean densities and track min
- # We must preserve order: OG loops i = 0..M-1 and filters by membership.
- # Achieve the same by iterating selectable_indices (which we build in ascending order)
- # but breaking when i >= M.
- vals = {}
- for i in selectable_indices:
- if i >= M:
- break
- left = i - d
- right = i + d
- if left < 0:
- left = 0
- if right >= M:
- right = M - 1
- total = prefix[right + 1] - prefix[left]
- denom = right - left + 1
- mean = total / denom # float64-equivalent - matches numpy.mean on booleans
- vals[i] = mean
-
- if not vals:
- return []
-
- min_val = min(vals.values())
- # preserving order of candidates as OG: ascending index order
- out = []
- for i in selectable_indices:
- if i >= M:
- break
- if vals.get(i) == min_val:
- out.append(i)
- return out
-
- def get_selectable_indices() -> list[int]:
- # OG did O(N*m) with "i not in list", but we do O(N) with a set, but order identical.
- return [i for i in range(length_segment) if i not in sampled_set]
-
- max_d = length_segment
-
- # main loop
- while len(index_list) < tot_points_to_sample:
- selection = 0
- selectable_indices = get_selectable_indices()
-
- # prefix sums for the current (truncated) mask once per outer iteration
- prefix, M = build_prefix_and_len(index_list=index_list)
-
- d = 1
- # keeping "previous set" semantics exactly (used when l becomes empty)
- previous_set = selectable_indices
-
- while selection == 0:
- indices = get_list_min_weight(prefix, M, d, selectable_indices)
-
- if not indices:
- # Exact OG behavior: pick first element of the previous set
- # when the intersection is empty at this d.
- if not previous_set:
- raise ValueError(
- "Previous candidate set should not be empty or None"
- )
- if verbose:
- logger.info(
- f"No intersection found with d={d}. Using the previous set "
- f"Appending to {index_list} the first element of {previous_set}"
- )
- chosen = previous_set[0]
- index_list.append(chosen)
- sampled_set.add(chosen)
- selection = 1
-
- else:
- # narrowing minimal-density set
- previous_set = selectable_indices
- selectable_indices = indices
-
- if len(selectable_indices) == 1 or d == max_d:
- # pick the first element (ascending order preserved)
- if verbose:
- logger.info(
- f"Appending to {index_list} the first element of {selectable_indices}"
- )
- chosen = selectable_indices[0]
- index_list.append(chosen)
- sampled_set.add(chosen)
- selection = 1
-
- # OG increments d regardless it's immaterial after selection, but we mirror it
- d += 1
-
- if sort:
- return sorted(index_list)
- return index_list
-
-
-def get_index_list_ordered_partitions(n: int, tot_points: int) -> list[int]:
- """
- Select indices from a 1D segment using a partition-based sampling strategy.
-
- The data is treated as isomorphic to a 1D segment ordered by feature importance.
- Points are selected by iteratively finding midpoints of the largest gaps.
-
- Args:
- n: Total length of the segment (len(df)), valid indices are 0 to n-1
- tot_points: Number of points to sample
-
- Returns:
- Sorted list of sampled indices
-
- Raises:
- ValueError: If tot_points exceeds n
- """
- if tot_points == 0:
- logger.debug("No points selected from the list, return empty list")
- return []
- if tot_points > n:
- raise ValueError
- if tot_points == 1:
- return [0]
- index_list = [n - 1, 0]
- number_of_inner_points_sampled = 0
- while number_of_inner_points_sampled + 2 < tot_points:
- l_copy_sorted = index_list.copy()
- l_copy_sorted.sort()
- l_copy = index_list.copy()
- for _i, el in enumerate(l_copy[1:]):
- start = el
- index_seen = l_copy_sorted.index(el)
- end = l_copy_sorted[index_seen + 1]
- mid = midpoint(start=start, end=end)
- if mid in index_list:
- continue
- number_of_inner_points_sampled += 1
- index_list.append(mid)
- if number_of_inner_points_sampled + 2 == tot_points:
- break
- index_list.sort()
- return index_list
-
-
-def midpoint(start: int, end: int) -> int:
- """
- Calculate the midpoint between two indices.
-
- Args:
- start: Starting index
- end: Ending index
-
- Returns:
- Integer midpoint index
-
- Raises:
- ValueError: If start is greater than end
- """
- if end - start < 0:
- raise ValueError("Start is greater than end!")
- return start + ((end - start) // 2)
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/order.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/order.py
deleted file mode 100644
index 5ff4e320e..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/order.py
+++ /dev/null
@@ -1,247 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-import itertools
-import logging
-import math
-from typing import Literal
-
-import numpy as np
-import pandas as pd
-
-from no_priors_characterization.utils.high_dimensional_sampling import (
- get_sampling_indices_multi_dimensional,
-)
-
-logger = logging.getLogger(__name__)
-
-
-def order_df_for_sampling_with_no_priors(
- df: pd.DataFrame,
- constitutive_properties: list[str],
- n: int,
- strategy: Literal["random", "clhs", "sobol"],
-) -> pd.DataFrame:
- """
- Orders a DataFrame for high-dimensional sampling without prior knowledge.
-
- Deduplicates rows based on constitutive properties, orders them for sampling,
- and returns a subset of n samples using the specified strategy.
-
- Args:
- df: Input dataset containing at least the columns specified in
- constitutive_properties. May contain duplicate configurations.
- constitutive_properties: Column names defining the configuration space.
- Uniqueness is enforced over the Cartesian product of these properties.
- n: Number of samples to generate. Adjusted if larger than available
- unique configurations.
- strategy: Sampling strategy - "random", "clhs", or "sobol".
-
- Returns:
- DataFrame with n sampled rows, preserving the original column schema.
- Index is positional (0..n-1).
-
- Raises:
- ValueError: If n <= 0 after adjustment or no samples are available.
- """
-
- # Filtering
- len_original = len(df)
- df_unique = df.drop_duplicates(subset=constitutive_properties).reset_index(
- drop=True
- )
- delta_len = len_original - len(df_unique)
- if delta_len > 0:
- logging.warning(
- f"Removing {delta_len} duplicate configurations."
- f"They are characterized by the same combination of constitutive properties = {constitutive_properties}"
- )
-
- if n > len(df_unique):
- logging.warning(
- f"Requested {n} samples, but DataFrame has only {len(df_unique)} rows. Adjusting n to {len(df_unique)}."
- )
- n = len(df_unique)
-
- if n <= 0:
- logging.error(
- f"No samples available to select. DataFrame has {len(df_unique)} rows and {n} samples were requested."
- )
- # Return empty DataFrame with same columns as input
- return pd.DataFrame(columns=df_unique.columns)
-
- # Build dictionaries
- def _get_sorted_uniques(prop: str) -> list:
- """Helper to safely sort unique values for a property."""
- vals = df_unique[prop].unique()
- try:
- return sorted(vals)
- except TypeError:
- logging.warning(
- f"Cannot sort mixed types for property '{prop}'. "
- "Keeping original order."
- )
- return list(vals)
-
- value_dict = {prop: _get_sorted_uniques(prop) for prop in constitutive_properties}
-
- space_dict = {prop: len(vals) for prop, vals in value_dict.items()}
-
- dimensions = list(space_dict.values())
-
- # Order DataFrame for index mapping
- df_unique = order_df_for_get_index_list_nn_high_dimensional(
- df_unique, constitutive_properties, dimensions=dimensions
- ).reset_index(drop=True)
-
- # Generate sampling orders
- orders_to_sample = get_sampling_indices_multi_dimensional(
- dimensions=dimensions, space=space_dict, n=n, strategy=strategy
- )
-
- # Map orders to DataFrame indices
- indices_to_sample = get_index_list_nn_high_dimensional(orders_to_sample, dimensions)
-
- logger.info(f"Indexes are:\n {indices_to_sample}")
- try:
- return df_unique.iloc[indices_to_sample]
- except IndexError:
- logging.error(
- f"Index Error detected. Length of the dataframe is {len(df_unique)}."
- "The indices that cause the error are:"
- )
- max_len = len(df_unique)
- out_of_bounds_list = [i for i in indices_to_sample if i < 0 or i >= max_len]
-
- logging.error(out_of_bounds_list)
- logging.error("Returning empty dataset")
- return pd.DataFrame({})
-
-
-def order_df_for_get_index_list_nn_high_dimensional(
- df: pd.DataFrame, constitutive_properties: list[str], dimensions: list[int]
-) -> pd.DataFrame:
- """
- Ensure DataFrame is ordered and complete for high-dimensional index generation.
-
- Prepares the DataFrame so rows align with the Cartesian product implied by
- constitutive_properties and dimensions. Sorts rows, validates completeness,
- and injects missing combinations if needed.
-
- Args:
- df: Input DataFrame containing at least the columns in constitutive_properties.
- constitutive_properties: Column names defining the high-dimensional space.
- Order determines sort priority.
- dimensions: Expected cardinality for each constitutive property.
- Used to compute expected_len = product(dimensions).
-
- Returns:
- DataFrame sorted by constitutive_properties and augmented with any missing
- combinations. Injected rows have NaN for non-constitutive columns.
-
- Notes:
- If dimensions and actual unique values disagree, uses observed unique
- values to generate combinations.
- """
- # Sort by constitutive properties
- df = df.sort_values(by=constitutive_properties).reset_index(drop=True)
-
- expected_len = math.prod(dimensions)
-
- # Return early if already complete
- if len(df) == expected_len:
- return df
-
- # Generate all possible combinations based on actual unique values
- unique_values = [
- sorted(df[prop].dropna().unique()) for prop in constitutive_properties
- ]
- all_combinations = list(itertools.product(*unique_values))
- actual_expected_len = len(all_combinations)
-
- logger.warning(
- f"DataFrame length mismatch: expected {expected_len} (product of {dimensions}), "
- f"but got {len(df)}. Actual unique combinations: {actual_expected_len}."
- )
-
- # Identify existing combinations
- existing_combinations = {
- tuple(row[prop] for prop in constitutive_properties) for _, row in df.iterrows()
- }
-
- # Find missing combinations
- missing_combinations = [
- comb for comb in all_combinations if comb not in existing_combinations
- ]
-
- if missing_combinations:
- logger.info(
- f"Injecting {len(missing_combinations)} missing rows to satisfy the property."
- )
- injected_rows = []
- for comb in missing_combinations:
- row_data = dict(zip(constitutive_properties, comb, strict=False))
- # Fill other columns with NaN
- for col in df.columns:
- if col not in constitutive_properties:
- row_data[col] = pd.NA
- injected_rows.append(row_data)
-
- # Append missing rows
- df = pd.concat([df, pd.DataFrame(injected_rows)], ignore_index=True)
-
- # Sort again after injection
- df = df.sort_values(by=constitutive_properties).reset_index(drop=True)
-
- logger.info(f"Injected rows: {injected_rows}")
-
- return df
-
-
-def get_index_list_nn_high_dimensional(
- orders_to_sample: list[list[int]], dimensions: list[int]
-) -> list[int]:
- """
- Map high-dimensional sampling orders to linear (flattened) indices.
-
- Converts multi-dimensional coordinates to linear indices using row-major ordering,
- where the last dimension varies fastest.
-
- Args:
- orders_to_sample: List of multi-dimensional coordinates [i0, i1, ..., ik]
- dimensions: Size of each dimension [d0, d1, ..., dk]
-
- Returns:
- List of linear indices corresponding to the input coordinates
-
- Warns:
- If duplicate or out-of-bounds indices are detected
- """
- indices = []
- cprod = np.cumprod(np.array(dimensions), dtype=int).tolist()
- maximum_n = cprod[-1]
-
- for order in orders_to_sample:
- index = 0
- multiplier = 1
- # Iterate reversed so last dimension varies fastest
- for i in reversed(range(len(dimensions))):
- index += order[i] * multiplier
- multiplier *= dimensions[i]
-
- if index > maximum_n:
- logging.warning(
- f"Out of bound index {index} computed from order {order}, dimensions are {dimensions}"
- )
- indices.append(index)
-
- if len(set(indices)) != len(indices):
- logger.error(f"{len(indices) - len(set(indices))} Duplicated indices!")
-
- out_of_bounds_list = [i for i in indices if i > maximum_n]
- if out_of_bounds_list:
- logger.error(
- f"The following indices are out of bound: {out_of_bounds_list}, maximum admissible value is {maximum_n-1}"
- )
-
- return indices
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/space_df_connector.py b/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/space_df_connector.py
deleted file mode 100644
index 9c29c2fa3..000000000
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/utils/space_df_connector.py
+++ /dev/null
@@ -1,524 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-from __future__ import annotations
-
-import logging
-from typing import TYPE_CHECKING, Any
-
-import pandas as pd
-
-from orchestrator.core.discoveryspace.space import DiscoverySpace
-from orchestrator.schema.virtual_property import PropertyAggregationMethodEnum
-
-if TYPE_CHECKING:
- from collections.abc import Hashable
-
- from orchestrator.metastore.project import ProjectContext
- from orchestrator.schema.entity import Entity
-
-logger = logging.getLogger(__name__)
-
-
-def get_project_context() -> ProjectContext:
- """
- Retrieve the current ADO project context from configuration.
-
- Returns:
- ProjectContext object for the active project
- """
- import orchestrator.cli.core.config
-
- ado_configuration = orchestrator.cli.core.config.AdoConfiguration.load()
- return ado_configuration.project_context # type: ignore[name-defined]
-
-
-def get_space(
- space_or_space_id: DiscoverySpace | str,
-) -> DiscoverySpace:
- """
- Get a DiscoverySpace object from either a space object or identifier string.
-
- Args:
- space_or_space_id: Either a DiscoverySpace object or its string identifier
-
- Returns:
- DiscoverySpace object
- """
-
- if isinstance(space_or_space_id, DiscoverySpace):
- return space_or_space_id
-
- return DiscoverySpace.from_stored_configuration(
- project_context=get_project_context(),
- space_identifier=space_or_space_id,
- )
-
-
-# %%
-
-
-def get_df_all_entities_no_measurements(
- discoverySpace: DiscoverySpace | str,
-) -> pd.DataFrame:
- """
- Return a DataFrame of all entities in the given Discovery Space, regardless of whether
- they have any mea sured target outputs.
-
- - Each row represents an entity from the entity space.
- - Includes the entity identifier and all constitutive property values.
- - Does NOT include any measured target outputs (only features).
- - Useful for generating the full feature set for prediction or backfilling missing measurements.
-
- Parameters
- ----------
- discoverySpace : DiscoverySpace | str
- The Discovery Space object or its identifier.
- targetOutput_list : list, optional
- List of target output names (ignored in this function, included for API consistency).
-
- Returns
- -------
- pd.DataFrame
- DataFrame with columns: ['identifier', ].
- """
-
- space = get_space(space_or_space_id=discoverySpace)
-
- entity_space = space.entitySpace
- cp_ids = [cp.identifier for cp in entity_space.constitutiveProperties]
-
- list_of_dicts_to_convert = []
- for point_values in entity_space.sequential_point_iterator():
- point_dict = dict(zip(cp_ids, point_values, strict=True))
- entity = entity_space.entity_for_point(point_dict)
- ed = {"identifier": entity.identifier}
- ed.update(point_dict)
- list_of_dicts_to_convert.append(ed)
-
- return pd.DataFrame(list_of_dicts_to_convert)
-
-
-def get_df_at_least_one_measured_value(
- discoverySpace: DiscoverySpace | str,
- targetOutput_list: list[str] | None = None,
- add_measurement_id: bool = False,
-) -> pd.DataFrame:
- """
- Return a DataFrame of entities that have at least one measured target output from the
- provided list, aggregated across all experiments in the Discovery Space.
-
- - Each row represents an entity with measurements.
- - Includes identifier (optional), constitutive properties, and the requested target outputs.
- - Drops rows with missing values for the selected targets.
- - May Return an empty DataFrame
-
- Parameters
- ----------
- discoverySpace : DiscoverySpace | str
- The Discovery Space object or its identifier.
- targetOutput_list : list
- List of target output names to include in the DataFrame.
- add_measurement_id : bool
- If True, include the entity identifier column in the output.
-
- Returns
- -------
- pd.DataFrame
- DataFrame with columns: ['identifier' (optional), , ].
- """
-
- if not targetOutput_list:
- targetOutput_list = []
- space = get_space(space_or_space_id=discoverySpace)
- col_list = [cp.identifier for cp in space.entitySpace.constitutiveProperties]
- if add_measurement_id:
- col_list = ["identifier", *col_list]
-
- discoverySpace.sample_store.refresh()
-
- df = pd.DataFrame(
- space.matchingEntitiesTable(
- property_type="target",
- aggregationMethod=PropertyAggregationMethodEnum.mean,
- )
- )
-
- if df.empty:
- # NOTE: this condition is hit when there are no measurements at all existing in the space
- logger.warning(
- "No measured properties found in the discovery space\nReturning empty DataFrame\n "
- )
- return df
-
- all_df_cols = list(df.columns)
- valid_targetOutput_list = []
- for el in targetOutput_list:
- if el in all_df_cols:
- valid_targetOutput_list.append(el)
- elif f"{el}-mean" in all_df_cols and el not in all_df_cols:
- logger.warning(
- f"Column named '{el}-mean' (instead of '{el}', which is not present)"
- "found in the DataFrame obtained through matchingEntitiesTable. "
- f"Renaming it to '{el}'."
- )
- # Rename the column in the DataFrame
- df.rename(columns={f"{el}-mean": el}, inplace=True)
- valid_targetOutput_list += [el]
- elif f"{el}-mean" in all_df_cols and el in all_df_cols:
- logger.warning(
- f"Columns named '{el}-mean' and '{el}'"
- "found in the DataFrame obtained through matchingEntitiesTable. "
- f"Renaming it to '{el}'."
- )
- logger.error("Unexpected behavior can happen!")
- # Rename the column in the DataFrame
- df.rename(columns={f"{el}-mean": el}, inplace=True)
- valid_targetOutput_list += [el]
- col_list += valid_targetOutput_list
-
- # Something unexpected happened: log here about it
- if valid_targetOutput_list != targetOutput_list:
- if len(valid_targetOutput_list) == 0:
- logger.error(
- "No valid target in the columns of the DataFrame."
- f"columns are:\t{list(df.columns)}."
- f"First rows are:\n{df.head(5)}"
- )
- else:
- not_found = [
- t for t in targetOutput_list if t not in valid_targetOutput_list
- ]
- logger.error(
- f"Found measurements for the following valid targets:\t{valid_targetOutput_list}"
- )
- logger.error(
- f"No measurement found for the following valid targets:\t{not_found}"
- )
-
- removed_cols = [c for c in list(df.columns) if c not in col_list]
- logger.debug(
- "Obtaining df with at least one measured target."
- f"Removed columns: {removed_cols}"
- )
-
- df = df[col_list]
-
- # I can still have Nans here for cols in targetOutput_list,
- # because I am taking points for which I have at least one of the measured properties of the experiment
- df.dropna(inplace=True)
-
- # The resulting DataFrame can be empty
- if df.empty:
- logger.warning(
- "Although there were some measured properties in the discovery space."
- )
- logger.warning(
- "All measured properties in the discovery space"
- f"are different from the desired outputs {targetOutput_list}.Returning empty DataFrame\n "
- )
-
- return df
-
-
-def get_source_and_target(
- discoverySpace: DiscoverySpace | str,
- targetOutput: str,
- log_string: str = "",
-) -> tuple[pd.DataFrame, pd.DataFrame]:
- """
- Build source (labeled) and target (unlabeled) DataFrames for a given target output `t`.
- Note, source can be empty
-
- - Retrieves measured entities for `t` and all entities without measurements.
- - Merges on common feature columns (excluding 'identifier').
- - Splits into:
- source_df: rows with non-null `t` (features + target).
- target_df: rows with null `t` (features only).
-
- Parameters
- ----------
- discoverySpace : str
- Discovery Space identifier (e.g., 'space-1a2469-6a3ed5').
- t : str
- Target output column name.
-
- Returns
- -------
- tuple
- (source_df, target_df)
- """
-
- dfm = get_df_at_least_one_measured_value(discoverySpace, [targetOutput])
- dfu = get_df_all_entities_no_measurements(discoverySpace)
- keys = [c for c in dfu.columns if c in dfm.columns and c != "identifier"]
-
- if dfm.empty:
- logger.warning("The source space is empty")
- return dfm, dfu
-
- df = dfu.merge(dfm, on=keys, how="left")
-
- # If nothing is measured you do not have the columns, so I add the column as empty to run the
- # following logic safely
- if targetOutput not in list(df.columns):
- logger.info(
- f"""The target output was not present in the columns of the measured+unmeasured DataFrame,' \
- meaning that '{targetOutput}' has never been measured in this space.
- dfm.empty = {df.empty}. Adding an empty column to the DataFrame.
- """
- )
- logger.debug("Adding an empty column to the DataFrame.")
- df[targetOutput] = pd.NA
-
- if targetOutput in list(df.columns):
- df_measured_drop_na = df.dropna(subset=[targetOutput])
- df_unmeasured_drop_na = df[df[targetOutput].isna()].drop(columns=[targetOutput])
- n_rows_dropped = len(df) - len(df_measured_drop_na)
- logger.debug(
- f"Dropped {n_rows_dropped} rows. Function called with log_string={log_string}"
- )
- if df_measured_drop_na.empty:
- logger.warning(
- f"Empty source after dropping rows that contain Nan in {targetOutput} column"
- )
- if df_unmeasured_drop_na.empty:
- logger.warning(
- f"Empty target after filtering rows that contain Nan in {targetOutput} column"
- )
- return df_measured_drop_na, df_unmeasured_drop_na
- save_path = "df_with_no_targetOutput_columns.csv"
- logger.error(
- f"'{targetOutput}' column is missing, saving df in {save_path}, returning unmerged DataFrames"
- )
- df.to_csv(save_path)
- return dfm, dfu
-
-
-def validate_points_in_space(
- points: list[dict],
- space: DiscoverySpace,
-) -> tuple[list[dict], list[int]]:
- """
- Validate a list of point dictionaries against a Discovery Space entity space.
-
- A point is considered valid if `space.entitySpace.isPointInSpace(point)` returns True.
- This function returns both the subset of valid points (in original order) and
- the indices of invalid points for diagnostics.
-
- Parameters
- ----------
- points : list[dict]
- List of point dicts `{constitutive_property_id: value}` to validate.
- space : DiscoverySpace
- The Discovery Space whose entity space defines the validity constraints.
-
- Returns
- -------
- (valid_points, invalid_indices) : tuple[list[dict], list[int]]
- valid_points :
- The points that are valid under `space.entitySpace.isPointInSpace`.
- invalid_indices :
- The zero-based indices (relative to the input `points`) that were invalid.
-
- Examples
- --------
- >>> points = make_points_from_df(df, space)
- >>> valid_points, invalid_idx = validate_points_in_space(points, space)
- >>> if invalid_idx:
- ... print(f"Warning: {len(invalid_idx)} invalid rows at indices {invalid_idx}")
- """
- valid_points: list[dict] = []
- invalid_indices: list[int] = []
-
- for i, p in enumerate(points):
- if space.entitySpace.isPointInSpace(p):
- valid_points.append(p)
- else:
- invalid_indices.append(i)
- return valid_points, invalid_indices
-
-
-def df_to_points(
- df: pd.DataFrame,
- cols: list[str] | None = None,
- dropna: bool = True,
- drop_duplicates: bool = False,
-) -> list[dict[Hashable, Any]]:
- """
- Convert DataFrame rows to list of point dictionaries.
-
- Args:
- df: Input DataFrame
- cols: Columns to include. If None, uses all columns
- dropna: If True, drop rows containing any NaN values
- drop_duplicates: If True, drop duplicate rows
-
- Returns:
- List of dictionaries, each representing a point {property_id: value}
-
- Raises:
- KeyError: If requested columns are not present in DataFrame
- """
-
- if cols is None:
- cols = list(df.columns)
- missing = set(cols) - set(df.columns)
- if missing:
- raise KeyError(f"Requested columns not present in DataFrame: {missing}")
-
- sub = df[cols].copy()
- if dropna:
- sub = sub.dropna(how="any")
- if drop_duplicates:
- sub = sub.drop_duplicates()
-
- # Convert numpy scalars to python builtins for safety
- def to_py(x: object) -> object:
- import numpy as np
-
- if isinstance(x, (np.generic)):
- return x.item()
- return x
-
- # apply conversion (only if needed)
- for c in sub.columns:
- sub[c] = sub[c].map(to_py)
-
- return sub.to_dict(orient="records")
-
-
-# TODO: check if these are actually needed
-def df_to_points_parsing(
- df: pd.DataFrame,
- cols: list[str] | None = None,
- dropna: bool = True,
- parse_values: bool = False,
-) -> list[dict]:
- """
- Convert DataFrame to points with optional string value parsing.
-
- Args:
- df: Input DataFrame
- cols: Columns to include
- dropna: If True, drop rows with NaN values
- parse_values: If True, parse string values using ast.literal_eval
-
- Returns:
- List of point dictionaries with parsed values
- """
- import ast
-
- points = df_to_points(df, cols=cols, dropna=dropna)
- if not parse_values:
- return points
-
- parsed = []
- for p in points:
- newp = {}
- for k, v in p.items():
- if isinstance(v, str):
- try:
- newp[k] = ast.literal_eval(v)
- except Exception:
- newp[k] = v
- else:
- newp[k] = v
- parsed.append(newp)
- return parsed
-
-
-def make_points_from_df(
- df: pd.DataFrame,
- space: DiscoverySpace,
- cols: list[str] | None = None,
- dropna: bool = True,
- parse_values: bool = True,
-) -> list[dict]:
- """
- Convert a DataFrame of constitutive properties into a list of point dictionaries,
- using the entity-space canonical column order by default.
-
- Each point is a mapping {constitutive_property_id: value}. By default, rows with
- any NaN across the selected columns are dropped, and string values are parsed
- into Python literals where possible (e.g., "[1, 2]" -> [1, 2]) via `ast.literal_eval`.
-
- Parameters
- ----------
- df : pd.DataFrame
- Input DataFrame whose columns correspond to constitutive property identifiers.
- space : DiscoverySpace
- The Discovery Space providing the canonical order of constitutive properties.
- cols : list[str], optional
- Explicit list of columns to use. If None, uses the canonical order:
- `[cp.identifier for cp in space.entitySpace.constitutiveProperties]`.
- dropna : bool, default True
- If True, drop rows containing any NaN in the selected columns.
- parse_values : bool, default True
- If True, attempt to parse string values into Python objects using `ast.literal_eval`.
-
- Returns
- -------
- list[dict]
- A list of point dicts, one per retained row: `[{prop_id: value, ...}, ...]`.
-
- Raises
- ------
- KeyError
- If any of the requested `cols` are not present in `df`.
-
- Examples
- --------
- >>> space_cols = [cp.identifier for cp in space.entitySpace.constitutiveProperties]
- >>> points = make_points_from_df(df, space, cols=space_cols, dropna=True, parse_values=True)
- """
- # Determine canonical order if cols not provided
- if cols is None:
- cols = [cp.identifier for cp in space.entitySpace.constitutiveProperties]
-
- # Validate requested columns exist
- missing = set(cols) - set(df.columns)
- if missing:
- raise KeyError(f"Requested columns not present in DataFrame: {missing}")
-
- # Convert rows -> point dicts, with optional parsing
- return df_to_points_parsing(df, cols=cols, dropna=dropna, parse_values=parse_values)
-
-
-def get_list_of_entities_from_df_and_space(
- df: pd.DataFrame, space: DiscoverySpace
-) -> list[Entity]:
- """
- Convert DataFrame rows to Entity objects validated against a discovery space.
-
- Args:
- df: DataFrame containing constitutive property values
- space: DiscoverySpace defining the entity space constraints
-
- Returns:
- List of valid Entity objects
-
- Warns:
- If number of valid entities differs from DataFrame row count
- """
- points = make_points_from_df(df=df, space=space)
- valid_points, __ = validate_points_in_space(points, space)
-
- list_of_entities = []
- from orchestrator.schema.point import SpacePoint
-
- for p in valid_points:
- # p is a dict mapping constitutive property id -> value
- sp = SpacePoint(entity=p)
- entity = sp.to_entity(
- generatorid="no_priors_characterization"
- ) # builds an Entity from the dict without touching the sample store
- list_of_entities.append(entity)
-
- numberEntities = len(list_of_entities)
- if numberEntities != len(df):
- numberEntities_log = f"""Warning: number of valid entities {numberEntities} is different from the number of rows in the ordered df {len(df)}.
- This means that some rows in the ordered df did not correspond to valid entities in the discovery space.
- """
- logging.warning(numberEntities_log)
- return list_of_entities
diff --git a/plugins/operators/no-priors-characterization/visualize_sampling.py b/plugins/operators/no-priors-characterization/visualize_sampling.py
deleted file mode 100644
index 275f7e7a0..000000000
--- a/plugins/operators/no-priors-characterization/visualize_sampling.py
+++ /dev/null
@@ -1,135 +0,0 @@
-# Copyright IBM Corporation 2025, 2026
-# SPDX-License-Identifier: MIT
-
-"""
-Visualization script for comparing sampling strategies.
-
-This script demonstrates the distribution patterns of different sampling
-strategies (random, CLHS, Sobol) in a 2D grid space.
-"""
-
-import sys
-
-try:
- import matplotlib.pyplot as plt
- import numpy as np
- from matplotlib.axes import Axes
-except ModuleNotFoundError:
- print("matplotlib not found. Please install it to run the visualization.")
- print("pip install matplotlib")
- sys.exit(1)
-
-from no_priors_characterization.utils.high_dimensional_sampling import (
- concatenated_latin_hypercube_sampling,
- random_high_dimensional_sampling,
- sobol_sampling,
-)
-
-
-def plot_grid(
- ax: Axes,
- dimensions: list[int] | tuple[int, int],
- points: np.ndarray | list[list[int]],
- title: str,
-) -> None:
- """
- Plot a 2D grid visualization of sampled points with overlap detection.
-
- Args:
- ax: Matplotlib axes object to draw on.
- dimensions: Dimensions of the grid [width, height].
- points: List of sampled points as [x, y] coordinates.
- title: Title for the plot.
- """
- from collections import defaultdict
-
- import matplotlib.patches as patches
-
- nx, ny = dimensions[0], dimensions[1]
-
- # Setup grid
- ax.set_xlim(0, nx)
- ax.set_ylim(0, ny)
- ax.set_xticks(range(nx + 1))
- ax.set_yticks(range(ny + 1))
- ax.grid(True, color="black", linewidth=1)
- ax.set_aspect("equal")
- ax.set_title(title, fontsize=12, pad=10)
-
- # Track points in each cell to handle overlaps
- # Maps (x, y) -> list of time indices (1-based)
- grid_content = defaultdict(list)
-
- # points is a list of [x, y], enumerate gives us the time index (0-based)
- for time, point in enumerate(points):
- x, y = int(point[0]), int(point[1]) # Ensure integers
- if 0 <= x < nx and 0 <= y < ny:
- # Store t + 1 so the first sample is '1'
- grid_content[(x, y)].append(time + 1)
-
- # Draw squares and text
- for (x, y), indices in grid_content.items():
- count = len(indices)
- # Darker alpha if multiple points hit the same square
- alpha = min(0.4 + 0.2 * count, 1.0)
- rect = patches.Rectangle(
- (x, y), 1, 1, linewidth=0, facecolor="#ff0000", alpha=alpha
- )
- ax.add_patch(rect)
-
- # Label is the comma-separated list of indices
- label = ",".join(map(str, indices))
-
- # Add text with shadow effect
- ax.text(
- x + 0.52,
- y + 0.52,
- label,
- ha="center",
- va="center",
- color="#D4FF00",
- fontweight="bold",
- )
- ax.text(
- x + 0.5,
- y + 0.5,
- label,
- ha="center",
- va="center",
- color="#000000",
- fontweight="bold",
- )
-
-
-def main() -> None:
- """Run the sampling visualization comparison."""
- # Configuration
- dimensions = [20, 6] # 20 columns, 6 rows (Total 120 cells)
- N = 30 # Number of samples to draw
- SEED = 42
-
- # Plotting
- _fig, axes = plt.subplots(1, 3, figsize=(15, 5))
-
- # 1. Random Sampling
- pts_rnd = random_high_dimensional_sampling(dimensions, N, seed=SEED)
- plot_grid(axes[0], dimensions, pts_rnd, f"Random Sampling (N={N})\n(Clumps & Gaps)")
-
- # 2. Concatenated LHS
- pts_lhs = concatenated_latin_hypercube_sampling(dimensions, N, seed=SEED)
- plot_grid(
- axes[1], dimensions, pts_lhs, f"Concatenated LHS (N={N})\n(Uniform Rows/Cols)"
- )
-
- # 3. Sobol Sequence
- pts_sobol = sobol_sampling(dimensions, N, seed=SEED)
- plot_grid(
- axes[2], dimensions, pts_sobol, f"Sobol Sequence (N={N})\n(Maximal Spreading)"
- )
-
- plt.tight_layout()
- plt.show()
-
-
-if __name__ == "__main__":
- main()
diff --git a/plugins/operators/trim/pyproject.toml b/plugins/operators/trim/pyproject.toml
index 233b36c60..9aa2419b4 100644
--- a/plugins/operators/trim/pyproject.toml
+++ b/plugins/operators/trim/pyproject.toml
@@ -5,7 +5,6 @@ readme = "README.md"
requires-python = ">=3.10,<3.14"
dependencies = [
"ado-core",
- "ado-no-priors-characterization",
"autogluon-tabular[catboost,xgboost]==1.5",
"numpy",
"pandas>=2.2.0",
@@ -30,4 +29,3 @@ local_scheme = "node-and-timestamp"
[tool.uv.sources]
ado-core = { workspace = true }
-ado-no-priors-characterization = { workspace = true }
diff --git a/plugins/operators/trim/src/trim/operator.py b/plugins/operators/trim/src/trim/operator.py
index f4ad7a2ae..5873e8616 100644
--- a/plugins/operators/trim/src/trim/operator.py
+++ b/plugins/operators/trim/src/trim/operator.py
@@ -5,12 +5,11 @@
import logging
from importlib.metadata import version
-from no_priors_characterization.utils import get_source_and_target
-
from orchestrator.core.discoveryspace.space import DiscoverySpace
from orchestrator.core.operation.config import FunctionOperationInfo
from orchestrator.core.operation.operation import OperationOutput
from orchestrator.modules.operators.collections import characterize_operation
+from trim.samplers.no_priors_utils import get_source_and_target
from trim.trim_pydantic import (
TrimParameters,
) # Importing this way works when the package is installed
@@ -54,7 +53,8 @@ def trim(
Returns:
OperationOutput containing the operation resources and metadata
"""
- from orchestrator.modules.operators.collections import characterize, explore
+ # Lazy import to avoid circular import issues during plugin loading
+ from orchestrator.modules.operators.collections import explore
from orchestrator.modules.operators.randomwalk import (
CustomSamplerConfiguration,
RandomWalkParameters,
@@ -95,9 +95,23 @@ def trim(
f"Note: Trim sampler has been called with a minimum budget of {params.samplingBudget.minPoints} points."
)
- # Call the no-priors-characterization operator directly
- no_priors_operator = characterize.no_priors_characterization
- op_output_characterization_no_prior = no_priors_operator(
+ # Use random-walk with no-priors sampler instead of direct operator call
+ no_priors_module = SamplerModuleConf(
+ moduleClass="NoPriorsSampleSelector",
+ moduleName="trim.samplers.no_priors_sampler",
+ )
+ no_priors_sampler_config = CustomSamplerConfiguration(
+ module=no_priors_module,
+ parameters=params.noPriorParameters,
+ )
+ no_priors_rwparams = RandomWalkParameters(
+ samplerConfig=no_priors_sampler_config,
+ batchSize=params.noPriorParameters.batchSize,
+ numberEntities=params.samplingBudget.minPoints - len(source_df),
+ singleMeasurement=True,
+ )
+
+ op_output_characterization_no_prior = random_walk(
discoverySpace=discoverySpace,
operationInfo=FunctionOperationInfo.model_validate(
{
@@ -112,7 +126,7 @@ def trim(
),
}
),
- **params.noPriorParameters.model_dump(),
+ **no_priors_rwparams.model_dump(),
)
source_df, target_df = get_source_and_target(
@@ -157,7 +171,11 @@ def trim(
operationInfo=FunctionOperationInfo.model_validate(
{
"metadata": {"completed operation": "Iterative Modeling Operation"},
- "actuatorConfigurationIdentifiers": operationInfo.actuatorConfigurationIdentifiers,
+ "actuatorConfigurationIdentifiers": (
+ operationInfo.actuatorConfigurationIdentifiers
+ if operationInfo
+ else []
+ ),
}
),
**trim_rwparams.model_dump(),
diff --git a/examples/no-priors-characterization/custom_experiments/no_priors_custom_experiments/__init__.py b/plugins/operators/trim/src/trim/samplers/__init__.py
similarity index 100%
rename from examples/no-priors-characterization/custom_experiments/no_priors_custom_experiments/__init__.py
rename to plugins/operators/trim/src/trim/samplers/__init__.py
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/no_priors_pydantic.py b/plugins/operators/trim/src/trim/samplers/no_priors_parameters.py
similarity index 69%
rename from plugins/operators/no-priors-characterization/src/no_priors_characterization/no_priors_pydantic.py
rename to plugins/operators/trim/src/trim/samplers/no_priors_parameters.py
index 3608470df..c1240c4b1 100644
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/no_priors_pydantic.py
+++ b/plugins/operators/trim/src/trim/samplers/no_priors_parameters.py
@@ -15,8 +15,6 @@ class NoPriorsParameters(BaseModel):
strategy (str): sampling subroutine:
- 'random': selects random points from the beginning
- - 'one_shift': refer to one_shift_then_random_points_high_dimensional_sampling
- - 'recursive_aggregation': refer to recursive_aggregation_high_dimensional_sampling
- 'clhs': refer to concatenated_latin_hypercube_sampling
- 'sobol': sobol sampling
"""
@@ -48,25 +46,15 @@ class NoPriorsParameters(BaseModel):
] = 1
sampling_strategy: Annotated[
- Literal["random", "one_shift", "recursive_aggregation", "clhs", "sobol"],
+ Literal["random", "clhs", "sobol"],
BeforeValidator(lambda s: s.lower()),
Field(
description=(
"Sampling subroutine. Supported values:\n"
" - 'random': selects random points from the beginning\n"
- " - 'one_shift': see one_shift_then_random_points_high_dimensional_sampling\n"
- " - 'recursive_aggregation': see recursive_aggregation_high_dimensional_sampling\n"
" - 'clhs': dimension-wise random without replacement until each dim cycles\n"
" - 'sobol': sobol sampling via scipy\n"
- "Aliases: 'random_shifts' → 'recursive_aggregation'.\n"
"Validation is case-insensitive; value is normalized to lowercase."
),
),
] = "clhs"
-
-
-if __name__ == "__main__":
- params = NoPriorsParameters.model_validate(NoPriorsParameters(targetOutput="test"))
- print(
- f"type of model_validate output on no-priors-characterization default is {type(params)}, printing the full object gives {params}"
- )
diff --git a/plugins/operators/no-priors-characterization/src/no_priors_characterization/no_priors_sampler.py b/plugins/operators/trim/src/trim/samplers/no_priors_sampler.py
similarity index 64%
rename from plugins/operators/no-priors-characterization/src/no_priors_characterization/no_priors_sampler.py
rename to plugins/operators/trim/src/trim/samplers/no_priors_sampler.py
index 2d7c220d1..3030d4965 100644
--- a/plugins/operators/no-priors-characterization/src/no_priors_characterization/no_priors_sampler.py
+++ b/plugins/operators/trim/src/trim/samplers/no_priors_sampler.py
@@ -7,15 +7,15 @@
from pydantic import BaseModel
-from no_priors_characterization.no_priors_pydantic import NoPriorsParameters
-from no_priors_characterization.utils.order import order_df_for_sampling_with_no_priors
-from no_priors_characterization.utils.space_df_connector import (
- get_list_of_entities_from_df_and_space,
- get_source_and_target,
-)
from orchestrator.core.discoveryspace.samplers import BaseSampler
from orchestrator.core.discoveryspace.space import DiscoverySpace, Entity
from orchestrator.modules.operators.discovery_space_manager import DiscoverySpaceManager
+from trim.samplers.no_priors_parameters import NoPriorsParameters
+from trim.samplers.no_priors_utils import (
+ get_list_of_entities_from_df_and_space,
+ get_source_and_target,
+ order_df_for_sampling_with_no_priors,
+)
logger_no_priors = logging.getLogger(__name__)
@@ -113,18 +113,70 @@ async def iterator() -> typing.AsyncGenerator[list[Entity], None]: # type: igno
def entityIterator(
self, discoverySpace: DiscoverySpace, batchsize: int = 1
) -> typing.Generator[list[Entity], None, None]:
- """Returns an remoteEntityIterator that returns entities in order"""
+ """
+ Generate entities for no-priors characterization sampling (synchronous version).
+
+ Orders the target space using a high-dimensional sampling strategy (e.g., CLHS, Sobol)
+ without relying on prior model knowledge or feature importance.
+
+ Args:
+ discoverySpace: The discovery space to sample from
+ batchsize: Number of entities to yield per iteration
+
+ Yields:
+ List of Entity objects to be measured, in the determined order
+ """
def iterator_closure(
space: DiscoverySpace,
) -> typing.Callable[[], typing.Generator[list[Entity], None, None]]:
- # list_of_entities = list(...) # type: ignore[name-defined]
- # numberEntities = len(list_of_entities)
+ logger_no_priors.info("Characterization with no-priors starts.\n")
+ logger_no_priors.info(f"Parameters are:\n{self.params}\n\n")
+
+ source_df, target_df = get_source_and_target(
+ space, self.params.targetOutput
+ )
+ logger_no_priors.info(f"Target dataframe has length {len(target_df)}")
+
+ # The 'samples' parameter specifies the number of NEW entities to sample,
+ # regardless of how many entities have already been measured in the space
+ logger_no_priors.info(
+ f"Space has {len(source_df)} measured entities. "
+ f"Sampling {self.params.samples} new entities as requested."
+ )
+ target_df = order_df_for_sampling_with_no_priors(
+ target_df,
+ [cp.identifier for cp in space.entitySpace.constitutiveProperties],
+ self.params.samples,
+ strategy=self.params.sampling_strategy,
+ )
+ list_of_entities_for_no_prior_characterization = (
+ get_list_of_entities_from_df_and_space(df=target_df, space=space)
+ )
+
+ logger_no_priors.info(
+ "\n\nCharacterization with no-priors finished. Starting Iterative Modeling.\n"
+ )
- def iterator() -> typing.Generator[list[Entity], None, None]: # type: ignore[name-defined]
- raise NotImplementedError
- # ...for i in range(0, numberEntities, batchsize):
+ def iterator() -> typing.Generator[list[Entity], None, None]:
+ logger_no_priors.info(
+ "\n\nIteration over sorted entities for no priors characterization starts.\n"
+ )
+ for i in range(
+ 0, len(list_of_entities_for_no_prior_characterization), batchsize
+ ):
+ entities = list_of_entities_for_no_prior_characterization[
+ i : i + batchsize
+ ]
+ if len(entities) == 0:
+ logger_no_priors.info(
+ "\n\nCharacterization with no-priors finished.\n"
+ )
+ break
+ else:
+ yield entities
+ logger_no_priors.info("\n\nCharacterization with no-priors finished.\n")
return iterator
diff --git a/plugins/operators/trim/src/trim/samplers/no_priors_utils.py b/plugins/operators/trim/src/trim/samplers/no_priors_utils.py
new file mode 100644
index 000000000..ccf6a2544
--- /dev/null
+++ b/plugins/operators/trim/src/trim/samplers/no_priors_utils.py
@@ -0,0 +1,953 @@
+# Copyright IBM Corporation 2025, 2026
+# SPDX-License-Identifier: MIT
+
+"""
+Utility functions for no-priors sampling, including:
+- High-dimensional sampling strategies (CLHS, Sobol, random)
+- DataFrame ordering and index mapping
+- Entity/point conversion and validation
+- Discovery space data extraction
+"""
+
+from __future__ import annotations
+
+import itertools
+import logging
+import math
+import random
+from typing import TYPE_CHECKING, Any, Literal
+
+import numpy as np
+import pandas as pd
+from scipy.stats.qmc import Sobol
+
+from orchestrator.core.discoveryspace.space import DiscoverySpace
+from orchestrator.schema.virtual_property import PropertyAggregationMethodEnum
+
+if TYPE_CHECKING:
+ from collections.abc import Hashable
+
+ from orchestrator.metastore.project import ProjectContext
+ from orchestrator.schema.entity import Entity
+
+logger = logging.getLogger(__name__)
+
+
+# ============================================================================
+# 1D Sampling Functions
+# ============================================================================
+
+
+def get_index_list_van_der_corput(
+ length_segment: int,
+ tot_points_to_sample: int,
+ sampled_indices: list[int] | None = None,
+ sort: bool = False,
+ verbose: bool = False,
+) -> list[int]:
+ """
+ Selects indices from a 1D segment using a modified Van der Corput sequence.
+
+ Args:
+ length_segment: Total number of units in the 1D segment
+ tot_points_to_sample: Total number of indices to sample
+ sampled_indices: List of indices already sampled
+ sort: If True, returns the final list sorted
+ verbose: If True, prints debug information
+
+ Returns:
+ List of sampled indices
+
+ Raises:
+ ValueError: If tot_points_to_sample exceeds length_segment
+ """
+ if tot_points_to_sample == 0:
+ return []
+
+ if tot_points_to_sample > length_segment:
+ raise ValueError(
+ "ValueError: You are trying to sample more points than those that are available"
+ )
+
+ if sampled_indices is None:
+ sampled_indices = []
+
+ if len(sampled_indices) == length_segment:
+ maximal_indices_list = list(range(length_segment))
+ if sorted(sampled_indices) != maximal_indices_list:
+ logging.error(
+ "Sampled indices do not correspond to [0,..., max_n_indices -1]. "
+ "Returning list(range(max_n_indices))"
+ )
+ return maximal_indices_list
+
+ if len(sampled_indices) > tot_points_to_sample:
+ logging.warning(
+ "Number of sampled indices is greater than the number of indices you want to sample"
+ "Returning sampled indices"
+ )
+ return sampled_indices
+
+ index_list = list(sampled_indices)
+ sampled_set = set(index_list)
+
+ for point in [0, length_segment - 1]:
+ if point not in sampled_set:
+ index_list.append(point)
+ sampled_set.add(point)
+ if len(index_list) == tot_points_to_sample:
+ return sorted(index_list)
+
+ def build_prefix_and_len(index_list: list[int]) -> tuple[list[int], int]:
+ if not index_list:
+ return [0], 0
+
+ M = max(index_list) + 1
+ sampled_set = set(index_list)
+ prefix = [0] * (M + 1)
+ s = 0
+
+ for i in range(M):
+ s += 1 if i in sampled_set else 0
+ prefix[i + 1] = s
+
+ return prefix, M
+
+ def get_list_min_weight(
+ prefix: list[int], M: int, d: int, selectable_indices: list[int]
+ ) -> list[int]:
+ vals = {}
+ for i in selectable_indices:
+ if i >= M:
+ break
+ left = max(0, i - d)
+ right = min(M - 1, i + d)
+ total = prefix[right + 1] - prefix[left]
+ denom = right - left + 1
+ mean = total / denom
+ vals[i] = mean
+
+ if not vals:
+ return []
+
+ min_val = min(vals.values())
+ out = []
+ for i in selectable_indices:
+ if i >= M:
+ break
+ if vals.get(i) == min_val:
+ out.append(i)
+ return out
+
+ def get_selectable_indices() -> list[int]:
+ return [i for i in range(length_segment) if i not in sampled_set]
+
+ max_d = length_segment
+
+ while len(index_list) < tot_points_to_sample:
+ selection = 0
+ selectable_indices = get_selectable_indices()
+ prefix, M = build_prefix_and_len(index_list=index_list)
+ d = 1
+ previous_set = selectable_indices
+
+ while selection == 0:
+ indices = get_list_min_weight(prefix, M, d, selectable_indices)
+
+ if not indices:
+ if not previous_set:
+ raise ValueError(
+ "Previous candidate set should not be empty or None"
+ )
+ if verbose:
+ logger.info(
+ f"No intersection found with d={d}. Using the previous set "
+ f"Appending to {index_list} the first element of {previous_set}"
+ )
+ chosen = previous_set[0]
+ index_list.append(chosen)
+ sampled_set.add(chosen)
+ selection = 1
+ else:
+ previous_set = selectable_indices
+ selectable_indices = indices
+
+ if len(selectable_indices) == 1 or d == max_d:
+ if verbose:
+ logger.info(
+ f"Appending to {index_list} the first element of {selectable_indices}"
+ )
+ chosen = selectable_indices[0]
+ index_list.append(chosen)
+ sampled_set.add(chosen)
+ selection = 1
+
+ d += 1
+
+ if sort:
+ return sorted(index_list)
+ return index_list
+
+
+# ============================================================================
+# High-Dimensional Sampling Functions
+# ============================================================================
+
+
+def concatenated_latin_hypercube_sampling(
+ dimensions: list[int],
+ final_sample_size: int,
+ seed: int | None = None,
+) -> list[list[int]]:
+ """
+ Generates samples using Concatenated Latin Hypercube Sampling.
+
+ Args:
+ dimensions: Cardinality (size) of each dimension
+ final_sample_size: Total number of points to sample
+ seed: Optional PRNG seed for reproducibility
+
+ Returns:
+ List of sampled points
+
+ Raises:
+ ValueError: If any dimension size is less than 1
+ """
+ if any(d <= 0 for d in dimensions):
+ raise ValueError(
+ f"All dimensions must be >= 1, received dimensions={dimensions}"
+ )
+
+ if final_sample_size <= 0:
+ return []
+
+ rng = random.Random() if seed is None else random.Random(seed) # noqa: S311
+ pools: list[list[int]] = [list(range(d)) for d in dimensions]
+ samples: list[list[int]] = []
+
+ for _ in range(final_sample_size):
+ point: list[int] = []
+ for j, d in enumerate(dimensions):
+ if not pools[j]:
+ pools[j] = list(range(d))
+ k = rng.randrange(len(pools[j]))
+ value = pools[j].pop(k)
+ point.append(value)
+ samples.append(point)
+
+ return samples
+
+
+def sobol_sampling(
+ dimensions: list[int], final_sample_size: int, seed: int | None = None
+) -> list[list[int]]:
+ """
+ Generates Sobol sampled points scaled to integer dimensions.
+
+ Falls back to CLHS if collisions are detected.
+
+ Args:
+ dimensions: Size of each dimension
+ final_sample_size: Number of points to sample
+ seed: Random seed for the Sobol scrambler
+
+ Returns:
+ List of sampled points
+ """
+ sampler = Sobol(d=len(dimensions), scramble=True, rng=seed)
+ points = sampler.random(final_sample_size)
+
+ discrete_points = [
+ [int(val * d) for val, d in zip(p, dimensions, strict=True)] for p in points
+ ]
+
+ unique_points = {tuple(p) for p in discrete_points}
+ n_collisions = final_sample_size - len(unique_points)
+
+ if n_collisions > 0:
+ logger.error(
+ f"Sobol sampling failed, {n_collisions} collisions detected, defaulting to clhs sampling"
+ )
+ return concatenated_latin_hypercube_sampling(
+ dimensions=dimensions, final_sample_size=final_sample_size, seed=seed
+ )
+
+ return discrete_points
+
+
+def random_high_dimensional_sampling(
+ dimensions: list[int], final_sample_size: int, seed: int | None = None
+) -> list[list[int]]:
+ """
+ Generate unique random samples from a high-dimensional space.
+
+ Args:
+ dimensions: Cardinality of each dimension
+ final_sample_size: Total number of points to sample
+ seed: Optional PRNG seed
+
+ Returns:
+ List of sampled points
+
+ Raises:
+ ValueError: If final_sample_size exceeds total configurations
+ """
+ if seed is not None:
+ random.seed(seed)
+
+ num_configs = math.prod(dimensions)
+ if final_sample_size > num_configs:
+ raise ValueError(
+ f"Cannot generate {final_sample_size} unique samples. "
+ f"The sample space only contains {num_configs} possibilities."
+ )
+
+ configs = list(itertools.product(*[range(d) for d in dimensions]))
+ actual_sample_size = min(final_sample_size, len(configs))
+
+ if actual_sample_size < final_sample_size:
+ logger.warning(
+ f"Requested {final_sample_size} samples but only {len(configs)} unique "
+ f"configurations available. Sampling {actual_sample_size} instead."
+ )
+
+ samples = random.sample(configs, actual_sample_size)
+ return [list(s) for s in samples]
+
+
+def get_sampling_indices_multi_dimensional(
+ dimensions: list[int],
+ n: int | Literal["all", "max"],
+ space: dict[str, int] | None = None,
+ strategy: Literal["random", "clhs", "sobol"] = "clhs",
+ seed: int | None = None,
+) -> list[list[int]]:
+ """
+ Generate sampling indices for a high-dimensional space.
+
+ Args:
+ dimensions: Sizes of each dimension
+ n: Number of points to sample ('all', 'max', or integer)
+ space: Optional mapping of dimension names to sizes
+ strategy: Sampling strategy ('random', 'clhs', or 'sobol')
+ seed: Controls randomness
+
+ Returns:
+ List of sampled multi-dimensional coordinates
+ """
+ if seed is not None:
+ random.seed(seed)
+
+ if space:
+ indices_dict = {
+ k: get_index_list_van_der_corput(v, v) for k, v in space.items()
+ }
+ if [len(indices) for indices in list(indices_dict.values())] != dimensions:
+ logger.error(
+ f"A space dict has been provided ->{space}. It is inconsistent with dimensions={dimensions}"
+ )
+ raise ValueError("Space has inconsistent dimensions!")
+ logger.info(
+ "Sampling indices for each named dimension (ordered low to high): %s",
+ indices_dict,
+ )
+
+ orders = [get_index_list_van_der_corput(v, v) for v in dimensions]
+
+ if logger.isEnabledFor(logging.DEBUG):
+ logger.debug("Dimensions: %s", dimensions)
+ logger.debug("Sampling orders for each dimension:")
+ for i, o in enumerate(orders):
+ logger.debug("Dimension %d order: %s", i, o)
+
+ maximum_n = math.prod(dimensions)
+ lcm = math.lcm(*dimensions)
+
+ if lcm != maximum_n:
+ logger.debug(
+ "Periodicity detected, the sampling subroutine will ensure that you will not sample"
+ "the same configuration more than once."
+ )
+
+ if isinstance(n, str):
+ if n == "all":
+ n = maximum_n
+ elif n == "max":
+ n = max(dimensions)
+ else:
+ raise ValueError(f"Unrecognized string for n: {n}")
+
+ if n > maximum_n:
+ logger.warning(
+ f"Maximal sample size is {maximum_n}, you requested {n} sampling prescriptions."
+ f"Elaborating prescription for n_samples = {maximum_n}"
+ )
+
+ logger.debug("Preparing to sample %d out of %d possible points.", n, maximum_n)
+
+ match strategy:
+ case "random":
+ return random_high_dimensional_sampling(dimensions, n, seed=seed)
+ case "clhs":
+ return concatenated_latin_hypercube_sampling(
+ dimensions=dimensions, final_sample_size=n, seed=seed
+ )
+ case "sobol":
+ return sobol_sampling(dimensions=dimensions, final_sample_size=n, seed=seed)
+ case _:
+ raise NotImplementedError(f"Strategy {strategy} is unknown")
+
+
+# ============================================================================
+# DataFrame Ordering and Index Mapping
+# ============================================================================
+
+
+def get_index_list_nn_high_dimensional(
+ orders_to_sample: list[list[int]], dimensions: list[int]
+) -> list[int]:
+ """
+ Map high-dimensional sampling orders to linear (flattened) indices.
+
+ Args:
+ orders_to_sample: List of multi-dimensional coordinates
+ dimensions: Size of each dimension
+
+ Returns:
+ List of linear indices
+
+ Warns:
+ If duplicate or out-of-bounds indices are detected
+ """
+ indices = []
+ cprod = np.cumprod(np.array(dimensions), dtype=int).tolist()
+ maximum_n = cprod[-1]
+
+ for order in orders_to_sample:
+ index = 0
+ multiplier = 1
+ for i in reversed(range(len(dimensions))):
+ index += order[i] * multiplier
+ multiplier *= dimensions[i]
+
+ if index > maximum_n:
+ logging.warning(
+ f"Out of bound index {index} computed from order {order}, dimensions are {dimensions}"
+ )
+ indices.append(index)
+
+ if len(set(indices)) != len(indices):
+ logger.error(f"{len(indices) - len(set(indices))} Duplicated indices!")
+
+ out_of_bounds_list = [i for i in indices if i > maximum_n]
+ if out_of_bounds_list:
+ logger.error(
+ f"The following indices are out of bound: {out_of_bounds_list}, maximum admissible value is {maximum_n-1}"
+ )
+
+ return indices
+
+
+def order_df_for_get_index_list_nn_high_dimensional(
+ df: pd.DataFrame, constitutive_properties: list[str], dimensions: list[int]
+) -> pd.DataFrame:
+ """
+ Ensure DataFrame is ordered and complete for high-dimensional index generation.
+
+ Args:
+ df: Input DataFrame
+ constitutive_properties: Column names defining the space
+ dimensions: Expected cardinality for each property
+
+ Returns:
+ DataFrame sorted and augmented with missing combinations
+ """
+ df = df.sort_values(by=constitutive_properties).reset_index(drop=True)
+ expected_len = math.prod(dimensions)
+
+ if len(df) == expected_len:
+ return df
+
+ unique_values = [
+ sorted(df[prop].dropna().unique()) for prop in constitutive_properties
+ ]
+ all_combinations = list(itertools.product(*unique_values))
+ actual_expected_len = len(all_combinations)
+
+ logger.warning(
+ f"DataFrame length mismatch: expected {expected_len} (product of {dimensions}), "
+ f"but got {len(df)}. Actual unique combinations: {actual_expected_len}."
+ )
+
+ existing_combinations = {
+ tuple(row[prop] for prop in constitutive_properties) for _, row in df.iterrows()
+ }
+
+ missing_combinations = [
+ comb for comb in all_combinations if comb not in existing_combinations
+ ]
+
+ if missing_combinations:
+ logger.info(
+ f"Injecting {len(missing_combinations)} missing rows to satisfy the property."
+ )
+ injected_rows = []
+ for comb in missing_combinations:
+ row_data = dict(zip(constitutive_properties, comb, strict=False))
+ for col in df.columns:
+ if col not in constitutive_properties:
+ row_data[col] = pd.NA
+ injected_rows.append(row_data)
+
+ df = pd.concat([df, pd.DataFrame(injected_rows)], ignore_index=True)
+ df = df.sort_values(by=constitutive_properties).reset_index(drop=True)
+ logger.info(f"Injected rows: {injected_rows}")
+
+ return df
+
+
+def order_df_for_sampling_with_no_priors(
+ df: pd.DataFrame,
+ constitutive_properties: list[str],
+ n: int,
+ strategy: Literal["random", "clhs", "sobol"],
+) -> pd.DataFrame:
+ """
+ Orders a DataFrame for high-dimensional sampling without prior knowledge.
+
+ Args:
+ df: Input dataset
+ constitutive_properties: Column names defining the configuration space
+ n: Number of samples to generate
+ strategy: Sampling strategy
+
+ Returns:
+ DataFrame with n sampled rows
+
+ Raises:
+ ValueError: If n <= 0 after adjustment or no samples available
+ """
+ len_original = len(df)
+ df_unique = df.drop_duplicates(subset=constitutive_properties).reset_index(
+ drop=True
+ )
+ delta_len = len_original - len(df_unique)
+ if delta_len > 0:
+ logging.warning(
+ f"Removing {delta_len} duplicate configurations."
+ f"They are characterized by the same combination of constitutive properties = {constitutive_properties}"
+ )
+
+ if n > len(df_unique):
+ logging.warning(
+ f"Requested {n} samples, but DataFrame has only {len(df_unique)} rows. Adjusting n to {len(df_unique)}."
+ )
+ n = len(df_unique)
+
+ if n <= 0:
+ logging.error(
+ f"No samples available to select. DataFrame has {len(df_unique)} rows and {n} samples were requested."
+ )
+ return pd.DataFrame(columns=df_unique.columns)
+
+ def _get_sorted_uniques(prop: str) -> list:
+ vals = df_unique[prop].unique()
+ try:
+ return sorted(vals)
+ except TypeError:
+ logging.warning(
+ f"Cannot sort mixed types for property '{prop}'. "
+ "Keeping original order."
+ )
+ return list(vals)
+
+ value_dict = {prop: _get_sorted_uniques(prop) for prop in constitutive_properties}
+ space_dict = {prop: len(vals) for prop, vals in value_dict.items()}
+ dimensions = list(space_dict.values())
+
+ df_unique = order_df_for_get_index_list_nn_high_dimensional(
+ df_unique, constitutive_properties, dimensions=dimensions
+ ).reset_index(drop=True)
+
+ orders_to_sample = get_sampling_indices_multi_dimensional(
+ dimensions=dimensions, space=space_dict, n=n, strategy=strategy
+ )
+
+ indices_to_sample = get_index_list_nn_high_dimensional(orders_to_sample, dimensions)
+
+ logger.info(f"Indexes are:\n {indices_to_sample}")
+ try:
+ return df_unique.iloc[indices_to_sample]
+ except IndexError:
+ logging.error(
+ f"Index Error detected. Length of the dataframe is {len(df_unique)}."
+ "The indices that cause the error are:"
+ )
+ max_len = len(df_unique)
+ out_of_bounds_list = [i for i in indices_to_sample if i < 0 or i >= max_len]
+ logging.error(out_of_bounds_list)
+ logging.error("Returning empty dataset")
+ return pd.DataFrame({})
+
+
+# ============================================================================
+# Discovery Space Data Extraction
+# ============================================================================
+
+
+def get_project_context() -> ProjectContext:
+ """Retrieve the current ADO project context from configuration."""
+ import orchestrator.cli.core.config
+
+ ado_configuration = orchestrator.cli.core.config.AdoConfiguration.load()
+ return ado_configuration.project_context # type: ignore[name-defined]
+
+
+def get_space(
+ space_or_space_id: DiscoverySpace | str,
+) -> DiscoverySpace:
+ """Get a DiscoverySpace object from either a space object or identifier string."""
+ if isinstance(space_or_space_id, DiscoverySpace):
+ return space_or_space_id
+
+ return DiscoverySpace.from_stored_configuration(
+ project_context=get_project_context(),
+ space_identifier=space_or_space_id,
+ )
+
+
+def get_df_all_entities_no_measurements(
+ discoverySpace: DiscoverySpace | str,
+) -> pd.DataFrame:
+ """
+ Return a DataFrame of all entities in the Discovery Space.
+
+ Returns:
+ DataFrame with columns: ['identifier', ]
+ """
+ space = get_space(space_or_space_id=discoverySpace)
+ entity_space = space.entitySpace
+ cp_ids = [cp.identifier for cp in entity_space.constitutiveProperties]
+
+ list_of_dicts_to_convert = []
+ for point_values in entity_space.sequential_point_iterator():
+ point_dict = dict(zip(cp_ids, point_values, strict=True))
+ entity = entity_space.entity_for_point(point_dict)
+ ed = {"identifier": entity.identifier}
+ ed.update(point_dict)
+ list_of_dicts_to_convert.append(ed)
+
+ return pd.DataFrame(list_of_dicts_to_convert)
+
+
+def get_df_at_least_one_measured_value(
+ discoverySpace: DiscoverySpace | str,
+ targetOutput_list: list[str] | None = None,
+ add_measurement_id: bool = False,
+) -> pd.DataFrame:
+ """
+ Return a DataFrame of entities with at least one measured target output.
+
+ Returns:
+ DataFrame with columns: ['identifier' (optional), , ]
+ """
+ if not targetOutput_list:
+ targetOutput_list = []
+ space = get_space(space_or_space_id=discoverySpace)
+ col_list = [cp.identifier for cp in space.entitySpace.constitutiveProperties]
+ if add_measurement_id:
+ col_list = ["identifier", *col_list]
+
+ discoverySpace.sample_store.refresh()
+
+ df = pd.DataFrame(
+ space.matchingEntitiesTable(
+ property_type="target",
+ aggregationMethod=PropertyAggregationMethodEnum.mean,
+ )
+ )
+
+ if df.empty:
+ logger.warning(
+ "No measured properties found in the discovery space\nReturning empty DataFrame\n "
+ )
+ return df
+
+ all_df_cols = list(df.columns)
+ valid_targetOutput_list = []
+ for el in targetOutput_list:
+ if el in all_df_cols:
+ valid_targetOutput_list.append(el)
+ elif f"{el}-mean" in all_df_cols and el not in all_df_cols:
+ logger.warning(
+ f"Column named '{el}-mean' (instead of '{el}', which is not present)"
+ "found in the DataFrame obtained through matchingEntitiesTable. "
+ f"Renaming it to '{el}'."
+ )
+ df.rename(columns={f"{el}-mean": el}, inplace=True)
+ valid_targetOutput_list += [el]
+ elif f"{el}-mean" in all_df_cols and el in all_df_cols:
+ logger.warning(
+ f"Columns named '{el}-mean' and '{el}'"
+ "found in the DataFrame obtained through matchingEntitiesTable. "
+ f"Renaming it to '{el}'."
+ )
+ logger.error("Unexpected behavior can happen!")
+ df.rename(columns={f"{el}-mean": el}, inplace=True)
+ valid_targetOutput_list += [el]
+ col_list += valid_targetOutput_list
+
+ if valid_targetOutput_list != targetOutput_list:
+ if len(valid_targetOutput_list) == 0:
+ logger.error(
+ "No valid target in the columns of the DataFrame."
+ f"columns are:\t{list(df.columns)}."
+ f"First rows are:\n{df.head(5)}"
+ )
+ else:
+ not_found = [
+ t for t in targetOutput_list if t not in valid_targetOutput_list
+ ]
+ logger.error(
+ f"Found measurements for the following valid targets:\t{valid_targetOutput_list}"
+ )
+ logger.error(
+ f"No measurement found for the following valid targets:\t{not_found}"
+ )
+
+ removed_cols = [c for c in list(df.columns) if c not in col_list]
+ logger.debug(
+ "Obtaining df with at least one measured target."
+ f"Removed columns: {removed_cols}"
+ )
+
+ df = df[col_list]
+ df.dropna(inplace=True)
+
+ if df.empty:
+ logger.warning(
+ "Although there were some measured properties in the discovery space."
+ )
+ logger.warning(
+ "All measured properties in the discovery space"
+ f"are different from the desired outputs {targetOutput_list}.Returning empty DataFrame\n "
+ )
+
+ return df
+
+
+def get_source_and_target(
+ discoverySpace: DiscoverySpace | str,
+ targetOutput: str,
+ log_string: str = "",
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+ """
+ Build source (labeled) and target (unlabeled) DataFrames for a target output.
+
+ Returns:
+ Tuple of (source_df, target_df)
+ """
+ dfm = get_df_at_least_one_measured_value(discoverySpace, [targetOutput])
+ dfu = get_df_all_entities_no_measurements(discoverySpace)
+ keys = [c for c in dfu.columns if c in dfm.columns and c != "identifier"]
+
+ if dfm.empty:
+ logger.warning("The source space is empty")
+ return dfm, dfu
+
+ df = dfu.merge(dfm, on=keys, how="left")
+
+ if targetOutput not in list(df.columns):
+ logger.info(
+ f"""The target output was not present in the columns of the measured+unmeasured DataFrame,' \
+ meaning that '{targetOutput}' has never been measured in this space.
+ dfm.empty = {df.empty}. Adding an empty column to the DataFrame.
+ """
+ )
+ logger.debug("Adding an empty column to the DataFrame.")
+ df[targetOutput] = pd.NA
+
+ if targetOutput in list(df.columns):
+ df_measured_drop_na = df.dropna(subset=[targetOutput])
+ df_unmeasured_drop_na = df[df[targetOutput].isna()].drop(columns=[targetOutput])
+ n_rows_dropped = len(df) - len(df_measured_drop_na)
+ logger.debug(
+ f"Dropped {n_rows_dropped} rows. Function called with log_string={log_string}"
+ )
+ if df_measured_drop_na.empty:
+ logger.warning(
+ f"Empty source after dropping rows that contain Nan in {targetOutput} column"
+ )
+ if df_unmeasured_drop_na.empty:
+ logger.warning(
+ f"Empty target after filtering rows that contain Nan in {targetOutput} column"
+ )
+ return df_measured_drop_na, df_unmeasured_drop_na
+
+ save_path = "df_with_no_targetOutput_columns.csv"
+ logger.error(
+ f"'{targetOutput}' column is missing, saving df in {save_path}, returning unmerged DataFrames"
+ )
+ df.to_csv(save_path)
+ return dfm, dfu
+
+
+# ============================================================================
+# Entity/Point Conversion
+# ============================================================================
+
+
+def validate_points_in_space(
+ points: list[dict],
+ space: DiscoverySpace,
+) -> tuple[list[dict], list[int]]:
+ """
+ Validate point dictionaries against a Discovery Space.
+
+ Returns:
+ Tuple of (valid_points, invalid_indices)
+ """
+ valid_points: list[dict] = []
+ invalid_indices: list[int] = []
+
+ for i, p in enumerate(points):
+ if space.entitySpace.isPointInSpace(p):
+ valid_points.append(p)
+ else:
+ invalid_indices.append(i)
+ return valid_points, invalid_indices
+
+
+def df_to_points(
+ df: pd.DataFrame,
+ cols: list[str] | None = None,
+ dropna: bool = True,
+ drop_duplicates: bool = False,
+) -> list[dict[Hashable, Any]]:
+ """
+ Convert DataFrame rows to list of point dictionaries.
+
+ Args:
+ df: Input DataFrame
+ cols: Columns to include
+ dropna: If True, drop rows containing NaN
+ drop_duplicates: If True, drop duplicate rows
+
+ Returns:
+ List of point dictionaries
+ """
+ if cols is None:
+ cols = list(df.columns)
+ missing = set(cols) - set(df.columns)
+ if missing:
+ raise KeyError(f"Requested columns not present in DataFrame: {missing}")
+
+ sub = df[cols].copy()
+ if dropna:
+ sub = sub.dropna(how="any")
+ if drop_duplicates:
+ sub = sub.drop_duplicates()
+
+ def to_py(x: object) -> object:
+ if isinstance(x, (np.generic)):
+ return x.item()
+ return x
+
+ for c in sub.columns:
+ sub[c] = sub[c].map(to_py)
+
+ return sub.to_dict(orient="records")
+
+
+def df_to_points_parsing(
+ df: pd.DataFrame,
+ cols: list[str] | None = None,
+ dropna: bool = True,
+ parse_values: bool = False,
+) -> list[dict]:
+ """Convert DataFrame to points with optional string value parsing."""
+ import ast
+
+ points = df_to_points(df, cols=cols, dropna=dropna)
+ if not parse_values:
+ return points
+
+ parsed = []
+ for p in points:
+ newp = {}
+ for k, v in p.items():
+ if isinstance(v, str):
+ try:
+ newp[k] = ast.literal_eval(v)
+ except Exception:
+ newp[k] = v
+ else:
+ newp[k] = v
+ parsed.append(newp)
+ return parsed
+
+
+def make_points_from_df(
+ df: pd.DataFrame,
+ space: DiscoverySpace,
+ cols: list[str] | None = None,
+ dropna: bool = True,
+ parse_values: bool = True,
+) -> list[dict]:
+ """
+ Convert DataFrame of constitutive properties into point dictionaries.
+
+ Args:
+ df: Input DataFrame
+ space: Discovery Space providing canonical order
+ cols: Explicit list of columns to use
+ dropna: If True, drop rows with NaN
+ parse_values: If True, parse string values
+
+ Returns:
+ List of point dictionaries
+ """
+ if cols is None:
+ cols = [cp.identifier for cp in space.entitySpace.constitutiveProperties]
+
+ missing = set(cols) - set(df.columns)
+ if missing:
+ raise KeyError(f"Requested columns not present in DataFrame: {missing}")
+
+ return df_to_points_parsing(df, cols=cols, dropna=dropna, parse_values=parse_values)
+
+
+def get_list_of_entities_from_df_and_space(
+ df: pd.DataFrame, space: DiscoverySpace
+) -> list[Entity]:
+ """
+ Convert DataFrame rows to Entity objects validated against a discovery space.
+
+ Args:
+ df: DataFrame containing constitutive property values
+ space: DiscoverySpace defining the entity space constraints
+
+ Returns:
+ List of valid Entity objects
+ """
+ points = make_points_from_df(df=df, space=space)
+ valid_points, __ = validate_points_in_space(points, space)
+
+ list_of_entities = []
+ from orchestrator.schema.point import SpacePoint
+
+ for p in valid_points:
+ sp = SpacePoint(entity=p)
+ entity = sp.to_entity(generatorid="no_priors_characterization")
+ list_of_entities.append(entity)
+
+ numberEntities = len(list_of_entities)
+ if numberEntities != len(df):
+ numberEntities_log = f"""Warning: number of valid entities {numberEntities} is different from the number of rows in the ordered df {len(df)}.
+ This means that some rows in the ordered df did not correspond to valid entities in the discovery space.
+ """
+ logging.warning(numberEntities_log)
+ return list_of_entities
+
+
+# Made with Bob
diff --git a/plugins/operators/trim/src/trim/trim_pydantic.py b/plugins/operators/trim/src/trim/trim_pydantic.py
index 0010d297b..05362a7ab 100644
--- a/plugins/operators/trim/src/trim/trim_pydantic.py
+++ b/plugins/operators/trim/src/trim/trim_pydantic.py
@@ -5,9 +5,10 @@
from typing import Annotated
import pydantic
-from no_priors_characterization.no_priors_pydantic import NoPriorsParameters
from pydantic import BaseModel, ConfigDict, Field, model_validator
+from trim.samplers.no_priors_parameters import NoPriorsParameters
+
class SamplingBudget(pydantic.BaseModel):
minPoints: Annotated[
diff --git a/plugins/operators/trim/src/trim/trim_sampler.py b/plugins/operators/trim/src/trim/trim_sampler.py
index c22b83bea..9ca0f7833 100644
--- a/plugins/operators/trim/src/trim/trim_sampler.py
+++ b/plugins/operators/trim/src/trim/trim_sampler.py
@@ -20,6 +20,11 @@
from autogluon.tabular import TabularDataset, TabularPredictor
from orchestrator.core.discoveryspace.samplers import BaseSampler
+from trim.samplers.no_priors_utils import (
+ get_index_list_van_der_corput,
+ get_list_of_entities_from_df_and_space,
+ get_source_and_target,
+)
from trim.trim_pydantic import TrimParameters
if TYPE_CHECKING:
@@ -29,11 +34,6 @@
from orchestrator.modules.operators.discovery_space_manager import (
DiscoverySpaceManager,
)
-from no_priors_characterization.utils import (
- get_index_list_van_der_corput,
- get_list_of_entities_from_df_and_space,
- get_source_and_target,
-)
from orchestrator.utilities.pandas import sort_rows_by_column_names
from trim.utils.exceptions import InsufficientDataError
diff --git a/plugins/operators/trim/src/trim/utils/order.py b/plugins/operators/trim/src/trim/utils/order.py
index 459657ade..eb7c2a8b8 100644
--- a/plugins/operators/trim/src/trim/utils/order.py
+++ b/plugins/operators/trim/src/trim/utils/order.py
@@ -9,8 +9,10 @@
import numpy as np
import pandas as pd
from autogluon.tabular import TabularPredictor
-from no_priors_characterization.utils import get_sampling_indices_multi_dimensional
+from trim.samplers.no_priors_utils import (
+ get_sampling_indices_multi_dimensional,
+)
from trim.trim_pydantic import AutoGluonArgs
from trim.utils.miscellaneous import delete_dir
diff --git a/plugins/operators/trim/tests/test_high_dimensional_sampling.py b/plugins/operators/trim/tests/test_high_dimensional_sampling.py
index 0b2c6457c..c8971692f 100644
--- a/plugins/operators/trim/tests/test_high_dimensional_sampling.py
+++ b/plugins/operators/trim/tests/test_high_dimensional_sampling.py
@@ -13,10 +13,8 @@
from typing import Any
import pytest
-from no_priors_characterization.utils.high_dimensional_sampling import (
- concatenated_latin_hypercube_sampling,
-)
from test_data_documentation import TEST_DATAFRAMES
+from trim.samplers.no_priors_utils import concatenated_latin_hypercube_sampling
class TestConcatenatedLatinHypercubeSampling:
diff --git a/plugins/operators/trim/tests/test_sampling.py b/plugins/operators/trim/tests/test_sampling.py
index a0113b1ae..4fcc79486 100644
--- a/plugins/operators/trim/tests/test_sampling.py
+++ b/plugins/operators/trim/tests/test_sampling.py
@@ -2,10 +2,7 @@
# SPDX-License-Identifier: MIT
import pytest
-from no_priors_characterization.utils.one_dimensional_sampling import (
- get_index_list_ordered_partitions,
- get_index_list_van_der_corput,
-) # Replace with actual module name
+from trim.samplers.no_priors_utils import get_index_list_van_der_corput
# --- Error Handling Tests ---
@@ -36,21 +33,3 @@ def test_get_index_list_nn_full_sampling() -> None:
def test_get_index_list_nn_sorted_sampling(points: int, expected: list[int]) -> None:
"""Should return sorted sampling for segment of length 17."""
assert get_index_list_van_der_corput(17, points, sort=True) == expected
-
-
-# --- Functional Tests for get_index_list_ordered_partitions ---
-
-
-@pytest.mark.parametrize(
- ("points", "expected"),
- [
- (7, [0, 2, 4, 8, 10, 12, 16]),
- (8, [0, 2, 4, 6, 8, 10, 12, 16]),
- (9, [0, 2, 4, 6, 8, 10, 12, 14, 16]),
- ],
-)
-def test_get_index_list_ordered_partitions_sampling(
- points: int, expected: list[int]
-) -> None:
- """Should return correct partition-based sampling for segment of length 17."""
- assert get_index_list_ordered_partitions(17, points) == expected
diff --git a/pyproject.toml b/pyproject.toml
index f1ac80459..2300b3854 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -92,7 +92,6 @@ resolution-helpers = [
# cuda dependencies.
test = [
"ado-autoconf",
- "ado-no-priors-characterization",
"ado-ray-tune",
"ado-sfttrainer; python_version < '3.13'",
"ado-trim",
@@ -152,7 +151,6 @@ members = [
[tool.uv.sources]
ado-autoconf = { workspace = true, editable = true }
-ado-no-priors-characterization = { workspace = true, editable = true }
ado-ray-tune = { workspace = true, editable = true }
ado-sfttrainer = { path = "plugins/actuators/sfttrainer", editable = true }
ado-trim = { workspace = true, editable = true }
diff --git a/requirements.txt b/requirements.txt
index 7614c34cf..199703253 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -422,7 +422,6 @@ googleapis-common-protos==1.74.0 \
# via google-api-core
greenlet==3.4.0 ; platform_machine == 'AMD64' or platform_machine == 'WIN32' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'ppc64le' or platform_machine == 'win32' or platform_machine == 'x86_64' \
--hash=sha256:04403ac74fe295a361f650818de93be11b5038a78f49ccfb64d3b1be8fbf1267 \
- --hash=sha256:0e1254cf0cbaa17b04320c3a78575f29f3c161ef38f59c977108f19ffddaf077 \
--hash=sha256:1054c5a3c78e2ab599d452f23f7adafef55062a783a8e241d24f3b633ba6ff82 \
--hash=sha256:16dec271460a9a2b154e3b1c2fa1050ce6280878430320e85e08c166772e3f97 \
--hash=sha256:1a54a921561dd9518d31d2d3db4d7f80e589083063ab4d3e2e950756ef809e1a \
@@ -436,27 +435,20 @@ greenlet==3.4.0 ; platform_machine == 'AMD64' or platform_machine == 'WIN32' or
--hash=sha256:5b99e87be7eba788dd5b75ba1cde5639edffdec5f91fe0d734a249535ec3408c \
--hash=sha256:5cb614ace7c27571270354e9c9f696554d073f8aa9319079dcba466bbdead711 \
--hash=sha256:636d2f95c309e35f650e421c23297d5011716be15d966e6328b367c9fc513a82 \
- --hash=sha256:6f0def07ec9a71d72315cf26c061aceee53b306c36ed38c35caba952ea1b319d \
--hash=sha256:805bebb4945094acbab757d34d6e1098be6de8966009ab9ca54f06ff492def58 \
--hash=sha256:8424683caf46eb0eb6f626cb95e008e8cc30d0cb675bdfa48200925c79b38a08 \
--hash=sha256:849f8bc17acd6295fcb5de8e46d55cc0e52381c56eaf50a2afd258e97bc65940 \
- --hash=sha256:89995ce5ddcd2896d89615116dd39b9703bfa0c07b583b85b89bf1b5d6eddf81 \
- --hash=sha256:8c5696c42e6bb5cfb7c6ff4453789081c66b9b91f061e5e9367fa15792644e76 \
--hash=sha256:90036ce224ed6fe75508c1907a77e4540176dcf0744473627785dd519c6f9996 \
--hash=sha256:9390ad88b652b1903814eaabd629ca184db15e0eeb6fe8a390bbf8b9106ae15a \
--hash=sha256:956215d5e355fffa7c021d168728321fd4d31fd730ac609b1653b450f6a4bc71 \
- --hash=sha256:98eedd1803353daf1cd9ef23eef23eda5a4d22f99b1f998d273a8b78b70dd47f \
--hash=sha256:9b2d9a138ffa0e306d0e2b72976d2fb10b97e690d40ab36a472acaab0838e2de \
--hash=sha256:a0a53fb071531d003b075c444014ff8f8b1a9898d36bb88abd9ac7b3524648a2 \
--hash=sha256:a19093fbad824ed7c0f355b5ff4214bffda5f1a7f35f29b31fcaa240cc0135ab \
--hash=sha256:a1c4f6b453006efb8310affb2d132832e9bbb4fc01ce6df6b70d810d38f1f6dc \
--hash=sha256:a70ed1cb0295bee1df57b63bf7f46b4e56a5c93709eea769c1fec1bb23a95875 \
- --hash=sha256:ac6a5f618be581e1e0713aecec8e54093c235e5fa17d6d8eb7ffc487e2300508 \
--hash=sha256:b45e45fe47a19051a396abb22e19e7836a59ee6c5a90f3be427343c37908d65b \
- --hash=sha256:b7857e2202aae67bc5725e0c1f6403c20a8ff46094ece015e7d474f5f7020b55 \
--hash=sha256:c660bce1940a1acae5f51f0a064f1bc785d07ea16efcb4bc708090afc4d69e83 \
--hash=sha256:d18eae9a7fb0f499efcd146b8c9750a2e1f6e0e93b5a382b3481875354a430e6 \
- --hash=sha256:d336d46878e486de7d9458653c722875547ac8d36a1cff9ffaf4a74a3c1f62eb \
--hash=sha256:ee407d4d1ca9dc632265aee1c8732c4a2d60adff848057cdebfe5fe94eb2c8a2 \
--hash=sha256:f38b81880ba28f232f1f675893a39cf7b6db25b31cc0a09bb50787ecf957e85e \
--hash=sha256:f50a96b64dafd6169e595a5c56c9146ef80333e67d4476a65a9c55f400fc22ff \
diff --git a/tests/fixtures/modules/operators.py b/tests/fixtures/modules/operators.py
index 7e3320bc9..557613eba 100644
--- a/tests/fixtures/modules/operators.py
+++ b/tests/fixtures/modules/operators.py
@@ -17,7 +17,7 @@
@pytest.fixture
def expected_characterize_operators() -> list[str]:
- return ["profile", "detect_anomalous_series", "trim", "no_priors_characterization"]
+ return ["profile", "detect_anomalous_series", "trim"]
@pytest.fixture
diff --git a/tests/operators/test_general_orchestration.py b/tests/operators/test_general_orchestration.py
index 4db4c904a..86c250800 100644
--- a/tests/operators/test_general_orchestration.py
+++ b/tests/operators/test_general_orchestration.py
@@ -14,7 +14,7 @@
@pytest.mark.parametrize(
"operator_name",
- ["profile", "no_priors_characterization"],
+ ["profile"],
)
def test_operator_callable_for_harness_unwraps_decorated_operator(
operator_name: str,
diff --git a/tests/operators/test_trim_example_integration.py b/tests/operators/test_trim_example_integration.py
index 4e05f9eb5..260e58074 100644
--- a/tests/operators/test_trim_example_integration.py
+++ b/tests/operators/test_trim_example_integration.py
@@ -9,7 +9,6 @@
import pytest
import trim_custom_experiments.experiments # noqa: F401 — registers ideal-gas experiment
import yaml
-from no_priors_characterization.no_priors_pydantic import NoPriorsParameters
from testcontainers.mysql import MySqlContainer
import orchestrator.modules.operators.randomwalk # noqa: F401
@@ -31,6 +30,7 @@
pytest.importorskip("autogluon")
+from trim.samplers.no_priors_parameters import NoPriorsParameters
from trim.trim_pydantic import (
AutoGluonArgs,
SamplingBudget,
diff --git a/uv.lock b/uv.lock
index 158ae9425..853607571 100644
--- a/uv.lock
+++ b/uv.lock
@@ -16,7 +16,6 @@ required-markers = [
members = [
"ado-autoconf",
"ado-core",
- "ado-no-priors-characterization",
"ado-ray-tune",
"ado-trim",
"ado-vllm-performance",
@@ -122,7 +121,6 @@ resolution-helpers = [
]
test = [
{ name = "ado-autoconf" },
- { name = "ado-no-priors-characterization" },
{ name = "ado-ray-tune" },
{ name = "ado-sfttrainer", marker = "python_full_version < '3.13'" },
{ name = "ado-trim" },
@@ -186,7 +184,6 @@ docs = [
resolution-helpers = [{ name = "urllib3", specifier = ">=2.5.0" }]
test = [
{ name = "ado-autoconf", editable = "plugins/custom_experiments/autoconf" },
- { name = "ado-no-priors-characterization", editable = "plugins/operators/no-priors-characterization" },
{ name = "ado-ray-tune", editable = "plugins/operators/ray_tune" },
{ name = "ado-sfttrainer", marker = "python_full_version < '3.13'", editable = "plugins/actuators/sfttrainer" },
{ name = "ado-trim", editable = "plugins/operators/trim" },
@@ -200,25 +197,6 @@ test = [
{ name = "trim-custom-experiments", editable = "examples/trim/custom_experiments" },
]
-[[package]]
-name = "ado-no-priors-characterization"
-source = { editable = "plugins/operators/no-priors-characterization" }
-dependencies = [
- { name = "ado-core" },
- { name = "numpy" },
- { name = "pandas" },
- { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
- { name = "scipy", version = "1.16.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
-]
-
-[package.metadata]
-requires-dist = [
- { name = "ado-core", editable = "." },
- { name = "numpy" },
- { name = "pandas", specifier = ">=2.2.0" },
- { name = "scipy" },
-]
-
[[package]]
name = "ado-ray-tune"
source = { editable = "plugins/operators/ray_tune" }
@@ -268,7 +246,6 @@ name = "ado-trim"
source = { editable = "plugins/operators/trim" }
dependencies = [
{ name = "ado-core" },
- { name = "ado-no-priors-characterization" },
{ name = "autogluon-tabular", extra = ["catboost", "xgboost"] },
{ name = "numpy" },
{ name = "pandas" },
@@ -278,7 +255,6 @@ dependencies = [
[package.metadata]
requires-dist = [
{ name = "ado-core", editable = "." },
- { name = "ado-no-priors-characterization", editable = "plugins/operators/no-priors-characterization" },
{ name = "autogluon-tabular", extras = ["catboost", "xgboost"], specifier = "==1.5" },
{ name = "numpy" },
{ name = "pandas", specifier = ">=2.2.0" },
@@ -2618,18 +2594,14 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/0c/bc/e30e1e3d5e8860b0e0ce4d2b16b2681b77fd13542fc0d72f7e3c22d16eff/greenlet-3.4.0-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:d18eae9a7fb0f499efcd146b8c9750a2e1f6e0e93b5a382b3481875354a430e6", size = 284315, upload-time = "2026-04-08T17:02:52.322Z" },
{ url = "https://files.pythonhosted.org/packages/5b/cc/e023ae1967d2a26737387cac083e99e47f65f58868bd155c4c80c01ec4e0/greenlet-3.4.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:636d2f95c309e35f650e421c23297d5011716be15d966e6328b367c9fc513a82", size = 601916, upload-time = "2026-04-08T16:24:35.533Z" },
{ url = "https://files.pythonhosted.org/packages/67/32/5be1677954b6d8810b33abe94e3eb88726311c58fa777dc97e390f7caf5a/greenlet-3.4.0-cp310-cp310-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:234582c20af9742583c3b2ddfbdbb58a756cfff803763ffaae1ac7990a9fac31", size = 616399, upload-time = "2026-04-08T16:30:54.536Z" },
- { url = "https://files.pythonhosted.org/packages/82/0a/3a4af092b09ea02bcda30f33fd7db397619132fe52c6ece24b9363130d34/greenlet-3.4.0-cp310-cp310-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:ac6a5f618be581e1e0713aecec8e54093c235e5fa17d6d8eb7ffc487e2300508", size = 621077, upload-time = "2026-04-08T16:40:34.946Z" },
{ url = "https://files.pythonhosted.org/packages/74/bf/2d58d5ea515704f83e34699128c9072a34bea27d2b6a556e102105fe62a5/greenlet-3.4.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:523677e69cd4711b5a014e37bc1fb3a29947c3e3a5bb6a527e1cc50312e5a398", size = 611978, upload-time = "2026-04-08T15:56:31.335Z" },
- { url = "https://files.pythonhosted.org/packages/8c/39/3786520a7d5e33ee87b3da2531f589a3882abf686a42a3773183a41ef010/greenlet-3.4.0-cp310-cp310-manylinux_2_39_riscv64.whl", hash = "sha256:d336d46878e486de7d9458653c722875547ac8d36a1cff9ffaf4a74a3c1f62eb", size = 416893, upload-time = "2026-04-08T16:43:02.392Z" },
{ url = "https://files.pythonhosted.org/packages/bd/69/6525049b6c179d8a923256304d8387b8bdd4acab1acf0407852463c6d514/greenlet-3.4.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:b45e45fe47a19051a396abb22e19e7836a59ee6c5a90f3be427343c37908d65b", size = 1571957, upload-time = "2026-04-08T16:26:17.041Z" },
{ url = "https://files.pythonhosted.org/packages/4e/6c/bbfb798b05fec736a0d24dc23e81b45bcee87f45a83cfb39db031853bddc/greenlet-3.4.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:5434271357be07f3ad0936c312645853b7e689e679e29310e2de09a9ea6c3adf", size = 1637223, upload-time = "2026-04-08T15:57:27.556Z" },
{ url = "https://files.pythonhosted.org/packages/b7/7d/981fe0e7c07bd9d5e7eb18decb8590a11e3955878291f7a7de2e9c668eb7/greenlet-3.4.0-cp310-cp310-win_amd64.whl", hash = "sha256:a19093fbad824ed7c0f355b5ff4214bffda5f1a7f35f29b31fcaa240cc0135ab", size = 237902, upload-time = "2026-04-08T17:03:14.16Z" },
{ url = "https://files.pythonhosted.org/packages/fb/c6/dba32cab7e3a625b011aa5647486e2d28423a48845a2998c126dd69c85e1/greenlet-3.4.0-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:805bebb4945094acbab757d34d6e1098be6de8966009ab9ca54f06ff492def58", size = 285504, upload-time = "2026-04-08T15:52:14.071Z" },
{ url = "https://files.pythonhosted.org/packages/54/f4/7cb5c2b1feb9a1f50e038be79980dfa969aa91979e5e3a18fdbcfad2c517/greenlet-3.4.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:439fc2f12b9b512d9dfa681c5afe5f6b3232c708d13e6f02c845e0d9f4c2d8c6", size = 605476, upload-time = "2026-04-08T16:24:37.064Z" },
{ url = "https://files.pythonhosted.org/packages/d6/af/b66ab0b2f9a4c5a867c136bf66d9599f34f21a1bcca26a2884a29c450bd9/greenlet-3.4.0-cp311-cp311-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a70ed1cb0295bee1df57b63bf7f46b4e56a5c93709eea769c1fec1bb23a95875", size = 618336, upload-time = "2026-04-08T16:30:56.59Z" },
- { url = "https://files.pythonhosted.org/packages/6d/31/56c43d2b5de476f77d36ceeec436328533bff960a4cba9a07616e93063ab/greenlet-3.4.0-cp311-cp311-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8c5696c42e6bb5cfb7c6ff4453789081c66b9b91f061e5e9367fa15792644e76", size = 625045, upload-time = "2026-04-08T16:40:37.111Z" },
{ url = "https://files.pythonhosted.org/packages/e5/5c/8c5633ece6ba611d64bf2770219a98dd439921d6424e4e8cf16b0ac74ea5/greenlet-3.4.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c660bce1940a1acae5f51f0a064f1bc785d07ea16efcb4bc708090afc4d69e83", size = 613515, upload-time = "2026-04-08T15:56:32.478Z" },
- { url = "https://files.pythonhosted.org/packages/80/ca/704d4e2c90acb8bdf7ae593f5cbc95f58e82de95cc540fb75631c1054533/greenlet-3.4.0-cp311-cp311-manylinux_2_39_riscv64.whl", hash = "sha256:89995ce5ddcd2896d89615116dd39b9703bfa0c07b583b85b89bf1b5d6eddf81", size = 419745, upload-time = "2026-04-08T16:43:04.022Z" },
{ url = "https://files.pythonhosted.org/packages/a9/df/950d15bca0d90a0e7395eb777903060504cdb509b7b705631e8fb69ff415/greenlet-3.4.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:ee407d4d1ca9dc632265aee1c8732c4a2d60adff848057cdebfe5fe94eb2c8a2", size = 1574623, upload-time = "2026-04-08T16:26:18.596Z" },
{ url = "https://files.pythonhosted.org/packages/1a/e7/0839afab829fcb7333c9ff6d80c040949510055d2d4d63251f0d1c7c804e/greenlet-3.4.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:956215d5e355fffa7c021d168728321fd4d31fd730ac609b1653b450f6a4bc71", size = 1639579, upload-time = "2026-04-08T15:57:29.231Z" },
{ url = "https://files.pythonhosted.org/packages/d9/2b/b4482401e9bcaf9f5c97f67ead38db89c19520ff6d0d6699979c6efcc200/greenlet-3.4.0-cp311-cp311-win_amd64.whl", hash = "sha256:5cb614ace7c27571270354e9c9f696554d073f8aa9319079dcba466bbdead711", size = 238233, upload-time = "2026-04-08T17:02:54.286Z" },
@@ -2637,9 +2609,7 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/65/8b/3669ad3b3f247a791b2b4aceb3aa5a31f5f6817bf547e4e1ff712338145a/greenlet-3.4.0-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:1a54a921561dd9518d31d2d3db4d7f80e589083063ab4d3e2e950756ef809e1a", size = 286902, upload-time = "2026-04-08T15:52:12.138Z" },
{ url = "https://files.pythonhosted.org/packages/38/3e/3c0e19b82900873e2d8469b590a6c4b3dfd2b316d0591f1c26b38a4879a5/greenlet-3.4.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:16dec271460a9a2b154e3b1c2fa1050ce6280878430320e85e08c166772e3f97", size = 606099, upload-time = "2026-04-08T16:24:38.408Z" },
{ url = "https://files.pythonhosted.org/packages/b5/33/99fef65e7754fc76a4ed14794074c38c9ed3394a5bd129d7f61b705f3168/greenlet-3.4.0-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:90036ce224ed6fe75508c1907a77e4540176dcf0744473627785dd519c6f9996", size = 618837, upload-time = "2026-04-08T16:30:58.298Z" },
- { url = "https://files.pythonhosted.org/packages/44/57/eae2cac10421feae6c0987e3dc106c6d86262b1cb379e171b017aba893a6/greenlet-3.4.0-cp312-cp312-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:6f0def07ec9a71d72315cf26c061aceee53b306c36ed38c35caba952ea1b319d", size = 624901, upload-time = "2026-04-08T16:40:38.981Z" },
{ url = "https://files.pythonhosted.org/packages/36/f7/229f3aed6948faa20e0616a0b8568da22e365ede6a54d7d369058b128afd/greenlet-3.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a1c4f6b453006efb8310affb2d132832e9bbb4fc01ce6df6b70d810d38f1f6dc", size = 615062, upload-time = "2026-04-08T15:56:33.766Z" },
- { url = "https://files.pythonhosted.org/packages/6a/8a/0e73c9b94f31d1cc257fe79a0eff621674141cdae7d6d00f40de378a1e42/greenlet-3.4.0-cp312-cp312-manylinux_2_39_riscv64.whl", hash = "sha256:0e1254cf0cbaa17b04320c3a78575f29f3c161ef38f59c977108f19ffddaf077", size = 423927, upload-time = "2026-04-08T16:43:05.293Z" },
{ url = "https://files.pythonhosted.org/packages/08/97/d988180011aa40135c46cd0d0cf01dd97f7162bae14139b4a3ef54889ba5/greenlet-3.4.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:9b2d9a138ffa0e306d0e2b72976d2fb10b97e690d40ab36a472acaab0838e2de", size = 1573511, upload-time = "2026-04-08T16:26:20.058Z" },
{ url = "https://files.pythonhosted.org/packages/d4/0f/a5a26fe152fb3d12e6a474181f6e9848283504d0afd095f353d85726374b/greenlet-3.4.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:8424683caf46eb0eb6f626cb95e008e8cc30d0cb675bdfa48200925c79b38a08", size = 1640396, upload-time = "2026-04-08T15:57:30.88Z" },
{ url = "https://files.pythonhosted.org/packages/42/cf/bb2c32d9a100e36ee9f6e38fad6b1e082b8184010cb06259b49e1266ca01/greenlet-3.4.0-cp312-cp312-win_amd64.whl", hash = "sha256:a0a53fb071531d003b075c444014ff8f8b1a9898d36bb88abd9ac7b3524648a2", size = 238892, upload-time = "2026-04-08T17:03:10.094Z" },
@@ -2647,9 +2617,7 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/7a/75/7e9cd1126a1e1f0cd67b0eda02e5221b28488d352684704a78ed505bd719/greenlet-3.4.0-cp313-cp313-macosx_11_0_universal2.whl", hash = "sha256:43748988b097f9c6f09364f260741aa73c80747f63389824435c7a50bfdfd5c1", size = 285856, upload-time = "2026-04-08T15:52:45.82Z" },
{ url = "https://files.pythonhosted.org/packages/9d/c4/3e2df392e5cb199527c4d9dbcaa75c14edcc394b45040f0189f649631e3c/greenlet-3.4.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5566e4e2cd7a880e8c27618e3eab20f3494452d12fd5129edef7b2f7aa9a36d1", size = 610208, upload-time = "2026-04-08T16:24:39.674Z" },
{ url = "https://files.pythonhosted.org/packages/da/af/750cdfda1d1bd30a6c28080245be8d0346e669a98fdbae7f4102aa95fff3/greenlet-3.4.0-cp313-cp313-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:1054c5a3c78e2ab599d452f23f7adafef55062a783a8e241d24f3b633ba6ff82", size = 621269, upload-time = "2026-04-08T16:30:59.767Z" },
- { url = "https://files.pythonhosted.org/packages/e0/93/c8c508d68ba93232784bbc1b5474d92371f2897dfc6bc281b419f2e0d492/greenlet-3.4.0-cp313-cp313-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:98eedd1803353daf1cd9ef23eef23eda5a4d22f99b1f998d273a8b78b70dd47f", size = 628455, upload-time = "2026-04-08T16:40:40.698Z" },
{ url = "https://files.pythonhosted.org/packages/54/78/0cbc693622cd54ebe25207efbb3a0eb07c2639cb8594f6e3aaaa0bb077a8/greenlet-3.4.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f82cb6cddc27dd81c96b1506f4aa7def15070c3b2a67d4e46fd19016aacce6cf", size = 617549, upload-time = "2026-04-08T15:56:34.893Z" },
- { url = "https://files.pythonhosted.org/packages/7f/46/cfaaa0ade435a60550fd83d07dfd5c41f873a01da17ede5c4cade0b9bab8/greenlet-3.4.0-cp313-cp313-manylinux_2_39_riscv64.whl", hash = "sha256:b7857e2202aae67bc5725e0c1f6403c20a8ff46094ece015e7d474f5f7020b55", size = 426238, upload-time = "2026-04-08T16:43:06.865Z" },
{ url = "https://files.pythonhosted.org/packages/ba/c0/8966767de01343c1ff47e8b855dc78e7d1a8ed2b7b9c83576a57e289f81d/greenlet-3.4.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:227a46251ecba4ff46ae742bc5ce95c91d5aceb4b02f885487aff269c127a729", size = 1575310, upload-time = "2026-04-08T16:26:21.671Z" },
{ url = "https://files.pythonhosted.org/packages/b8/38/bcdc71ba05e9a5fda87f63ffc2abcd1f15693b659346df994a48c968003d/greenlet-3.4.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:5b99e87be7eba788dd5b75ba1cde5639edffdec5f91fe0d734a249535ec3408c", size = 1640435, upload-time = "2026-04-08T15:57:32.572Z" },
{ url = "https://files.pythonhosted.org/packages/a1/c2/19b664b7173b9e4ef5f77e8cef9f14c20ec7fce7920dc1ccd7afd955d093/greenlet-3.4.0-cp313-cp313-win_amd64.whl", hash = "sha256:849f8bc17acd6295fcb5de8e46d55cc0e52381c56eaf50a2afd258e97bc65940", size = 238760, upload-time = "2026-04-08T17:04:03.878Z" },
diff --git a/website/docs/examples/example_yamls/op_basic_sampling.yaml b/website/docs/examples/example_yamls/op_basic_sampling.yaml
deleted file mode 120000
index e01111145..000000000
--- a/website/docs/examples/example_yamls/op_basic_sampling.yaml
+++ /dev/null
@@ -1 +0,0 @@
-../../../../examples/no-priors-characterization/example_yamls/op_basic_sampling.yaml
\ No newline at end of file
diff --git a/website/docs/examples/example_yamls/op_quick_exploration.yaml b/website/docs/examples/example_yamls/op_quick_exploration.yaml
deleted file mode 120000
index ee9e2d0c6..000000000
--- a/website/docs/examples/example_yamls/op_quick_exploration.yaml
+++ /dev/null
@@ -1 +0,0 @@
-../../../../examples/no-priors-characterization/example_yamls/op_quick_exploration.yaml
\ No newline at end of file
diff --git a/website/docs/examples/example_yamls/op_thorough_coverage.yaml b/website/docs/examples/example_yamls/op_thorough_coverage.yaml
deleted file mode 120000
index c38ecaf28..000000000
--- a/website/docs/examples/example_yamls/op_thorough_coverage.yaml
+++ /dev/null
@@ -1 +0,0 @@
-../../../../examples/no-priors-characterization/example_yamls/op_thorough_coverage.yaml
\ No newline at end of file
diff --git a/website/docs/examples/example_yamls/space_reaction.yaml b/website/docs/examples/example_yamls/space_reaction.yaml
deleted file mode 120000
index 48a189ac5..000000000
--- a/website/docs/examples/example_yamls/space_reaction.yaml
+++ /dev/null
@@ -1 +0,0 @@
-../../../../examples/no-priors-characterization/example_yamls/space_reaction.yaml
\ No newline at end of file
diff --git a/website/docs/examples/no-priors-characterization.md b/website/docs/examples/no-priors-characterization.md
deleted file mode 120000
index 7daf43406..000000000
--- a/website/docs/examples/no-priors-characterization.md
+++ /dev/null
@@ -1 +0,0 @@
-../../../examples/no-priors-characterization/README.md
\ No newline at end of file
diff --git a/website/docs/operators/no-priors-characterization.md b/website/docs/operators/no-priors-characterization.md
deleted file mode 120000
index dee9ca30f..000000000
--- a/website/docs/operators/no-priors-characterization.md
+++ /dev/null
@@ -1 +0,0 @@
-../../../plugins/operators/no-priors-characterization/README.md
\ No newline at end of file
diff --git a/website/docs/operators/random-walk.md b/website/docs/operators/random-walk.md
index 507bd16c7..e9327a250 100644
--- a/website/docs/operators/random-walk.md
+++ b/website/docs/operators/random-walk.md
@@ -59,25 +59,6 @@ After the second operation:
replayed (as they were already measured during the first operation)
- The timeseries of this second operation is stored. It has 200 entities in it.
-## Controlling sampling and measurements: Continuous batching
-
-When a `random_walk` operation encounters an unmeasured entity in the
-`discoveryspace`, it applies the experiments defined by its `measurementspace`.
-Depending on the experiments, you may want to control how many concurrent
-experiments are being executed.
-
-`random_walk` uses continuous batching to set the number of concurrent
-**requested** experiments and ensure that, as far as possible, there is always
-this number of experiments in flight.
-
-This approach maximizes throughput compared to standard batch-wise submission.
-In the normal case the time to finish measuring batch of N entities is, at a
-minimum, the time taken for the longest experiment to complete. This means if
-one experiment is very long and the others short, there can be capacity in the
-system for (N-1) additional entities to be measured but it will not be used.
-
-The next section explains more about configuring continuous batching
-
## Configuring a `random_walk` operation
The parameters for a `random_walk` operation are (default values shown):
@@ -123,6 +104,8 @@ spaces:
- your-spaces
```
+The following sections explain the different options
+
!!! info end
You can get a default `random_walk` operation template and the schema of its
@@ -131,10 +114,27 @@ spaces:
The information output by this command should always be preferred
over the information presented here if there is an inconsistency.
+## Continuous batching
+
+When a `random_walk` operation encounters an unmeasured entity in the
+`discoveryspace`, it applies the experiments defined by its `measurementspace`.
+Depending on the experiments, you may want to control how many concurrent
+experiments are being executed.
+
+`random_walk` uses continuous batching to set the number of concurrent
+**requested** experiments and ensure that, as far as possible, there is always
+this number of experiments in flight.
+
+This approach maximizes throughput compared to standard batch-wise submission.
+In the normal case the time to finish measuring batch of N entities is, at a
+minimum, the time taken for the longest experiment to complete. This means if
+one experiment is very long and the others short, there can be capacity in the
+system for (N-1) additional entities to be measured but it will not be used.
+
### Batch Size and Concurrent Experiments
-When it comes to managing resources during an exploration, the key variable one
-wants to control is the number of concurrent experiments.
+When it comes to managing resources during an exploration, the key variable
+to control is the number of concurrent experiments.
For the `random_walk` operator, this number is its `batchSize` parameter (the
number of initial entities submitted) multiplied by the number of experiments in
@@ -151,7 +151,37 @@ this many concurrent experiment requests during the operation.
Hence, continuous batching can only maintain that there are
N experiments requested at any time.
-### Base Sampling Types and Modes
+### Sampling all Entities
+
+If either of the following conditions are true you can specify a value of "all"
+for the `numberOfEntities` field in the random walk configuration:
+
+- All dimensions in the `entityspace`s are discrete and bounded or categorical
+- The sampling type is `selector` i.e. you are iterating over an existing set
+ number of entities in a `samplestore`
+
+In the first case `all` will be converted to the size of the space. In the
+second case `all` will be converted to the number of matching entities in the
+`samplestore`.
+
+If both of these conditions is False the `random_walk` operator will raise a
+ValueError when the execution starts.
+
+!!! info end
+
+ Depending on the Filter settings a randomwalk operation may not sample "all"
+ entities even if "all" is specified. This is because the filter may filter out
+ some entities.
+
+!!! warning end
+
+ For `discoveryspaces` where one/both of the above conditions are True setting
+ `numberOfEntities` greater than the corresponding size (size of space, or number
+ of matching entities in `samplestore`) will raise a ValueError. This means you
+ cannot set `numberOfEntities` to an arbitrarily large number to ensure sampling
+ all of them - use `all` instead.
+
+## Basic Sampling
The `samplerConfig` field controls how Entities are sampled during the
operation. The base `samplerConfig` is shown in the examples above and has the
@@ -163,7 +193,7 @@ samplerType: selector
grouping: []
```
-#### Sampling Types
+### Sampling Types
There are two sampling types: `generator` and `selector`.
@@ -175,7 +205,7 @@ are bounded.
The `selector` sampling type draws _existing matching entities_ from the
`samplestore` of the `discoveryspace` i.e. it doesn't use the entity space.
-#### Sampler Modes
+### Sampler Modes
Both sampling types support four modes, which can be categorised as flat or
grouped:
@@ -230,7 +260,7 @@ for x in propertyN.values:
entity({'propertyN':x, 'propertyN_1':y, ..., 'property1':z})
```
-#### Why Grouped Modes?
+### Why Grouped Modes?
The advantage of the group modes is that they can allow
[actuators](../actuators/working-with-actuators.md) to reuse their test
@@ -248,7 +278,7 @@ allows.
See the docs of the specific actuator you are using to see if and how it can
benefit from grouping.
-#### Enabling Grouping
+### Enabling Grouping
To use the grouped modes (`randomgrouped`, `sequentialgrouped`) you need to
supply a list of constitutive properties to group by using the `grouping`
@@ -279,14 +309,10 @@ spaces:
- your-spaces
```
-### Custom Samplers
+## Custom Samplers
-It is also possible to specify that `random_walk` uses a custom sampler. This is
-a class that inherits from
-`orchestrator.core.discoveryspace.samplers.BaseSampler`. This is useful for
-implementing more complex sampling schemes. For example, for developers who want
-to use random_walk to drive an exploration but have custom logic to execute
-before choosing each sample/entity.
+`random_walk` can also use custom samplers for
+more complex sampling schemes.
For custom samplers the `samplerConfig` field has the following structure:
@@ -302,7 +328,94 @@ parameters: # A dictionary of key value pairs with the values for the custom sam
-#### Implementing a Custom Sampler
+### Available Custom Samplers
+
+#### No Priors Sample Selector
+
+To install `NoPriorsSampleSelector` execute
+
+```bash
+pip install plugins/operators/trim/
+```
+
+The `NoPriorsSampleSelector` provides quasi-random sampling strategies designed
+for high-dimensional discrete spaces. These strategies produce sequences where
+consecutive elements are maximally dispersed, favoring uniform coverage of the
+space:
+
+- **`sobol`**: Sobol sequences are low-discrepancy quasi-random sequences widely
+ used for space-filling designs. They provide better coverage than pure random
+ sampling by ensuring points are well-distributed across all dimensions.
+- **`clhs`**: Concatenated Latin Hypercube Sampling (CLHS) samples each dimension
+ independently without replacement, cycling through all values before repeating.
+ This ensures each dimension is uniformly covered.
+
+**Collision Handling**: Sobol sampling may produce collisions (duplicate points),
+when this happens the sampler automatically falls back to CLHS to ensure
+the requested number of unique samples.
+
+##### Example: Sobol Sampling
+
+Here we write an example using Sobol ordering for quasi-random
+low-discrepancy coverage. Make sure to install the TRIM package first.
+Then install TRIM custom experiments with
+
+```bash
+pip install examples/trim/custom_experiments/
+```
+
+To create a discoveryspace and explore it with the TRIM operator, execute the
+following from the root of the ado repository:
+
+```bash
+ado create space -f examples/trim/example_yamls/space_pressure.yaml --new-sample-store
+
+ado create operation -f \
+ examples/trim/example_yamls/randomwalk_sobol_operation.yaml \
+ --use-latest space
+```
+
+The configuration file `randomwalk_sobol_operation.yaml` contains the following
+to specify which points to sample
+
+```yaml
+samplerConfig:
+ module:
+ moduleName: trim.samplers.no_priors_sampler
+ moduleClass: NoPriorsSampleSelector
+ parameters:
+ targetOutput: pressure
+ samples: 20
+ batchSize: 1
+ sampling_strategy: sobol
+```
+
+Since `batchSize: 1` the operation will sample one point at a time, this
+ensures that the sequence of measurements has the desired uniform coverage
+
+```bash
+ado show entities operation --use-latest -o csv --output-file your_file.csv
+```
+
+The file `your_file.csv` will contain the sequence of sampled points, you
+will see something like this:
+
+
+
+```csv
+request_index,result_index,identifier,experiment_id,generatorid,mol,temperature,volume,pressure,request_id,entity_index,valid
+0,0,mol.0.2-temperature.274-volume.8,custom_experiments.calculate_pressure_ideal_gas,no_priors_characterization,0.2,274,8,56.9540689333,c8f814,0,True
+1,0,mol.0.7-temperature.284-volume.1,custom_experiments.calculate_pressure_ideal_gas,no_priors_characterization,0.7,284,1,1652.9151684584,232c8e,0,True
+2,0,mol.0.4-temperature.294-volume.7,custom_experiments.calculate_pressure_ideal_gas,no_priors_characterization,0.4,294,7,139.6829719824,9c6ae3,0,True
+3,0,mol.0.9-temperature.284-volume.5,custom_experiments.calculate_pressure_ideal_gas,no_priors_characterization,0.9,284,5,425.03532903216,83a93d,0,True
+4,0,mol.0.5-temperature.280-volume.6,custom_experiments.calculate_pressure_ideal_gas,no_priors_characterization,0.5,280,6,194.00412775333334,9e8ecd,0,True
+5,0,mol.0.1-temperature.298-volume.4,custom_experiments.calculate_pressure_ideal_gas,no_priors_characterization,0.1,298,4,61.9427465041,db9284,0,True
+...
+```
+
+
+
+### Implementing a Custom Sampler
To implement a custom sampler create a sub-class of
`orchestrator.core.discovery.samplers.BaseSampler` and implement all required
@@ -337,37 +450,7 @@ class MySampler(BaseSampler):
...
```
-### Sampling all Entities
-
-If either of the following conditions are true you can specify a value of "all"
-for the `numberOfEntities` field in the random walk configuration:
-
-- All dimensions in the `entityspace`s are discrete and bounded or categorical
-- The sampling type is `selector` i.e. you are iterating over an existing set
- number of entities in a `samplestore`
-
-In the first case `all` will be converted to the size of the space. In the
-second case `all` will be converted to the number of matching entities in the
-`samplestore`.
-
-If both of these conditions is False the `random_walk` operator will raise a
-ValueError when the execution starts.
-
-!!! info end
-
- Depending on the Filter settings a randomwalk operation may not sample "all"
- entities even if "all" is specified. This is because the filter may filter out
- some entities.
-
-!!! warning end
-
- For `discoveryspaces` where one/both of the above conditions are True setting
- `numberOfEntities` greater than the corresponding size (size of space, or number
- of matching entities in `samplestore`) will raise a ValueError. This means you
- cannot set `numberOfEntities` to an arbitrarily large number to ensure sampling
- all of them - use `all` instead.
-
-### Filtering Entities
+## Filtering Entities
In some circumstance you may want to only sample a subset of Entities. Some
examples include
@@ -391,26 +474,29 @@ which can take the following values:
- `measured`: Only Entities fully measured by the experiments in the
`measurementspace` will be sampled
-### Multiple Measurement
+## Memoization: Reusing existing measurements
-By setting `singleMeasurement:` to False the random walk operation will measure
-ALL entities it samples, even if they already have measurements.
+If `singleMeasurement:` is False, all experiments are applied to
+ALL entities sampled, even if they already have the results for that
+experiment.
-If entities have multiple measurements e.g. you turned this off and then turned
-it on again, then if an entity has multiple measurements each one will be
-replayed.
+By setting `singleMeasurement:` to True (the default) a random walk operation
+will check if an experiment has already been applied to an entity and,
+if it has, reuse a.k.a. replay, the result.
-Check [replayed measurements](explore_operators.md#memoization-replaying-measurements)
+If the entity has multiple results for the same experiment, each one will be
+replayed.
+See [replayed measurements](explore_operators.md#memoization-replaying-measurements)
for more details.
-### Retrying Failed Measurements
+## Retrying Failed Measurements
If the measurement of an entity by an experiment fails `random_walk` can retry
it. The parameter controlling this is `maxRetries` which by default is 0 - no
retries. If `maxRetries` is N then failing measurements will be retried up to
`N` times.
-#### Experiment request index v number of experiments requested
+### Experiment request index v number of experiments requested
To understand a `random_walk` operations logs when maxRetries is greater than 0
it's necessary to understand how it tracks the entity+experiment combinations it
diff --git a/website/mkdocs.yml b/website/mkdocs.yml
index 924dd9542..3e0adc9d9 100644
--- a/website/mkdocs.yml
+++ b/website/mkdocs.yml
@@ -158,7 +158,6 @@ nav:
- Space Characterization:
- Identify the important dimensions of a space: examples/lhu.md
- Quickly building a predictive model for a configuration space: examples/trim.md
- - Characterizing Spaces Without Prior Knowledge: examples/no-priors-characterization.md
- Fine-Tuning Throughput:
- Measure throughput of fine-tuning locally: examples/finetune-locally.md
- Measure throughput of fine-tuning on a remote RayCluster: examples/finetune-remotely.md
@@ -199,4 +198,3 @@ nav:
- The Random Walk Operator: operators/random-walk.md
- The Ray Tune Operator: operators/optimisation-with-ray-tune.md
- The TRIM Operator: operators/trim.md
- - The No-Priors Characterization Operator: operators/no-priors-characterization.md