diff --git a/website/docs/actuators/working-with-actuators.md b/website/docs/actuators/working-with-actuators.md
index 96659e703..2cdf7f535 100644
--- a/website/docs/actuators/working-with-actuators.md
+++ b/website/docs/actuators/working-with-actuators.md
@@ -13,11 +13,11 @@ You can also add [your own custom experiments](creating-custom-experiments.md)
using the special actuator
[_custom_experiments_](creating-custom-experiments.md#using-your-custom-experiment).
-!!! info end
-
- Most actuators are plugins: pieces of code that can be installed
- independently from `ado` and that `ado` can dynamically discover. Custom
- experiments are also plugins.
+> [!NOTE] Actuators and Plugins
+>
+> Most actuators are plugins: pieces of code that can be installed
+> independently from `ado` and that `ado` can dynamically discover. Custom
+> experiments are also plugins.
## Listing available Actuators
@@ -28,13 +28,83 @@ To see a list of available actuators execute
ado get actuators
```
-to see the experiments each provides
+You can also use `ado get actuators --details` which in addition
+outputs the description of the actuators, the number of
+experiments they provide and their version. Below is an example
+of the output:
+
+
+
+```commandline
+┌────────────────────┬─────────────┬─────────────────────────────────────────────────────┬───────────────────────────┐
+│ ACTUATOR ID │ EXPERIMENTS │ DESCRIPTION │ VERSION │
+├────────────────────┼─────────────┼─────────────────────────────────────────────────────┼───────────────────────────┤
+│ SFTTrainer │ 5 │ An actuator for benchmarking fine-tuning of │ 1.5.1.dev13+ga1833142b │
+│ │ │ foundation models │ │
+│ custom_experiments │ 6 │ Actuator for applying user supplied custom │ 1.5.1.dev8+531c6444.dirty │
+│ │ │ experiments │ │
+│ mock │ 2 │ A actuator class for testing │ 1.5.1.dev8+531c6444.dirty │
+│ replay │ 0 │ Special actuator for handling externally defined │ 1.5.1.dev8+531c6444.dirty │
+│ │ │ experiments (experiments we don't have code for) │ │
+│ robotic_lab │ 1 │ A template for creating an actuator │ 1.5.1.dev13+ga1833142b │
+└────────────────────┴─────────────┴─────────────────────────────────────────────────────┴───────────────────────────┘
+```
+
+
+
+## Listing available Experiments
+
+To see the experiments each actuator provides
```commandline
ado get experiments
```
+You can also get see the description of each experiment (if provided)
+with `ado get experiments --details`.
+The output will be similar to:
+
+
+```terminaloutput
+┌────────────────────┬─────────────────────────────────────┬─────────────────────────────────────────────────────────┐
+│ ACTUATOR ID │ EXPERIMENT ID │ DESCRIPTION │
+├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────────────────────────┤
+│ SFTTrainer │ finetune_full_benchmark-v1.0.0 │ Measures the performance of full-finetuning a model for │
+│ │ │ a given (GPU model, number GPUS, batch_size, │
+│ │ │ model_max_length, number nodes) combination. │
+│ SFTTrainer │ finetune_full_stability-v1.0.0 │ Performs 5 full finetune runs of 5 steps each on a │
+│ │ │ model and reports the fraction of those that resulted │
+│ │ │ in GPU OOM, Other error, or No Error for a given (GPU │
+│ │ │ model, number GPUS, batch_size, model_max_length) │
+│ │ │ combination. │
+│ SFTTrainer │ finetune_gptq-lora_benchmark-v1.0.0 │ Measures the performance of GPTQ-LORA tuning a model │
+│ │ │ for a given (GPU model, number GPUS, batch_size, │
+│ │ │ model_max_length, number nodes) combination. │
+│ SFTTrainer │ finetune_lora_benchmark-v1.0.0 │ Measures the performance of LORA tuning a model for a │
+│ │ │ given (GPU model, number GPUS, batch_size, │
+│ │ │ model_max_length, number nodes) combination. │
+│ SFTTrainer │ finetune_pt_benchmark-v1.0.0 │ Measures the performance of prompt-tuning a model for a │
+│ │ │ given (GPU model, number GPUS, batch_size, │
+│ │ │ model_max_length, number nodes) combination. │
+│ custom_experiments │ acid_test │ │
+│ custom_experiments │ avoid_oom_recommender │ An AutoConf recommender that suggests the minimum │
+│ │ │ number of gpus per worker and number of workers │
+│ │ │ necessary to execute a Tuning job whilekeeping the per │
+│ │ │ GPU batch size constant │
+│ custom_experiments │ calculate_density │ │
+│ custom_experiments │ min_gpu_recommender │ An AutoConf plugin that suggests the minimum number of │
+│ │ │ gpus per worker and number of workers necessary to │
+│ │ │ execute a Tuning job │
+│ custom_experiments │ ml-multicloud-cost-v1.0 │ │
+│ custom_experiments │ nevergrad_opt_3d_test_func │ │
+│ mock │ test-experiment │ │
+│ mock │ test-experiment-two │ │
+│ robotic_lab │ peptide_mineralization │ Measures adsorption of peptide lanthanide combinations │
+└────────────────────┴─────────────────────────────────────┴─────────────────────────────────────────────────────────┘
+```
+
+
## Special actuators: replay and custom_experiments
`ado` has two special builtin actuators: `custom_experiments` and `replay`.
@@ -90,7 +160,7 @@ Some additional notes about this process when you are developing an actuator:
## What's next
-
+
@@ -111,4 +181,4 @@ Some additional notes about this process when you are developing an actuator:
[Creating new Operators :octicons-arrow-right-24:](../operators/working-with-operators.md)
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/website/docs/core-concepts/actuators.md b/website/docs/core-concepts/actuators.md
index c6c8debca..1328deb50 100644
--- a/website/docs/core-concepts/actuators.md
+++ b/website/docs/core-concepts/actuators.md
@@ -1,295 +1,30 @@
## Experiments
-To find the values of certain properties of Entities we need to perform
-measurements on them. We use the term "experiment" to describe a particular type
-of measurement. This is also referred to as an "experiment protocol".
+An **Experiment**
+measures the values of a set of output properties given a set of input
+properties. Each time an Experiment is applied to an
+[Entity](entity-spaces.md) it produces a measurement result.
-An experiment will define its inputs - the set of constitutive and observed
-properties it requires entities to have. It will also define the properties it
-measures.
+### Inputs and Outputs
-You can list them with `ado get experiments --details`. The output will be
-similar to:
+Experiments define two things:
-
-```terminaloutput
-┌────────────────────┬─────────────────────────────────────┬─────────────────────────────────────────────────────────┐
-│ ACTUATOR ID │ EXPERIMENT ID │ DESCRIPTION │
-├────────────────────┼─────────────────────────────────────┼─────────────────────────────────────────────────────────┤
-│ SFTTrainer │ finetune_full_benchmark-v1.0.0 │ Measures the performance of full-finetuning a model for │
-│ │ │ a given (GPU model, number GPUS, batch_size, │
-│ │ │ model_max_length, number nodes) combination. │
-│ SFTTrainer │ finetune_full_stability-v1.0.0 │ Performs 5 full finetune runs of 5 steps each on a │
-│ │ │ model and reports the fraction of those that resulted │
-│ │ │ in GPU OOM, Other error, or No Error for a given (GPU │
-│ │ │ model, number GPUS, batch_size, model_max_length) │
-│ │ │ combination. │
-│ SFTTrainer │ finetune_gptq-lora_benchmark-v1.0.0 │ Measures the performance of GPTQ-LORA tuning a model │
-│ │ │ for a given (GPU model, number GPUS, batch_size, │
-│ │ │ model_max_length, number nodes) combination. │
-│ SFTTrainer │ finetune_lora_benchmark-v1.0.0 │ Measures the performance of LORA tuning a model for a │
-│ │ │ given (GPU model, number GPUS, batch_size, │
-│ │ │ model_max_length, number nodes) combination. │
-│ SFTTrainer │ finetune_pt_benchmark-v1.0.0 │ Measures the performance of prompt-tuning a model for a │
-│ │ │ given (GPU model, number GPUS, batch_size, │
-│ │ │ model_max_length, number nodes) combination. │
-│ custom_experiments │ acid_test │ │
-│ custom_experiments │ avoid_oom_recommender │ An AutoConf recommender that suggests the minimum │
-│ │ │ number of gpus per worker and number of workers │
-│ │ │ necessary to execute a Tuning job whilekeeping the per │
-│ │ │ GPU batch size constant │
-│ custom_experiments │ calculate_density │ │
-│ custom_experiments │ min_gpu_recommender │ An AutoConf plugin that suggests the minimum number of │
-│ │ │ gpus per worker and number of workers necessary to │
-│ │ │ execute a Tuning job │
-│ custom_experiments │ ml-multicloud-cost-v1.0 │ │
-│ custom_experiments │ nevergrad_opt_3d_test_func │ │
-│ mock │ test-experiment │ │
-│ mock │ test-experiment-two │ │
-│ robotic_lab │ peptide_mineralization │ Measures adsorption of peptide lanthanide combinations │
-└────────────────────┴─────────────────────────────────────┴─────────────────────────────────────────────────────────┘
-```
-
-
-## Actuators
-
-Experiments are provided by Actuators. An Actuator usually provides sets of
-experiments that work on the same types of entities i.e. have the same or
-similar input requirements. As such Actuators usually are related to a
-particular domain e.g., computational chemistry, foundation model inference,
-robotic biology lab.
-
-`ado get actuators --details` lists the available actuators, the number of
-experiments they provide, a description and their version. Below is an example
-of the output:
-
-
-
-```commandline
-┌────────────────────┬─────────────┬─────────────────────────────────────────────────────┬───────────────────────────┐
-│ ACTUATOR ID │ EXPERIMENTS │ DESCRIPTION │ VERSION │
-├────────────────────┼─────────────┼─────────────────────────────────────────────────────┼───────────────────────────┤
-│ SFTTrainer │ 5 │ An actuator for benchmarking fine-tuning of │ 1.5.1.dev13+ga1833142b │
-│ │ │ foundation models │ │
-│ custom_experiments │ 6 │ Actuator for applying user supplied custom │ 1.5.1.dev8+531c6444.dirty │
-│ │ │ experiments │ │
-│ mock │ 2 │ A actuator class for testing │ 1.5.1.dev8+531c6444.dirty │
-│ replay │ 0 │ Special actuator for handling externally defined │ 1.5.1.dev8+531c6444.dirty │
-│ │ │ experiments (experiments we don't have code for) │ │
-│ robotic_lab │ 1 │ A template for creating an actuator │ 1.5.1.dev13+ga1833142b │
-└────────────────────┴─────────────┴─────────────────────────────────────────────────────┴───────────────────────────┘
-```
-
-
-
-A primary way to extend `ado` is by developing new Actuators providing the
-ability to do experiments on entities in a new domain.
+- **Inputs** — the values an Experiment needs in order to run. Each input
+ restricts the values it accepts through a **Property Domain** (for example,
+ a list of allowed model names, or any integer within a range). See
+ [Properties and Domains](properties-and-domains.md) for the full list of
+ domain types.
+- **Outputs** — the properties the Experiment measures and records. Because
+ many Experiments may target the same concept (e.g. `tokens_per_second`),
+ each output is namespaced to the Experiment that produced it — see
+ [Target and Observed Properties](#target-and-observed-properties).
-### Example: Experiment from the SFTTrainer actuator
+### Example
-Here is an example (truncated) description of an experiment from the SFTTrainer
-actuator.
-
-
-
-```commandline
-Identifier: SFTTrainer.finetune_pt_benchmark-v1.0.0
-Description: Measures the performance of prompt-tuning a model for a given (GPU model, number GPUS, batch_size,
-model_max_length, number nodes) combination.
-
-
-Required Inputs:
-
- Constitutive Properties:
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- Identifier: model_name
- Description: The huggingface name or path to the model
-
- Domain:
-
- Type: CATEGORICAL_VARIABLE_TYPE
- Values: [
- 'allam-1-13b',
- 'granite-13b-v2',
- 'granite-20b-v2',
- 'granite-3-8b',
- 'granite-3.0-1b-a400m-base',
- 'granite-3.1-2b',
- 'granite-3.1-3b-a800m-instruct',
- 'granite-3.1-8b-instruct',
- 'granite-3.3-8b',
- 'granite-34b-code-base',
- 'granite-3b-1.5',
- 'granite-3b-code-base-128k',
- 'granite-4.0-1b',
- 'granite-4.0-350m',
- 'granite-4.0-h-1b',
- 'granite-4.0-h-micro',
- 'granite-4.0-h-small',
- 'granite-4.0-h-tiny',
- 'granite-4.0-micro',
- 'granite-7b-base',
- 'granite-8b-code-base',
- 'granite-8b-code-base-128k',
- 'granite-8b-code-instruct',
- 'granite-8b-japanese',
- 'granite-vision-3.2-2b',
- 'hf-tiny-model-private/tiny-random-BloomForCausalLM',
- 'llama-13b',
- 'llama-7b',
- 'llama2-70b',
- 'llama3-70b',
- 'llama3-8b',
- 'llama3.1-405b',
- 'llama3.1-70b',
- 'llama3.1-8b',
- 'llama3.2-1b',
- 'llama3.2-3b',
- 'llava-v1.6-mistral-7b',
- 'mistral-123b-v2',
- 'mistral-7b-v0.1',
- 'mixtral-8x7b-instruct-v0.1',
- 'smollm2-135m'
- ]
-
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- Identifier: model_max_length
- Description: The maximum context size. Dataset entries with more tokens they are truncated. Entries with
- fewer are padded
-
- Domain:
-
- Type: DISCRETE_VARIABLE_TYPE
- Interval: 1
- Range: [1, 131073]
-
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- Identifier: batch_size
- Description: The total batch size to use
-
- Domain:
-
- Type: DISCRETE_VARIABLE_TYPE
- Interval: 1
- Range: [1, 4097]
-
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- Identifier: number_gpus
- Description: The total number of GPUs to use
-
- Domain:
-
- Type: DISCRETE_VARIABLE_TYPE
- Interval: 1
- Range: [0, 33]
-
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
-
-Optional Inputs and Default Values:
-
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- Identifier: max_steps
- Description: The number of optimization steps to perform. Set to -1 to respect num_train_epochs instead
-
- Domain:
-
- Type: DISCRETE_VARIABLE_TYPE
- Interval: 1
- Range: [-1, 10001]
-
- Default value: -1
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
-
-Outputs:
- ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
- finetune_pt_benchmark-v1.0.0-is_valid
- finetune_pt_benchmark-v1.0.0-dataset_tokens_per_second_per_gpu
- finetune_pt_benchmark-v1.0.0-train_runtime
- finetune_pt_benchmark-v1.0.0-dataset_tokens_per_second
- finetune_pt_benchmark-v1.0.0-train_samples_per_second
- finetune_pt_benchmark-v1.0.0-train_steps_per_second
- finetune_pt_benchmark-v1.0.0-train_tokens_per_second
- finetune_pt_benchmark-v1.0.0-train_tokens_per_gpu_per_second
- finetune_pt_benchmark-v1.0.0-cpu_compute_utilization
- finetune_pt_benchmark-v1.0.0-cpu_memory_utilization
- finetune_pt_benchmark-v1.0.0-gpu_compute_utilization_min
- finetune_pt_benchmark-v1.0.0-gpu_compute_utilization_avg
- finetune_pt_benchmark-v1.0.0-gpu_compute_utilization_max
- finetune_pt_benchmark-v1.0.0-gpu_memory_utilization_min
- finetune_pt_benchmark-v1.0.0-gpu_memory_utilization_avg
- finetune_pt_benchmark-v1.0.0-gpu_memory_utilization_max
- finetune_pt_benchmark-v1.0.0-gpu_memory_utilization_peak
- finetune_pt_benchmark-v1.0.0-gpu_power_watts_min
- finetune_pt_benchmark-v1.0.0-gpu_power_watts_avg
- finetune_pt_benchmark-v1.0.0-gpu_power_watts_max
- finetune_pt_benchmark-v1.0.0-gpu_power_percent_min
- finetune_pt_benchmark-v1.0.0-gpu_power_percent_avg
- finetune_pt_benchmark-v1.0.0-gpu_power_percent_max
- ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-```
-
-
-
-The SFTTrainer actuator provides experiments which measure the performance of
-different fine-tuning techniques on a foundation model fine-tuning deployment
-configuration. Therefore, the entities it takes as input represent fine-tuning
-deployment configuration.
-
-## Experiment Inputs
-
-Experiments define their inputs they require along with valid values for those
-inputs.
-
-### Required Inputs
-
-Experiments can define required inputs. There are properties an Entity must have
-values for, for it to be a valid input to the Experiment.
-
-For example for `SFTTrainer.finetune_pt_benchmark-v1.0.0` shown above we can see
-it requires an Entity to have 4 constitutive properties defined: `model_name`,
-`model_max_length`, `batch_size` and `number_gpus`. Each one has a domain which
-defines the allowed values for that property - if an Entity has a value for a
-property that is not in the defined domain the experiment cannot run on it.
-
-For example, the `number_gpu` property can only have the values from 0 to 32
-(range is exclusive of upper bound)
-
-
-```commandline
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
- Identifier: number_gpus
- Description: The total number of GPUs to use
-
- Domain:
-
- Type: DISCRETE_VARIABLE_TYPE
- Interval: 1
- Range: [0, 33]
-
- ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
-```
-
-
-All the required inputs in the examples above are
-[constitutive properties](entity-spaces.md#entities). However, they can also be
-observed properties (see next section) i.e. properties measured by other
-experiments. If an Experiment, `B` has a required input that is an observed
-property it means the experiment measuring that property has to be run on an
-Entity before Experiment `B` can be run on it.
-
-### Optional Properties
-
-Experiments can also define optional properties. These are properties an Entity
-can have but if they don't the Experiment will give it a default value. In
-addition, the default values of optional properties can be overridden to create
-**parameterized experiments**. This is described further in the
-[`discoveryspace` resource documentation](../resources/discovery-spaces.md).
-
-An example experiment with optional properties is
+Below is the description of `robotic_lab.peptide_mineralization`, an Experiment
+that measures the adsorption of peptide and lanthanide combinations in a
+robotic biology lab:
```terminaloutput
@@ -376,26 +111,80 @@ Outputs:
```
-Here you can see three optional properties, `temperature`, `replicas` and
-`robot_identifier` that are given default values.
+The example shows:
+
+- **Required inputs** — `peptide_identifier`, `peptide_concentration`, and
+ `lanthanide_concentration` must always be provided. Each declares a domain
+ that restricts the valid values.
+- **Optional inputs** — `temperature`, `replicas`, and `robot_identifier` each
+ have a default value and can be overridden.
+- **Outputs** — two properties are measured and recorded:
+ `adsorption_timeseries` and `adsorption_plateau_value`.
+
+### Required Inputs
+
+Values must be provided for all required inputs before the Experiment can run.
+Providing a value outside the declared domain is an error.
+
+Most required inputs are **constitutive properties** — values that describe the
+Entity being measured, such as a model name or a concentration. However, an
+input can also be an **observed property** produced by another Experiment: if
+Experiment `B` requires a value that Experiment `A` produces, Experiment `A`
+must have been run on the Entity first.
+
+See [Properties and Domains](properties-and-domains.md) for a full description
+of constitutive and observed properties and all domain types.
+
+### Optional Inputs
+
+Experiments can also declare optional inputs that have default values.
+The defaults can be overridden to create **parameterized experiments** — useful
+when you want to fix certain settings while exploring others. This is described
+further in the
+[`discoveryspace` resource documentation](../resources/discovery-spaces.md).
-## Target and Observed Properties
+### Target and Observed Properties
-Experiments define properties the properties they measure. However, there may be
-many experiments that measure the same property in different ways so we need a
-way to differentiate them.
+Experiments declare the properties they intend to measure — these are called
+**target properties**. However, many different Experiments might target the
+same property (e.g. `tokens_per_second`) measured in different ways. To
+distinguish them, the actual value recorded by Experiment `A` for target
+property `X` is called an **observed property**, named `A-X`.
-The properties the experiment targets measuring are called `target properties`,
-and the properties it actually measures `observed properties`. If experiment `A`
-has target property `X`, then the observed property is `A-X` i.e. the value of
-target property `X` measured by experiment `A`.
+In the example above:
+
+- `adsorption_plateau_value` is the **target property** — the concept being
+ measured.
+- `peptide_mineralization-adsorption_plateau_value` is the **observed property**
+ — that value as recorded by this specific Experiment.
+
+For a full description of property types see
+[Properties and Domains](properties-and-domains.md).
## Measurement Space
-A measurement space is simply a set of [experiments](actuators.md#experiments).
+A Measurement Space is a collection of [Experiments](#experiments).
+As a result a Measurement Space also defines a set of observed properties and target
+properties as follows
+
+Property Type | Measurement Space Definition
+--- | ---
+Observed | Union of the observed property sets of it Experiments
+Target | Union of the target property sets of it Experiments
+
+When combined with an Entity Space, a Measurement Space forms a
+[Discovery Space](discovery-spaces.md).
+
+## Actuators
+
+Experiments are grouped and provided by **Actuators**. An Actuator typically
+covers a particular domain - for example, foundation model fine-tuning,
+computational chemistry, or robotic biology - and provides a collection of
+related Experiments for that domain.
-Since each experiment has a set of observed properties, a measurement space also
-defines a set of observed properties.
+A primary way to extend `ado` is by developing new Actuators to support
+Experiments in a new domain.
-Since each observed property is an observation of a target property, a
-measurement space also defines a set of target properties.
+The [Actuator documentation](../actuators/working-with-actuators.md)
+has more detail including how to see the Actuators and Experiments
+available in your deployment.
diff --git a/website/docs/core-concepts/concepts.md b/website/docs/core-concepts/concepts.md
index 59bebdd43..8540a71e3 100644
--- a/website/docs/core-concepts/concepts.md
+++ b/website/docs/core-concepts/concepts.md
@@ -1,40 +1,35 @@
## Discovery Space
-The core concept in `ado` is called a _Discovery Space_. In `ado` you are often
-creating and performing operations on Discovery Spaces.
-
-For users familiar with `pandas` and `dataframes`, a Discovery Space combines:
-
-- the schema of a `dataframe` i.e. the columns and what they mean
-- instructions on how to fill the `dataframe` rows
-- the current data in the `dataframe` (and what's missing!)
-
-A Discovery Space expresses the hidden metadata and contextual
-information necessary to understand and extend a dataframe. See
-[Discovery Space](discovery-spaces.md) for more details.
-
-A Discovery Space is built from:
-
-- [Entities and Entity Spaces](entity-spaces.md): The set of things in a
- Discovery Space
-- [Measurement Spaces](actuators.md#measurement-space): The set of experiments
- in a Discovery Space
-- [Experiments and Actuators](actuators.md): The available experiments and the
- tools that execute them
+`ado` is a tool for systematically exploring, measuring, and analysing a space of
+entities - for example, configurations, systems and substances.
+The core concept enabling this is a
+**Discovery Space**. It answers three questions:
+
+- **How are measurements performed?** A Discovery Space defines
+ a set of [Experiments](actuators.md). Each Experiment
+ takes defined inputs and produces measured outputs. The collection of Experiments
+ is called a [Measurement Space](actuators.md#measurement-space).
+- **What do you want to measure?** A Discovery Space defines an
+ [Entity Space](entity-spaces.md) — the
+ specific set of things, called _Entities_, you want to measure.
+- **What have you measured so far?** A Discovery Space uses
+ a **Sample Store**, a shared database, to read and store measurement
+ results.
+
+For users familiar with `pandas`, a Discovery Space is like a DataFrame that
+knows its own schema, knows how to fill in missing values, and shares data
+transparently with other DataFrames. See [Discovery Spaces](discovery-spaces.md)
+for more.
## Sample Store
-In `ado`, data on sampled entities, and the results of experiments on them, are
-kept in a **sample store**.
-
-A single sample store can be used by multiple Discovery Spaces, allowing them to
-share data. This means, for example, if an experiment has already been run,
-`ado` can reuse the existing results instead of running the experiment again,
-saving time and computational resources.
+In `ado`, Entities and the results of Experiments on them are kept in a
+**Sample Store** — a shared database that multiple Discovery Spaces can use.
-This ability to transparently share and reuse data is a core feature of `ado`.
-See [Shared Sample Stores](data-sharing.md) for more details.
+If an Experiment has already been run on an Entity, `ado` can reuse the result
+rather than running it again. This transparent data sharing is a core feature of
+`ado`. See [Shared Sample Stores](data-sharing.md) for more details.
## What's next
@@ -46,17 +41,19 @@ See [Shared Sample Stores](data-sharing.md) for more details.
---
- Next go to [resources](../resources/resources.md) to learn more about working with these core-concepts in `ado`.
+ Go to [resources](../resources/resources.md) to learn more about working
+ with these core concepts in `ado`.
[ado resources :octicons-arrow-right-24:](../resources/resources.md)
- :octicons-workflow-24:{ .lg .middle } **Try our examples**
- ---
+ ---
- Try some of our [examples](../examples/examples.md) if you want to dive straight in.
+ Try some of our [examples](../examples/examples.md) if you want to dive
+ straight in.
- [Our examples :octicons-arrow-right-24:](../examples/examples.md)
+ [Our examples :octicons-arrow-right-24:](../examples/examples.md)
\ No newline at end of file
diff --git a/website/docs/core-concepts/data-sharing.md b/website/docs/core-concepts/data-sharing.md
index b2c93d4a5..8b1075845 100644
--- a/website/docs/core-concepts/data-sharing.md
+++ b/website/docs/core-concepts/data-sharing.md
@@ -1,102 +1,80 @@
# Shared Sample Stores
-In `ado` Entities and measurement results are stored in a database called a
-**Sample Store**. This document describes how Sample Stores enable sharing of
-data. For more general information about these databases see
+In `ado`, Entities and measurement results are stored in a database called a
+**Sample Store**. For more on how Sample Stores are configured and managed see
[their dedicated page](../resources/sample-stores.md).
-There are two key points that underpin data reuse in `ado`:
+Two principles underpin data reuse in `ado`:
-- You can **share** a Sample Store between multiple Discovery Spaces
- - This allows a Discovery Space to (re)use relevant Entities and Measurements
- stored in the Sample Store by operations on other Discovery Spaces
-- **Entities are always shared**. There is only one entry in a Sample Store for
- an Entity
+- **A Sample Store can be shared across multiple Discovery Spaces.** This allows
+ any Discovery Space to access Entities and measurements recorded by operations
+ on other Discovery Spaces that use the same store.
+- **Each Entity has exactly one record in a Sample Store.** If two Discovery
+ Spaces both include the same Entity, they reference the same record — there is
+ no duplication.
> [!NOTE]
>
-> To maximize the chance of data-reuse, similar Discovery Spaces should use the
-> same Sample Store. However, Discovery Spaces do not have to be similar to use
-> the same Sample Store.
+> To maximise the chance of data reuse, similar Discovery Spaces should use the
+> same Sample Store. However, any Discovery Spaces can share a store regardless
+> of how similar they are.
-## When data can be shared in `ado`
-
-There are two situations where data can be shared between Discovery Spaces in
-`ado`:
-
-- **Data Retrieval**: retrieving data about entities and measurements from the
- Discovery Space e.g. `ado show entities space`
-- **Data Generation**: When performing an explore operation on a Discovery
- Space - this type of data reuse is called `memoization`
-
-## How `ado` determines what data can be shared
-
-As a quick recap, a Discovery Space is composed of:
-
-- an [Entity Space](entity-spaces.md) which describes a set of Entities (points)
- to be measured
-- a [Measurement Space](actuators.md#measurement-space) which describes a set of
- Experiments to apply to the points
+## How `ado` matches shared data
### Entities
-Each Entity in the Entity Space has a unique identifier, usually determined by
-its set of constitutive property values. For example, if an Entity has two
-constitutive properties `X` an `Y` with values 4 and 10, its id will be
-'X:4-Y:10'. Since the identifiers of all the Entities in the Entity Space are
-known, the Sample Store can be searched to see if it contains a record for any
-of the Entities.
+Each Entity has a unique identifier derived from its
+[constitutive property](properties-and-domains.md#property-types) values.
+For example, an Entity with properties `X=4` and `Y=10` gets the id
+`X.4-Y.10`. `ado` uses these identifiers to look up Entities in the Sample
+Store, regardless of which Discovery Space originally recorded them.
### Measurements
-Each experiment in a Measurement Space has a unique identifier, determined from
-its base name plus any optional properties that have been explicitly set. When
-an Entity is retrieved from the Sample Store, it contains results of all the
-experiments that have been applied to it. If the identifier of a result matches
-the identifier of an Experiment in the Measurement Space, `ado` determines it
-can be reused.
+Each Experiment also has a unique identifier (its name plus any explicitly set
+optional inputs). When an Entity is retrieved from the Sample Store, it carries
+the results of all Experiments that have been applied to it. `ado` checks
+whether any of those result identifiers match an Experiment in the current
+Measurement Space — if so, the result can be reused.
+
+## Data retrieval modes
-## Data sharing and data retrieval
+When retrieving data from a Discovery Space (e.g. via `ado show entities`),
+there are two modes that control whether shared data is included:
-When retrieving data from a Discovery Space, e.g. via `ado show entities`, you
-are actually retrieving data from the Sample Store that matches the Discovery
-Space. When determining what data to retrieve there are two situations to
-consider:
+
+| Mode | What is returned |
+| --- | --- |
+| **measured** | Only Entities and measurements recorded by operations run directly on *this* Discovery Space. Compatible data from other spaces is excluded. |
+| **matching** | All Entities and measurements in the Sample Store that are compatible with this Discovery Space, regardless of which space produced them. |
+
-- **measured**: retrieve only Entities and measurements that were sampled via an
- operation on the given Discovery Space
- - this can be considered the "no sharing" mode. If an Entity or measurement
- exists in the Sample Store that's compatible with the Discovery Space, but
- no operation on the Discovery Space ever visited it, the "measured" mode
- will not show it
-- **matching**: retrieve all Entities and measurements that match the Discovery
- Space
- - this can be considered the "sharing" mode.
+Use **measured** when you want to see only the results your operations have
+produced. Use **matching** when you want the full picture including any
+compatible data from other spaces.
-## Data sharing and memoization
+## Memoization
> [!IMPORTANT]
>
> Each explore operator should provide a way to turn memoization on and off.
> Check the operator documentation.
-This section explains how data sharing and reuse works during an explore
-operation - a feature called _memoization_. It's recommended you check the
-documentation on [operations](../resources/operation.md) and
+*Memoization* is the name for data reuse that happens automatically during an
+explore operation. It's recommended you also check the documentation on
+[operations](../resources/operation.md) and
[explore operators](../operators/explore_operators.md).
-Briefly, an explore operation samples a point in the Entity Space of a Discovery
-Space and applies the experiments in the Measurement Space to it. In detail, the
-sampling process is as follows:
+When an operation samples an Entity it proceeds as follows:
-- An Entity is sampled from the Entity Space
+- The Entity is sampled from the Entity Space
- The Entity's record is retrieved from the Sample Store if present (via its
unique identifier)
- If **memoization is on**
- - for each experiment in the MeasurementSpace, `ado` checks
- if a result for it already exists (via the experiment's unique identifier)
- - if it does, the result is reused. If there is more than one result, they
- are all reused
-- if **memoization is off**
- - Existing results are ignored. Each experiment in the Measurement Space is
- applied again to the Entity. The new results are added to any existing.
+ - for each Experiment in the Measurement Space, `ado` checks
+ if a result for it already exists (via the Experiment's unique identifier)
+ - if it does, the result is reused. If there is more than one result,
+ they are all reused
+- If **memoization is off**
+ - existing results are ignored. Each Experiment in the Measurement Space is
+ applied again to the Entity. The new results are added to any existing.
diff --git a/website/docs/core-concepts/discovery-spaces.md b/website/docs/core-concepts/discovery-spaces.md
index ad1580d2f..544ca3563 100644
--- a/website/docs/core-concepts/discovery-spaces.md
+++ b/website/docs/core-concepts/discovery-spaces.md
@@ -1,19 +1,22 @@
-A Discovery Space is made up of an [`Entity Space`](entity-spaces.md) and a
-[`Measurement Space`](actuators.md#measurement-space). The `Entity Space`
-defines the things you want to measure and the `Measurement Space` how you want
-to measure them.
+A Discovery Space combines an [Entity Space](entity-spaces.md) and a
+[Measurement Space](actuators.md#measurement-space). The Entity Space defines
+the Entities you want to measure; the Measurement Space defines how they are
+measured. Results are stored in a [Sample Store](data-sharing.md).
-A Discovery Space is also associated with a [Sample Store](data-sharing.md)
-where measurement results and entities are recorded.
+A Discovery Space is a **view** rather than a container — data is fetched from
+the Sample Store on demand. This means multiple Discovery Spaces can share
+measurement results transparently, and any measurement made by anyone using the
+same Sample Store becomes immediately available.
-## Example: Fine-Tuning Deployment Configuration Discovery Space
+## Example: Fine-Tuning Deployment Configuration
-We can combine the Entity Space example for fine-tuning deployment configuration
-[here](entity-spaces.md#example-fine-tuning-deployment-configuration) with one
-of the experiments from the `SFTTrainer` actuator to create the following
-Discovery Space:
+We can combine the
+[Entity Space example](entity-spaces.md#example-fine-tuning-deployment-configuration)
+with one of the Experiments from the [`SFTTrainer` Actuator](../actuators/sft-trainer.md)
+to create the
+following Discovery Space:
@@ -66,98 +69,78 @@ Sample Store identifier: '2351e8'
```
-Here we can see:
+The output shows the unique Discovery Space identifier, the Entity Space (80
+Entities across 7 dimensions), and the Measurement Space (one Experiment with
+17 target properties). Together these define exactly what can be measured and
+what the resulting data will look like.
-- A unique id for the discovery space
-- The entity space
-- For each experiment in the measurement space (in this case just one) the
- target properties it measures.
+## Measurement Space and Entity Space Compatibility
-## Sampling and Measurement
-
-A Discovery Space created with an empty Sample Store has no data associated with
-it i.e. no sampled and measured entities. Adding data requires applying an
-operation, like a Random Walk, to the Discovery Space. This operation samples
-entities from the Entity Space, measures them according to the Measurement Space
-experiments, and places the results into the Sample Store.
-
-Therefore, at any given point in time a Discovery Space will have some number of
-
-- sampled and measured entities
-- sampled and unmeasured entities (because the measurements failed)
-- unsampled entities
+Since an [Experiment](actuators.md#experiments) declares the inputs it needs,
+an Entity can only be measured by that Experiment if its
+[constitutive property](properties-and-domains.md#property-types) values
+satisfy those input requirements.
-The first two will have corresponding data in the Sample Store.
+Since a [Measurement Space](actuators.md#measurement-space) is a set of
+Experiments, it defines a set of required constitutive properties. An Entity
+Space must therefore contain all those properties, and each Entity Space
+[Property Domain](properties-and-domains.md#property-domain-types) must be a
+**subdomain** of the corresponding Experiment's input domain.
-## Comparison: Discovery Space and a DataFrame
+In practice this means the Experiment's declared input domains define the
+**maximum possible extent** of any Entity Space used with that Measurement
+Space. Your Entity Space is always a focused subset within those bounds. For
+example, if an Experiment accepts `batch_size` values from 1 to 4096, your
+Entity Space can restrict that to `[1, 2, 4, 8, 16]` — but it cannot extend
+beyond `[1, 4096]`.
-Comparing a Discovery Space with a DataFrame can help clarify the concept and
-also illustrate the benefits
+| | Full Experiment extent | Focused Entity Space subset |
+| --- | --- | --- |
+| `batch_size` | `[1, 4097]` interval 1 | `[1, 2, 4, 8, 16, 32, 64, 128]` |
+| `model_name` | 40 model names | `[granite-3-8b, llama3-8b]` |
+| `number_gpus` | `[0, 33]` interval 1 | `[2, 4]` |
-### A Discovery Space defines a DataFrame schema
+You can inspect the full extent of an Experiment's inputs with
+`ado get experiments --details`.
-When you create a Discovery Space you can imagine you have created a DataFrame
-schema where:
-
-1. There are Columns for each entity space dimension
-2. There are Columns for each measurement space property
-3. Each row is an entity
-
-If we were to look at the example fine-tuning deployment configuration Discovery
-Space this would look like (the rows and columns are truncated)
-
-
-| model_id | gpu_type | batch_size | model_max_length | number_gpus | ... | finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0.dataset_tokens_per_second | finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0.gpu_memory_utilization_peak | ... |
-| -------- | --------------------- | ---------- | ---------------- | ----------- | --- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------- | --- |
-| lama3-8b | NVIDIA-A100-80GB-PCIe | 2 | 512 | 2 | ... | UNK | UNK | ... |
-| lama3-8b | NVIDIA-A100-80GB-PCIe | 4 | 512 | 2 | ... | UNK | UNK | ... |
-| lama3-8b | NVIDIA-A100-80GB-PCIe | 8 | 512 | 2 | ... | UNK | UNK | ... |
-| ... | ... | ... | ... | ... | ... | ... | ... | ... |
-
-
-This DataFrame has 80 rows, one for each entity, and (4+3+17) columns, one for
-each of the 7 constitutive properties and the 17 target properties of
-`finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0.`
-
-We can fill all the entity space columns for all the rows as we know the full
-space. No measurements have taken place so all the measurement values are
-unknown
-
-### A Discovery Space defines how to fill all the data in the DataFrame
-
-In the above example the columns associated with the measurement space have no
-data. However, the Discovery Space specifies exactly how to obtain this data, as
-it defines the actual experiments, supplied by actuators, that you can execute
-to get it.
+## Sampling and Measurement
-Using the Discovery Space at any point we can choose a row (entity) with no
-measurement and get the measurements
+Data is added to a Discovery Space by running an **operation** on it, for
+example a Random Walk or a Bayesian optimisation. The operation selects
+Entities from the Entity Space, applies the Experiments in the Measurement
+Space to them and stores the results in the Sample Store. Operations are
+described in the [resources documentation](../resources/operation.md).
-### A Discovery Space populates the schema from a shared external source
+An Entity and its measurements only become **associated with a Discovery Space**
+when an operation on that space has sampled them. Even if the underlying Sample
+Store already contains compatible measurements from another Discovery Space,
+those results are not automatically attributed to this one — attribution requires
+an explicit operation. This prevents uncontrolled inheritance of data from other
+spaces.
-A Discovery Space is a view rather than a container.
+At any point in time a Discovery Space therefore has:
-This means when you generate a DataFrame from a Discovery Space the data in the
-rows is fetched from a shared-source. If someone else measured an entity that
-corresponds to one of the rows in your DataFrame it will be automatically
-populated.
+- Entities that have been sampled and successfully measured
+- Entities that have been sampled but whose measurements failed
+- Entities that have not yet been sampled
-As operations are run on a Discovery Space the rows in the table become filled
-in. You can choose to look at:
+> [!NOTE]
+> You can still query compatible data across spaces when needed
+> — see [Shared Sample Stores](data-sharing.md).
-1. Rows filled in by operations on this space (Entities sampled and measured via
- this Discovery Space)
-2. Rows filled in by operations on other spaces (Entities sampled and
- measured via any Discovery Space using same Sample Store)
-3. Rows not filled in at all (Unmeasured entities)
+## Discovery Space vs DataFrame
-### Summary
+For users familiar with `pandas`, the table below summarises how a Discovery
+Space relates to a DataFrame. The key difference is that a Discovery Space
+*knows* its schema and how to fill it, and shares data from a common source
+rather than holding a private copy.
-| Method | Column Definition | Defines how to acquire missing data? | Data Sharing |
-| --------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------- |
-| DataFrame | Ad-Hoc. The data-frame creator defines the columns when it is created. The meaning of the columns must be communicated separately, | Not defined. The DataFrame just holds data | Not possible. A DataFrame is a static object |
-| Discovery Space | Defined by the discovery space. A set of Entity Space columns and Measurement Space columns. | Yes ,defined by the MeasurementSpace | Yes, values are loaded from a distributed shared db on demand |
+| | DataFrame | Discovery Space |
+| --- | --- | --- |
+| Column definition | Ad-hoc — defined when created; meaning communicated separately | Defined by the Discovery Space: Entity Space dimensions + Measurement Space target properties |
+| How to fill missing data | Not defined — a DataFrame just holds data | Defined by the Measurement Space: run the Experiments |
+| Data sharing | Not possible — a DataFrame is a static, private object | Yes — values are fetched from a shared Sample Store on demand |
-
\ No newline at end of file
+
diff --git a/website/docs/core-concepts/entity-spaces.md b/website/docs/core-concepts/entity-spaces.md
index d43eb6be6..ca439dd88 100644
--- a/website/docs/core-concepts/entity-spaces.md
+++ b/website/docs/core-concepts/entity-spaces.md
@@ -1,27 +1,27 @@
## Entities
-Entities represent things that can be measured. Examples are molecules or points
-in an application configuration space.
+Entities represent the things you want to measure — for example, a molecule,
+a fine-tuning deployment configuration, or a robotic experiment setup.
-Entities all have a set of constitutive properties which define them. A
-molecule's constitutive properties might be a SMILES or INCHI string. The
-constitutive properties of a fine-tuning deployment configuration might be GPU
-model, number of GPUs and batch size.
+Every Entity is described by a set of
+[**constitutive properties**](properties-and-domains.md#property-types), and
+corresponding values, that uniquely identify it. For a fine-tuning deployment
+configuration these
+might be GPU model, number of GPUs, and batch size. For a molecule they might
+be a SMILES string.
-An entity will also have observed properties. These are properties measured by
-an experiment (or experiment protocol). For example, a molecule might have an
-an observed property for its `band-gap` while a fine-tuning deployment
-configuration might have an an observed property related to `tokens throughput`.
+Once an Experiment has been run on an Entity, it also gains
+[**observed properties**](actuators.md#target-and-observed-properties) — the
+measured outputs produced by that Experiment.
-### Example: FM Fine-tuning Deployment Configuration
+### Example
-Here is an example of an entity that represents a FM fine-tuning deployment
+Here is an Entity representing a fine-tuning deployment configuration:
```terminaloutput
Identifier: dataset_id.news-tokens-16384plus-entries-4096-model_name.llama3-8b-number_gpus.4.0-model_max_length.2048.0-torch_dtype.bfloat16-batch_size.16.0-gpu_model.NVIDIA-A100-80GB-PCIe
-Generator: explicit_grid_sample_generator
Constitutive properties:
name value
@@ -32,63 +32,39 @@ Constitutive properties:
4 torch_dtype bfloat16
5 batch_size 16.0
6 gpu_model NVIDIA-A100-80GB-PCIe
-
-Observed properties:
- name experiment target-property values
- 0 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_compute_utilization_min [98.14772727272727]
- 1 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_compute_utilization_avg [98.26988636363636]
- 2 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_compute_utilization_max [98.38636363636364]
- 3 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_memory_utilization_min [33.709723284090906]
- 4 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_memory_utilization_avg [33.709723284090906]
- 5 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_memory_utilization_max [33.709723284090906]
- 6 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... gpu_memory_utilization_peak [34.065475]
- 7 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... cpu_compute_utilization [98.94999999999999]
- 8 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... cpu_memory_utilization [6.3182326931818205]
- 9 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... train_runtime [887.5672]
- 10 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... train_samples_per_second [4.615]
- 11 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... train_steps_per_second [0.072]
- 12 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... train_tokens_per_second [9451.236]
- 13 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... train_tokens_per_gpu_per_second [2362.809]
- 14 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... model_load_time [-1.0]
- 15 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... dataset_tokens_per_second [9451.237044361262]
- 16 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... dataset_tokens_per_second_per_gpu [2362.8092610903154]
- 17 finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0-... SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-defa... is_valid [1.0]
-
-Associated experiments:
-
- SFTTrainer.finetune-lora-fsdp-r-4-a-16-tm-default-v1.2.0
```
-For more information about the meaning of `observed properties` see
-[target & observed properties](actuators.md#target-and-observed-properties)
-
-## Entity Spaces
+The identifier is derived from the constitutive property values — two Entities
+with the same values are the same Entity. Once Experiments have been run on
+this Entity, observed properties (measured values such as
+`train_tokens_per_second`) will also appear. See
+[Target and Observed Properties](actuators.md#target-and-observed-properties)
+for more.
-An Entity Space describes a set of entities. The set could be discrete or
-continuous, bounded or unbounded. In `ado` you normally define Entity Spaces and
-then sample Entities from them.
+>[!IMPORTANT] Measuring Entities with Experiments
+>
+> In order for an [Experiment](actuators.md#experiments) to measure an Entity,
+> the Entity's constitutive property values must fall within the input domains
+> declared by the Experiment.
-### Example: Molecules
+## Entity Spaces
-This space has a single dimension with type identifier. This is a property whose
-values are a potentially very large set of unique-ids generated in some fashion.
+An individual Entity is a single point. An **Entity Space** defines the full
+set of Entities you want to explore — all the points you could potentially
+measure.
-```commandline
- Space with non-discrete dimensions. Cannot count entities
- Identifier properties:
- name
- 0 smiles
-```
+An Entity Space is a set of constitutive properties, each with a **Property
+Domain** that constrains the values it can take. Each property is a dimension
+of the space, and every combination of values across all dimensions is an
+Entity in the space. That is,
+the Entity Space is the cartesian product of the dimensions.
### Example: Fine-tuning Deployment Configuration
-This space has 7 dimensions, 4 categorical and 3 discrete. Each of the 4
-categorical dimensions has only a single value. The discrete dimensions each
-have a range of values they can take.
-
+
```commandline
- Number entities: 80
+Number entities: 80
Categorical properties:
name values
0 dataset_id [news-tokens-16384plus-entries-4096]
@@ -102,19 +78,17 @@ have a range of values they can take.
1 model_max_length [512, 8193] None [512, 1024, 2048, 4096, 8192]
2 batch_size [1, 129] None [1, 2, 4, 8, 16, 32, 64, 128]
```
+
-### Property Domains
-
-Each property in an entity space can be associated with a domain. The domain is
-the range of values the property can take and also the probability of those
-values. In the `Fine-tuning Deployment Configuration` example we can see the
-domains for each property. The categorical properties have a set of values and
-the discrete properties a range and also a set of values.
+This space has 7 dimensions: 4 categorical (each fixed to a single value) and
+3 discrete. The total number of Entities is the product of the number of values
+in each dimension:
-In the `Molecules` example we see there is no domain, which means any value of
-`smiles` is allowed. When there is no domain it also means the Entity Space
-alone does not contain sufficient information by itself on how to sample the
-entities.
+```text
+1 × 1 × 1 × 1 × 2 × 5 × 8 = 80 Entities
+```
-By default, the probability is uniform, every value is equally likely, but it
-could also be more complex.
+Each Property Domain constrains one dimension. The categorical properties list
+their allowed values explicitly; the discrete properties specify a range and a
+set of values within it. For the full list of domain types see
+[Properties and Domains](properties-and-domains.md).
diff --git a/website/docs/core-concepts/properties-and-domains.md b/website/docs/core-concepts/properties-and-domains.md
new file mode 100644
index 000000000..24aa2f33d
--- /dev/null
+++ b/website/docs/core-concepts/properties-and-domains.md
@@ -0,0 +1,250 @@
+# Properties and Domains
+
+Properties and Property Domains are what `ado` uses to describe the
+inputs and outputs of [Experiments](actuators.md),
+and the dimensions of [Entity Spaces](entity-spaces.md).
+
+## Property
+
+A **Property** is a named concept — a string identifier such as:
+
+* _gpu-model_
+* _batch-size_
+* _node-selection-method_
+* _solve-time_
+
+A Property may optionally carry metadata (a description) that explains what the
+identifier represents.
+
+Some Properties are also associated with a **Property Domain** that specifies
+the set of values the Property is allowed to take:
+
+* gpu-model → one of {A100, H100, MI300}
+* batch-size → any integer between 1 and 1024
+* node-selection-method → one of {round-robin, random, greedy}
+* solve-time → a positive floating‑point number
+
+## Property Types
+
+In `ado` there are three roles a Property can play:
+
+* **Constitutive properties** — the inputs to Experiments, and the dimensions
+ of an Entity Space. They describe inherent or assumed characteristics of the
+ Entity — the "givens". Constitutive properties usually have a Property Domain.
+* **Target properties** — the properties an Experiment _intends_ to measure,
+ e.g. `train_tokens_per_second`.
+* **Observed properties** — the values actually recorded by a specific
+ Experiment. Because many Experiments may target the same property, each
+ observed property is namespaced to the Experiment that produced it
+ (e.g. `finetune_lora-train_tokens_per_second`). See
+ [Target and Observed Properties](actuators.md#target-and-observed-properties).
+
+> [!NOTE]
+>
+> In ado, usually only constitutive properties have Property Domains.
+
+## Property Domain Types
+
+`ado` supports the following Property Domain types. Each is written under a
+`domain:` key in ado YAML.
+
+> [!NOTE]
+>
+> The different domain types are distinguished by a **Variable Type** field
+> (`variableType`). In many cases this can be omitted and `ado` will infer it
+> automatically — see [Auto-inference](#auto-inference-of-property-domain-types).
+
+### Categorical
+
+A finite, named set of values. Typically strings, though numeric values are
+also allowed.
+
+Used when the property can take one of a fixed list of labels.
+
+```yaml
+domain:
+ values: [granite-3-8b, llama3-8b, mistral-7b-v0.1]
+```
+
+### Discrete
+
+A finite set of numeric values, specified either as an explicit list or as a
+range with a step interval. Both forms are equivalent.
+
+Used when the property takes a countable set of numbers.
+
+**Explicit list:**
+
+```yaml
+domain:
+ values: [1, 2, 4, 8, 16, 32, 64, 128]
+```
+
+**Range with interval** (lower inclusive, upper exclusive):
+
+```yaml
+domain:
+ domainRange: [1, 129]
+ interval: 1
+```
+
+**Interval only** (unbounded discrete — any multiple of the interval):
+
+```yaml
+domain:
+ interval: 1
+```
+
+### Continuous
+
+A continuous numeric domain. Use for real-valued properties.
+
+**Bounded range** — any real value within the bounds is valid:
+
+```yaml
+domain:
+ domainRange: [0, 100]
+```
+
+**Unbounded** — any real number:
+
+```yaml
+domain:
+ variableType: CONTINUOUS_VARIABLE_TYPE
+```
+
+### Binary
+
+Exactly two values: `true` and `false`.
+
+```yaml
+domain:
+ variableType: BINARY_VARIABLE_TYPE
+```
+
+### Open Categorical
+
+Categorical values where the complete set of categories is not known in advance.
+`variableType` must be set explicitly. An optional `values` field can seed a
+known subset of categories.
+
+Used for properties where new categories can appear at runtime, for example a
+molecule identifier or an AI model name.
+
+```yaml
+domain:
+ variableType: OPEN_CATEGORICAL_VARIABLE_TYPE
+```
+
+## Auto-inference of Property Domain Types
+
+When `variableType` is omitted, `ado` infers it from the other fields:
+
+| Fields present | Inferred type |
+| --- | --- |
+| `values` with all numeric entries | `DISCRETE_VARIABLE_TYPE` |
+| `values` with any non-numeric entry | `CATEGORICAL_VARIABLE_TYPE` |
+| `domainRange` only (no `interval`) | `CONTINUOUS_VARIABLE_TYPE` |
+| `domainRange` + `interval` | `DISCRETE_VARIABLE_TYPE` |
+| `interval` only (no `domainRange`) | `DISCRETE_VARIABLE_TYPE` |
+
+`BINARY_VARIABLE_TYPE` and `OPEN_CATEGORICAL_VARIABLE_TYPE` cannot be inferred
+and must always be declared explicitly.
+
+## Probability Functions
+
+Each domain can optionally specify a probability function that controls how
+values are sampled. The default is **uniform** — every value in the domain is
+equally likely.
+
+```yaml
+domain:
+ values: [1, 2, 4, 8, 16]
+ probabilityFunction:
+ identifier: uniform
+```
+
+A **normal** distribution is also available for continuous and discrete domains:
+
+```yaml
+domain:
+ domainRange: [0.0, 1.0]
+ probabilityFunction:
+ identifier: normal
+ parameters:
+ mean: 0.5
+ std: 0.1
+```
+
+When no `probabilityFunction` is specified, uniform sampling is used.
+
+## Property Subdomains
+
+Domain A is a **subdomain** of domain B if every value in A is also a valid
+value in B. A subdomain represents a narrowed or more specific version of a
+parent domain.
+
+The most common place this matters in `ado` is when defining an
+[Entity Space](entity-spaces.md): the domain you assign to each entity space
+dimension must be a subdomain of the corresponding experiment input domain.
+This ensures that all entities in the space are valid inputs to the experiment.
+
+### Compatible Subdomain Types
+
+Not every combination of domain types is valid — the subdomain type must be
+compatible with the parent type:
+
+
+| Parent domain | Compatible sub-domain types | Notes |
+| --- | --- | --- |
+| `CONTINUOUS` | `CONTINUOUS`, `DISCRETE` (finite), `BINARY` | Sub-range must lie within the parent range; `BINARY` requires 0 and 1 to be within the range |
+| `DISCRETE` | `DISCRETE`, `BINARY` | Sub-values must be a subset of the parent values; `BINARY` only valid if both 0 and 1 appear in the parent |
+| `CATEGORICAL` | `CATEGORICAL`, `DISCRETE` (finite), `BINARY` | Sub-values must be a subset of the parent values |
+| `BINARY` | `BINARY`, `DISCRETE` (≤2 values) | Values must be a subset of `{0, 1}` / `{false, true}` |
+| `OPEN_CATEGORICAL` | `OPEN_CATEGORICAL`, `CATEGORICAL`, `DISCRETE` (finite), `BINARY` | The most permissive categorical parent |
+
+
+### Example
+
+Suppose an experiment declares the following required input domains:
+
+```yaml
+# Experiment input domains (the maximum possible extent)
+model_name:
+ values: [granite-3-8b, llama3-8b, mistral-7b-v0.1, granite-34b-code-base]
+
+batch_size:
+ domainRange: [1, 4097]
+ interval: 1
+
+temperature:
+ domainRange: [0.0, 100.0]
+```
+
+A valid entity space could narrow each of these to a focused subdomain:
+
+```yaml
+# Entity space domains (subdomains of the experiment inputs above)
+model_name:
+ values: [granite-3-8b, llama3-8b] # CATEGORICAL ⊆ CATEGORICAL ✓
+
+batch_size:
+ values: [1, 2, 4, 8, 16] # DISCRETE ⊆ DISCRETE ✓
+
+temperature:
+ domainRange: [20.0, 40.0] # CONTINUOUS ⊆ CONTINUOUS ✓
+```
+
+The following entity space domains would be **invalid** because they are not
+subdomains of the corresponding experiment inputs:
+
+```yaml
+batch_size:
+ # Values above 4096 are not in the Experiment input domain for batch_size
+ domainRange: [4096, 8124]
+ interval: 1028
+
+model_name:
+ # granite-4-3b is not one of the allowed values
+ domainRange: [granite-4-3b]
+```
diff --git a/website/docs/resources/discovery-spaces.md b/website/docs/resources/discovery-spaces.md
index cb5d0b4cc..43de3d7c7 100644
--- a/website/docs/resources/discovery-spaces.md
+++ b/website/docs/resources/discovery-spaces.md
@@ -802,7 +802,7 @@ explains how to use optional properties.
## Parameterizing Experiments
If an experiment has
-[optional properties](../core-concepts/actuators.md#optional-properties) you can
+[optional input properties](../core-concepts/actuators.md#optional-inputs) you can
define equivalent properties in the entity space. If you don't, the default
value for the property will be used.
diff --git a/website/mkdocs.yml b/website/mkdocs.yml
index c868823c8..bcd2157b0 100644
--- a/website/mkdocs.yml
+++ b/website/mkdocs.yml
@@ -157,8 +157,9 @@ nav:
- Efficiently Exploring Parameter Spaces with TRIM: examples/trim.md
- Core Concepts:
- core-concepts/concepts.md
+ - Properties and Domains: core-concepts/properties-and-domains.md
+ - Experiments & Actuators: core-concepts/actuators.md
- Entities and Entity Spaces: core-concepts/entity-spaces.md
- - Actuators, Experiments & Measurement Spaces: core-concepts/actuators.md
- Discovery Spaces: core-concepts/discovery-spaces.md
- Shared Sample Stores: core-concepts/data-sharing.md
- Resources: