diff --git a/.secrets.baseline b/.secrets.baseline
index eca42867..bf0bc6aa 100644
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -3,7 +3,7 @@
     "files": "requirements.txt|^.secrets.baseline$",
     "lines": null
   },
-  "generated_at": "2025-11-05T16:16:55Z",
+  "generated_at": "2025-11-10T08:32:10Z",
   "plugins_used": [
     {
       "name": "AWSKeyDetector"
@@ -414,7 +414,7 @@
       }
     ]
   },
-  "version": "0.13.1+ibm.64.dss",
+  "version": "0.13.1+ibm.62.dss",
   "word_list": {
     "file": null,
     "hash": null
diff --git a/backend/kuberay/README.md b/backend/kuberay/README.md
index 7e4e462c..e0a2f9b4 100644
--- a/backend/kuberay/README.md
+++ b/backend/kuberay/README.md
@@ -18,21 +18,26 @@ the
 
 ## Deploying a RayCluster
 
-> [!WARNING]
+> [!WARNING] Ray version compatibility
 >
-> The `ray` versions must be compatible. For a more in depth guide refer to the
-> [RayCluster configuration](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html)
+> The `ray` version set in KubeRay YAML and the one
+> used in the ray head and worker containers must be compatible.
+> For a more in depth guide refer to the [RayCluster configuration](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html)
 > page.
 
-!!! note
+We provide [an example set of values](vanilla-ray.yaml) for deploying a
+RayCluster via KubeRay. To deploy it run:
+
+``` commandline
+helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml
+```
 
-    When running multi-node measurement make sure that
-    all nodes in your multi-node setup have read and write access
-    to your HuggingFace home directory. On Kubernetes with RayCluster,
-    avoid S3-like filesystems as that is known to cause failures
-    in **transformers**. Use a NFS or GPFS-backed PersistentVolumeClaim instead.
+Feel free to customize the example file provided to suit your cluster,
+such as uncommenting GPU-enabled workers.
 
-### Configuring a Kubernetes ServiceAccount for the RayCluster
+### Enabling ado actuators to create K8s resources
+
+#### Configuring a ServiceAccount for the RayCluster
 
 The default Kubernetes ServiceAccount created for a RayCluster does not
 have enough permissions for an ado actuator to create Kubernetes resources
@@ -46,46 +51,14 @@ It also provides access to the RayCluster resources.
 
 <!-- markdownlint-disable-next-line code-block-style -->
 ```yaml
-apiVersion: v1
-kind: ServiceAccount
-metadata:
-  name: ray-deployer
----
-apiVersion: rbac.authorization.k8s.io/v1
-kind: RoleBinding
-metadata:
-  name: ray-deployer
-roleRef:
-  apiGroup: rbac.authorization.k8s.io
-  kind: Role
-  name: ray-deployer
-subjects:
-  - kind: ServiceAccount
-    name: ray-deployer
----
-apiVersion: rbac.authorization.k8s.io/v1
-kind: Role
-metadata:
-  name: ray-deployer
-rules:
-  - apiGroups: ["ray.io"]
-    resources:
-      - rayclusters
-    verbs: ["get", "patch"]
-  - apiGroups: ["apps"]
-    resources:
-      - pods
-      - deployments
-    verbs: ["get", "create", "delete", "list", "watch", "update"]
-  - apiGroups: [""]
-    resources:
-      - services
-    verbs: ["get", "create", "delete", "list", "watch", "update"]
+{% include "./service-account.yaml" %}
 ```
 
 From the root of the ado project run the below command:
 
-    kubectl apply -f backend/kuberay/service-account.yaml
+```commandline
+kubectl apply -f backend/kuberay/service-account.yaml
+```
 
 This will create a ServiceAccount named `ray-deployer`.
 We will reference this name later when
@@ -94,6 +67,19 @@ We will reference this name later when
 More information about ServiceAccount, Role, and RoleBinding objects can be found
 in the [official Kubernetes RBAC documentation](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
 
+#### Associating a RayCluster with the ServiceAccount
+
+The below command shows how to set the `serviceAccountName` property for head
+and worker nodes.
+
+<!-- markdownlint-disable-next-line code-block-style -->
+```bash
+helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 \
+  --values backend/kuberay/vanilla-ray-service-account.yaml \
+  --set head.serviceAccountName=ray-deployer \
+  --set worker.serviceAccountName=ray-deployer
+```
+
 ### Best Practices for Efficient GPU Resource Utilization
 
 To maximize the efficiency of your RayCluster and minimize GPU resource
@@ -124,12 +110,13 @@ Recommended worker setup:
 - 4 replicas of a worker with **8 GPUs**
 
 <!-- markdownlint-disable no-inline-html -->
+
 <details>
 <summary>
 Example: The contents of the additionalWorkerGroups field of a RayCluster
 with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memory
 </summary>
-
+<!-- markdownlint-disable MD046 -->
     ```yaml
     one-A100-80G-gpu-WG:
       replicas: 0
@@ -288,34 +275,24 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor
       # volumes: ...
       # volumeMounts: ....
     ```
-
+<!-- markdownlint-enable MD046 -->
 </details>
 <!-- markdownlint-enable no-inline-html -->
 
-!!! note
-
-    Notice that the only variant with a **full-worker** custom resource
-    is the one with 8 GPUs. Some actuators, like SFTTrainer, use this
-    custom resource for measurements that involve reserving an entire GPU node.
-
-We provide [an example set of values](vanilla-ray.yaml) for deploying a
-RayCluster via KubeRay. To deploy it, simply run:
-
-    helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml
-
-In the case the ado operation to be executed requires creating Kubernetes
-resources, the RayCluster to be deployed must be associated with a properly
-configured ServiceAccount like the one described [above](#configuring-a-kubernetes-serviceaccount-for-the-raycluster).
-The below command shows how to set the `serviceAccountName` property for head
-and worker nodes.
+> [!IMPORTANT] full-worker custom resource
+>
+> Notice that the only variant with a **full-worker** custom resource
+> is the one with 8 GPUs. Some actuators, like SFTTrainer, use this
+> custom resource for measurements that involve reserving an entire GPU node.
 
-<!-- markdownlint-disable-next-line code-block-style -->
-```bash
-helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 \
-  --values backend/kuberay/vanilla-ray-service-account.yaml \
-  --set head.serviceAccountName=ray-deployer \
-  --set worker.serviceAccountName=ray-deployer
-```
+### RayClusters and SFTTrainer
 
-Feel free to customize the example file provided to suit your cluster,
-such as uncommenting GPU-enabled workers.
+> [!IMPORTANT] HuggingFace home directory
+>
+> If you want to run multi-node measurements with
+> the SFTTrainer actuator make sure that
+> all nodes in your multi-node setup have read and write access
+> to your HuggingFace home directory. On Kubernetes with RayClusters,
+> avoid S3-like filesystems as that is known to cause failures
+> in **transformers**.
+> Use a NFS or GPFS-backed PersistentVolumeClaim instead.
diff --git a/backend/kuberay/service-account.yaml b/backend/kuberay/service-account.yaml
index 3da627ee..cb121a51 100644
--- a/backend/kuberay/service-account.yaml
+++ b/backend/kuberay/service-account.yaml
@@ -36,4 +36,5 @@ rules:
 - apiGroups: [""]
   resources:
   - services
+  - persistentvolumeclaims
   verbs: ["get", "create", "delete", "list", "watch", "update"]
\ No newline at end of file
diff --git a/plugins/actuators/vllm_performance/yamls/vllm_actuator_configuration.yaml b/plugins/actuators/vllm_performance/yamls/vllm_actuator_configuration.yaml
new file mode 100644
index 00000000..f601151a
--- /dev/null
+++ b/plugins/actuators/vllm_performance/yamls/vllm_actuator_configuration.yaml
@@ -0,0 +1,16 @@
+# Copyright (c) IBM Corporation
+# SPDX-License-Identifier: MIT
+actuatorIdentifier: vllm_performance
+metadata:
+  name: "Test actuator deployment"
+parameters:
+  benchmark_retries: 3
+  hf_token: 'test' # Set if you need to access a gated model
+  image_secret: ''
+  in_cluster: false
+  interpreter: python3
+  max_environments: 1
+  namespace: null # Must set to the namespace to create deployments
+  node_selector: {}
+  retries_timeout: 5
+  verify_ssl: false
diff --git a/plugins/actuators/vllm_performance/yamls/discoveryspace_override_defaults.yaml b/plugins/actuators/vllm_performance/yamls/vllm_deployment_space.yaml
similarity index 60%
rename from plugins/actuators/vllm_performance/yamls/discoveryspace_override_defaults.yaml
rename to plugins/actuators/vllm_performance/yamls/vllm_deployment_space.yaml
index 898f0947..c382f2f4 100644
--- a/plugins/actuators/vllm_performance/yamls/discoveryspace_override_defaults.yaml
+++ b/plugins/actuators/vllm_performance/yamls/vllm_deployment_space.yaml
@@ -1,7 +1,5 @@
 # Copyright (c) IBM Corporation
 # SPDX-License-Identifier: MIT
-
-sampleStoreIdentifier: 2963a5
 entitySpace:
   - identifier: model
     propertyDomain:
@@ -11,36 +9,26 @@ entitySpace:
     propertyDomain:
       values:
         - quay.io/dataprep1/data-prep-kit/vllm_image:0.1
-  - identifier: n_cpus
-    propertyDomain:
-      values: [8]
-  - identifier: memory
-    propertyDomain:
-      values: ["128Gi"]
-  - identifier: dtype
+  - identifier: "number_input_tokens"
     propertyDomain:
-      values: ["auto"]
-  - identifier: "num_prompts"
-    propertyDomain:
-      values: [500]
+      values: [1024, 2048, 4096]
   - identifier: "request_rate"
     propertyDomain:
-      values: [-1]
-  - identifier: "max_concurrency"
-    propertyDomain:
-      values: [-1]
-  - identifier: "gpu_memory_utilization"
+      domainRange: [1,10]
+      interval: 1
+  - identifier: n_cpus
     propertyDomain:
-      values: [.9]
-  - identifier: "cpu_offload"
+      domainRange: [2,16]
+      interval: 2
+  - identifier: memory
     propertyDomain:
-      values: [0]
+      values: ["128Gi", "256Gi"]
   - identifier: "max_batch_tokens"
     propertyDomain:
-      values: [16384]
+      values: [1024, 2048, 4096, 8192, 16384, 32768]
   - identifier: "max_num_seq"
     propertyDomain:
-      values: [256]
+      values: [16,32,64]
   - identifier: "n_gpus"
     propertyDomain:
       values: [1]
@@ -51,4 +39,5 @@ experiments:
   - actuatorIdentifier: vllm_performance
     experimentIdentifier: performance-testing-full
 metadata:
-  description: Parameters for VLLM performance testing
+  description: A space of vllm deployment configurations
+  name: vllm_deployments
diff --git a/website/docs/actuators/vllm_performance.md b/website/docs/actuators/vllm_performance.md
new file mode 100644
index 00000000..29060ee9
--- /dev/null
+++ b/website/docs/actuators/vllm_performance.md
@@ -0,0 +1,393 @@
+# The `vllm_performance` actuator
+
+<!-- markdownlint-disable line-length -->
+
+> [!TIP] Overview
+> The `vllm_performance` actuator **can
+> automatically create and benchmark [vLLM](https://github.com/vllm-project/vllm) inference deployments on Kubernetes and OpenShift clusters**.
+>
+> It is designed for robust, repeatable, and configurable experiment execution.
+> It is suitable for both simple one-off benchmarks and large parameter sweeps.
+<!-- markdownlint-enable line-length -->
+
+## Key Capabilities
+
+- **Automated LLM benchmarking:** Deploys vLLM serving endpoints
+on NVIDIA GPU-enabled OpenShift/Kubernetes clusters and runs
+standardized serving benchmarks.
+- **Cluster integration:** Handles deployments and clean-up of vLLM inference
+pods on  OpenShift/Kubernetes, with configurable resource selection via namespace,
+node selector,  and PVC/service templates.
+- **Scenario configurability:** Supports customizing models, NVIDIA GPU types,
+node selection, retry behavior, concurrent deployments, and more
+- **Efficient sampling:** Supports grouped sampling which maximises reuse
+of vLLM deployments, hence minimising time spent creating such deployments
+- **Endpoint benchmarking:** Can also be used to benchmark existing OpenAI
+compatible endpoints
+
+### Available experiments
+
+The `vllm_performance` actuator implements two experiments
+
+- `performance-testing-full`: This experiment can test the full vLLM workload configuration,
+including resource requests and server deployment configuration. It deploys
+servers with given configuration on kubernetes and runs `vllm bench serve` on them
+with the given parameters
+- `performance-testing-endpoint`: This experiment is equivalent to running
+`vllm bench serve` against an endpoint.
+
+---
+
+## Running single experiments: Quick endpoint and deployment tests
+
+For rapid testing and debugging, you can use the [`run_experiment`](run_experiment.md)
+tool to execute individual  experiments on a single point (entity).
+This is ideal when you want to:
+
+- Quickly check if your actuator installation and configuration works
+- Debug a deployment scenario or endpoint using the vllm_performance actuator
+
+### Running an endpoint test
+
+To test the throughput or limits of an existing vLLM-compatible endpoint, create
+a `point.yaml`file like this:
+
+```yaml
+entity:
+  model: openai/gpt-oss-20b
+  endpoint: http://localhost:8000
+  request_rate: 50
+experiments:
+- actuatorIdentifier: vllm_performance
+  experimentIdentifier: performance-testing-endpoint
+```
+
+Then run:
+
+```shell
+run_experiment point.yaml
+```
+
+This will assess how many requests per second the endpoint can handle for the given
+model and configuration.
+
+> [!TIP] Inference endpoint testing example
+>
+> See [the detailed endpoint scenario](../examples/vllm-performance-endpoint.md)
+> for a production-style workflow exploring inference endpoint throughput.
+
+### Running a deployment test
+
+To launch and benchmark a temporary vLLM deployment
+(including provisioning on Kubernetes/OpenShift), you must provide both:
+
+<!-- markdownlint-disable MD007 -->
+- An entity definition (as before)
+- The identifier of a valid `actuatorconfiguration` resource
+    - This contains information necessary for accessing and creating
+     deployments on the Kubernetes/OpenShift cluster
+    - See [configuring the vllm_performance actuator](#configuring-the-vllm_performance-actuator)
+      for details.
+<!-- markdownlint-enable MD007 -->
+
+Example `point.yaml`:
+
+```yaml
+entity:
+  model: ibm-granite/granite-3.3-8b-instruct
+  n_cpus: 8
+  memory: 128Gi
+  gpu_type: NVIDIA-A100-80GB-PCIe
+  max_batch_tokens: 8192
+  max_num_seq: 32
+  n_gpus: 1
+experiments:
+- actuatorIdentifier: vllm_performance
+  experimentIdentifier: performance-testing-full
+```
+
+Then run:
+
+```shell
+run_experiment point.yaml  --actuator-config-id my-vllm-performance-config
+```
+
+Here `my-vllm-performance-config` is the ID of an `actuatorconfiguration` resource
+containing the details for accessing and running on your target cluster.
+See [configuring the vllm_performance actuator](#configuring-the-vllm_performance-actuator)
+for more.
+
+This command will provision the deployment for the specified entity, using your indicated
+actuator configuration, run the benchmark, and print results.
+
+> [!TIP] vLLM deployment example
+>
+> See [the vLLM deployment exploration example](../examples/vllm-performance-full.md)
+> for details on how to explore many deployment configurations.
+
+---
+
+## Configuring the vllm_performance actuator
+
+You can configure how the `vllm_performance` actuator creates,
+manage, and monitor vLLM deployments on a Kubernetes/OpenShift
+cluster.
+This configuration covers several needs:
+
+- **Cluster targeting and permissions**: Specify the OpenShift/Kubernetes namespace
+and optionally node selectors, secrets, and templates to match your cluster resources.
+- **Secure access**: Pass required HuggingFace tokens, set up image pull secrets,
+control in-cluster or remote execution, and toggle SSL verification.
+- **Experiment protocol and retries**: Choose how benchmarks are run, including interpreter,
+retry logic, and YAML templates for deployments/services used.
+- **Deployment resource management**: Limit the number of concurrent deployments
+and control automated clean-up.
+
+You supply this configuration information as an `ado`
+[`actuatorconfiguration` resource](../resources/actuatorconfig.md),
+which is a YAML file with the configuration options.
+An example is:
+<!-- markdownlint-disable line-length -->
+
+```yaml
+actuatorIdentifier: vllm_performance #The actuator the configuration is for
+metadata:
+  description: "Actuator config for vLLM LLM benchmarking"
+  name: demo-vllm-perf
+parameters:
+  benchmark_retries: 3                  # Number of benchmark attempts (see Failure Handling)
+  hf_token: "<YOUR_HUGGINGFACE_TOKEN>"  # Required for pulling some models
+  image_secret: ""                      # Optional image pull secret
+  in_cluster: false                     # Set to true if running from within the cluster
+  interpreter: python3                  # Language for test drivers/benchmarks
+  max_environments: 1                   # Max concurrent vLLM deployments
+  namespace: "mynamespace"              # OpenShift/K8s namespace to deploy into
+  node_selector:                        # A dictionary of Kubernetes node_selector key:value pairs
+    "kubernetes.io/hostname":"gpunode01"  
+  pvc_name: null                        # Name of existing PVC to use. If null/omitted a temporary PVC is created
+  retries_timeout: 5                    # Seconds between retries (exponential backoff)
+  verify_ssl: false                     # Whether to verify HTTPS endpoints
+```
+<!-- markdownlint-enable line-length -->
+
+If the above YAML was saved to a file called `vllm_config.yaml` you would create
+the configuration using
+
+```commandline
+ado create actuatorconfiguration -f vllm_config.yaml
+```
+
+> [!WARNING] namespace
+>
+> The critical parameter you must set in the configuration is `namespace`
+<!-- markdownlint-disable-next-line MD028 -->
+
+> [!WARNING] GPU type
+>
+> The GPU type to use in an experiment is set via the experiment itself (performance-testing-full).
+> **Do not** set this via the `node_selector` parameter of the configuration.
+<!-- markdownlint-disable-next-line MD028 -->
+
+> [!TIP] Further details
+>
+> For further details on specific options and advanced behavior see:
+>
+> - [Maximum number of deployments](#maximum-number-of-deployments) (details on `max_environments`)
+> - [Handling benchmark failures](#handling-benchmark-failures) and [Deployment Clean-Up](#deployment-clean-up)
+> - [Grouped sampling for efficient deployment usage](#grouped-sampling-for-efficient-deployment-usage)
+
+### Multiple configurations
+
+You can create multiple `actuatorconfiguration`s for the `vllm_performance` actuator.
+Each configuration captures
+the cluster-specific, security-sensitive, and experiment-relevant settings necessary
+for the actuator to operate in a given environment.
+Each configuration will have a different id and you can choose the one to use
+when submitting an operation or single experiment that uses the `vllm_performance`
+actuator.
+
+> [!TIP] Getting a default configuration
+>
+> You can generate a default configuration via the ado CLI:
+>
+> ```shell
+> ado template actuatorconfiguration --actuator-identifier vllm_performance -o actuatorconfiguration.yaml
+> ```
+
+---
+
+## vLLM deployment management
+
+### The `in_cluster` configuration option
+
+The `in_cluster` option in your `actuatorconfiguration` tells the `vllm_performance`
+actuator how to communicate with the target Kubernetes or OpenShift cluster when
+running `performance-testing-full`.
+
+If running `ado` from outside the Kubernetes/OpenShift cluster where
+the deployments will be created, leave `in_cluster: false` (the default).
+
+Set `in_cluster: true` if your `ado` operation will be run on a
+**remote Ray cluster that is in the same Kubernetes/OpenShift cluster** as your
+vLLM deployments.
+This configuration maximizes efficiency for large-scale, distributed benchmarking.
+For a detailed guide on running `ado` remotely on a Ray cluster, including environment
+and package setup, see [Running ado remotely](../getting-started/remote_run.md).
+
+> [!IMPORTANT] RayCluster permissions
+>
+> If running with `in_cluset=True`, your RayCluster **must** be configured so that
+> jobs launched  by `ado` have permissions to create and manage Kubernetes deployments,
+> pods, and services.
+> For configuring the necessary ServiceAccount, roles, and permissions,
+> see our [documentation on deploying RayClusters for `ado`](../getting-started/installing-backend-services.md).
+<!-- markdownlint-disable-next-line MD028 -->
+
+> [!TIP] Installing the `vllm_performance` actuator on a remote RayCluster
+>
+> If the `ado-vllm-performance` actuator is not installed in the
+> image used by the RayCluster you can have [ray install it following
+> this guide](../getting-started/remote_run.md).
+>
+> In particular, if a compatible version of vLLM is not installed
+> in the image this step will require installing vLLM on each RayCluster node
+> (so `vllm bench serve` is available).
+> This can take some time so you may see the `ado` `operation` output "hang"
+> while this is happening.  
+
+### Maximum number of deployments
+
+The actuator configuration parameter `max_environments` controls how
+many concurrent vLLM deployments will be created. The default is 1.
+
+When experiments are requested, if an existing deployment cannot
+be used a new environment is created as long as `max_environments` has
+not been reached.
+If it has been reached, then the actuator waits for an existing
+environment to become idle, at which point it is deleted and
+the new environment is created.
+
+Some notes:
+<!-- markdownlint-disable MD007 -->
+
+- `max_environments` deployments are always created before any are deleted
+    - This means idle environments will remain until there is a need to delete them
+    - This is to increases chances they can be reused/minimise cost of redeploying
+- Environment creation is serialized
+    - If `max_environments` is reached and all are active, the first experiment
+      that requires a new environment will block. Subsequent experiment
+      requests will queue behind it in FIFO order until it can proceed (i.e. delete
+      an existing environment and create the one it needs)
+<!-- markdownlint-enable MD007 -->
+
+### Handling benchmark failures
+
+Once deployments are created and the vLLM health endpoint is responding to requests
+(pod running, container ready), or 20 mins has elapsed, the actuator runs
+`vllm bench serve` against it.
+The 20min timeout is so the wait won't pend forever in a case where something
+goes wrong
+in K8s that means the health check will never pass.
+
+When running the benchmark the actuator will try `benchmark_retries` times
+backing off exponentially based  on `retries_timeout` to run the benchmark successfully.
+The retries may be required as it can happen for large models that 20 minutes is
+not sufficient for model download and load for serving.
+Since vLLM bench itself waits 10 minutes for the endpoint to come up this means with
+`benchmark_retries=3` (the default) there is roughly 50mins-1hr timeout for the
+endpoint to become available.
+
+### PVCs
+
+#### `pvc_name` not given
+
+If no `pvc_name` is set in the `actuatorconfiguration`, when an actuator
+instance is created with this configuration, e.g., via `create operation` or `run_experiment`,
+it creates a PVC called `vllm-support-$UUID` that is shared by all deployments
+it creates.
+The `$UUID` is a randomly generated string that will vary each time the
+actuator is created.
+When the `operation` or `run_experiment` exits this PVC will be deleted.
+
+#### `pvc_name` given
+
+If a `pvc_name` is set in the `actuatorconfiguration`, when an actuator
+instance is created with this configuration, e.g., via `create operation`
+or `run_experiment`,
+it will look for an existing PVC with the given name.
+If the PVC exists it will be used for all deployments the actuator instance
+creates.
+When the `operation` or `run_experiment` exits this PVC will NOT be deleted.
+If the PVC does not exist the actuator will exit with an error.
+
+### Deployment Clean-Up
+
+The `vllm_performance` actuator will automatically clean up
+all Kubernetes resources associated with the vLLM deployments as it proceeds
+leaving at most `max_environments` active at a time.
+On a graceful shutdown of the `ado` process running the operation
+(CTRL-C, SIGTERM, SIGINT) active deployments will be deleted
+before exit.
+On an uncontrolled shutdown (SIGKILL) you will need to manually
+clean up any K8s deployments that were running  at the time.
+
+> [!IMPORTANT] PVC Deletion
+>
+> If the actuator created a PVC (i.e. `vllm-support-$UUID`) it will be deleted.
+>
+> If the actuator used an existing PVC it will not be deleted.
+
+### Kubernetes resource templates
+
+The `vllm_performance` actuator creates Kubernetes resources
+based on a set of template YAML files
+that are distributed with the actuator.
+The templates are for:
+
+- vLLM deployment
+- PVC used by deployment pod
+- vLLM service
+
+You can use your own templates,
+by creating a vllm_performance
+`actuatorconfiguration` resource with the following
+fields set to the path to your templates.
+
+```yaml
+deployment_template: $PATH_RELATIVE_TO_WORKING_DIR
+service_template: $PATH_RELATIVE_TO_WORKING_DIR
+pvc_template: $PATH_RELATIVE_TO_WORKING_DIR
+```
+
+Then use this `actuatorconfiguration` resource
+when running operations with the actuator.
+
+The paths given are always interpreted relative to the
+working directory of process using the actuator
+(where `ado create operation` or `run_experiment` is executed).
+
+>[!IMPORTANT] Custom templates and executing on remote RayClusters
+>
+> The template path must be accessible where the actuator is running.
+> This is important to consider when running operation using
+> `vllm_performance` on a remote RayCluster.
+> To handle this we recommend:
+>
+> - Put custom templates in the working directory (or a subdirectory of it)
+>   that you will
+>   [send to the RayCluster](../getting-started/remote_run.md#other-options)
+> - Create an `actuatorconfiguration` with the relative paths to the
+>   templates from this working directory
+>
+
+### Grouped sampling for efficient deployment usage
+
+Creating and deleting vLLM deployments takes time.
+If you have limited number of vLLM deployments that can be
+created concurrently, say one, then this can add significant
+overhead if consecutive points being sampled require
+different deployments.
+The [grouped sampling](../operators/random-walk.md#enabling-grouping)
+feature of the `random_walk` operator can be useful in this case.
+This allows configuring the sampling so points that
+require a given vLLM deployment are submitted in a batch.
diff --git a/website/docs/examples/vllm-performance-endpoint.md b/website/docs/examples/vllm-performance-endpoint.md
index a6bc0452..8eb7cd22 100644
--- a/website/docs/examples/vllm-performance-endpoint.md
+++ b/website/docs/examples/vllm-performance-endpoint.md
@@ -2,7 +2,9 @@
 
 > [!NOTE] The scenario
 >
-> **In this example, the _vllm_performance_ actuator is used to find
+> **In this example,
+> the [_vllm_performance_ actuator](../actuators/vllm_performance.md)
+> is used to find
 > the maximum requests per second a server can handle while maintaining
 > stable maximum throughput.**
 >
@@ -16,12 +18,13 @@
 > To explore this space, you will:
 >
 > - define an endpoint, model and range of requests per second to test
-> - use an optimizer to efficiently find the maximum requests per second
+> - use [an optimizer](../operators/optimisation-with-ray-tune.md)
+> to efficiently find the maximum requests per second
 <!-- markdownlint-disable-next-line MD028 -->
 
 > [!IMPORTANT] Prerequisites
 >
-> - An endpoint serving an LLM in an OpenAI API-compatible format  
+> - An endpoint serving an LLM via an OpenAI API-compatible API  
 > - Install the following Python packages:
 >
 > ```bash
@@ -37,11 +40,15 @@
 > from [our repository](https://github.com/IBM/ado/tree/main/plugins/actuators/vllm_performance/yamls).
 >
 > - `vllm_request_rate_space.yaml`: this file defines the _endpoint_, _model_,
->   and _request_ _range_ to explore.
->   - **You must edit the _model_ and _endpoint_ fields in this file
->     to match your own.**
+> and _request_ _range_ to explore.
+> <!-- markdownlint-disable MD007 -->
+> <!-- markdownlint-disable MD046 -->
+>     - **You must edit the _model_ and _endpoint_ fields in this file
+>        to match your own.**
+> <!-- markdownlint-enable MD046 -->
+> <!-- markdownlint-enable MD007  -->
 > - `operation_hyperopt.yaml`: this file contains the optimization parameters.
->   You do not need to edit it.
+>    You do not need to edit it.
 >
 > Then, in a directory with these files, execute:
 >
@@ -213,12 +220,16 @@ and the best region is unlikely to be visited.
 
 ## Next steps
 
+<!-- markdownlint-disable MD007 -->
 - Use `ado describe experiment vllm_performance_endpoint` to see what
 other parameters can be explored
 - Try varying **`burstiness`** or **`number_input_tokens`**, or adding
 them as dimensions of the `entityspace`, to explore their impact on throughput
 - Try varying `num_samples`, `gamma` and `n_initial_points` parameters of hyperopt
-  - You can keep running the optimization on the same `discoveryspace`.
-    The previous runs will not influence new runs, but their results will
-    be reused, speeding experimentation up
+    - You can keep running the optimization on the same `discoveryspace`.
+       The previous runs will not influence new runs, but their results will
+        be reused, speeding experimentation up
 - Measure the [performance of vLLM deployment configurations](vllm-performance-full.md)
+- Check the [`vllm_performance` actuator documentation](../actuators/vllm_performance.md)
+
+<!-- markdownlint-enable MD007 -->
diff --git a/website/docs/examples/vllm-performance-full.md b/website/docs/examples/vllm-performance-full.md
index 161fe516..c6b83230 100644
--- a/website/docs/examples/vllm-performance-full.md
+++ b/website/docs/examples/vllm-performance-full.md
@@ -1,67 +1,67 @@
 # Exploring vLLM deployment configurations
 
-> [!NOTE]
+> [!NOTE] The scenario
+>
+> **In this example,
+> the [_vllm_performance_ actuator](../actuators/vllm_performance.md)
+> is used to evaluate
+> different vLLM server deployment configurations on Kubernetes/OpenShift.**
+>
+> When deploying vLLM, you must choose values for parameters like GPU type, batch
+> size, and memory limits. These choices directly affect performance, cost, and
+> scalability. To find the best configuration for your workload, whether you are
+> optimizing for latency, throughput, or cost, you need to explore the deployment
+> parameter space. In this example:
 >
-> This example illustrates using the vllm-performance actuator to discover
-> how best to deploy vLLM for a given use-case
+> - We will define a space of vLLM deployment configurations to test with
+> the `vllm_performance` actuator's `performance_testing_full` experiment
+>       - This experiment can create and characterize a vLLM deployment on Kubernetes
+> - Use the [`random_walk` operator](../operators/random-walk.md) to
+>   explore the space
 <!-- markdownlint-disable-next-line MD028 -->
 
-> [!IMPORTANT]
+> [!IMPORTANT] Prerequisites
 >
-> **Prerequisites**
+> - Be logged-in to your Kubernetes/OpenShift cluster
+> - Have access to a namespace where you can create vLLM deployments
+> - Install the following Python packages locally:
 >
-> - Access to a k8s namespace where you can deploy vLLM
-
-## The scenario
-
-When deploying vLLM, you must choose values for parameters like GPU type, batch
-size, and memory limits. These choices directly affect performance, cost, and
-scalability. To find the best configuration for your workload, whether you are
-optimizing for latency, throughput, or cost—you need to explore the deployment
-parameter space.
-
-In this example:
-
-- We will define a space of vLLM deployment configurations to test with
-the `vllm_performance` actuator's `performance_testing_full` experiment
-  - This experiment can create and characterize a vLLM deployment on Kubernetes
-- Use the `random_walk` operator to explore the space
-
-## Install the actuator
-
-[//]: # (If you haven't already:)
-
-[//]: # ()
-[//]: # (```commandline)
-
-[//]: # (pip install ado-vllm-performance)
-
-[//]: # (```)
-
-[//]: # ()
-[//]: # (If you have cloned the `ado` source repository you can also do:)
-
-[//]: # ()
-[//]: # (```commandline)
-
-[//]: # (# From the root of this repository )
-
-[//]: # (pip install -e plugins/actuators/vllm_performance)
-
-[//]: # (```)
-
-Execute:
-
-```commandline
-pip install -e plugins/actuators/vllm_performance
-```
-
-in the root of the `ado` source repository.
-You can clone the repository with
+> ```bash
+> pip install ado-vllm-performance
+> ```
+<!-- markdownlint-disable-next-line MD028 -->
 
-```commandline
-git clone https://github.com/IBM/ado.git
-```
+> [!TIP] TL;DR
+>
+> Get the files `vllm_deployment_space.yaml`, `vllm_actuator_configuration.yaml`
+> and `operation_random_walk.yaml` from
+> <!-- markdownlint-disable line-length -->
+> [our repository](https://github.com/IBM/ado/tree/main/plugins/actuators/vllm_performance/yamls).
+> <!-- markdownlint-enable line-length -->
+>
+> **You must edit `vllm_actuator_configuration.yaml` with your details.**
+> In particular the following two fields are important:
+> <!-- markdownlint-disable line-length -->
+> ```yaml
+> hf_token: <your HuggingFace access token> # Required to access gated models
+> namespace: vllm-testing # you MUST set this to a namespace where you can create vLLM deployments
+> ```
+>
+> Then, in a directory with these files, execute:
+>
+> ```bash
+> : # Define the configurations to explore
+> ado create space -f vllm_deployment_space.yaml
+> : # Create a configuration for the actuator - normally just once as it can be reused
+> ado create actuatorconfiguration -f vllm_actuator_configuration.yaml
+> : # Explore!
+> ado create operation -f random_walk_operation_grouped.yaml --use-latest space --use-latest actuatorconfiguration
+> ```
+> <!-- markdownlint-enable line-length -->
+> See [configuring the `vllm_performance` actuator](../actuators/vllm_performance.md#configuring-the-vllm_performance-actuator)
+> for more configuration options.
+
+## Verify the installation
 
 Verify the installation with:
 
@@ -69,43 +69,42 @@ Verify the installation with:
 ado get actuators --details 
 ```
 
-The actuator `vllm_performance` will appear in the list of available actuators.
+The actuator `vllm_performance` should appear in the list of available actuators
+if installation completed successfully.
 
 ## Create an actuator configuration
 
-The vllm-performance actuator needs some information the target cluster to
+The vllm-performance actuator needs some information about the target cluster to
 deploy on. This is provided via an `actuatorconfiguration`.
 
-First execute,
+First execute:
 
 ```commandline
-# Generate the template file
-ado template actuatorconfiguration --actuator-identifier vllm_performance -o actuatorconfiguration.yaml
+ado template actuatorconfiguration --actuator-identifier vllm_performance -o vllm_actuator_configuration.yaml
 ```
 
-This will create a file called `vllm_performance_actuatorconfiguration.yaml`
-
-Edit the file and set correct values for the following fields:
+This will create a file called `vllm_actuator_configuration.yaml`
 
+Edit the file and set correct values for at least the `namespace` field.
+Also consider if you need to supply a value for `hf_token` :
 <!-- markdownlint-disable line-length -->
 ```yaml
-hf_token: <your HuggingFace access token>
-namespace: vllm-testing # OpenShift namespace you have write access to
-node_selector: '{"kubernetes.io/hostname":"<host-with-gpu>"}' # JSON string selecting a node that owns GPU
+hf_token: <your HuggingFace access token> # Required to access gated models
+namespace: vllm-testing # you MUST set this to a namespace where you can create vLLM deployments
 ```
 <!-- markdownlint-enable line-length -->
 
 Then save this configuration as an `actuatorconfiguration` resource:
 
 ```bash
-ado create actuatorconfiguration -f vllm_performance_actuatorconfiguration.yaml
+ado create actuatorconfiguration -f vllm_actuator_configuration.yaml
 ```
 
 > [!TIP]
 >
 > You can create multiple actuator configurations corresponding
-> to different clusters/target environments.
-> You choose the one to use when you launch an operation requiring the actuator
+> to different target environments.
+> You choose the one to use when you launch an operation requiring the actuator.
 
 ## Define the configurations to test
 
@@ -120,8 +119,6 @@ deployment parameters, including `max_num_seq` and `max_batch_tokens`, for a
 scenario where requests arrive between 1 and 10 per second with sizes
 around 2000 tokens.
 
-Save the following as `vllm_discoveryspace.yaml`:
-
 ```yaml
 entitySpace:
   - identifier: model
@@ -166,11 +163,11 @@ metadata:
   name: vllm_deployments
 ```
 
-Save the above as `vllm_discoveryspace.yaml`.
+Save the above as `vllm_deployment_space.yaml`.
 Then run:
 
 ```bash
-ado create space -f vllm_discoveryspace.yaml
+ado create space -f vllm_deployment_space.yaml
 ```
 
 ## Explore the space with random_walk
@@ -180,7 +177,7 @@ efficiency. The `grouped` sampler ensures we explore all the different benchmark
 configurations for a given vLLM deployment before creating a new deployment -
 minimizing the number of deployment creations.
 
-Save the following as `random_walk.yaml`:
+Save the following as `random_walk_operation_grouped.yaml`:
 
 ```yaml
 metadata:
@@ -212,11 +209,12 @@ operation:
 Then, start the operation with:
 
 ```commandline
-ado create operation -f random_walk.yaml \
+ado create operation -f random_walk_operation_grouped.yaml \
            --use-latest space --use-latest actuatorconfiguration
 ```
 
-Results will appear as they are measured.
+As it runs a table of the results is updated
+live in the terminal as they come in.
 
 ### Monitor the optimization
 
@@ -227,10 +225,7 @@ While the operation is running you can monitor the deployment:
 oc get deployments --watch -n vllm-testing
 ```
 
-As it runs a table of the results is updated
-live in the terminal as they come in.
-
-You can also get the table be executing (in another terminal)
+You can also get the results table by executing (in another terminal)
 
 ```commandline
 ado show entities operation --use-latest
@@ -253,6 +248,7 @@ ado show entities space --output-format csv --use-latest
 
 ## Next steps
 
+<!-- markdownlint-disable MD028 -->
 - Try varying **`max_batch_tokens`** or **`gpu_memory_utilization`** to
 explore the impact on throughput.
 - Try creating a different `actuatorconfiguration` with more
@@ -261,3 +257,7 @@ explore the impact on throughput.
 - Use **RayTune**
 (see the [vLLM endpoint performance](vllm-performance-endpoint.md) example)
 to optimise the hyper‑parameters of the benchmark.
+- Run [the exploration on the OpenShift/Kubernetes cluster](../actuators/vllm_performance.md#the-in_cluster-configuration-option)
+you create the deployments on, so you don't have to keep your laptop open.
+- Check the [`vllm_performance` actuator documentation](../actuators/vllm_performance.md)
+<!-- markdownlint-enable MD028 -->
\ No newline at end of file
diff --git a/website/docs/getting-started/service-account.yaml b/website/docs/getting-started/service-account.yaml
new file mode 120000
index 00000000..3952243f
--- /dev/null
+++ b/website/docs/getting-started/service-account.yaml
@@ -0,0 +1 @@
+../../../backend/kuberay/service-account.yaml
\ No newline at end of file
diff --git a/website/docs/resources/actuatorconfig.md b/website/docs/resources/actuatorconfig.md
index ebce72bf..8da0aba7 100644
--- a/website/docs/resources/actuatorconfig.md
+++ b/website/docs/resources/actuatorconfig.md
@@ -88,11 +88,13 @@ the `operation` resource documentation for details.
 
 ### Other ado commands that work with actuatorconfiguration
 
+<!-- markdownlint-disable MD007 -->
 - `ado get actuatorconfigurations`
-  - list stored `actuatorconfiguration`s or retrieve their representations
+    - list stored `actuatorconfiguration`s or retrieve their representations
 - `ado show related actuatorconfiguration ID`
-  - show operations using an `actuatorconfiguration`
+    - show operations using an `actuatorconfiguration`
 - `ado edit actuatorconfiguration ID`
-  - set the name, description, and labels for an `actuatorconfiguration`
+    - set the name, description, and labels for an `actuatorconfiguration`
 - `ado delete actuatorconfiguration ID`
-  - delete an `actuatorconfiguration`
+    - delete an `actuatorconfiguration`
+<!-- markdownlint-enable MD007 -->
diff --git a/website/docs/resources/resources.md b/website/docs/resources/resources.md
index ce9e9085..950eddc5 100644
--- a/website/docs/resources/resources.md
+++ b/website/docs/resources/resources.md
@@ -59,26 +59,28 @@ metastore.
 Here is a list of common `ado` CLI commands for interacting with resources. See
 the [ado CLI guide](../getting-started/ado.md) for more details
 
+<!-- markdownlint-disable MD007 -->
 - `ado get [resource type]`
-  - Lists all resources of the requested type
+    - Lists all resources of the requested type
 - `ado get [resource type] [$identifier] -o yaml`
-  - Outputs the YAML of resource `$identifier`
+    - Outputs the YAML of resource `$identifier`
 - `ado create [resource type] -f [YAMLFILE]`
-  - Creates the resource of the specified type from the definition in "YAMLFILE"
+    - Creates the resource of the specified type from the definition in "YAMLFILE"
 - `ado delete [resource type] [$identifier]`
-  - Deletes the resource of the specified type with the provided identifier from
+    - Deletes the resource of the specified type with the provided identifier from
       the database. See the [deleting resources](#deleting-resources) section for
       more information and considerations to keep in mind.
 - `ado describe [resource type] [$identifier]`
-  - Outputs a human-readable description of resource `$identifier`
+    - Outputs a human-readable description of resource `$identifier`
 - `ado show related [resource type] [$identifier]`
-  - List ids of resources related to resource `$identifier`
+    - List ids of resources related to resource `$identifier`
 - `ado show details [resource type] [$identifier]`
-  - Outputs some details on the resource. Usually these are quantities that have
+    - Outputs some details on the resource. Usually these are quantities that have
       to be computed.
 - `ado template [resource type] --include-schema`
-  - Outputs a default YAML for the given resource along with a schema file
+    - Outputs a default YAML for the given resource along with a schema file
       explaining the fields.`
+<!-- markdownlint-enable MD007 -->
 
 ### Deleting resources
 
diff --git a/website/mkdocs.yml b/website/mkdocs.yml
index f577134e..54ec6e1a 100644
--- a/website/mkdocs.yml
+++ b/website/mkdocs.yml
@@ -191,6 +191,7 @@ nav:
       - Adding custom experiments: actuators/creating-custom-experiments.md
       - Running experiments on single entities: actuators/run_experiment.md
       - Using externally obtained data: actuators/replay.md
+      - vllm_performance - measure inference performance: actuators/vllm_performance.md
       #- ST4SD: actuators/st4sd.md
       - SFTTrainer - measure fine-tuning performance : actuators/sft-trainer.md
       #- Molformer: actuators/molformer.md