diff --git a/training/a4/llama3-1-70b/nemo-pretraining-gke/README.md b/training/a4/llama3-1-70b/nemo-pretraining-gke/README.md
deleted file mode 100644
index 4982e1a..0000000
--- a/training/a4/llama3-1-70b/nemo-pretraining-gke/README.md
+++ /dev/null
@@ -1,402 +0,0 @@
-# Pretrain Llama-3.1-70B workloads on A4 GKE Node pools with Nvidia NeMo Framework
-
-This recipe outlines the steps for running a Llama-3.1-70B pretraining workload
-on [A4 GKE Node pools](https://cloud.google.com/kubernetes-engine) by using the
-[NVIDIA NeMo framework](https://github.com/NVIDIA/nemo).
-
-## Orchestration and deployment tools
-
-For this recipe, the following setup is used:
-
-- Orchestration - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine)
-- Pretraining job configuration and deployment - A Helm chart is used to configure and deploy
-  the [Kubernetes Jobset](https://kubernetes.io/blog/2025/03/23/introducing-jobset)
-  resource which manages the execution  of the
-  [NeMo pretraining workload](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_pretraining.py).
-
-## Test environment
-
-This recipe has been optimized for and tested with the following configuration:
-
-- GKE cluster
-    - [A regional standard cluster](https://cloud.google.com/kubernetes-engine/docs/concepts/configuration-overview) version: 1.31.7-gke.1265000 or later.
-    - A GPU node pool with 32 or 64
-    [a4-highgpu-8g](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-high-vms) provisioned using the DENSE deployment type.
-    - [Workload Identity Federation for GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity) enabled.
-    - [Cloud Storage FUSE CSI driver for GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/cloud-storage-fuse-csi-driver) enabled.
-    - [DCGM metrics](https://cloud.google.com/kubernetes-engine/docs/how-to/dcgm-metrics) enabled.
-    - [Kueue](https://kueue.sigs.k8s.io/docs/reference/kueue.v1beta1/) and [JobSet](https://jobset.sigs.k8s.io/docs/overview/) APIs installed.
-    - Kueue configured to support [Topology Aware Scheduling](https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling/).
-- A regional Google Cloud Storage (GCS) bucket to store logs generated by the recipe runs.
-
-To prepare the required environment, see
-[GKE environment setup guide](../../../../docs/configuring-environment-gke-a4.md).
-
-## Training dataset
-
-This recipe uses a mock pretraining dataset provided by the NeMo framework
-
-## Docker container image
-
-This recipe uses the following docker image:
-`us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo25.02-gib1.0.5-A4`.
-
-This image is based on NVIDIA NeMo 25.02 and contains the NCCL gIB plugin
-v1.0.5, bundling all NCCL binaries validated for use with A4 GPUs.
-
-## Run the recipe
-
-From your client workstation, complete the following steps:
-
-### Configure environment settings
-
-Set the environment variables to match your environment:
-
- ```bash
- export PROJECT_ID=<PROJECT_ID>
- export CLUSTER_REGION=<CLUSTER_REGION>
- export CLUSTER_NAME=<CLUSTER_NAME>
- export GCS_BUCKET=<GCS_BUCKET>
- export KUEUE_NAME=<KUEUE_NAME>
- ```
-
-Replace the following values:
-
- - `<PROJECT_ID>`: your Google Cloud project ID.
- - `<CLUSTER_REGION>`: the region where your cluster is located.
- - `<CLUSTER_NAME>`: the name of your GKE cluster.
- - `<GCS_BUCKET>`: the name of your Cloud Storage bucket. Don't include the `gs://` prefix.
- - `<KUEUE_NAME>`: the name of the Kueue local queue. The default queue created by the cluster toolkit is `a4`. Make sure to verify the name of the local queue in your cluster.
-
-Set the default project:
-
- ```bash
- gcloud config set project $PROJECT_ID
- ```
-
-### Get the recipe
-
-Clone the `gpu-recipes` repository and set a reference to the recipe folder.
-
-```
-git clone https://github.com/ai-hypercomputer/gpu-recipes.git
-cd gpu-recipes
-export REPO_ROOT=`git rev-parse --show-toplevel`
-export RECIPE_ROOT=$REPO_ROOT/training/a4/llama3-1-70b/nemo-pretraining-gke
-cd $RECIPE_ROOT
-```
-
-### Get cluster credentials
-
-```
-gcloud container clusters get-credentials $CLUSTER_NAME --region $CLUSTER_REGION
-```
-
-### Configure and submit a pretraining job
-
-#### Using 32 nodes (256 GPUs) FP8 precision
-
-The default job setting is 15 training steps and fp8 precision. To execute the
-job with the default settings, run the following command from your client:
-
-```bash
-helm  install -f $RECIPE_ROOT/values.yaml \
-    --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-    --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-256gpus-a4-fp8.yaml \
-    --set queue=${KUEUE_NAME} \
-    --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-    $USER-llama-3-1-70b-nemo-fp8 \
-    $REPO_ROOT/src/helm-charts/a4/jobset
-```
-
-#### Using 32 nodes (256 GPUs) BF16 precision
-
-The default job setting is 15 training steps and bf16 precision. To execute the
-job with the default settings, run the following command from your client:
-
-```bash
-helm  install -f $RECIPE_ROOT/values.yaml \
-    --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-    --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-256gpus-a4-bf16.yaml \
-    --set queue=${KUEUE_NAME} \
-    --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-    $USER-llama-3-1-70b-nemo-bf16 \
-    $REPO_ROOT/src/helm-charts/a4/jobset
-```
-
-#### Using 64 nodes (512 GPUs) FP8 precision
-
-The default job setting is 15 training steps and fp8 precision. To execute the
-job with the default settings, run the following command from your client:
-
-```bash
-helm  install -f $RECIPE_ROOT/values-64-128-nodes.yaml \
-    --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-    --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-512gpus-a4-fp8.yaml \
-    --set queue=${KUEUE_NAME} \
-    --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-    $USER-llama-3-1-70b-nemo-fp8 \
-    $REPO_ROOT/src/helm-charts/a4/jobset
-```
-
-#### Using 64 nodes (512 GPUs) BF16 precision
-
-The default job setting is 15 training steps and bf16 precision. To execute the
-job with the default settings, run the following command from your client:
-
-```bash
-helm  install -f $RECIPE_ROOT/values-64-128-nodes.yaml \
-    --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-    --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-512gpus-a4-bf16.yaml \
-    --set queue=${KUEUE_NAME} \
-    --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-    $USER-llama-3-1-70b-nemo-bf16 \
-    $REPO_ROOT/src/helm-charts/a4/jobset
-```
-
-#### Using 128 nodes (1024 GPUs) FP8 precision
-
-The default job setting is 15 training steps and fp8 precision. To execute the
-job with the default settings, run the following command from your client:
-
-```bash
-helm  install -f $RECIPE_ROOT/values-64-128-nodes.yaml \
-    --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-    --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-1024gpus-a4-fp8.yaml \
-    --set queue=${KUEUE_NAME} \
-    --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-    --set workload.gpus=1024 \
-    $USER-llama-3-1-70b-nemo-fp8 \
-    $REPO_ROOT/src/helm-charts/a4/jobset
-```
-
-#### Using 128 nodes (1024 GPUs) BF16 precision
-
-The default job setting is 15 training steps and bf16 precision. To execute the
-job with the default settings, run the following command from your client:
-
-```bash
-helm  install -f $RECIPE_ROOT/values-64-128-nodes.yaml \
-    --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-    --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-1024gpus-a4-bf16.yaml \
-    --set queue=${KUEUE_NAME} \
-    --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-    --set workload.gpus=1024 \
-    $USER-llama-3-1-70b-nemo-bf16 \
-    $REPO_ROOT/src/helm-charts/a4/jobset
-```
-
-
-#### Configure job settings
-
-You can overwrite any of the default:
-- [NeMo configurations 32 nodes fp8](../../../../src/frameworks/a4/nemo-configs/llama3-1-70b-256gpus-a4-fp8.yaml)
-- [NeMo configurations 32 nodes bf16](../../../../src/frameworks/a4/nemo-configs/llama3-1-70b-256gpus-a4-bf16.yaml)
-- [NeMo configurations 64 nodes fp8](../../../../src/frameworks/a4/nemo-configs/llama3-1-70b-512gpus-a4-fp8.yaml)
-- [NeMo configurations 64 nodes bf16](../../../../src/frameworks/a4/nemo-configs/llama3-1-70b-512gpus-a4-bf16.yaml)
-
-for this job. To do this, we can set the new arguments using `--set
-workload.arguments`.
-
-**Examples**
-
--   To set the number of training steps to 100, run the following command from
-    your client:
-
-    ```bash
-    helm  install -f $RECIPE_ROOT/values.yaml \
-        --set-file workload_launcher=$REPO_ROOT/src/launchers/nemo-10-launcher.sh \
-        --set-file workload_config=$REPO_ROOT/src/frameworks/a4/nemo-configs/llama3-1-70b-256gpus-a4-fp8.yaml \
-        --set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
-        --set queue=${KUEUE_NAME} \
-        --set workload.arguments[0]="trainer.max_steps=100" \
-        $USER-llama-3-1-70b-nemo-fp8 \
-        $REPO_ROOT/src/helm-charts/a4/jobset
-    ```
-
-### Monitor the job
-
-To check the status of pods in your job, run the following command:
-
-```
-kubectl get pods | grep JOB_NAME_PREFIX
-```
-
-Replace the following:
-- JOB_NAME_PREFIX - your job name prefix. For example $USER-llama-3-1-70b-nemo-fp8.
-
-To get the logs for one of the pods, run the following command:
-
-```
-kubectl logs POD_NAME
-```
-
-Information about the training job's progress, including crucial details such as loss,
-step count, and step time, is generated by the rank 0 process.
-This process runs on the pod whose name begins with `JOB_NAME_PREFIX-workload-0-0`.
-For example: `user-llama-3-1-70b-nemo-fp8-workload-0-0-s9zrv`.
-
-### Analyze results
-
-When completed, the job creates several artifacts, including logs and traces,
-and places them in the configured Google Cloud Storage bucket as follows:
-
-```
-gs://${GCS_BUCKET}/nemo-experiments/<JOB_ID>
-├── hparams.yaml
-├── lightning_logs.txt
-├── nemo_error_logs.txt
-├── nemo_log_globalrank-[RANK]_localrank-[LOCAL].txt
-├── dllogger
-│   ├── rank-0
-│   │   ├── dllogger.json
-...
-```
-
--   `hparams.yaml`: the NeMo configuration used by the pretraining script. This
-    includes the combined
-    [configuration file](../../../../src/frameworks/a4/nemo-configs/llama3-1-70b-256gpus-a4-fp8.yaml)
-    and the command line overrides
--   `lightning_logs.txt`: the log files generated by PyTorch Lightning, which is
-    used by NeMo
--   `nemo_error_logs.txt`: the warning and error logs generated by NeMo
--   `nemo_log_globalrank-[RANK]_localrank-[LOCAL].txt`: the NeMo logs for each
-    rank
--   `dllogger/: The log captured by [NVIDIA
-    DLLogger](https://github.com/NVIDIA/dllogger)`: DLLogger is configured to
-    store logs on the rank 0 node. The log is in JSON format and includes loss,
-    step_time, and other key metrics for each training step
-
-The JOB_ID has the following format:
-
-$USER-llama-3-1-70b-nemo-[YYYY]-[MM]-[DD]-[hh]-[mm]-[ss], where the suffix of the ID is a day and time when the job was started.
-
-Here is an example of an entry in the DLLogger log:
-
-```json
-DLLL
-{
-    "timestamp": "1742531120.867155",
-    "datetime": "2025-03-21 04:25:20.867155",
-    "elapsedtime": "416.858187",
-    "type": "LOG",
-    "step": 11,
-    "data":
-    {
-        "reduced_train_loss": 2.589764356613159,
-        "lr": 8.249999723375367e-07,
-        "global_step": 11.0,
-        "consumed_samples": 24576.0,
-        "train_backward_timing in s": 4.3010711669921876e-05,
-        "train_step_timing in s": 19.954481744766234,
-        "epoch": 0
-    }
-}
-```
-
-The DLLogger log can be used to calculate the Model FLOPS Utilization (MFU)
-metric, as described in the next section.
-
-### Calculate training performance metrics (MFU, TFLOPS, Average Step Time)
-
-This section explains how to calculate key training performance metrics, such as
-Model FLOPS Utilization (MFU), using the `dllogger.json` file generated during
-training.
-
-We provide a tool called
-[training_metrics](../../../../src/utils/training_metrics/) to help you easily
-compute these metrics. This tool can calculate the following metrics:
-
--   *MFU*: Model FLOPS Utilization
--   *Average training step time*: the average time taken for each training step
--   *TFLOPS per GPU*: the number of Tera Floating Point Operations per second
-    achieved by each GPU
-
-To calculate training performance metrics using the `training_metrics` tool,
-complete the following steps command from your client:
-
-1.  Download the `dllogger.json` file. The `dllogger.json` file is generated
-    during the training session.
-
-    To download the file, run the following command. Replace `<JOB_ID>` with the
-    ID of your training session.
-
-    ```bash
-    gcloud storage cp gs://${GCS_BUCKET}/nemo-experiments/megatron_gpt/<JOB_ID>/dllogger/rank-0/dllogger.json \
-        $RECIPE_ROOT/dllogger.json
-    ```
-
-2.  Run the
-    [`process_training_results.py`](../../../../src/utils/training_metrics/process_training_results.py)
-    script
-
-    ```bash
-    cd $REPO_ROOT/src/utils/training_metrics
-    python3 process_training_results.py --file $RECIPE_ROOT/dllogger.json \
-    --batch_size 2048 \
-    --num_accelerators 256 \
-    --precision fp8 \
-    --model_type llama3.1-70b \
-    --accelerator_type b200
-    ```
-
-**Note:** The `batch_size`, `num_accelerators`, `precision`, `model_type` and
-`accelerator_type` are the specific values for this recipe running the default
-configuration. Average step time is computed by default using the steps 10 to
-30.
-
-For more detailed information and advanced usage instructions of this tool, see
-the [full documentation](../../../../src/utils/training_metrics/README.md)
-
-### Troubleshooting
-
-This section provides guidance on troubleshooting issues with the training job.
-
-To check the status of the job's pods, use the following command:
-
-```bash
-kubectl get pods | grep JOB_NAME_PREFIX
-```
-
-Replace `JOB_NAME_PREFIX` with the prefix of your job name. For example, `$USER-mixtral-8x7b-nemo`. This command will list all pods associated with the specified job, along with their current status.
-
-
-To get the logs from a specific pod, use the following command:
-
-```bash
-kubectl logs POD_NAME
-```
-
-Replace `POD_NAME` with the name of the pod you want to inspect.
-
-In this recipe, the training job is orchestrated by the [Kubernetes JobSet](https://jobset.sigs.k8s.io/docs/overview/). If the JobSet encounters a fatal failure, it removes all pods, making it impossible to inspect their logs directly. To analyze logs from a failed job, retrieve them from Cloud Logging using the following filter:
-
-```
-resource.type="k8s_container"
-resource.labels.project_id="PROJECT_ID"
-resource.labels.location="CLUSTER_REGION"
-resource.labels.cluster_name="CLUSTER_NAME"
-resource.labels.namespace_name="default"
-resource.labels.pod_name=~"^JOB_NAME_PREFIX.*"
-severity>=DEFAULT
-```
-
-Replace the following:
-- `PROJECT_ID`: your Google Cloud project ID.
-- `CLUSTER_REGION`: the region where your cluster is located.
-- `CLUSTER_NAME`: the name of your GKE cluster.
-- `JOB_NAME_PREFIX`: the prefix of your job name (e.g., `$USER-llama-3-1-70b-nemo`).
-
-This filter will retrieve logs from all containers within pods that match the job with the specified name prefix.
-
-
-### Uninstall the Helm release
-
-You can delete the job and other resources created by the Helm chart. To
-uninstall Helm, run the following command from your client:
-
-```bash
-helm uninstall $USER-llama-3-1-70b-nemo-fp8
-helm uninstall $USER-llama-3-1-70b-nemo-bf16
-
-```
diff --git a/training/a4/llama3-1-70b/nemo-pretraining-gke/values-64-128-nodes.yaml b/training/a4/llama3-1-70b/nemo-pretraining-gke/values-64-128-nodes.yaml
deleted file mode 100644
index 8d7f86e..0000000
--- a/training/a4/llama3-1-70b/nemo-pretraining-gke/values-64-128-nodes.yaml
+++ /dev/null
@@ -1,68 +0,0 @@
-# Copyright 2025 Google LLC
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-queue:
-dwsSettings:
-  maxRunDurationSeconds:
-
-tasSettings:
-  topologyRequest:
-    kueue.x-k8s.io/podset-preferred-topology: "kubernetes.io/hostname"
-
-volumes:
-  gcsVolumes: true
-  psVolumes: false
-  gcsMounts:
-  - bucketName:
-    mountPath: "/job-logs"
-  - bucketName: cloud-samples-data
-    mountPath: "/artifacts"
-    mountOptions: "implicit-dirs"
-
-workload:
-  gpus: 512 # This should be one of: {<= 8,  multiple of 8}
-  image: us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo25.02-gib1.0.5-A4
-  defaultArguments[]:
-  arguments[]:
-  configFile: nemo-config.yaml
-  configPath: /workload/configs
-  envs:
-  - name: NEMO_CONFIG_PATH
-    value: "/workload/configs"
-  - name: NEMO_CONFIG_NAME
-    value: "nemo-config.yaml"
-  - name: EXPERIMENT_NAME
-    value: "nemo-experiments"
-  - name: EXPERIMENT_ROOT_DIR
-    value: "/job-logs"
-  - name: NVTE_FWD_LAYERNORM_SM_MARGIN
-    value: "8"
-  - name: NVTE_BWD_LAYERNORM_SM_MARGIN
-    value: "8"
-  - name: GLOO_SOCKET_IFNAME
-    value: "eth0"
-  - name: TOKENIZER_PATH
-    value: "/artifacts/third-party/tokenizers/gpt2"
-  - name: NEMO_LAUNCH_SCRIPT
-    value: "/opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py"
-  - name: TORCH_DISTRIBUTED_TRACING
-    value: "ALL"
-
-network:
-  hostNetwork: true
-  gibVersion: us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib:v1.0.5
-  subnetworks[]:
-  ncclSettings:
-  - name: NCCL_DEBUG
-    value: "VERSION"
diff --git a/training/a4/llama3-1-70b/nemo-pretraining-gke/values.yaml b/training/a4/llama3-1-70b/nemo-pretraining-gke/values.yaml
deleted file mode 100644
index 3b86e81..0000000
--- a/training/a4/llama3-1-70b/nemo-pretraining-gke/values.yaml
+++ /dev/null
@@ -1,70 +0,0 @@
-# Copyright 2025 Google LLC
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-queue:
-dwsSettings:
-  maxRunDurationSeconds:
-
-tasSettings:
-  topologyRequest:
-    kueue.x-k8s.io/podset-preferred-topology: "cloud.google.com/gce-topology-block"
-
-volumes:
-  gcsVolumes: true
-  psVolumes: false
-  gcsMounts:
-  - bucketName:
-    mountPath: "/job-logs"
-  - bucketName: cloud-samples-data
-    mountPath: "/artifacts"
-    mountOptions: "implicit-dirs"
-
-workload:
-  gpus: 256 # This should be one of: {<= 8,  multiple of 8}
-  image: us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo25.02-gib1.0.5-A4
-  defaultArguments[]:
-  arguments[]:
-  configFile: nemo-config.yaml
-  configPath: /workload/configs
-  envs:
-  - name: NEMO_CONFIG_PATH
-    value: "/workload/configs"
-  - name: NEMO_CONFIG_NAME
-    value: "nemo-config.yaml"
-  - name: EXPERIMENT_NAME
-    value: "nemo-experiments"
-  - name: EXPERIMENT_ROOT_DIR
-    value: "/job-logs"
-  - name: NVTE_FWD_LAYERNORM_SM_MARGIN
-    value: "8"
-  - name: NVTE_BWD_LAYERNORM_SM_MARGIN
-    value: "8"
-  - name: GLOO_SOCKET_IFNAME
-    value: "eth0"
-  - name: TOKENIZER_PATH
-    value: "/artifacts/third-party/tokenizers/gpt2"
-  - name: NEMO_LAUNCH_SCRIPT
-    value: "/opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py"
-  - name: TORCH_DISTRIBUTED_TRACING
-    value: "ALL"
-
-network:
-  hostNetwork: true
-  gibVersion: us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib:v1.0.5
-  subnetworks[]:
-  ncclSettings:
-  - name: NCCL_DEBUG
-    value: "VERSION"
-  - name: NVTE_UB_SOCKET_IFNAME
-    value: "eth1"