From 1f7792c48641ae23ec81798aabfa4f4c41bbcecc Mon Sep 17 00:00:00 2001
From: Razvan Tanase <ratanase@microsoft.com>
Date: Tue, 21 Mar 2023 14:38:12 -0700
Subject: [PATCH 1/6] fix broken links

---
 .../Training/Bert-Pretrain/README.md          |  8 ++--
 .../Training/Bloom-Pretrain/README.md         |  2 +-
 .../Training/README.md                        | 48 +++++++------------
 3 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md b/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
index ce8945cb7f..20e9880810 100644
--- a/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
+++ b/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
@@ -24,7 +24,7 @@ The first step in the training script is to parse the arguments passed in from t
 - ``--gradient_accumulation_steps`` Number of training steps to accumulate gradients before using them to compute variables. This value should match the value of ``gradient_accumulation_steps`` in your ``ds_config.json`` file if deepspeed is enabled.
 - ``--model_checkpoint`` the model to pretrain. In this case we are pretraining "bert-large-uncased" but this example was also run with DistilBERT and BERT-base. See below for more information.
 
-This example also supports the interactive capabilities from JupyterLab, TensorBoard and VSCode. These are added via the ``services`` section of the yml submit files. For more information on these, see [this](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training#interactive-debugging) page. Remove these sections under ``services`` to disable these tools.
+This example also supports the interactive capabilities from JupyterLab, TensorBoard and VSCode. These are added via the ``services`` section of the yml submit files. For more information on these, see [this](../README.md#interactive-debugging) page. Remove these sections under ``services`` to disable these tools.
 
 #### **DeepSpeed Configuration**
 As discussed above, arguments to the command job will need to match arguments in the DeepSpeed configuration file (``ds_config.json``) if DeepSpeed is being used. We use a very simple configuration for this experiment. This config is without the additional profiling + checkpointing tools added to the ``ds_config.json`` located in the ``src`` folder.
@@ -42,7 +42,7 @@ As discussed above, arguments to the command job will need to match arguments in
 ```
 Each setting here is described above, but this configuration also includes ``fp16`` to improve training speed and reduce memory usage. 
 
-This configuration was found by running [DeepSpeed Autotuning](https://www.deepspeed.ai/tutorials/autotuning/) with this training script and BERT large in [this example](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/DeepSpeed-Autotuning). DeepSpeed as it relates to this example is described in more detail [here](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training#deepspeed).
+This configuration was found by running [DeepSpeed Autotuning](https://www.deepspeed.ai/tutorials/autotuning/) with this training script and BERT large in [this example](../DeepSpeed-Autotuning). DeepSpeed as it relates to this example is described in more detail [here](../README.md#deepspeed).
 ### **Load the dataset**
 Once arguments have been parsed, its time to prepare the dataset. First we prepare a tokenizer to tokenize the data:
 ```
@@ -56,7 +56,7 @@ encoded_dataset_train, encoded_dataset_eval = load_encoded_glue_dataset(
     task=task, tokenizer=tokenizer
 )
 ```
-This is done from within the [``glue_datasets.py``](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/blob/main/Training/Bert-Pretrain/src/glue_datasets.py) file.
+This is done from within the [``glue_datasets.py``](../src/glue_datasets.py) file.
 ```
 def load_raw_glue_dataset(task: str) -> Union[DatasetDict, Dataset]:
     dataset = load_dataset("glue", actual_task(task))
@@ -113,7 +113,7 @@ trainer.pop_callback(MLflowCallback)
 
 result = trainer.train()
 ```
-The ``ProfilerCallback`` in the above code is used to integrate the experiment with [Pytorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html). For more information on this code, see [this page](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training#pytorch-profiler).
+The ``ProfilerCallback`` in the above code is used to integrate the experiment with [Pytorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html). For more information on this code, see [this page](../README.md#pytorch-profiler).
 
 ## **Run the Job**
 ### **Submit with vanilla Pytorch**
diff --git a/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md b/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md
index 475a7a3f9e..ed3be11a1a 100644
--- a/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md
+++ b/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md
@@ -19,7 +19,7 @@ az ml environment create --file ./src/environment/env.yml
 The following code can be found under this directory in ``src/deepspeed-BLOOM-AML-SDKv2.yaml`` for the submit file and environment and ``src/Megatron-DeepSpeed/pretrain_gpt.py`` for the training code.
 
 ### **Job Configuration**
-In the [``deepspeed-BLOOM-AML-SDKv2.yaml``](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bloom-Pretrain/src/deepspeed-BLOOM-AML-SDKv2.yaml) file for submitting the job, there are several arguments passed in for the pretraining, with most being settings specific to how the model will be trained. For more information on command line arguments, see [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/megatron/arguments.py). Some arguments relevant to this example are:
+In the [``deepspeed-BLOOM-AML-SDKv2.yaml``](./src/deepspeed-BLOOM-AML-SDKv2.yaml) file for submitting the job, there are several arguments passed in for the pretraining, with most being settings specific to how the model will be trained. For more information on command line arguments, see [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/megatron/arguments.py). Some arguments relevant to this example are:
 
 - ``--data-path`` - The paths to the data the model is trained on. The format should be the weight of the dataset and the path and name of the file that references .bin and .idx file (without the extension). For example, command below will add weight of 0.033178301 to ar language data, and inside the ar folder should be ar_text_document.bin and ar_text_document.idx.
 - ``--deepspeed`` and other deepspeed related arguments. These arguments are specific to DeepSpeed. The ``ds_config.json`` file passed in gives the configuration settings for DeepSpeed. Notice that the argument ``global_batch_size`` matches the ``train_batch_size`` setting in the ds_config. Similarly, the ``--zero_stage`` command line argument matches the ``zero_optimization`` setting in the ``ds_config.json`` file.
diff --git a/best-practices/largescale-deep-learning/Training/README.md b/best-practices/largescale-deep-learning/Training/README.md
index 3e327b2ab7..0de065c9ac 100644
--- a/best-practices/largescale-deep-learning/Training/README.md
+++ b/best-practices/largescale-deep-learning/Training/README.md
@@ -8,28 +8,14 @@ Large scale training has led to state-of-the-art accuracies across a range of ta
   
 This guide will show best practices to allow you to train large models very efficiently with high throughput in AzureML, leveraging full utilization of GPU to keep the cost low.
 
-- [Setup](#setup)
-  - [Estimate Your Memory Requirements](#estimate-memory-requirements)
-  - [Compute Cluster](#compute-cluster)
-    - [Linear Scaling with Infiniband Enabled SKUs](linear-scaling-with-infiniband-enabled-skus)
-  - [Environment](#environment)
-  - [Data Loading](#data-loading)
-- [Training Optimizations for Compute and Memory Efficiency](#optimizations)
-  - [DeepSpeed](#deepspeed)
-    - [DeepSpeed Autotuning](#deepspeed-autotuning)
-  - [Onnx Runtime (ORT)](#onnx-runtime-ort)
-- [Monitoring and Debugging](#monitoring)
-  - [Interactive Debugging](#interactive-debugging)
-    - [JupyterLab](#jupyterlab)
-    - [VSCode](#vscode)
-    - [Tensorboard](#tensorboard)
-  - [Pytorch Profiler](#pytorch-profiler)
-  - [Flops Profiler](#flops-profiler)
-- [Resiliency](#resiliency)
-  - [Nebula Checkpointing](#nebula-checkpointing)
-- [Examples](#examples)
-  - [BERT Pretrain](#bert-pretrain)
-  - [BLOOM Pretrain](#bloom-pretrain)
+- [Large Scale Distributed Training](#large-scale-distributed-training)
+  - [Setup](#setup)
+      - [**Linear Scaling with Infiniband Enabled SKUs**](#linear-scaling-with-infiniband-enabled-skus)
+  - [Optimizations](#optimizations)
+      - [**DeepSpeed Autotuning**](#deepspeed-autotuning)
+  - [Monitoring](#monitoring)
+- [Create path for logging to tensorboard](#create-path-for-logging-to-tensorboard)
+  - [**Examples**](#examples)
       
 <!-- /TOC -->
 
@@ -87,7 +73,7 @@ This guide will show best practices to allow you to train large models very effi
   
   
 - ### **Environment**
-  The recommended environment for a large scale distributed training job is an Azure Container for PyTorch (ACPT) environment with several built in optimizers and is 	described in more detail [here](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/blob/main/Environment/ACPT.md). This environment is built and ready to use under the 	'Environments' tab in AzureML studio. Some optimizers included in the environment are: 
+  The recommended environment for a large scale distributed training job is an Azure Container for PyTorch (ACPT) environment with several built in optimizers and is 	described in more detail [here](../Environment/ACPT.md). This environment is built and ready to use under the 	'Environments' tab in AzureML studio. Some optimizers included in the environment are: 
 	- Onnx Runtime, Built-in optimizations that deliver up to 1.4X faster training
 	- Deepspeed allows to train trillion model parameter at low cost by achieving excellent system throughput and efficiently scale to thousands of GPUs
 	- MSCCL, an inter-accelerator communication framework that is built on top of NCCL
@@ -95,7 +81,7 @@ This guide will show best practices to allow you to train large models very effi
 
 
 - ### **Data Loading**
-  To load data in the most efficient way with large scale distributed training jobs, follow [this guide](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/blob/main/Data-loading/data-loading.md).
+  To load data in the most efficient way with large scale distributed training jobs, follow [this guide](../Data-loading/data-loading.md).
 ## Optimizations
 To achive the best possible performance and resource utilization of jobs on AzureML, we employ several different optimization tools showcased below.
 - ### **DeepSpeed**
@@ -128,7 +114,7 @@ To achive the best possible performance and resource utilization of jobs on Azur
 
   DeepSpeed features can be enabled, disabled, or configured using a config JSON file that should be specified as args.deepspeed_config. 
 
-  To include DeepSpeed in a job using the HuggingFace ``Trainer`` class, simply include the argument ``--deepspeed ds_config.json`` as part of the ``TrainerArguments`` class passed into the Trainer. Example code for Bert Pretraining with Deepspeed and the HuggingFace Trainer class is shown at [BERT pretraining guide](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bert-Pretrain).
+  To include DeepSpeed in a job using the HuggingFace ``Trainer`` class, simply include the argument ``--deepspeed ds_config.json`` as part of the ``TrainerArguments`` class passed into the Trainer. Example code for Bert Pretraining with Deepspeed and the HuggingFace Trainer class is shown at [BERT pretraining guide](./Bert-Pretrain).
   
   To include DeepSpeed in a job using a custom training loop, DeepSpeed will have to be initialized before the training loop as shown here:
 
@@ -160,7 +146,7 @@ To achive the best possible performance and resource utilization of jobs on Azur
   |   DeBERTa    |    1.5B    |         Not runnable          |   140.587 (z = 1, gas = 1 mbs = 8)   |  162.395  (z1_gas1_tmbspg11)   |                 inf                  |           40           |          12           |
 
 
-  To learn how to use DeepSpeed Autotuning with AzureML, see [this tutorial](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/DeepSpeed-Autotuning).
+  To learn how to use DeepSpeed Autotuning with AzureML, see [this tutorial](./DeepSpeed-Autotuning/README.md).
 
   When running the Bloom and BERT examples in this repo, the following results were found:
   |      Metrics      |   Vanilla Pytorch     | DeepSpeed + Autotuning|
@@ -192,7 +178,7 @@ To achive the best possible performance and resource utilization of jobs on Azur
   ```
   --optim adamw_ort_fused
   ```
-  This is an extra argument added with ORTTrainingArguments that applies the Fused Adam Optimizer to give a little extra performance gain. For a training example that uses ORT, See the [BERT Pretrain example](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bert-Pretrain).
+  This is an extra argument added with ORTTrainingArguments that applies the Fused Adam Optimizer to give a little extra performance gain. For a training example that uses ORT, See the [BERT Pretrain example](./Bert-Pretrain/README.md).
 ## Monitoring
 - ### **Interactive Debugging**
   Machine learning model training is usually an iterative process and requires significant experimentation. With the Azure Machine Learning interactive job experience, we can access the container where the job is running and iterate on training scripts, monitor progress and even debug the job remotely on local machines.  
@@ -216,7 +202,7 @@ To achive the best possible performance and resource utilization of jobs on Azur
 
   <img src="https://user-images.githubusercontent.com/73311224/225147928-865bb51f-12ba-44c0-80e1-0d26d067f2cf.png" alt="SSH Connections" width="450"/>
 
-  For an example that enables these tools, see [here](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bert-Pretrain).
+  For an example that enables these tools, see [here](./Bert-Pretrain/README.md).
 
   #### **JupyterLab**
   With JupyterLab, you can open a terminal and interact with the job container as well as iterate on your training script.
@@ -270,7 +256,7 @@ To achive the best possible performance and resource utilization of jobs on Azur
           self.prof.step()
   ```
   > NOTE: To make sure the Pytorch Profiler is visible with Tensorboard, we create a variable called `my_logs` (as shown in the above code) from passing an additional argument ``--tensorboard_log_dir "/outputs/runs/"`` to our training script. This path matches the ``logDir`` property under ``my_tensorboard`` in our yaml file for submitting the job.
-  See the [BERT Pretrain example](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bert-Pretrain) for the full implementation of this code.
+  See the [BERT Pretrain example](./Bert-Pretrain/README.md) for the full implementation of this code.
 
   After the job starts running, go to the TensorBoard as described above and click on 'Pytorch Profiler'. This page will show the relevant resource utilization information.
   
@@ -326,7 +312,7 @@ Nebula Checkpointing improves on standard model checkpointing by saving models 1
 - ### **Pretraining a model**
   Pretraining a language model is a process of training a model on a large corpus of unlabeled text using self-supervision, which means that the model learns to predict some parts of the text from other parts. Pretraining helps the model learn general language knowledge and skills that can be useful for various downstream tasks. Pretraining from scratch means training a model from random initialization without using any existing pretrained models. Pretraining from scratch can be beneficial when you have a large amount of domain-specific data that differs significantly from general text corpora, or when you want to customize your model architecture or hyperparameters. However, pretraining from scratch can also be more costly and time-consuming than finetuning an existing pretrained model.
 - ### **BERT Pretrain**
-  [This example](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bert-Pretrain) shows how to run a BERT pretraining job on AzureML.
+  [This example](./Bert-Pretrain/README.md) shows how to run a BERT pretraining job on AzureML.
   The following results were found using 2 ND40rs nodes with 8 V100 GPUs each.
 
   | Optimizations  | Model size  | GPU  | MBS  | Samples/Second  | GPU memory utilized  |
@@ -334,7 +320,7 @@ Nebula Checkpointing improves on standard model checkpointing by saving models 1
   | Vanilla Pytorch| 330M        | 16   | 64   | 2431.02         | 49.4%​                |
   | DeepSpeed + Autotuning| 330M | 16   | 93   | 3369.37         | 64.5%​                |
 - ### **Bloom Pretrain**
-  [This example](https://github.com/microsoft/azureml-largescale-deeplearning-bestpractices/tree/main/Training/Bloom-Pretrain) shows how to pretrain the Bloom model in AzureML. The following results were found using 16 NVIDIA A100 80GB GPUs (2 nodes NVLink enabled).
+  [This example](./Bloom-Pretrain/README.md) shows how to pretrain the Bloom model in AzureML. The following results were found using 16 NVIDIA A100 80GB GPUs (2 nodes NVLink enabled).
   |Experiment |Model size|GPU Count |	TP|	PP	 | MBS	| TFlops|	Samples per second |	GPU memory Utillized	
   |----|----|----|----|----|----|----|----|----|
   |1|25B|16|	8|	1|	1|	119.42|	4.173	|69.7%|

From 902d098de7307f534cfe4b5a5e0154b177b95eb7 Mon Sep 17 00:00:00 2001
From: Razvan Tanase <ratanase@microsoft.com>
Date: Tue, 21 Mar 2023 14:40:48 -0700
Subject: [PATCH 2/6] more broken links

---
 best-practices/largescale-deep-learning/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/best-practices/largescale-deep-learning/README.md b/best-practices/largescale-deep-learning/README.md
index 783a4fe2b0..726e0c8454 100644
--- a/best-practices/largescale-deep-learning/README.md
+++ b/best-practices/largescale-deep-learning/README.md
@@ -56,7 +56,7 @@ AzureML supports thre data asset types:
 Follow this [guide](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets) to learn more how to create any of the supported data assets.
 
 ## Create Training environment
-An Environment is useful to track and reproduce the projects' software dependencies as they evolve over time. In this [section](./Environment/Readme.md) you can learn more about Environments in AzureML, how to quickly get started and validate the setup before we begin training. 
+An Environment is useful to track and reproduce the projects' software dependencies as they evolve over time. In this [section](./Environment/README.md) you can learn more about Environments in AzureML, how to quickly get started and validate the setup before we begin training. 
 
 ## Efficient data loading for large training workloads
 

From f26bedaa6c87a1ea3684b22035f965d95a6be735 Mon Sep 17 00:00:00 2001
From: Razvan Tanase <ratanase@microsoft.com>
Date: Tue, 21 Mar 2023 14:52:32 -0700
Subject: [PATCH 3/6] Fixing ToC

---
 .../Training/Bert-Pretrain/README.md          |  4 +--
 .../Training/README.md                        | 30 ++++++++++++++-----
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md b/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
index 20e9880810..91aeb4eb9b 100644
--- a/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
+++ b/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
@@ -10,9 +10,9 @@ V100 GPUs (STANDARD_ND40RS_V2) are recommended for this job. This example was or
 To attain linear scaling for large model, one important step can be to use InfiniBand. InfiniBand enables low-latency, GPU-to-GPU communication across nodes in a cluster. InfiniBand requires specialized hardware to operate. Only some VM SKUs on Azure contain this required hardware. You can view the full list of InfiniBand-enabled machine SKUs [here](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-hpc#rdma-capable-instances). 
 
 ### **Environment**
-The environment found at ``src/envrionments`` is an ACPT environment with multiple accelerators to boost the training job. If you would like to add additional packages, edit the appropriate files in ``src/environments`` with your changes, then create the custom environment using the following command:
+The environment found at ``src/envrionment`` is an ACPT environment with multiple accelerators to boost the training job. If you would like to add additional packages, edit the appropriate files in ``src/environment`` with your changes, then create the custom environment using the following command:
 ```
-az ml environment create --file ./src/environments/env.yml
+az ml environment create --file ./src/environment/env.yml
 ```
 ## **Code**
 All of the code described in this document can be found either in one of the submit yml files or in the ``src`` folder of this directory.
diff --git a/best-practices/largescale-deep-learning/Training/README.md b/best-practices/largescale-deep-learning/Training/README.md
index 0de065c9ac..a00ca73b7f 100644
--- a/best-practices/largescale-deep-learning/Training/README.md
+++ b/best-practices/largescale-deep-learning/Training/README.md
@@ -8,14 +8,28 @@ Large scale training has led to state-of-the-art accuracies across a range of ta
   
 This guide will show best practices to allow you to train large models very efficiently with high throughput in AzureML, leveraging full utilization of GPU to keep the cost low.
 
-- [Large Scale Distributed Training](#large-scale-distributed-training)
-  - [Setup](#setup)
-      - [**Linear Scaling with Infiniband Enabled SKUs**](#linear-scaling-with-infiniband-enabled-skus)
-  - [Optimizations](#optimizations)
-      - [**DeepSpeed Autotuning**](#deepspeed-autotuning)
-  - [Monitoring](#monitoring)
-- [Create path for logging to tensorboard](#create-path-for-logging-to-tensorboard)
-  - [**Examples**](#examples)
+- [Setup](#setup)
+  - [Estimate Your Memory Requirements](#estimate-memory-requirements)
+  - [Compute Cluster](#compute-cluster)
+    - [Linear Scaling with Infiniband Enabled SKUs](linear-scaling-with-infiniband-enabled-skus)
+  - [Environment](#environment)
+  - [Data Loading](#data-loading)
+- [Training Optimizations for Compute and Memory Efficiency](#optimizations)
+  - [DeepSpeed](#deepspeed)
+    - [DeepSpeed Autotuning](#deepspeed-autotuning)
+  - [Onnx Runtime (ORT)](#onnx-runtime-ort)
+- [Monitoring and Debugging](#monitoring)
+  - [Interactive Debugging](#interactive-debugging)
+    - [JupyterLab](#jupyterlab)
+    - [VSCode](#vscode)
+    - [Tensorboard](#tensorboard)
+  - [Pytorch Profiler](#pytorch-profiler)
+  - [Flops Profiler](#flops-profiler)
+- [Resiliency](#resiliency)
+  - [Nebula Checkpointing](#nebula-checkpointing)
+- [Examples](#examples)
+  - [BERT Pretrain](#bert-pretrain)
+  - [BLOOM Pretrain](#bloom-pretrain)
       
 <!-- /TOC -->
 

From 37496d44e443d92923dd9e819bfe94cbe27cc5ee Mon Sep 17 00:00:00 2001
From: Razvan Tanase <ratanase@microsoft.com>
Date: Tue, 21 Mar 2023 15:03:17 -0700
Subject: [PATCH 4/6] more fixes

---
 best-practices/largescale-deep-learning/README.md              | 3 +--
 .../largescale-deep-learning/Training/Bert-Pretrain/README.md  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/best-practices/largescale-deep-learning/README.md b/best-practices/largescale-deep-learning/README.md
index 726e0c8454..cc3a87ea93 100644
--- a/best-practices/largescale-deep-learning/README.md
+++ b/best-practices/largescale-deep-learning/README.md
@@ -3,7 +3,6 @@
 ## Table of Contents
 
 - [AzureML Large Scale Deep Learning Best Practices](#azureml-large-scale-deep-learning-best-practices)
-  - [Table of Contents](#table-of-contents)
   - [Welcome](#welcome)
   - [Optimizations for Deep Learning in AzureML](#optimizations-for-deep-learning-in-azureml)
   - [Create ML resources to get started](#create-ml-resources-to-get-started)
@@ -34,7 +33,7 @@ The host OS is updated with the latest drivers and patches to ensure smooth oper
 
 The AzureML Compute layer abstracts the complexities for managing the cloud scale infrastructure for compute, storage and networking. 
 
-AzureML supports curated environments for training execution on cached Docker images reducing the run preparation cost and consistency for experiment runs. The Azure Container for PyTorch ([ACPT](https://learn.microsoft.com/azure/machine-learning/reference-azure-container-for-pytorch)) Curated Environment is the built-in setup for running pytorch training experiments on the Azure AI hardware. ACPT includes a curated set of optimizer libraries to improve the training throughput with DeepSpeed for GPU memory optimization, ONNX Runtime Training for efficient op-level execution and NebulaML for fast checkpointing.
+AzureML supports curated environments for training execution on cached Docker images reducing the run preparation cost and consistency for experiment runs. The Azure Container for PyTorch ([ACPT](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-azure-container-for-pytorch-environment)) Curated Environment is the built-in setup for running pytorch training experiments on the Azure AI hardware. ACPT includes a curated set of optimizer libraries to improve the training throughput with DeepSpeed for GPU memory optimization, ONNX Runtime Training for efficient op-level execution and NebulaML for fast checkpointing.
 
 The AzureML PaaS offers capabilities for the enterprise MLOps lifecycle to manage all aspects of the experimentation and deployment loops.
 
diff --git a/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md b/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
index 91aeb4eb9b..a3b33a2df5 100644
--- a/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
+++ b/best-practices/largescale-deep-learning/Training/Bert-Pretrain/README.md
@@ -56,7 +56,7 @@ encoded_dataset_train, encoded_dataset_eval = load_encoded_glue_dataset(
     task=task, tokenizer=tokenizer
 )
 ```
-This is done from within the [``glue_datasets.py``](../src/glue_datasets.py) file.
+This is done from within the [``glue_datasets.py``](./src/glue_datasets.py) file.
 ```
 def load_raw_glue_dataset(task: str) -> Union[DatasetDict, Dataset]:
     dataset = load_dataset("glue", actual_task(task))

From f25fa18ec0e94a2fcd48bde82191aec4882e24e7 Mon Sep 17 00:00:00 2001
From: Razvan Tanase <ratanase@microsoft.com>
Date: Tue, 21 Mar 2023 15:04:42 -0700
Subject: [PATCH 5/6] more broken links.

---
 best-practices/largescale-deep-learning/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/best-practices/largescale-deep-learning/README.md b/best-practices/largescale-deep-learning/README.md
index cc3a87ea93..21ecd567ac 100644
--- a/best-practices/largescale-deep-learning/README.md
+++ b/best-practices/largescale-deep-learning/README.md
@@ -33,7 +33,7 @@ The host OS is updated with the latest drivers and patches to ensure smooth oper
 
 The AzureML Compute layer abstracts the complexities for managing the cloud scale infrastructure for compute, storage and networking. 
 
-AzureML supports curated environments for training execution on cached Docker images reducing the run preparation cost and consistency for experiment runs. The Azure Container for PyTorch ([ACPT](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-azure-container-for-pytorch-environment)) Curated Environment is the built-in setup for running pytorch training experiments on the Azure AI hardware. ACPT includes a curated set of optimizer libraries to improve the training throughput with DeepSpeed for GPU memory optimization, ONNX Runtime Training for efficient op-level execution and NebulaML for fast checkpointing.
+AzureML supports curated environments for training execution on cached Docker images reducing the run preparation cost and consistency for experiment runs. The Azure Container for PyTorch ([ACPT](https://learn.microsoft.com/en-us/azure/machine-learning/resource-azure-container-for-pytorch)) Curated Environment is the built-in setup for running pytorch training experiments on the Azure AI hardware. ACPT includes a curated set of optimizer libraries to improve the training throughput with DeepSpeed for GPU memory optimization, ONNX Runtime Training for efficient op-level execution and NebulaML for fast checkpointing.
 
 The AzureML PaaS offers capabilities for the enterprise MLOps lifecycle to manage all aspects of the experimentation and deployment loops.
 

From 132fdd3527cece9eef7ae238dbaf9d2b39eed3b6 Mon Sep 17 00:00:00 2001
From: Razvan Tanase <ratanase@microsoft.com>
Date: Tue, 21 Mar 2023 15:06:20 -0700
Subject: [PATCH 6/6] bloom environment link

---
 .../largescale-deep-learning/Training/Bloom-Pretrain/README.md  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md b/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md
index ed3be11a1a..f14cc79b8a 100644
--- a/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md
+++ b/best-practices/largescale-deep-learning/Training/Bloom-Pretrain/README.md
@@ -10,7 +10,7 @@ NVIDIA A100 80GB GPUs are recommended for this job. This experiment was original
 To attain linear scaling for large model, one important step can be to use InfiniBand. InfiniBand enables low-latency, GPU-to-GPU communication across nodes in a cluster. InfiniBand requires specialized hardware to operate. Only some VM SKUs on Azure contain this required hardware. You can view the full list of InfiniBand-enabled machine SKUs [here](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-hpc#rdma-capable-instances). 
 
 ### **Environment**
-The environment found [here](https://github.com/savitamittal1/Megatron-DeepSpeed-AML/blob/353db918a3a061552aa541e8d67d9b55a35b2f3d/examples/azureml/environment/context/Dockerfile) is an ACPT environment with multiple accelerators to boost the training job. Also included are HuggingFace packages used for this training. If you would like to add additional packages, edit the appropriate files in that directory with your changes, then create the custom environment using the following command:
+The environment found [here](./src/environment/context/Dockerfile) is an ACPT environment with multiple accelerators to boost the training job. Also included are HuggingFace packages used for this training. If you would like to add additional packages, edit the appropriate files in that directory with your changes, then create the custom environment using the following command:
 ```
 az ml environment create --file ./src/environment/env.yml
 ```