# (OPTIONAL): Convert Neuron X distributed checkpoint to Huggingace format for Inferencing

<div class="alert alert-block alert-warning"> 

<b>NOTE: This notebook is optional.</b> You should only run this notebook if you are experimenting with the **continuous pretraining** process on Neuronx and previously executed `Notebook 1` and `Notebook 2`. 
</div>

The output from the training job is saved as [NeuronX checkpoint](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/save_load_developer_guide.html). In this notebook we will convert the neuronX distributed checkpoint into a .pt weights file which can be used for inferencing.

To begin with we will retrieve the path for the checkpoint from the model output and also path to Llama 70b config file, this can be retreived from `Notebook 2`.

## Prerequisites

---
This Jupyter Notebook can be run on a `ml.t3.medium instance`. However, to execute the training job for preparing the pre-trained weights for the continuous pre-training process, you may need to request a quota increase. The number of instances you need to request for the quota increase depends on how quickly you may want the training job to complete. The range is between **8** and **32** instances.

To request a quota increase, follow these steps:

1. Navigate to the [Service Quotas console](https://console.aws.amazon.com/servicequotas/).
2. Choose Amazon SageMaker.
3. Review your default quota for the following resources:
   - `ml.trn1.32xlarge` for training job usage
   - `ml.trn1.32xlarge` for training warm pool usage
   - `Maximum number of instances per training job`

<div class="alert alert-block alert-warning"> 

<b>NOTE:</b> To make sure that you have enough quotas to support your usage requirements, it's a best practice to monitor and manage your service quotas. Requests for Amazon SageMaker service quota increases are subject to review by AWS engineering teams. Also, service quota increase requests aren't immediately processed when you submit a request. After your request is processed, you receive an email notification.
</div>

## Contents

---
The example has the following main sections:

1. [Requirements](#Requirements)
2. [Setup](#Setup)
3. [Training job parameters](#Training-job-parameters)
4. [Run training job](#Run-training-job)

## Requirements
---

1. Create an Amazon SageMaker Notebook Instance - [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)
    - For Notebook Instance type, choose ml.t3.medium.
2. For Select Kernel, choose [conda_python3](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html).
3. Install the required packages.

<div class="alert alert-block alert-info"> 

<b>NOTE:</b> For <a href="https://aws.amazon.com/sagemaker/studio/" target="_blank">Amazon SageMaker Studio</a>, select kernel "<span style="color:green;">Base Python 3.0</span>"

</div>

To run this notebook you would need to install the following dependencies:

In [None]:
!pip install -U sagemaker boto3 --quiet

## Setup
---

In [None]:
import sagemaker 

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
region_name = sess.boto_region_name

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {region_name}")

Retrieve the checkpoint s3 uri with the value used in `Notebook 2`

In [None]:
# Retrive the checkpoint s3 directory from Store Magic 
%store -r checkpoint_s3_uri 

In [None]:
if 'checkpoint_s3_uri' not in vars():
    print("The variable checkpoint_s3_uri does not exist. Before continuing with this notebook, check the value for checkpoint_s3_uri in Notebook 2 and define the variable within this notebook")
else:
    print(checkpoint_s3_uri)

In [None]:
# S3 checkpoint directory that contains the weights and other relevant data from the continuous pre-training model
nxd_checkpoint_path = f"s3://{checkpoint_s3_uri}/neuronx_llama_experiment/checkpts/step10/model/" # Checkpoint is saved as part of Notebook 2

In [None]:
# Docker image for training a models on AWS Trainium
docker_image = f"763104351884.dkr.ecr.{region_name}.amazonaws.com/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.18.0-ubuntu20.04"

For more details about neron docker images:
- [AWS Neuron Deep Learning Containers](https://github.com/aws-neuron/deep-learning-containers/tree/main0)
- [Available Deep Learning Containers Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)

Update the [access token](https://huggingface.co/docs/hub/en/security-tokens) from HF

In [None]:
access_token = "hf_xxxx"

## Training job parameters
---

Hyperparameters for saving model weights to HF

In [None]:
hyperparameters = {}
hyperparameters["n_layers"] = 80
hyperparameters["pp_size"] = 8
hyperparameters["tp_size"] = 8
hyperparameters["input_dir"] = "/opt/ml/input/data/checkpoint"
hyperparameters["convert_to_full_model"] = ""
hyperparameters["output_dir"] = "/opt/ml/model"
hyperparameters["access_token"] = access_token

## Run training job
---

In [None]:
from sagemaker.pytorch import PyTorch

# Handle end-to-end Amazon SageMaker training and deployment tasks.
# NOTES: Multinode with torchrun is a work-in-progresss. Use a single node.
estimator = PyTorch(
    base_job_name="neuronx-convert-checkpoint-to-hf",
    source_dir="./scripts",
    entry_point="convert_checkpoints.py",
    role=role,
    image_uri=docker_image,
    instance_count=1,
    instance_type="ml.trn1.32xlarge",
    sagemaker_session=sess,
    volume_size=1024,
    hyperparameters=hyperparameters,
    debugger_hook_config=False,
    disable_output_compression=True,
    keep_alive_period_in_seconds=600,
)

In [None]:
# Start SageMaker job
estimator.fit({"checkpoint": nxd_checkpoint_path})

In [None]:
model_path = estimator.model_data['S3DataSource']['S3Uri']

In [None]:
print(f"You can find the converted weights here {model_path}")

# Thank You!