## Convert Neuron X distributed checkpoint to Huggingace format for Inferencing

The output from the training job is saved as [NeuronX checkpoint](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/save_load_developer_guide.html). In this notebook we will convert the neuronX distributed checkpoint into a .pt weights file which can be used for inferencing.

To begin with we will retrieve the path for the checkpoint from the model output and also path to Llama 70b config file, this can be retreived from `Notebook 2`.

### Contents

The example has the following main sections:

- [Install require packages](#Install-required-packages)
- [Convert Neuron X checkpoints to HF format](#Convert-Neuron-X-checkpoints-to-HF-format)

### Instance type quota increase

Complete the following steps:

- Open the [Service Quotas console](https://console.aws.amazon.com/servicequotas/).
- Choose Amazon EC2.
- Choose the service quota.
- Choose Request quota increase.

**Notes**: *To make sure that you have enough quotas to support your usage requirements, it's a best practice to monitor and manage your service quotas. Requests for Amazon EC2 service quota increases are subject to review by AWS engineering teams. Also, service quota increase requests aren't immediately processed when you submit a request. After your request is processed, you receive an email notification.*

*This Jupyter Notebook can be run on a t3.medium instance (`ml.t3.medium`). However, to save the pre-trained weights into a .pt weights file, we use a `trn1.32xlarge` instance type.*

*Before you run this notebook, you'll need to request a `quota increase of 32` from Amazon SageMaker for the following resources:*

1. *ml.trn1.32xlarge instance type for training job usage*

2. *ml.trn1.32xlarge instance type for training warm pool usage*

3. *Maximum number of instances per training job*

### Install required packages

In [None]:
!pip install -U sagemaker boto3 --quiet

### Convert Neuron X checkpoints to HF format

In [None]:
import sagemaker 

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
region_name = sess.boto_region_name

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {region_name}")

Update the [access token](https://huggingface.co/docs/hub/en/security-tokens) from HF

In [None]:
access_token = "hf_xxxx"

Update the checkpoint s3 uri with the value used in `Notebook 2`

In [None]:
# S3 checkpoint directory that contains the weights and other relevant data from the fine-tuned model
checkpoint_s3_uri = "<fine-tuning-checkpoint-s3-uri>"
nxd_checkpoint_path = f"s3://{checkpoint_s3_uri}/neuronx_llama_experiment/checkpts/step10/model/" # Checkpoint is saved as part of Notebook 2

Hyperparameters for saving model weights to HF

In [None]:
hyperparameters = {}
hyperparameters["n_layers"] = 80
hyperparameters["pp_size"] = 8
hyperparameters["tp_size"] = 8
hyperparameters["input_dir"] = "/opt/ml/input/data/checkpoint"
hyperparameters["convert_to_full_model"] = ""
hyperparameters["output_dir"] = "/opt/ml/model"
hyperparameters["access_token"] = access_token

In [None]:
# Docker image for training a models on AWS Trainium
docker_image = f"763104351884.dkr.ecr.{region_name}.amazonaws.com/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.17.0-ubuntu20.04"

For more details about neron docker images:
- [AWS Neuron Deep Learning Containers](https://github.com/aws-neuron/deep-learning-containers/tree/main0)
- [Available Deep Learning Containers Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)

In [None]:
from sagemaker.pytorch import PyTorch

# Handle end-to-end Amazon SageMaker training and deployment tasks.
# NOTES: Multinode with torchrun is a work-in-progresss. Use a single node.
estimator = PyTorch(
    base_job_name="neuronx-convert-checkpoint-to-hf",
    source_dir="./scripts",
    entry_point="convert_checkpoints.py",
    role=role,
    image_uri=docker_image,
    instance_count=1,
    instance_type="ml.trn1.32xlarge",
    sagemaker_session=sess,
    volume_size=1024,
    hyperparameters=hyperparameters,
    debugger_hook_config=False,
    disable_output_compression=True,
    keep_alive_period_in_seconds=600,
)

In [None]:
# Start SageMaker job
estimator.fit({"checkpoint": nxd_checkpoint_path})

In [None]:
model_path = estimator.model_data['S3DataSource']['S3Uri']

In [None]:
print(f"You can find the converted weights here {model_path}")