Let's export the trained model in ONNX and safetensors formats for compatibility with downstream inference engines. First, we'll define some variables.

In [33]:
model_name = "lightgpt-small"
checkpoint_path = "./checkpoints/checkpoint.pt"
lora_path = None  # "./checkpoints/lora_instruction.pt"
exports_path = "./exports"

Then, we'll load the base model checkpoint into memory from disk.

In [34]:
import torch

from model import LightGPT

checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=True)

model = LightGPT(**checkpoint["model_args"])

model = torch.compile(model)

model.load_state_dict(checkpoint["model"])

print("Base checkpoint loaded successfully")

Base checkpoint loaded successfully


Now, we'll load any LoRA checkpoints we wish to incorporate into the exported model.

In [35]:
from model import LightGPTInstruct

if lora_path != None:
    checkpoint = torch.load(lora_path, map_location="cpu", weights_only=True)

    model = LightGPTInstruct(model, **checkpoint["lora_args"])

    model = torch.compile(model)

    model.load_state_dict(checkpoint["lora"], strict=False)

    model.merge_lora_parameters()

    print("LoRA checkpoint loaded successfully")

For ONNX format we'll use TorchDynamo to trace the FX Graph of our model using some example data and then translate the intermediate representation to ONNX format.

In [36]:
from model import ONNXModel

from torch.onnx import dynamo_export, ExportOptions

example_input = torch.randint(0, model.vocabulary_size - 1, (1, 1024))

onnx_model = ONNXModel(model)  # Nicer inferencing API

onnx_model.eval()  # Turn off dropout and other train-time operations

export_options = ExportOptions(
    dynamic_shapes=True
)  # Necessary for variable batch and sequence lengths

onnx_model = dynamo_export(onnx_model, example_input, export_options=export_options)

onnx_path = path.join(exports_path, f"{model_name}.onnx")

onnx_model.save(onnx_path)

print(f"Model saved to {onnx_path}")



Applied 73 of general pattern rewrite rules.
Model saved to ./exports/lightgpt-small.onnx


Compare the output of PyTorch with the ONNX runtime to see if they are the same.

In [37]:
import onnxruntime

from numpy.testing import assert_allclose

pytorch_logits = model.predict(example_input).detach().numpy()

session = onnxruntime.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

onnx_input = {"l_x_": example_input.numpy()}

onnx_logits = session.run(None, onnx_input)

onnx_logits = onnx_logits[0]

assert_allclose(pytorch_logits, onnx_logits, rtol=1e-2, atol=1e-03)

print("Looks good!")

Looks good!


Lastly, let's export the model in HuggingFace format so that it can be used with the HuggingFace ecosystem.

In [40]:
from os import path

hf_path = path.join(exports_path, model_name)

model.save_pretrained(hf_path)

print(f"Model saved to {hf_path}")

Model saved to ./exports/lightgpt-small


Lastly, we'll compensate for HuggingFace Hub's poor support multiple models by uploading each individual model to a separate namespace.

In [41]:
model.push_to_hub("lightgpt-small")

model.safetensors: 100%|██████████| 1.41G/1.41G [08:10<00:00, 2.88MB/s]  


CommitInfo(commit_url='https://huggingface.co/andrewdalpino/lightgpt-small/commit/be1a6b528bb2e58a95756cb604c9c4ac459085ce', commit_message='Push model using huggingface_hub.', commit_description='', oid='be1a6b528bb2e58a95756cb604c9c4ac459085ce', pr_url=None, repo_url=RepoUrl('https://huggingface.co/andrewdalpino/lightgpt-small', endpoint='https://huggingface.co', repo_type='model', repo_id='andrewdalpino/lightgpt-small'), pr_revision=None, pr_num=None)