Let's export the trained model in safetensor formats for compatibility with downstream inference engines. First, we'll define some variables.

In [None]:
model_name = "NoPE-GPT-Base"
checkpoint_path = "./checkpoints/checkpoint.pt"
exports_path = "./exports"

Then, we'll load the base model checkpoint into memory from disk.

In [2]:
import torch

from src.nope_gpt.model import NoPEGPT

checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)

tokenizer = checkpoint["tokenizer"]

model = NoPEGPT(**checkpoint["model_args"])

model = torch.compile(model)

model.load_state_dict(checkpoint["model"])

print("Base checkpoint loaded successfully")

  return torch._C._cuda_getDeviceCount() > 0


Base checkpoint loaded successfully


Next, let's export the model in HuggingFace format so that it can be used with the HuggingFace ecosystem.

In [3]:
from os import path

hf_path = path.join(exports_path, model_name)

model.save_pretrained(hf_path)

print(f"Model saved to {hf_path}")

Model saved to ./exports/NoPE-GPT-400M-Base


Lastly, we'll login to HuggingFaceHub and upload the model under our account.

In [4]:
from huggingface_hub import notebook_login

notebook_login()

model.push_to_hub(model_name)

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/andrewdalpino/NoPE-GPT-400M-Base/commit/865a878d37ffdd21d27209f7a4cc3ae9523e5d3f', commit_message='Push model using huggingface_hub.', commit_description='', oid='865a878d37ffdd21d27209f7a4cc3ae9523e5d3f', pr_url=None, repo_url=RepoUrl('https://huggingface.co/andrewdalpino/NoPE-GPT-400M-Base', endpoint='https://huggingface.co', repo_type='model', repo_id='andrewdalpino/NoPE-GPT-400M-Base'), pr_revision=None, pr_num=None)