# Lab 7 – Reproducible Packaging and Release (Local Version)
This notebook is a local-ready adaptation. Before running, make sure to install dependencies using `pip install -r requirements.txt`.
All output directories are set to `./slm-labs/`.

# Lab 7 Reproducible Packaging and Release
This is the final step in our Small Language Model lab series. Here we package our tuned model into a reusable, sharable form. The goal is to save adapters, tokenizer, and metadata in a clean structure and optionally push to Hugging Face Hub.

## Step 0 Stable installs

In [1]:
%pip install -q --force-reinstall numpy==2.0.2 pandas==2.2.2 pyarrow==17.0.0
%pip install -q datasets>=3.0.0 transformers>=4.41.0 peft>=0.11.0 accelerate>=0.29.0 sentencepiece>=0.1.99 tqdm>=4.66.0 bitsandbytes
print("If imports fail, restart runtime and re-run this cell.")

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.9/60.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.2/19.2 MB[0m [31m65.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m79.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.9/229.9 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.2/509.2 kB[0m [31m35.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m347.8/347.8 kB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m
[?25hIf imports fail, restart runtime and re-run this cell.


## Step 1 Auto detect best adapters

## Step 2 Reload base model and adapters

In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = 'HuggingFaceH4/zephyr-7b-beta'
kw={}
if torch.cuda.is_available():
    try:
        kw=dict(device_map='auto', quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True), torch_dtype=torch.float16)
    except Exception:
        kw=dict(torch_dtype=torch.float16)
else:
    kw=dict(torch_dtype=torch.float32)

Tok = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
if Tok.pad_token is None:
    Tok.pad_token = Tok.eos_token

Base = AutoModelForCausalLM.from_pretrained(BASE_MODEL, **kw)
Tuned = PeftModel.from_pretrained(Base, str(BEST_DIR))
Tuned.eval()
print("Model with adapters ready.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Model with adapters ready.


## Step 3 Save adapters and tokenizer

In [4]:
REL_DIR = BASE / 'lab7_release'
REL_DIR.mkdir(parents=True, exist_ok=True)

Tok.save_pretrained(str(REL_DIR))
Tuned.save_pretrained(str(REL_DIR))
print("Saved adapters and tokenizer to", REL_DIR)

Saved adapters and tokenizer to /content/drive/MyDrive/slm-labs/lab7_release


## Step 4 Write a model card

In [5]:
card = REL_DIR / 'README.md'
with open(card,'w') as f:
    f.write("# Domain Tuned Small Language Model\n")
    f.write("This model was fine tuned with LoRA adapters as part of the Lab 1–7 SLM training series.\n")
    f.write("\n")
    f.write("## Base Model\n")
    f.write("HuggingFaceH4/zephyr-7b-beta\n")
    f.write("\n")
    f.write("## Training Data\n")
    f.write("Domain text from ncbi/Open-Patients, prepared in Lab 3.\n")
    f.write("\n")
    f.write("## Method\n")
    f.write("LoRA fine tuning with Unsloth, adapters attached in Lab 4, optimized in Lab 5, and evaluated in Lab 6.\n")
    f.write("\n")
    f.write("## Intended Use\n")
    f.write("For experimentation and research. Not for clinical or production use without further validation.\n")
print("Model card saved to", card)

Model card saved to /content/drive/MyDrive/slm-labs/lab7_release/README.md


## Step 5 Optional push to Hugging Face Hub

In [6]:
# To push to Hugging Face Hub, first log in with your token:
# from huggingface_hub import login
# login(token='hf_your_token_here')

# Then run:
# from transformers import AutoModelForCausalLM, AutoTokenizer
# Tok.push_to_hub('your-username/your-model-name')
# Tuned.push_to_hub('your-username/your-model-name')
# print("Pushed to Hugging Face Hub.")