# 🔬 Fine-Tuning Llama 3.1 8B on PubMedQA 🩺

This notebook outlines the process of fine-tuning the Meta Llama 3.1 8B model on the PubMedQA dataset for medical question answering. The goal is to leverage the power of large language models for specialized domains.

**Crafted by:** David E. Girges (for exploratory purposes)

**Key Resources:**

  * Fine-Tuning Lama3.1 instruct Template steps from GEMINI 2.5 Pro
  * [PubMedQA Dataset on Hugging Face](https://huggingface.co/datasets/qiaojin/PubMedQA)
  * [Unsloth GitHub Repository](https://github.com/unslothai/unsloth)

-----

## 1\. Install Dependencies



In [2]:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

-----

## 2\. Initialize Model and Tokenizer with Unsloth


In [3]:
from unsloth import FastLanguageModel
import torch

# Configuration for model loading
max_seq_length = 2048
dtype = None
load_in_4bit = True

# List of Unsloth's 4-bit pre-quantized models (for reference)
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit",
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",
]

# Load the Llama 3.1 8B Instruct model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", # Using the 4bit instruct model
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)
print("✅ Model and tokenizer loaded.")

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.8: Fast Llama patching. Transformers: 4.52.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.5k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

✅ Model and tokenizer loaded.


-----

## 3\. Configure Parameter-Efficient Fine-Tuning (PEFT) with LoRA



In [4]:
# Configure the model for PEFT using LoRA
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",

    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)
print("✅ PEFT LoRA configuration applied.")

Unsloth 2025.5.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


✅ PEFT LoRA configuration applied.


-----

## 4\. Load and Inspect the PubMedQA Dataset



In [5]:
# Import necessary libraries for dataset handling
from datasets import load_dataset

# Load the 'pqa_labeled' configuration of the PubMedQA dataset (train split)
dataset_name = "qiaojin/PubMedQA"
dataset_config = "pqa_labeled"
dataset_split = "train"

dataset = load_dataset(dataset_name, dataset_config, split=dataset_split)

# Display the first 10 raw examples from the dataset
print(f"🔍 First 10 raw examples from {dataset_name} ({dataset_config} config, {dataset_split} split):\n")
for i in range(10):
    print(f"Example {i+1}")
    print("Question:", dataset[i]["question"])
    print("Long Answer:", dataset[i]["long_answer"])
    print("="*50)

README.md:   0%|          | 0.00/5.19k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

🔍 First 10 raw examples from qiaojin/PubMedQA (pqa_labeled config, train split):

Example 1
Question: Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?
Long Answer: Results depicted mitochondrial dynamics in vivo as PCD progresses within the lace plant, and highlight the correlation of this organelle with other organelles during developmental PCD. To the best of our knowledge, this is the first report of mitochondria and chloroplasts moving on transvacuolar strands to form a ring structure surrounding the nucleus during developmental PCD. Also, for the first time, we have shown the feasibility for the use of CsA in a whole plant system. Overall, our findings implicate the mitochondria as playing a critical and early role in developmentally regulated PCD in the lace plant.
Example 2
Question: Landolt C and snellen e acuity: differences in strabismus amblyopia?
Long Answer: Using the charts described, there was only a slight overestimation of visu

-----

## 5\. Prepare Data for Llama 3.1 Chat Format



In [6]:
from transformers import AutoTokenizer # Though tokenizer is already loaded, this is good practice for clarity
from unsloth.chat_templates import get_chat_template


tokenizer = get_chat_template(
    tokenizer,
    chat_template="llama-3.1",
)
print("🗣️ Llama 3.1 chat template applied to tokenizer.")

# Define the formatting function
def formatting_prompts_func(examples):
    texts = []
    convos = []
    for question, answer in zip(examples['question'], examples['long_answer']):
        # Create a conversation structure for Llama 3.1
        convo = [
            {'role': 'user', 'content': question},
            {'role': 'assistant', 'content': answer}
        ]
        convos.append(convo)

        # Apply the chat template to format the conversation
        formatted_text = tokenizer.apply_chat_template(
            convo,
            tokenize=False,
            add_generation_prompt=False # We are providing the full conversation
        )
        texts.append(formatted_text)
    return {'text': texts, 'conversations': convos}

# Apply the formatting function to the dataset
formatted_dataset = dataset.map(formatting_prompts_func, batched=True)

# Display the first 10 formatted examples
print("\n📜 First 10 formatted examples for Llama 3.1:\n")
for i in range(10):
    print(f"Example {i+1}")
    print("Conversation:", formatted_dataset[i]["conversations"])
    print("Formatted Text:\n", formatted_dataset[i]["text"])
    print("="*50)

🗣️ Llama 3.1 chat template applied to tokenizer.


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]


📜 First 10 formatted examples for Llama 3.1:

Example 1
Conversation: [{'content': 'Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?', 'role': 'user'}, {'content': 'Results depicted mitochondrial dynamics in vivo as PCD progresses within the lace plant, and highlight the correlation of this organelle with other organelles during developmental PCD. To the best of our knowledge, this is the first report of mitochondria and chloroplasts moving on transvacuolar strands to form a ring structure surrounding the nucleus during developmental PCD. Also, for the first time, we have shown the feasibility for the use of CsA in a whole plant system. Overall, our findings implicate the mitochondria as playing a critical and early role in developmentally regulated PCD in the lace plant.', 'role': 'assistant'}]
Formatted Text:
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id

-----

## 6\. Tokenize the Formatted Dataset



In [9]:
# Tokenize the 'text' field of the formatted dataset
tokenized_dataset = formatted_dataset.map(
    lambda examples: tokenizer(examples["text"]),
    batched=True,
    remove_columns=formatted_dataset.column_names # Optional: remove other columns to save memory
)

print("✅ Dataset tokenized.")
print("\nSample of tokenized dataset (first example):")
print(tokenized_dataset[0].keys())
print("Input IDs (sample):", tokenized_dataset[0]['input_ids'][:20])

✅ Dataset tokenized.

Sample of tokenized dataset (first example):
dict_keys(['input_ids', 'attention_mask'])
Input IDs (sample): [128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220]


-----

## 7\. Set Up the SFTTrainer



In [10]:
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from transformers import TrainingArguments
# from unsloth.chat_templates import train_on_responses_only # Unsloth utility, alternative for collator

INSTRUCTION_TEMPLATE_START = "<|start_header_id|>user<|end_header_id|>\n\n"
RESPONSE_TEMPLATE_START = "<|start_header_id|>assistant<|end_header_id|>\n\n"

# Set up TrainingArguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=100,
    learning_rate=2e-5,
    fp16=True,
    bf16=False,
    logging_steps=1,
    output_dir="./llama3_pubmedqa_model",
    optim="adamw_8bit",
    seed=3407,
)

# Set up the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForCompletionOnlyLM(
        instruction_template=INSTRUCTION_TEMPLATE_START,
        response_template=RESPONSE_TEMPLATE_START,
        tokenizer=tokenizer,
    ),
)
print("⚙️ SFTTrainer configured.")


⚙️ SFTTrainer configured.


-----

## 8\. Sanity Check: Inputs and Labels


In [11]:
print("🕵️ Sanity Check: Inspecting a sample from the training dataset...")

# Get the first example from the processed train dataset
sample_idx = 0
if hasattr(trainer, "train_dataset") and trainer.train_dataset is not None and len(trainer.train_dataset) > sample_idx:
    first_example = trainer.train_dataset[sample_idx]

    print("\nSample Input IDs (first 100 tokens):\n", first_example.get("input_ids", [])[:100])
    print("\nSample Decoded Input:\n", tokenizer.decode(first_example.get("input_ids", [])))

    # Decode labels, replacing -100 with a space for readability (or a pad token if available)
    space_token_id = tokenizer(" ", add_special_tokens=False).input_ids[0]
    labels_to_decode = [space_token_id if x == -100 else x for x in first_example.get("labels", [])]
    print("\nSample Decoded Labels (what the model learns to predict, -100s replaced):\n", tokenizer.decode(labels_to_decode))
else:
    print("Could not retrieve a sample from trainer.train_dataset. Please check dataset preparation steps.")

🕵️ Sanity Check: Inspecting a sample from the training dataset...

Sample Input IDs (first 100 tokens):
 [128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 1627, 5887, 220, 2366, 19, 271, 128009, 128006, 882, 128007, 271, 5519, 55042, 4298, 1514, 264, 3560, 304, 1323, 347, 6427, 46793, 6136, 11141, 2391, 56168, 2849, 4648, 30, 128009, 128006, 78191, 128007, 271, 10001, 44894, 72061, 30295, 304, 41294, 439, 393, 6620, 68711, 2949, 279, 46793, 6136, 11, 323, 11415, 279, 26670, 315, 420, 1262, 2444, 273, 449, 1023, 1262, 2444, 645, 2391, 48006, 393, 6620, 13, 2057, 279, 1888, 315, 1057, 6677, 11, 420, 374, 279, 1176, 1934]

Sample Decoded Input:
 <|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Do mitochondria play a role in remodelling lace plant leaves during programmed cell

-----

## 9\. Check GPU Memory Before Training



In [12]:
import torch # Ensure torch is imported

if torch.cuda.is_available():
    gpu_stats = torch.cuda.get_device_properties(0)
    start_gpu_memory_gb = round(torch.cuda.memory_reserved(0) / 1024**3, 3)
    max_memory_gb = round(gpu_stats.total_memory / 1024**3, 3)

    print(f"GPU Information:")
    print(f"  Name:          {gpu_stats.name}")
    print(f"  Total Memory:  {max_memory_gb} GB")
    print(f"  Reserved Memory (before training): {start_gpu_memory_gb} GB")
else:
    print("⚠️ CUDA (GPU) is not available. Training will be very slow on CPU.")
    start_gpu_memory_gb = 0
    max_memory_gb = 0

GPU Information:
  Name:          Tesla T4
  Total Memory:  14.741 GB
  Reserved Memory (before training): 5.516 GB


-----

## 10\. Start Fine-Tuning



In [13]:
print("🚀 Starting model training...")
trainer_stats = trainer.train()
print("🎉 Training completed!")

🚀 Starting model training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mdavedelks1122334455[0m ([33mdavedelks1122334455-stem-egypt[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.7203
2,2.8199
3,2.6284
4,2.8804
5,2.7979
6,2.638
7,2.7251
8,2.5355
9,2.6062
10,2.6268


🎉 Training completed!


-----

## 11\. Review Training Metrics and GPU Memory Usage



In [14]:
if torch.cuda.is_available():


    used_memory_gb = round(torch.cuda.max_memory_reserved(0) / 1024**3, 3)
    used_memory_for_lora_gb = round(used_memory_gb - start_gpu_memory_gb, 3)
    used_percentage = round(used_memory_gb / max_memory_gb * 100, 3) if max_memory_gb > 0 else 0
    lora_percentage = round(used_memory_for_lora_gb / max_memory_gb * 100, 3) if max_memory_gb > 0 else 0

    print("\n📊 Training Performance & GPU Usage:")
    if hasattr(trainer_stats, 'metrics'):
        train_runtime = trainer_stats.metrics.get('train_runtime', 0)
        print(f"  Training time: {train_runtime:.2f} seconds")
        print(f"  Training time: {train_runtime/60:.2f} minutes")

    else:
        print("  Trainer stats metrics not available.")

    print(f"  Peak memory used during training: {used_memory_gb} GB")
    print(f"  Memory used specifically for LoRA training (approx): {used_memory_for_lora_gb} GB")
    print(f"  Peak GPU memory usage: {used_percentage}% of total GPU memory")
    print(f"  LoRA training memory usage: {lora_percentage}% of total GPU memory")
else:
    print("GPU not used for training. Memory metrics are for CUDA devices.")
    if hasattr(trainer_stats, 'metrics'):
        train_runtime = trainer_stats.metrics.get('train_runtime', 0)
        print(f"  Training time: {train_runtime:.2f} seconds ({train_runtime/60:.2f} minutes)")




📊 Training Performance & GPU Usage:
  Training time: 816.08 seconds
  Training time: 13.60 minutes
  Peak memory used during training: 6.883 GB
  Memory used specifically for LoRA training (approx): 1.367 GB
  Peak GPU memory usage: 46.693% of total GPU memory
  LoRA training memory usage: 9.273% of total GPU memory


-----

## 12\. Verify Checkpoint Contents



In [15]:
import os

checkpoint_step = training_args.max_steps
checkpoint_path = os.path.join(training_args.output_dir, f"checkpoint-{checkpoint_step}")

print(f"🔎 Verifying contents of checkpoint directory: '{checkpoint_path}'")

if os.path.exists(checkpoint_path) and os.path.isdir(checkpoint_path):
    print(f"\nContents of '{checkpoint_path}':")
    for item in sorted(os.listdir(checkpoint_path)):
        item_path = os.path.join(checkpoint_path, item)
        item_type = "📄 File" if os.path.isfile(item_path) else "📁 Directory"
        print(f"  - {item} ({item_type})")

    # Specific check for LoRA adapter files
    adapter_config_file = os.path.join(checkpoint_path, "adapter_config.json")
    adapter_model_file = os.path.join(checkpoint_path, "adapter_model.safetensors") # Or .bin

    if os.path.exists(adapter_config_file):
        print(f"\n✅ Found 'adapter_config.json'.")
    else:
        print(f"\n⚠️ 'adapter_config.json' NOT found in checkpoint. This is essential for LoRA.")

    if os.path.exists(adapter_model_file):
        print(f"✅ Found 'adapter_model.safetensors' (or .bin).")
    else:
        # Check for .bin as an alternative name
        adapter_model_file_bin = os.path.join(checkpoint_path, "adapter_model.bin")
        if os.path.exists(adapter_model_file_bin):
             print(f"✅ Found 'adapter_model.bin'.")
        else:
            print(f"\n⚠️ 'adapter_model.safetensors' (or .bin) NOT found. This contains LoRA weights.")

else:
    print(f"\n❌ Directory '{checkpoint_path}' does NOT exist or is not a directory!")
    print("Please check your `TrainingArguments` (output_dir, save_steps) and training completion.")

🔎 Verifying contents of checkpoint directory: './llama3_pubmedqa_model/checkpoint-100'

Contents of './llama3_pubmedqa_model/checkpoint-100':
  - README.md (📄 File)
  - adapter_config.json (📄 File)
  - adapter_model.safetensors (📄 File)
  - chat_template.jinja (📄 File)
  - optimizer.pt (📄 File)
  - rng_state.pth (📄 File)
  - scaler.pt (📄 File)
  - scheduler.pt (📄 File)
  - special_tokens_map.json (📄 File)
  - tokenizer.json (📄 File)
  - tokenizer_config.json (📄 File)
  - trainer_state.json (📄 File)
  - training_args.bin (📄 File)

✅ Found 'adapter_config.json'.
✅ Found 'adapter_model.safetensors' (or .bin).


-----

## 13\. Inference: Load Fine-Tuned Model from Checkpoint



In [16]:
%%capture

from unsloth import FastLanguageModel
import torch
from transformers import TextStreamer
from datasets import load_dataset
from unsloth.chat_templates import get_chat_template




fine_tuned_model_path = checkpoint_path
print(f"⏳ Attempting to load fine-tuned model from: {fine_tuned_model_path}")


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = fine_tuned_model_path, # Point to your checkpoint directory
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
model.eval()

# Ensure the Llama 3.1 chat template is correctly applied after loading.
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)
print(f"✅ Fine-tuned model and tokenizer loaded successfully from: {fine_tuned_model_path}")

-----

## 14\. Inference: Prepare a Sample Prompt



In [22]:
# Select a sample question for inference
sample_index = 15
if len(dataset) > sample_index:
    test_question = dataset[sample_index]["question"]
    true_long_answer = dataset[sample_index]["long_answer"] # For reference

    print(f"\n📝 Test Question (from dataset index {sample_index}):")
    print(test_question)
    print(f"\n💡 Reference Answer (True Long Answer):")
    print(true_long_answer)
else:
    print(f"⚠️ Sample index {sample_index} is out of bounds for the dataset.")
    test_question = "What are the treatments for hypertension?" # Fallback question
    print(f"\n📝 Using fallback test question: {test_question}")


# Format the prompt using the Llama 3.1 chat template for a user query
messages_for_inference = [
    {"role": "user", "content": test_question},
]


formatted_prompt_for_inference = tokenizer.apply_chat_template(
    messages_for_inference,
    tokenize=False,
    add_generation_prompt=True
)
print("\n💬 Formatted Prompt for Llama 3.1 (for inference):\n", formatted_prompt_for_inference)


📝 Test Question (from dataset index 15):
Patient-Controlled Therapy of Breathlessness in Palliative Care: A New Therapeutic Concept for Opioid Administration?

💡 Reference Answer (True Long Answer):
Opioid PCT is a feasible and acceptable therapeutic method to reduce refractory breathlessness in palliative care patients.

💬 Formatted Prompt for Llama 3.1 (for inference):
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Patient-Controlled Therapy of Breathlessness in Palliative Care: A New Therapeutic Concept for Opioid Administration?<|eot_id|><|start_header_id|>assistant<|end_header_id|>




-----

## 15\. Inference: Generate and Stream Response



In [23]:
# Tokenize the formatted prompt for inference
inputs = tokenizer([formatted_prompt_for_inference], return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")

# Setup TextStreamer for displaying the output token by token
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

print("\n🤖 Model Generating Response (streaming):")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        streamer=streamer,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.6,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
    )
print("\n✅ Generation complete (streaming finished or full output below).")


output_text_full = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\n📜 Full Decoded Output (after streaming, if needed for variable assignment):")
print(output_text_full)


🤖 Model Generating Response (streaming):
The results of the present study suggest that PCT is a feasible and safe method for the management of breathlessness in palliative care. It is also a promising therapeutic concept for opioid administration in patients with refractory breathlessness.

✅ Generation complete (streaming finished or full output below).

📜 Full Decoded Output (after streaming, if needed for variable assignment):
The results of the present study suggest that PCT is a feasible and safe method for the management of breathlessness in palliative care. It is also a promising therapeutic concept for opioid administration in patients with refractory breathlessness.


-----

## 16\. Hugging Face Hub: Login



In [27]:
!pip install huggingface_hub

from huggingface_hub import login

# This will prompt you to enter your Hugging Face Hub token
print("🔑 Please log in to Hugging Face Hub.")
login()
print("✅ Logged in to Hugging Face Hub (if token was valid).")

🔑 Please log in to Hugging Face Hub.


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

✅ Logged in to Hugging Face Hub (if token was valid).


-----

## 17\. Hugging Face Hub: Push LoRA Adapters


In [30]:
# Define your Hugging Face username and the desired model name on the Hub
hf_username = "DavidElks"
lora_model_name_on_hub = "LamaFineTuned"


print(f"🚀 Pushing LoRA adapters to Hugging Face Hub: {hf_username}/{lora_model_name_on_hub}")
try:

    model.push_to_hub_merged(
        repo_id=f"{hf_username}/{lora_model_name_on_hub}",
        tokenizer=tokenizer,
        save_method="lora",
        # token="YOUR_HF_TOKEN_IF_NOT_LOGGED_IN_VIA_CLI", # Add if you didn't use login() or login() failed
        commit_message="Upload fine-tuned Llama3.1 8B LoRA adapters for PubMedQA"
    )
    print(f"✅ Successfully pushed LoRA adapters to: https://huggingface.co/{hf_username}/{lora_model_name_on_hub}")
except Exception as e:
    print(f"❌ Error pushing LoRA adapters: {e}")
    print("⚠️ Make sure you are logged in via `huggingface_hub.login()` or have provided a valid token.")
    print("⚠️ Also ensure your repository name is valid (e.g., 'username/modelname') and the repo doesn't exist with incompatible content if you're not creating it new.")
    print("⚠️ Check that the `model` variable is indeed the Unsloth PEFT model.")

🚀 Pushing LoRA adapters to Hugging Face Hub: DavidElks/LamaFineTuned
Unsloth: Saving LoRA adapters. Please wait...


README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/DavidElks/LamaFineTuned
✅ Successfully pushed LoRA adapters to: https://huggingface.co/DavidElks/LamaFineTuned


-----

This reorganized structure should make your notebook easier to follow and understand. Each step is clearly delineated with its purpose.

## I have created another colab notebook for merging and using the trained LORA and the base model. Kidly check it from the link below
[Google Colab](https://colab.research.google.com/drive/17KcinhPLGg3b54djjru87jXiEtF66qyf?usp=sharing)*italicised text*