## Importing the required Modules

In [2]:
!pip install transformers peft datasets torch bitsandbytes accelerate wandb datasets


Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl.metadata (2.9 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl (69.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━

In [3]:
!pip install datasets



## Cleaning the dataset

In [None]:
import json

def clean_jsonl(input_file, output_file):
    cleaned_data = []
    with open(input_file, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, 1):
            try:
                data = json.loads(line)
                if isinstance(data['response'], dict):
                    data['response'] = json.dumps(data['response'])
                cleaned_data.append(data)
            except json.JSONDecodeError:
                print(f"Error in line {line_num}")
                continue

    with open(output_file, 'w', encoding='utf-8') as f:
        for item in cleaned_data:
            f.write(json.dumps(item) + '\n')


#clean_jsonl('finetune_data1.jsonl', 'cleaned_finetune_data.jsonl')

## Fine-tuning Llama 3.2-1B for Robotics

This code fine-tunes a pre-trained Llama 3.2-1B language model for a robotics-related task. It utilizes:

* **LoRA (Low-Rank Adaptation):**  Efficiently fine-tunes the model by adding small, trainable matrices to its layers, reducing memory requirements.
* **4-bit Quantization:** Compresses the model's size and speeds up inference by representing its weights using fewer bits.
* **Hugging Face Transformers and Datasets:** Provides tools for loading, preprocessing, and training the model on a robotics dataset.
* **Weights & Biases (wandb):** Tracks the training progress and logs important metrics.

**Process:**

1. **Imports necessary libraries:** Including `transformers`, `peft`, `datasets`, and `wandb`.
2. **Defines a preprocessing function:** Formats the data for the model by creating instruction-response pairs.
3. **Defines the training function:**
    * Loads the pre-trained Llama 3.2-1B model and tokenizer.
    * Applies 4-bit quantization for memory efficiency.
    * Loads and preprocesses the robotics dataset.
    * Configures LoRA for fine-tuning.
    * Sets up training parameters using `TrainingArguments`.
    * Creates a `Trainer` instance to manage the training process.
    * Fine-tunes the model and saves the results.

In [5]:
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType
)
import logging
import wandb
from huggingface_hub import login

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def preprocess_function(examples):
    texts = [
        f"### Instruction: {instruction}\n### Response: {response}"
        for instruction, response in zip(examples["instruction"], examples["response"])
    ]

    tokenized = tokenizer(
        texts,
        truncation=True,
        max_length=512,
        padding="max_length",
        return_tensors="pt"
    )

    tokenized["labels"] = tokenized["input_ids"].clone()
    return tokenized

def train():
    wandb.init(project="llm-robotics-finetuning")

    global tokenizer
    model_id = "NousResearch/Llama-3.2-1B"

    # Configure 4-bit quantization
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True
    )

    # Load tokenizer with padding token
    tokenizer = AutoTokenizer.from_pretrained(
        model_id,
        padding_side="right",
        model_max_length=512
    )
    tokenizer.pad_token = tokenizer.eos_token

    # Load model with quantization
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        quantization_config=bnb_config,
        device_map="auto",
        torch_dtype=torch.float16
    )

    # Load and preprocess dataset
    dataset = load_dataset("json", data_files="cleaned_finetune_data.jsonl")
    tokenized_dataset = dataset.map(
        preprocess_function,
        batched=True,
        remove_columns=dataset["train"].column_names
    )

    # Configure LoRA
    lora_config = LoraConfig(
        r=64,
        lora_alpha=16,
        lora_dropout=0.1,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
        target_modules=["q_proj", "v_proj", "k_proj", "o_proj"]
    )

    # Prepare model
    model = prepare_model_for_kbit_training(model)
    model = get_peft_model(model, lora_config)

    # Training arguments
    training_args = TrainingArguments(
        output_dir="./robotics_mistral",
        num_train_epochs=3,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=8,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=10,
        max_steps=500,
        optim="paged_adamw_32bit",
        gradient_checkpointing=True,
        gradient_checkpointing_kwargs={"use_reentrant": False},
        save_strategy="steps",
        save_steps=100,
        evaluation_strategy="no",
        warmup_ratio=0.03
    )

    # Data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False
    )

    # Initialize trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset["train"],
        data_collator=data_collator
    )

    # Train and save
    trainer.train()
    trainer.save_model("./robotics_mistral_final")

if __name__ == "__main__":
    train()

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/630 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
10,2.1774
20,1.8776
30,1.6016
40,1.5387
50,1.3956
60,1.3355
70,1.2757
80,1.4256
90,1.1857
100,1.2546


In [None]:
import torch
torch.cuda.empty_cache()

## Logging into Huggingface and creating a repo


In [None]:
from huggingface_hub import create_repo, login
from google.colab import userdata

# Log in to Hugging Face Hub - Replace with your actual token
login(token=userdata.get('huggingface'))

repo_name = "SolomonMartin/robotics-llama-3.2-1b-finetuned"
create_repo(repo_name, private=False)  # Set private=False if you want it public

RepoUrl('https://huggingface.co/SolomonMartin/robotics-llama-3.2-1b-finetuned', endpoint='https://huggingface.co', repo_type='model', repo_id='SolomonMartin/robotics-llama-3.2-1b-finetuned')

## Merging and Pushing Fine-tuned Llama Model

This code snippet focuses on merging the LoRA weights with the base Llama 3.2-1B model and pushing the fine-tuned model and tokenizer to the Hugging Face Hub.

**Process:**

1. **Imports necessary libraries:** Includes `peft` and `transformers`.
2. **Loads the base and LoRA models:**
    * Loads the pre-trained Llama 3.2-1B model.
    * Loads the LoRA weights from the fine-tuning step.
3. **Merges the models:**
    * Combines the base model and LoRA weights into a single, merged model.
    * Unloads the LoRA model to free up memory.
4. **Pushes to Hugging Face Hub:**
    * Uploads the merged model to the specified repository.
    * Uploads the tokenizer to the same repository.

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

# Load base model and LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    "NousResearch/Llama-3.2-1B",
    torch_dtype=torch.float16,
    device_map="auto"
)
peft_model = PeftModel.from_pretrained(
    base_model,
    "./robotics_mistral_final"
)

# Merge weights (optional but recommended)
merged_model = peft_model.merge_and_unload()

# Push to Hub
merged_model.push_to_hub(
    repo_name,
    use_temp_dir=True,
    commit_message="Add fine-tuned robotics LLaMA model"
)

# Push tokenizer
tokenizer.push_to_hub(
    repo_name,
    use_temp_dir=True,
    commit_message="Add tokenizer"
)

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/SolomonMartin/robotics-llama-3.2-1b-finetuned/commit/e50bd26da7fd2459d57eb53a01761222183e977a', commit_message='Add tokenizer', commit_description='', oid='e50bd26da7fd2459d57eb53a01761222183e977a', pr_url=None, repo_url=RepoUrl('https://huggingface.co/SolomonMartin/robotics-llama-3.2-1b-finetuned', endpoint='https://huggingface.co', repo_type='model', repo_id='SolomonMartin/robotics-llama-3.2-1b-finetuned'), pr_revision=None, pr_num=None)