(olmo_peft)=
# QLoRA on OLMo-1B

In this notebook, we show how to fine-tune OLMo-1B with QLoRA.

## Intro to QLoRA

QLoRA is a parameter-efficient fine-tuning approach. It involves loading a quantized model (with 8-bit or 4-bit weights), freezing the weights, and backpropagating the gradients through the frozen model into low-rank adapters (LoRA).

For more, check out the [GitHub Repo](https://github.com/artidoro/qlora?tab=readme-ov-file), which also links to the QLoRA paper and other resources.

## Fine-tuning task

In this example, we're going to try to fine-tune OLMo 1b on a task many models struggle with: changing the letter e to 3 in arbitrary input text. We'll use the wikitext dataset to do this, and we will process each entry such that it combines a "normal" input with an output with the letter e replaced with 3.

## Setup

First, we'll install the required dependencies.


In [None]:
%pip install -r ./olmo_peft_requirements.txt

Next, we set up the [`LoraConfig`](https://huggingface.co/docs/peft/en/package_reference/lora#peft.LoraConfig). This config specifies which layers we apply the LoRA adapters to (`target_modules`); the rank and scaling factor for the adapters (`r` and `lora_alpha`); and the dropout rate (`lora_dropout`).

In [None]:
from peft import LoraConfig, TaskType

lora_config = LoraConfig(
    r=32,
    target_modules=["att_proj", "ff_proj"],
    task_type=TaskType.CAUSAL_LM,
    lora_alpha=16,
    lora_dropout=0.05
)

2024-04-10 14:04:56.582847: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-10 14:04:56.582919: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-10 14:04:56.582943: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-10 14:04:56.589971: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Unexpected internal error when monkey patching `Tr

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import hf_olmo
from hf_olmo import *

tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1B")

In [None]:

model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1B",
                                             trust_remote_code=True,
                                             cache_dir = "/Volumes/daniel_liden/fine_tuning/assets",
                                             device_map="auto",
                                             load_in_8bit=True)

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Some weights of OLMoForCausalLM were not initialized from the model checkpoint at allenai/OLMo-1B and are newly initialized: ['model.transformer.ff_out.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
model.add_adapter(lora_config)

In [None]:
from datasets import Dataset, DatasetDict, load_dataset

# Load the WikiText-2 dataset
wikitext = load_dataset("wikitext", "wikitext-2-raw-v1")

# Tokenize the dataset
def tokenize_function(example):
    # Split the example into individual lines
    lines = example["text"].split("\n")
    
    # Remove empty lines and lines starting with ' ='
    filtered_lines = [line for line in lines if line.strip() and not line.startswith(' =')]
    
    # Join the filtered lines back into a single string
    text = "\n".join(filtered_lines)
    
    input_text = "Replace all es or Es with 3s in the following text.\n\n### Input:\n" + text + "\n\n### Output:\n"
    output_text = text.replace("e", "3").replace("E", "3") # + "<|endoftext|>"
    
    return tokenizer(input_text + output_text, padding=True, truncation=True, max_length=512)

# Tokenize the train and validation splits
tokenized_train = wikitext["train"].map(tokenize_function, num_proc=4, remove_columns=["text"])
tokenized_validation = wikitext["validation"].map(tokenize_function, num_proc=4, remove_columns=["text"])

# Shuffle the datasets
tokenized_train = tokenized_train.shuffle(seed=42)
tokenized_validation = tokenized_validation.shuffle(seed=42)

# Select the desired number of examples
train_dataset = tokenized_train.select(range(8000))
eval_dataset = tokenized_validation.select(range(2000))

# Create a DatasetDict with the selected subsets
dataset_dict = DatasetDict({
    "train": train_dataset,
    "eval": eval_dataset
})



Map (num_proc=4):   0%|          | 0/36718 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/3760 [00:00<?, ? examples/s]

In [None]:
dataset_dict['train']

In [None]:
from transformers import DataCollatorForLanguageModeling, TrainingArguments, Trainer

# Define the data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Define the training arguments
training_args = TrainingArguments(
    output_dir="/Volumes/daniel_liden/fine_tuning/assets",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    learning_rate=1e-4,
    weight_decay=0.01,
    logging_steps=1,
    save_steps=250,
    save_total_limit=3,
    evaluation_strategy="steps",
    eval_steps=50,
)

# Create the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset_dict['train'],
    eval_dataset=dataset_dict['eval'],
    data_collator=data_collator,
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)


In [None]:
import mlflow

# Start training and track with MLflow
with mlflow.start_run(log_system_metrics=True):
    trainer.evaluate() # eval before starting tuning
    trainer.train()
    mlflow.log_params(training_args.to_dict())

trainer.save_model("/Volumes/daniel_liden/fine_tuning/assets" + "/final")

2024/04/10 14:10:23 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.




Step,Training Loss,Validation Loss
50,1.2524,1.297852
100,0.97,1.192383
150,1.1025,1.148438
200,1.0403,1.120117
250,1.1328,1.105469
300,1.1266,1.09375
350,1.0475,1.083984




In [None]:
final_model = AutoModelForCausalLM.from_pretrained("/Volumes/daniel_liden/fine_tuning/assets/final/", device_map="auto", load_in_8bit=True)

prompt_template = """"Replace all es or Es with 3s in the following text.\n\n### Input:\n {prompt} \n\n### Output:\n"""

from mlflow.models import infer_signature

# Define the prompt template
prompt_template = "Replace all es or Es with 3s in the following text.\n\n### Input:\n{prompt}\n\n### Output:\n"

# Define the sample input/output
sample_input = "The quick brown fox jumps over the lazy dog.\nElephants are the largest land mammals on Earth.\nThe Earth revolves around the Sun, which is a star."
sample_output = prompt_template.format(prompt=sample_input) + "Th3 quick brown fox jumps ov3r th3 lazy dog.\n3l3phants ar3 th3 larg3st land mammals on 3arth.\nTh3 3arth r3volv3s around th3 Sun, which is a star.<|endoftext|>"

# Define the sample parameters
sample_params = {
    "max_new_tokens": 512,
    "repetition_penalty": 1.1,
}

# MLflow infers schema from the provided sample input/output/params
signature = infer_signature(
    model_input=sample_input,
    model_output=sample_output,
    params=sample_params,
)

In [None]:
# Get the ID of the MLflow Run that was automatically created above
last_run_id = mlflow.last_active_run().info.run_id

with mlflow.start_run(run_id=last_run_id):
    mlflow.log_params(lora_config.to_dict())
    mlflow.transformers.log_model(
        transformers_model={"model": final_model, "tokenizer": tokenizer},
        signature=signature,
        artifact_path="model",  # This is a relative path to save model files within MLflow run
    )

In [None]:
import hf_olmo
from transformers import AutoModelForCausalLM

In [None]:
peft_model = AutoModelForCausalLM.from_pretrained("/Volumes/daniel_liden/datasets/h2o_rag/output/checkpoint-500/", load_in_8bit=True,
                                                  device_map="auto")

In [None]:
peft_model

In [None]:
def generate(input_text, max_new_tokens=100):
    # Create the prompt template
    prompt_template = "Replace all es or Es with 3s in the following text.\n\n### Input:\n{input_text}\n\n### Output:\n"
    
    # Format the prompt with the input text
    formatted_prompt = prompt_template.format(input_text=input_text)
    
    # Tokenize the formatted prompt
    input_ids = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(peft_model.device)
    
    # Generate the output using the trained model
    gen_tokens = peft_model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        eos_token_id=tokenizer.eos_token_id,
        repetition_penalty=1.1,
    )
    
    # Decode the generated output
    generated_text = tokenizer.batch_decode(gen_tokens, skip_special_tokens=False)[0]
    
    # Extract the generated output after "### Output:"
    generated_output = generated_text.split("### Output:")[-1].strip()
    
    return generated_output

# Example usage
example_text = """
The quick brown fox jumps over the lazy dog.
"""

# Generate the output using the trained model
generated_output = generate(example_text)

print("Generated Output:")
print(generated_output)

In [None]:
generate("""Tear the emblems from our sleeves just like the others / Apostate Brothers please stay for the dawn of a new day / Watch the sun come up from the mud / Our cups are empty / Our wine has turned to ether that's good and fine""")