
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>


# Low-Rank Adaption (LoRA)
This lab introduces how to apply low-rank adaptation (LoRA) to your model of choice using [Parameter-Efficient Fine-Tuning (PEFT) library developed by Hugging Face](https://huggingface.co/docs/peft/index). 


### ![Dolly](https://files.training.databricks.com/images/llm/dolly_small.png) Learning Objectives
1. Apply LoRA to a model
1. Fine-tune on your provided dataset
1. Save your model
1. Conduct inference using the fine-tuned model

In [0]:
%pip install peft==0.4.0

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
Collecting peft==0.4.0
  Downloading peft-0.4.0-py3-none-any.whl (72 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.9/72.9 kB 2.1 MB/s eta 0:00:00
Installing collected packages: peft
Successfully installed peft-0.4.0
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../Includes/Classroom-Setup

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


Resetting the learning environment:
| enumerating serving endpoints...found 7...(0 seconds)
| No action taken

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/llm-foundation-models/v01-raw"

Validating the locally installed datasets:
| listing local files...(4 seconds)
| removing extra path: /datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/...(0 seconds)
| removing extra path: /datasets/downloads/...(0 seconds)
| removing extra file: /datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96.incomplete_info.lock...(0 seconds)
| removing extra file: /datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96_builder.lock...(0 seconds)
| removing extra file: /datasets/_dbfs_mnt_dbacademy-datasets_llm-foundation-models_v01-raw_

Importing lab testing framework.



Using the "default" schema.

Predefined paths variables:
| DA.paths.working_dir: /dbfs/mnt/dbacademy-users/labuser5958453@vocareum.com/llm-foundation-models
| DA.paths.user_db:     dbfs:/mnt/dbacademy-users/labuser5958453@vocareum.com/llm-foundation-models/database.db
| DA.paths.datasets:    /dbfs/mnt/dbacademy-datasets/llm-foundation-models/v01-raw

Setup completed (17 seconds)

The models developed or used in this course are for demonstration and learning purposes only.
Models may occasionally output offensive, inaccurate, biased information, or harmful instructions.


We will re-use the same dataset and model from the demo notebook. 

In [0]:
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bigscience/bloomz-560m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
foundation_model = AutoModelForCausalLM.from_pretrained(model_name)

data = load_dataset("Abirate/english_quotes", cache_dir=DA.paths.datasets+"/datasets")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample = data["train"].select(range(50))
display(train_sample) 

Downloading tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]



Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]



Downloading and preparing dataset json/Abirate--english_quotes to /local_disk0/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /local_disk0/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

Dataset({
    features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
    num_rows: 50
})

## Define LoRA configurations

By using LoRA, you are unfreezing the attention `Weight_delta` matrix and only updating `W_a` and `W_b`. 

<img src="https://files.training.databricks.com/images/llm/lora.png" width=300>

You can treat `r` (rank) as a hyperparameter. Recall from the lecture that, LoRA can perform well with very small ranks based on [Hu et a 2021's paper](https://arxiv.org/abs/2106.09685). GPT-3's validation accuracies across tasks with ranks from 1 to 64 are quite similar. From [PyTorch Lightning's documentation](https://lightning.ai/pages/community/article/lora-llm/):

> A smaller r leads to a simpler low-rank matrix, which results in fewer parameters to learn during adaptation. This can lead to faster training and potentially reduced computational requirements. However, with a smaller r, the capacity of the low-rank matrix to capture task-specific information decreases. This may result in lower adaptation quality, and the model might not perform as well on the new task compared to a higher r.

Other arguments:
- `lora_dropout`: 
  - Dropout is a regularization method that reduces overfitting by randomly and temporarily removing nodes during training. 
  - It works like this: <br>
    * Apply to most type of layers (e.g. fully connected, convolutional, recurrent) and larger networks
    * Temporarily and randomly remove nodes and their connections during each training cycle
    ![](https://files.training.databricks.com/images/nn_dropout.png)
    * See the original paper here: <a href="http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf" target="_blank">Dropout: A Simple Way to Prevent Neural Networks from Overfitting</a>
- `target_modules`:
  - Specifies the module names to apply to 
  - This is dependent on how the foundation model names its attention weight matrices. 
  - Typically, this can be:
    - `query`, `q`, `q_proj` 
    - `key`, `k`, `k_proj` 
    - `value`, `v` , `v_proj` 
    - `query_key_value` 
  - The easiest way to inspect the module/layer names is to print the model, like we are doing below.

### Question 1

Fill in `r=1` and `target_modules`. 

Note:
- For `r`, any number is valid. The smaller the r is, the fewer parameters there are to update during the fine-tuning process.

Hint: 
- For `target_modules`, what's the name of the **first** module within each `BloomBlock`'s `self_attention`? 

Read more about [`LoraConfig` here](https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft).

In [0]:
# TODO
import peft
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=1,
    lora_alpha=1, # a scaling factor that adjusts the magnitude of the weight matrix. Usually set to 1
    target_modules=["query_key_value"],
    lora_dropout=0.05, 
    bias="none", # this specifies if the bias parameter should be trained. 
    task_type="CAUSAL_LM"
)

In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_1(lora_config.r, lora_config.target_modules)

[32mPASSED[0m: All tests passed for lesson2, question1
[32mRESULTS RECORDED[0m: Click `Submit` when all questions are completed to log the results.


###  Question 2

Add the adapter layers to the foundation model to be trained

In [0]:
help(get_peft_model)

Help on function get_peft_model in module peft.mapping:

get_peft_model(model: 'PreTrainedModel', peft_config: 'PeftConfig', adapter_name: 'str' = 'default') -> 'PeftModel'
    Returns a Peft model object from a model and a config.
    
    Args:
        model ([`transformers.PreTrainedModel`]): Model to be wrapped.
        peft_config ([`PeftConfig`]): Configuration object containing the parameters of the Peft model.



In [0]:
# TODO
peft_model = get_peft_model(foundation_model, lora_config)
print(peft_model.print_trainable_parameters())

trainable params: 98,304 || all params: 559,312,896 || trainable%: 0.01757585078102687
None


In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_2(peft_model)

[32mPASSED[0m: All tests passed for lesson2, question2
[32mRESULTS RECORDED[0m: Click `Submit` when all questions are completed to log the results.


## Define `Trainer` class for fine-tuning

### Question 3 

Fill out the `Trainer` class. Feel free to tweak the `training_args` we provided, but remember that lowering the learning rate and increasing the number of epochs will increase training time significantly. If you change none of the defaults we set below, it could take ~15 mins to fine-tune.

In [0]:
# TODO
import transformers
from transformers import TrainingArguments, Trainer
import os
import mlflow

# Tell MLflow Tracking to use this explicit experiment path,
# which is located on the left hand sidebar under Machine Learning -> Experiments 
mlflow.set_experiment(f"/Users/{DA.username}/LLM 02L - LoRA with PEFT")

output_directory = os.path.join(DA.paths.working_dir, "peft_lab_outputs")
training_args = TrainingArguments(
    output_dir=output_directory,
    auto_find_batch_size=True,
    learning_rate= 3e-2, # Higher learning rate than full fine-tuning.
    num_train_epochs=5,
    no_cuda=True
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
trainer.train()

2024/05/24 10:14:17 INFO mlflow.tracking.fluent: Experiment with name '/Users/labuser5958453@vocareum.com/LLM 02L - LoRA with PEFT' does not exist. Creating a new experiment.
You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=35, training_loss=7.041696166992187, metrics={'train_runtime': 1094.9289, 'train_samples_per_second': 0.228, 'train_steps_per_second': 0.032, 'total_flos': 58346118414336.0, 'train_loss': 7.041696166992187, 'epoch': 5.0})

In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_3(trainer)

[32mPASSED[0m: All tests passed for lesson2, question3
[32mRESULTS RECORDED[0m: Click `Submit` when all questions are completed to log the results.


## Load model

### Question 4 

Load the PEFT model using pre-defined LoRA configs and foundation model. We set `is_trainable=False` to avoid further training.

In [0]:
import time

time_now = time.time()

username = spark.sql("SELECT CURRENT_USER").first()[0]
peft_model_path = os.path.join(output_directory, f"peft_model_{time_now}")
trainer.model.save_pretrained(peft_model_path)

In [0]:
# TODO
from peft import PeftModel, PeftConfig

loaded_model = PeftModel.from_pretrained(peft_model_path,
                                         model_id=model_id,
                                         is_trainable=False)

In [0]:
help(PeftConfig)

Help on class PeftConfig in module peft.utils.config:

class PeftConfig(PeftConfigMixin)
 |  PeftConfig(peft_type: Union[str, peft.utils.config.PeftType] = None, auto_mapping: Optional[dict] = None, base_model_name_or_path: str = None, revision: str = None, task_type: Union[str, peft.utils.config.TaskType] = None, inference_mode: bool = False) -> None
 |  
 |  This is the base configuration class to store the configuration of a [`PeftModel`].
 |  
 |  Args:
 |      peft_type (Union[[`~peft.utils.config.PeftType`], `str`]): The type of Peft method to use.
 |      task_type (Union[[`~peft.utils.config.TaskType`], `str`]): The type of task to perform.
 |      inference_mode (`bool`, defaults to `False`): Whether to use the Peft model in inference mode.
 |  
 |  Method resolution order:
 |      PeftConfig
 |      PeftConfigMixin
 |      transformers.utils.hub.PushToHubMixin
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  


In [0]:
help(PeftModel)

Help on class PeftModel in module peft.peft_model:

class PeftModel(transformers.utils.hub.PushToHubMixin, torch.nn.modules.module.Module)
 |  PeftModel(model: 'PreTrainedModel', peft_config: 'PeftConfig', adapter_name: 'str' = 'default')
 |  
 |  Base model encompassing various Peft methods.
 |  
 |  Args:
 |      model ([`~transformers.PreTrainedModel`]): The base transformer model used for Peft.
 |      peft_config ([`PeftConfig`]): The configuration of the Peft model.
 |  
 |  
 |  **Attributes**:
 |      - **base_model** ([`~transformers.PreTrainedModel`]) -- The base transformer model used for Peft.
 |      - **peft_config** ([`PeftConfig`]) -- The configuration of the Peft model.
 |      - **modules_to_save** (`list` of `str`) -- The list of sub-module names to save when
 |      saving the model.
 |      - **prompt_encoder** ([`PromptEncoder`]) -- The prompt encoder used for Peft if
 |      using [`PromptLearningConfig`].
 |      - **prompt_tokens** (`torch.Tensor`) -- The virtu

In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_4(loaded_model)

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-1174370809808757>, line 3[0m
[1;32m      1[0m [38;5;66;03m# Test your answer. DO NOT MODIFY THIS CELL.[39;00m
[0;32m----> 3[0m dbTestQuestion2_4([43mloaded_model[49m)

[0;31mNameError[0m: name 'loaded_model' is not defined

## Inference

### Question 5

Generate output tokens to the same input we provided in the demo notebook before. How do the outputs compare?

In [0]:
# TODO
inputs = tokenizer("Two things are infinite: ", return_tensors="pt")
outputs = peft_model.generate(
    input_ids=<FILL_IN>, 
    attention_mask=<FILL_IN>, 
    max_new_tokens=<FILL_IN>, 
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(<FILL_IN>, skip_special_tokens=True))

In [0]:
# Test your answer. DO NOT MODIFY THIS CELL.

dbTestQuestion2_5(outputs)

&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>