# Fine-tune Falcon-7B with QLoRA and SageMaker remote decorator

## Question & Answering

---

In this demo notebook, we demonstrate how to fine-tune the Falcon-7B model using QLoRA, Hugging Face PEFT, and bitsandbytes.

We are using SageMaker remote decorator for runinng the fine-tuning job on Amazon SageMaker Training job
---
SageMaker Studio Kernel: PyTorch 2.0.0 Python 3.10

Instance Type: ml.g5.8xlarge

Install the required libriaries, including the Hugging Face libraries, and restart the kernel.

In [10]:
# !pip install virtualenv

In [5]:
# !python -m virtualenv myenv

created virtual environment CPython3.10.8.final.0-64 in 15761ms
  creator CPython3Posix(dest=/root/amazon-sagemaker-remote-decorator-generative-ai/myenv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.41.3
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator


In [None]:
# !source myenv/bin/activate

In [9]:
# %pip install -r requirements.txt

In [11]:
# !pip install --upgrade pip

In [12]:
# %pip install -q -U transformers
# %pip install -q -U datasets
# %pip install -q -U peft
# %pip install -q -U accelerate
# %pip install -q -U bitsandbytes
# %pip install -q -U boto3
# %pip install -q -U sagemaker
# %pip install -q -U scikit-learn

In [13]:
# !pip install --upgrade pip


## Setup Configuration file path

We are setting the directory in which the config.yaml file resides so that remote decorator can make use of the settings.


In [55]:
import os

# Set path to config file
os.environ["SAGEMAKER_USER_CONFIG_OVERRIDE"] = os.getcwd()

## Visualize and upload the dataset

Read train dataset in a Pandas dataframe

In [5]:
import pandas as pd
df1 = pd.read_csv('./dora/dora_main.csv', sep=';')
df2 = pd.read_csv('./dora/DSID_101.csv', sep=';')
df3 = pd.read_csv('./dora/DSID_102.csv', sep=';')
df4 = pd.read_csv('./dora/dora_preamble.csv', sep=';')
df5 = pd.read_csv('./dora/dora_EU.csv', sep=';')

In [6]:
# Concatenate the DataFrames
df2.columns = df1.columns  # Match the column names to df1
df3.columns = df1.columns  # Match the column names to df1
df4.columns = df1.columns  # Match the column names to df1
df5.columns = df1.columns  # Match the column names to df1


df = pd.concat([df1, df2, df3, df4, df5], ignore_index=True)

# To verify the result
df.tail()

Unnamed: 0,question,answers
5799,What significance does DORA place on the inter...,Maintaining a strong interaction ensures consi...
5800,What implications does DORA's categorization a...,"For entities under the NIS 2 Directive, DORA r..."
5801,What specific challenges did policymakers iden...,Policymakers identified limitations in existin...
5802,What challenges might financial entities face ...,Entities might encounter challenges in alignin...
5803,How does DORA aim to address the challenges ar...,DORA aims to mitigate inconsistencies from unc...


In [7]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.3)

In [51]:
train.shape

(4062, 2)

In [79]:
test.shape

(1742, 2)



To train our model, we need to convert our inputs (text) to token IDs. This is done by a Hugging Face Transformers Tokenizer. In addition to QLoRA, we will use bitsanbytes 4-bit precision to quantize out frozen LLM to 4-bit and attach LoRA adapters on it.



In [10]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

Create a prompt template and load the dataset with a random sample to try summarization.

In [11]:
from random import randint

# custom instruct prompt start
prompt_template = f"{{question}}\n---\nAnswer:\n{{answer}}{{eos_token}}"

# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = prompt_template.format(question=sample["question"],
                                            answer=sample["answers"],
                                            eos_token=tokenizer.eos_token)
    return sample

Use the Hugging Face Trainer class to fine-tune the model. Define the hyperparameters we want to use. We also create a DataCollator that will take care of padding our inputs and labels.

In [None]:
# from transformers import AutoTokenizer, AutoModelForCausalLM


# model_id = "tiiuae/falcon-7b-instruct"

# tokenizer = AutoTokenizer.from_pretrained(model_id)
# # Set the Falcon tokenizer
# tokenizer.pad_token = tokenizer.eos_token

In [36]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)

# # Set the Falcon tokenizer
# tokenizer.pad_token = tokenizer.eos_token

# # Set pad_token_id explicitly to eos_token_id
# tokenizer.pad_token_id = tokenizer.eos_token_id

# Set the pad_token_id to a valid value
tokenizer.pad_token_id = 0


In [37]:
model = AutoModelForCausalLM.from_pretrained(model_id)

Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.75s/it]


Let's try some prompts related to DORA (Digital Operational Resilience Act):

In [40]:
duration = 0.0
total_length = 0
prompt = []
prompt.append("What is DORA?")
prompt.append("When was DORA approved?")
prompt.append("By when does DORA need to be implemented?")
prompt.append("How many Chapters does DORA have?")
prompt.append("What is CSIRT?")
prompt.append("What is the number of preamble in DORA?")
prompt.append("What is the scope of DORA?")
prompt.append("What is the definition of ICT system?")
prompt.append("What is the 'Proportionality  Principle'?")
prompt.append("What is TLPT?")




for i in range(len(prompt)):
    model_inputs = tokenizer(prompt[i], return_tensors="pt")
    output = model.generate(**model_inputs, max_length=500)[0]
    total_length += len(output)
    print(tokenizer.decode(output, skip_special_tokens=True))
    print("-" * 40)  # This will print a line of dashes as a separator

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


What is DORA?
DORA is an acronym for "Data Object Request for Applications". It is a standard for data access that allows applications to access data from a variety of data sources.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


When was DORA approved?
DORA was approved in 2015.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


By when does DORA need to be implemented?
DORA needs to be implemented as soon as possible to ensure that all data is properly stored and organized. It is important to have a system in place that can handle large amounts of data and can be easily accessed and updated.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


How many Chapters does DORA have?
Dora has 12 chapters in total.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


What is CSIRT?
CSIRT stands for Computer Security Incident Response Team. It is a team of professionals who are responsible for responding to security incidents and mitigating the impact of these incidents on the organization.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


What is the number of preamble in DORA?
I'm sorry, I cannot provide an accurate answer as DORA is not a specific term or acronym. Can you please provide more context or information?
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


What is the scope of DORA?
DORA (Data Object Request for Applications) is a programming language that is used to create and manage data objects. It is used to create and manage data objects in a variety of applications, including databases, data warehouses, and data visualization tools. The scope of DORA is to provide a standardized way of accessing and manipulating data objects, regardless of the specific application or platform they are being used on.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


What is the definition of ICT system?
The definition of ICT system is a combination of hardware, software, and network components that work together to process, store, and communicate information.
----------------------------------------


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


What is the 'Proportionality  Principle'?
The Proportionality Principle is a legal doctrine that requires a proportionality between the severity of the punishment and the crime committed. It is often used in criminal law to determine whether a punishment is excessive or disproportionate to the crime.
----------------------------------------
What is TLPT?
TLPT stands for "Tales from the Public Transport" and is a series of short stories written by a group of writers from the UK. The stories are based on the experiences of the writers on public transport and are often humorous.
----------------------------------------


Splitting the test and train dataset:

In [56]:
from datasets import Dataset, DatasetDict

train_dataset = Dataset.from_pandas(train)
test_dataset = Dataset.from_pandas(test)

dataset = DatasetDict({"train": train_dataset, "test": test_dataset})

train_dataset = dataset["train"].map(template_dataset, remove_columns=list(dataset["train"].features))

print(train_dataset[randint(0, len(dataset))]["text"])

test_dataset = dataset["test"].map(template_dataset, remove_columns=list(dataset["test"].features))

Map: 100%|██████████| 4062/4062 [00:00<00:00, 20503.82 examples/s]


What approaches are available to competent authorities for implementing administrative penalties and remedial measures in Article 50, paragraph 1?
---
Answer:
Direct implementation, collaboration with other authorities, delegation of responsibility, or seeking intervention from judicial authorities.<|endoftext|>


Map: 100%|██████████| 1742/1742 [00:00<00:00, 22730.17 examples/s]


Utility method for finding the target modules and update the necessary matrices. Visit [this](https://github.com/artidoro/qlora/blob/main/qlora.py) link for additional info.

In [90]:
import bitsandbytes as bnb

def find_all_linear_names(hf_model):
    lora_module_names = set()
    for name, module in hf_model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

Define the train function

In [91]:
import logging

sagemaker_config_logger = logging.getLogger("sagemaker.config")
sagemaker_config_logger.setLevel(logging.WARNING)

In [92]:
from huggingface_hub import login
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import remote
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import transformers

# Start training
@remote(volume_size=50)
def train_fn(
        model_name,
        train_ds,
        test_ds,
        lora_r=64,
        lora_alpha=16,
        lora_dropout=0.1,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        learning_rate=2e-4,
        num_train_epochs=1,
        merge_weights=False,
        token=None
):
    if token is not None:
        login(token=token)

    # tokenize and chunk dataset
    lm_train_dataset = train_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, batch_size=24, remove_columns=list(train_dataset.features)
    )


    lm_test_dataset = test_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(test_dataset.features)
    )

    # Print total number of samples
    print(f"Total number of train samples: {len(lm_train_dataset)}")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        quantization_config=bnb_config,
        device_map="auto")

    model.gradient_checkpointing_enable()
    model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

    # get lora target modules
    modules = find_all_linear_names(model)
    print(f"Found {len(modules)} modules to quantize: {modules}")

    config = LoraConfig(
        r=lora_r,
        lora_alpha=lora_alpha,
        target_modules=modules,
        lora_dropout=lora_dropout,
        bias="none",
        task_type="CAUSAL_LM"
    )

    model = get_peft_model(model, config)
    print_trainable_parameters(model)

    trainer = transformers.Trainer(
        model=model,
        train_dataset=lm_train_dataset,
        eval_dataset=lm_test_dataset,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            per_device_eval_batch_size=per_device_eval_batch_size,
            logging_steps=2,
            num_train_epochs=num_train_epochs,
            learning_rate=learning_rate,
            bf16=True,
            save_strategy="no",
            output_dir="outputs"
        ),
        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    )
    model.config.use_cache = False

    trainer.train()
    trainer.evaluate()

    if merge_weights:
        output_dir = "/tmp/model"

        # merge adapter weights with base model and save
        # save int 4 model
        trainer.model.save_pretrained(output_dir, safe_serialization=False)
        # clear memory
        del model
        del trainer
        torch.cuda.empty_cache()

        # load PEFT model in fp16
        model = AutoPeftModelForCausalLM.from_pretrained(
            output_dir,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16,
        )
        # Merge LoRA and base model and save
        model = model.merge_and_unload()
        model.save_pretrained(
            "/opt/ml/model", safe_serialization=True, max_shard_size="2GB"
        )
    else:
        model.save_pretrained("/opt/ml/model", safe_serialization=True)

    tmp_tokenizer = AutoTokenizer.from_pretrained(model_name)
    tmp_tokenizer.save_pretrained("/opt/ml/model")

Define LoRA parameters for fine-tuning

In [93]:
learning_rate=2e-4
num_train_epochs=1
per_device_train_batch_size=8
per_device_eval_batch_size=8

lora_r=8
lora_alpha=32
lora_dropout=0.05

In [94]:
train_fn(
    model_id,
    train_dataset,
    test_dataset,
    lora_r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    learning_rate=learning_rate,
    num_train_epochs=num_train_epochs,
    merge_weights=False
)

2023-12-07 23:45:03,465 sagemaker.remote_function INFO     Serializing function code to s3://sagemaker-us-east-1-848055118036/train-fn-2023-12-07-23-45-03-464/function
2023-12-07 23:45:03,839 sagemaker.remote_function INFO     Serializing function arguments to s3://sagemaker-us-east-1-848055118036/train-fn-2023-12-07-23-45-03-464/arguments
2023-12-07 23:45:04,447 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmpopm61zhx/temp_workspace/sagemaker_remote_function_workspace/requirements.txt'
2023-12-07 23:45:04,448 sagemaker.remote_function INFO     Successfully created workdir archive at '/tmp/tmpopm61zhx/workspace.zip'
2023-12-07 23:45:04,496 sagemaker.remote_function INFO     Successfully uploaded workdir to 's3://sagemaker-us-east-1-848055118036/train-fn-2023-12-07-23-45-03-464/sm_rf_user_ws/workspace.zip'
2023-12-07 23:45:04,497 sagemaker.remote_function INFO     Creating job: train-fn-2023-12-07-23-45-03-464


ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: Cannot find repository: huggingface-pytorch-training in registry ID: 848055118036 Please check if your ECR repository exists and role arn:aws:iam::848055118036:role/service-role/SageMaker-DataScientist-EKAI has proper pull permissions for SageMaker: ecr:BatchCheckLayerAvailability, ecr:BatchGetImage, ecr:GetDownloadUrlForLayer

## Load Fine-Tuned model

Note: Run `train_fn` with `merge_weights=False`

### Download model

In [2]:
import boto3

s3_client = boto3.client("s3")

In [3]:
bucket_name = "<S3_BUCKET>"
job_name = "<JOB_NAME>"

In [4]:
s3_client.download_file(bucket_name, f"{job_name}/{job_name}/output/model.tar.gz", "model.tar.gz")

In [5]:
! rm -rf ./model && mkdir -p ./model && tar -xf model.tar.gz -C ./model

Now we are loading the PEFT weights trained

In [6]:
from transformers import AutoTokenizer

model_id = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

In [7]:
from peft import PeftModel, PeftConfig
import torch
from transformers import AutoModelForCausalLM

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

config = PeftConfig.from_pretrained("./model")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, "./model")
model.to(device)

  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 4544)
        (h): ModuleList(
          (0-31): 32 x DecoderLayer(
            (input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear(
                in_features=4544, out_features=4672, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4544, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4672, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()

Load a test dataset and try a random sample for Q&A.

In [8]:
import pandas as pd
df = pd.read_csv('train.csv.gz', compression='gzip', sep=';')

sample = df.sample()

# format sample
prompt_template = f"{{question}}\n---\nAnswer:\n"

test_sample = prompt_template.format(question=sample.iloc()[0]["question"])

print(test_sample)

print("Original answer:\n", sample.iloc()[0]["answers"])

How do I get my device listed in the AWS Partner Device Catalog?
---
Answer:

Original answer:
 If you are an AWS partner, the AWS Device Qualification Program defines the process to get your device listed in the catalog. A high level overview of the process is as follows: Pass the AWS IoT Device Tester for AWS IoT Greengrass test Log into the AWS Partner Network Portal Upload the AWS IoT Device Tester report. Once the report is verified by AWS, and other device related artifacts such as picture and data sheet have been submitted, the device is listed in the AWS Partner Device Catalog.


In [9]:
input_ids = tokenizer(test_sample, return_tensors="pt").input_ids

In [10]:
#set the tokens for the summary evaluation
tokens_for_answer = 100
output_tokens = input_ids.shape[1] + tokens_for_answer

outputs = model.generate(inputs=input_ids.to(device), do_sample=True, max_length=output_tokens)
gen_text = tokenizer.batch_decode(outputs)[0]

print(gen_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


How do I get my device listed in the AWS Partner Device Catalog?
---
Answer:
To get started, simply submit a request to the AWS Device Farm team. Once we confirm your eligibility and receive the required information, your devices will be listed. If there is anything else we require, we will let you know. Please contact us if you’d like to get started. Please reference the AWS API reference for more details on the Partner Device Catalog operations and documentation. If you intend to integrate with one of the AWS Partner Device Catalog API endpoints, we recommend you first validate the process by
