# Fine-tune Falcon-7B with QLoRA and SageMaker remote decorator

## Question & Answering

---

In this demo notebook, we demonstrate how to fine-tune the Falcon-7B model using QLoRA, Hugging Face PEFT, and bitsandbytes.

We are using SageMaker remote decorator for runinng the fine-tuning job on Amazon SageMaker Training job
---

Install the required libriaries, including the Hugging Face libraries, and restart the kernel.

In [None]:
%pip install -r requirements.txt

In [None]:
%pip install -q -U transformers==4.28.1
%pip install -q -U git+https://github.com/huggingface/peft.git@e2b8e3260d3eeb736edf21a2424e89fe3ecf429d
%pip install -q -U git+https://github.com/huggingface/accelerate.git@b76409ba05e6fa7dfc59d50eee1734672126fdba
%pip install -q -U bitsandbytes==0.39.1
%pip install -q -U boto3
%pip install -q -U sagemaker==2.154.0
%pip install -q -U scikit-learn


## Setup Configuration file path

We are setting the directory in which the config.yaml file resides so that remote decorator can make use of the settings.


In [2]:
import os

# Set path to config file
os.environ["SAGEMAKER_USER_CONFIG_OVERRIDE"] = os.getcwd()

## Visualize and upload the dataset

Read train dataset in a Pandas dataframe

In [3]:
import pandas as pd
df = pd.read_csv('train.csv.gz', compression='gzip', sep=';')
df.head()

Unnamed: 0,service,question,answers
0,/ec2/autoscaling/faqs/,What is Amazon EC2 Auto Scaling?,Amazon EC2 Auto Scaling is a fully managed ser...
1,/ec2/autoscaling/faqs/,When should I use Amazon EC2 Auto Scaling vs. ...,You should use AWS Auto Scaling to manage scal...
2,/ec2/autoscaling/faqs/,How is Predictive Scaling Policy different fro...,Predictive Scaling Policy brings the similar p...
3,/ec2/autoscaling/faqs/,What are the benefits of using Amazon EC2 Auto...,Amazon EC2 Auto Scaling helps to maintain your...
4,/ec2/autoscaling/faqs/,What is fleet management and how is it differe...,If your application runs on Amazon EC2 instanc...


In [4]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.3)



To train our model, we need to convert our inputs (text) to token IDs. This is done by a Hugging Face Transformers Tokenizer. In addition to QLoRA, we will use bitsanbytes 4-bit precision to quantize out frozen LLM to 4-bit and attach LoRA adapters on it.



In [5]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

Create a prompt template and load the dataset with a random sample to try summarization.

In [6]:
from random import randint

# custom instruct prompt start
prompt_template = f"{{question}}\n---\nAnswer:\n{{answer}}{{eos_token}}"

# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = prompt_template.format(question=sample["question"],
                                            answer=sample["answers"],
                                            eos_token=tokenizer.eos_token)
    return sample

Use the Hugging Face Trainer class to fine-tune the model. Define the hyperparameters we want to use. We also create a DataCollator that will take care of padding our inputs and labels.

In [7]:
from transformers import AutoTokenizer

model_id = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

In [8]:
from datasets import Dataset, DatasetDict

train_dataset = Dataset.from_pandas(train)
test_dataset = Dataset.from_pandas(test)

dataset = DatasetDict({"train": train_dataset, "test": test_dataset})

train_dataset = dataset["train"].map(template_dataset, remove_columns=list(dataset["train"].features))

print(train_dataset[randint(0, len(dataset))]["text"])

test_dataset = dataset["test"].map(template_dataset, remove_columns=list(dataset["test"].features))

Map:   0%|          | 0/5101 [00:00<?, ? examples/s]

How do I get started with EMR Studio?
---
Answer:
Your administrator must first set up an EMR Studio. When you receive a unique sign-on URL for your Amazon EMR Studio from your administrator, you can log in to the Studio directly using your corporate credentials.<|endoftext|>


Map:   0%|          | 0/2187 [00:00<?, ? examples/s]

Define the train function

In [9]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import remote
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import transformers

# Start training
@remote(volume_size=50)
def train_fn(
        model_name,
        train_ds,
        test_ds
):
    # tokenize and chunk dataset
    lm_train_dataset = train_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, batch_size=24, remove_columns=list(train_dataset.features)
    )


    lm_test_dataset = test_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(test_dataset.features)
    )

    # Print total number of samples
    print(f"Total number of train samples: {len(lm_train_dataset)}")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    # Falcon requires you to allow remote code execution. This is because the model uses a new architecture that is not part of transformers yet.
    # The code is provided by the model authors in the repo.
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        quantization_config=bnb_config,
        device_map="auto")

    model.gradient_checkpointing_enable()
    model = prepare_model_for_kbit_training(model)

    config = LoraConfig(
        r=8,
        lora_alpha=32,
        target_modules=[
            "query_key_value",
            "dense",
            "dense_h_to_4h",
            "dense_4h_to_h",
            ],
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM"
    )

    model = get_peft_model(model, config)
    print_trainable_parameters(model)

    trainer = transformers.Trainer(
        model=model,
        train_dataset=lm_train_dataset,
        eval_dataset=lm_test_dataset,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=8,
            per_device_eval_batch_size=8,
            logging_steps=2,
            num_train_epochs=1,
            learning_rate=2e-4,
            bf16=True,
            save_strategy="no",
            output_dir="outputs"
        ),
        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    )
    model.config.use_cache = False

    trainer.train()
    trainer.evaluate()

    model.save_pretrained("/opt/ml/model")


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...


  warn("The installed version of bitsandbytes was compiled without GPU support. "
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


In [11]:
train_fn(model_id, train_dataset, test_dataset)

[Sagemaker Config - applied value]
 config key = SageMaker.PythonSDK.Modules.RemoteFunction.ImageUri
 config value that will be used = 763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04

[Sagemaker Config - applied value]
 config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
 config value that will be used = ./requirements.txt

[Sagemaker Config - applied value]
 config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType
 config value that will be used = ml.g5.12xlarge

[Sagemaker Config - applied value]
 config key = SageMaker.PythonSDK.Modules.RemoteFunction.RoleArn
 config value that will be used = arn:aws:iam::691148928602:role/mlops-sagemaker-execution-role



2023-07-21 08:42:56,753 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmphxs4fyyv/temp_workspace/sagemaker_remote_function_workspace/requirements.txt'
2023-07-21 08:42:56,754 sagemaker.remote_function INFO     Successfully created workdir archive at '/tmp/tmphxs4fyyv/workspace.zip'
2023-07-21 08:42:56,817 sagemaker.remote_function INFO     Successfully uploaded workdir to 's3://sagemaker-eu-west-1-691148928602/train-fn-2023-07-21-08-42-56-582/sm_rf_user_ws/workspace.zip'
2023-07-21 08:42:56,818 sagemaker.remote_function INFO     Serializing function code to s3://sagemaker-eu-west-1-691148928602/train-fn-2023-07-21-08-42-56-582/function
2023-07-21 08:42:57,048 sagemaker.remote_function INFO     Serializing function arguments to s3://sagemaker-eu-west-1-691148928602/train-fn-2023-07-21-08-42-56-582/arguments
2023-07-21 08:42:57,190 sagemaker.remote_function INFO     Creating job: train-fn-2023-07-21-08-42-56-582


2023-07-21 08:42:57 Starting - Starting the training job...
2023-07-21 08:43:23 Starting - Preparing the instances for training......
2023-07-21 08:44:27 Downloading - Downloading input data...
2023-07-21 08:44:52 Training - Downloading the training image...........................
2023-07-21 08:49:18 Training - Training image download completed. Training in progress.......[34mINFO: CONDA_PKGS_DIRS is set to '/opt/ml/sagemaker/warmpoolcache/sm_remotefunction_user_dependencies_cache/conda/pkgs'[0m
[34mINFO: PIP_CACHE_DIR is set to '/opt/ml/sagemaker/warmpoolcache/sm_remotefunction_user_dependencies_cache/pip'[0m
[34mINFO: Bootstraping runtime environment.[0m
[34m2023-07-21 08:50:11,141 sagemaker.remote_function INFO     Successfully unpacked workspace archive at '/'.[0m
[34m2023-07-21 08:50:11,141 sagemaker.remote_function INFO     '/sagemaker_remote_function_workspace/pre_exec.sh' does not exist. Assuming no pre-execution commands to run[0m
[34m2023-07-21 08:50:11,141 sagema

## Load Fine-Tuned model

### Download model

In [12]:
import boto3

s3_client = boto3.client("s3")

In [13]:
bucket_name = "<S3_BUCKET>"
job_name = "<JOB_NAME>"

In [14]:
s3_client.download_file(bucket_name, f"{job_name}/{job_name}/output/model.tar.gz", "model.tar.gz")

In [15]:
! rm -rf ./model && mkdir -p ./model && tar -xf model.tar.gz -C ./model

Now we are loading the PEFT weights trained

In [16]:
from transformers import AutoTokenizer

model_id = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

In [24]:
from peft import PeftModel, PeftConfig
import torch
from transformers import AutoModelForCausalLM

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

config = PeftConfig.from_pretrained("./model")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, "./model")
model.to(device)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 4544)
        (h): ModuleList(
          (0-31): 32 x DecoderLayer(
            (input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear(
                in_features=4544, out_features=4672, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4544, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4672, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()

Load a test dataset and try a random sample for Q&A.

In [33]:
import pandas as pd
df = pd.read_csv('train.csv.gz', compression='gzip', sep=';')

sample = df.sample()

sample

# format sample
prompt_template = f"{{question}}\n---\nAnswer:\n"

test_sample = prompt_template.format(question=sample.iloc()[0]["question"])

print(test_sample)

print("Original answer:\n", sample.iloc()[0]["answers"])

When should I use AppFlow or AWS Glue?
---
Answer:

Original answer:
 AWS Glue provides a managed ETL service that makes it easy for data engineers to prepare and load data stored on AWS for analytics. It creates a data catalog from JDBC-compliant data sources (i.e. databases) that makes metadata available for ETL as well as querying via Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. AppFlow connects to API-based data sources and enables users in lines of business to build data integration without writing code.


In [34]:
input_ids = tokenizer(test_sample, return_tensors="pt").input_ids

In [36]:
#set the tokens for the summary evaluation
tokens_for_answer = 100
output_tokens = input_ids.shape[1] + tokens_for_answer

outputs = model.generate(inputs=input_ids.to(device), do_sample=True, max_length=output_tokens)
gen_text = tokenizer.batch_decode(outputs)[0]

print(gen_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


When should I use AppFlow or AWS Glue?
---
Answer:
AppFlow is ideal for moving data from one storage system to another for long-term storage or archival purposes. For example, AppFlow can be used to create a new S3 bucket and automatically push the data into it. While this is a common data integration scenario, you may need to preserve the original source’s format and structure. For example, if you wanted to move a file system to AWS Glue from a legacy database that is organized in nested folders, you would use AppFlow. However
