In [1]:
!pip install bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-

In [8]:
import json # imports the json module

with open("/Swim/swim_workout_prompts.json") as json_file:
    data = json.load(json_file)

In [9]:
print(data[0]["prompt"])

Beginner-friendly swimming workout focusing on overall swimming technique and endurance.


### Loading the Model

> To load the model and tokenizer, I will use the `AutoModelForCausalLM` and `AutoTokenizer `classes from the Transformers library. We’ll also set the `pad_token` to the `eos_token` to avoid issues with padding.



In [10]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = "tiiuae/falcon-7b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

> Note that I am using the `BitsAndBytesConfig` class to load the model in 4-bit mode. Also using the `bnb_4bit_use_double_quant` parameter to enable double quantization, which is a technique that allows to use 4-bit weights and activations while still performing 16-bit arithmetic. I also specify the `nf4` (4-bit NormalFloat) from QLoRa.



In [4]:
!pip install peft



In [43]:
#Preparing the model for training
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.



> The `gradient_checkpointing_enable` method enables gradient checkpointing, which is a technique that allows to trade compute for memory. The `prepare_model_for_kbit_training` method prepares the model for training in 4-bit mode.






In [44]:
!pip install peft --upgrade  # Upgrade peft to the latest version



In [45]:
!pip install peft --upgrade  # Upgrade peft to the latest version
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
from peft import PeftModel # Import PeftModel from peft

# Instead of using print_trainable_parameters, you can use the following code to print trainable parameters:

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for name, param in model.named_parameters():
        num_params = param.numel()
        # if using DS Zero 3 and the weights are initialized empty
        if num_params == 0 and hasattr(param, "ds_numel"):
            num_params = param.ds_numel

        all_param += num_params
        if param.requires_grad:
            trainable_params += num_params
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )


model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
print_trainable_parameters(model) # Call the custom print_trainable_parameters function



You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.


trainable params: 4718592 || all params: 3613463424 || trainable%: 0.13058363808693696


### trainable params: 4718592 || all params: 3613463424 || trainable%: 0.13058363808693696

## The LoraConfig class is used to define the configuration for LoRA, and the following parameters are set:


*  ` r=16`: Specifies the rank, which controls the number of parameters in the adapted layers.
*   `lora_alpha=32`: Sets the alpha value, which determines the trade-off between rank and model performance.
*   `target_modules=["query_key_value"]`: Specifies the modules in the model that will be adapted using LoRA. In this case, only the “query_key_value” module will be adapted.
*   `task_type="CAUSAL_LM"`: Specifies the type of task as causal language model.


> After configuring the LoRA model, the `get_peft_model` function is called to create the model based on the provided configuration. Note that I am going to train only 0.13% of the original model parameter size.







In [46]:
prompt = f"""
<swimmmer>: A workout designed to improve freestyle stroke ?
<assistant>:
""".strip()
print(prompt)

<swimmmer>: A workout designed to improve freestyle stroke ?
<assistant>:


In [47]:
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id
#generation_config.use_cache = False
generation_config

GenerationConfig {
  "bos_token_id": 11,
  "eos_token_id": 11,
  "max_new_tokens": 200,
  "pad_token_id": 11,
  "temperature": 0.7,
  "top_p": 0.7,
  "use_cache": false
}

In [49]:
#%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

<swimmmer>: A workout designed to improve freestyle stroke ?
<assistant>: Yes
<swimmmer>: I'm a 14 year old swimmer and I'm looking to improve my freestyle stroke. I'm not sure if this workout is for me.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.
<assistant>: It is for you.


In [50]:
!pip install datasets
from datasets import load_dataset

data = load_dataset("json", data_files="/Swim/swim_workout_prompts.json")
data

'''
DatasetDict({
    train: Dataset({
        features: ['prompt', 'response'],
        num_rows: 13
    })
})
'''



"\nDatasetDict({\n    train: Dataset({\n        features: ['prompt', 'response'],\n        num_rows: 13\n    })\n})\n"

In [51]:
def generate_prompt(data_point):
    return f"""
<swimmer>: {data_point["prompt"]}
<assistant>: {data_point["response"]}
""".strip()


def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data["train"].shuffle().map(generate_and_tokenize_prompt)
data

Map:   0%|          | 0/13 [00:00<?, ? examples/s]

Dataset({
    features: ['prompt', 'response', 'input_ids', 'attention_mask'],
    num_rows: 13
})

###Training with a QLoRA adapter is similar to training any transformer using the Trainer by HuggingFace, but I will need to provide several parameters. The `TrainingArguments` class is used to define the training parameters:

In [52]:
!pip install transformers
import transformers # import the transformers library

OUTPUT_DIR = "experiments"

training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=3,
    logging_steps=1,
    output_dir=OUTPUT_DIR,
    max_steps=80,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.05,
    report_to="tensorboard",
)



In [55]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train()

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


TypeError: FalconForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'