# Efficient Large Language Model training with LoRA and Hugging Face

This notebook is for developing, understanding, and debugging code. 

In [None]:
!pip install git+https://github.com/huggingface/peft.git
!pip install git+https://github.com/huggingface/accelerate.git
!pip install git+https://github.com/huggingface/transformers.git

## 2. Load and prepare the dataset

We will use the [samsum](https://huggingface.co/datasets/samsum) dataset, a collection of about 16k messenger-like conversations with summaries. Conversations were created and written down by linguists fluent in English.

```python
{
  "id": "13818513",
  "summary": "Amanda baked cookies and will bring Jerry some tomorrow.",
  "dialogue": "Amanda: I baked cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)"
}
```

To load the `samsum` dataset, we use the `load_dataset()` method from the 🤗 Datasets library.

In [3]:
from datasets import load_dataset

# Load dataset from the hub
dataset = load_dataset("samsum", split="train")
print(f"Train dataset size: {len(dataset)}")
# Train dataset size: 14732

train_num_samples = 100
dataset = dataset.shuffle(seed=42).select(range(train_num_samples))
print(f"Train dataset size (sampled): {len(dataset)}")

Found cached dataset samsum (/home/ec2-user/.cache/huggingface/datasets/samsum/samsum/0.0.0/f1d7c6b7353e6de335d444e424dc002ef70d1277109031327bc9cc6af5d3d46e)
Loading cached shuffled indices for dataset at /home/ec2-user/.cache/huggingface/datasets/samsum/samsum/0.0.0/f1d7c6b7353e6de335d444e424dc002ef70d1277109031327bc9cc6af5d3d46e/cache-d721b14e5b344d80.arrow


Train dataset size: 14732
Train dataset size (sampled): 100


To train our model, we need to convert our inputs (text) to token IDs. This is done by a 🤗 Transformers Tokenizer. If you are not sure what this means, check out **[chapter 6](https://huggingface.co/course/chapter6/1?fw=tf)** of the Hugging Face Course.

In [4]:
from transformers import AutoTokenizer

model_id = "bigscience/bloomz-7b1"

# Load tokenizer of BLOOMZ
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.model_max_length = 2048 # overwrite wrong value

Before we can start training, we need to preprocess our data. Abstractive Summarization is a text-generation task. Our model will take a text as input and generate a summary as output. We want to understand how long our input and output will take to batch our data efficiently.

We defined a `prompt_template` which we will use to construct an instruct prompt for better performance of our model. Our `prompt_template` has a “fixed” start and end, and our document is in the middle. This means we need to ensure that the “fixed” template parts + document are not exceeding the max length of the model. 
We preprocess our dataset before training and save it to disk to then upload it to S3. You could run this step on your local machine or a CPU and upload it to the [Hugging Face Hub](https://huggingface.co/docs/hub/datasets-overview).

In [5]:
from random import randint
from itertools import chain
from functools import partial

# custom instruct prompt start
prompt_template = f"Summarize the chat dialogue:\n{{dialogue}}\n---\nSummary:\n{{summary}}{{eos_token}}"

# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = prompt_template.format(dialogue=sample["dialogue"],
                                            summary=sample["summary"],
                                            eos_token=tokenizer.eos_token)
    return sample


# apply prompt template per sample
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))

print(dataset[randint(0, len(dataset))]["text"])

# empty list to save remainder from batches to use in next batch
remainder = {"input_ids": [], "attention_mask": []}


def chunk(sample, chunk_length=2048):
    # define global remainder variable to save remainder from batches to use in next batch
    global remainder
    # Concatenate all texts and add remainder from previous batch
    concatenated_examples = {k: list(chain(*sample[k])) for k in sample.keys()}
    concatenated_examples = {k: remainder[k] + concatenated_examples[k] for k in concatenated_examples.keys()}
    # get total number of tokens for batch
    batch_total_length = len(concatenated_examples[list(sample.keys())[0]])

    # get max number of chunks for batch
    if batch_total_length >= chunk_length:
        batch_chunk_length = (batch_total_length // chunk_length) * chunk_length

    # Split by chunks of max_len.
    result = {
        k: [t[i : i + chunk_length] for i in range(0, batch_chunk_length, chunk_length)]
        for k, t in concatenated_examples.items()
    }
    # add remainder to global variable for next batch
    remainder = {k: concatenated_examples[k][batch_chunk_length:] for k in concatenated_examples.keys()}
    # prepare labels
    result["labels"] = result["input_ids"].copy()
    return result


# tokenize and chunk dataset
lm_dataset = dataset.map(
    lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(dataset.features)
).map(
    partial(chunk, chunk_length=1536),
    batched=True,
)

# Print total number of samples
print(f"Total number of samples: {len(lm_dataset)}")

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Summarize the chat dialogue:
Sandy: wanna join? <file_other>
Tina: no, I need to keep writing:/:/
Sandy: writing what?
Tina: a stupid essay for Monday :/ :/
Sandy: about?
Tina: some psychological shit..:P
Sandy: uh, that sucks ;(
Sandy: how many pages?
Tina: 30............
Sandy: how many u have?
Tina: don ask!
Sandy: 3? :D
Tina: 5...  
Sandy: fuck!!close fb and  u gonna make it,  fingers x!!
Sandy: thx, enjoy tonite!
---
Summary:
Tina can't join Sandy as she needs to keep writing an essay on psychology for Monday. She has to write 30 pages, but has only 5 now.  </s>


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Total number of samples: 12


In [6]:
def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        #prepare_model_for_int8_training,
        prepare_model_for_kbit_training
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=["query_key_value"],
    )

    # prepare int-8 model for training
    model = prepare_model_for_kbit_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model

import os
import argparse
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    set_seed,
    default_data_collator,
)
from datasets import load_from_disk
import torch
from transformers import Trainer, TrainingArguments
from peft import PeftConfig, PeftModel
import shutil

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2023-06-20 07:29:35,964] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The cur

Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [11]:
!rm -rf logs output
!mkdir /home/ec2-user/SageMaker/hf_cache logs output

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
mkdir: cannot create directory ‘/home/ec2-user/SageMaker/hf_cache’: File exists


In [12]:
set_seed(42)

# load model from the hub
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    use_cache=False,
    device_map="auto",
    load_in_8bit=True,
    cache_dir="/home/ec2-user/SageMaker/hf_cache"
)

# create peft config
model = create_peft_config(model)

trainable params: 3,932,160 || all params: 7,072,948,224 || trainable%: 0.055594355783029126


In [13]:
# Define training args
training_args = TrainingArguments(
    report_to="none",
    output_dir="output",
    overwrite_output_dir=True,
    per_device_train_batch_size=1,
    bf16=True,  # Use BF16 if available
    learning_rate=2e-4,
    num_train_epochs=1,
    gradient_checkpointing=True,
    gradient_accumulation_steps=2,
    # logging strategies
    logging_dir="logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
    optim="adafactor",
)

In [14]:
# Create Trainer instance
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_dataset,
    data_collator=default_data_collator,
)

# Start training
trainer.train()



Step,Training Loss


TrainOutput(global_step=6, training_loss=2.9970054626464844, metrics={'train_runtime': 43.1713, 'train_samples_per_second': 0.278, 'train_steps_per_second': 0.139, 'total_flos': 668566655336448.0, 'train_loss': 2.9970054626464844, 'epoch': 1.0})

In [15]:
del model
del trainer

torch.cuda.empty_cache()