In [1]:
from transformers import (
    AutoConfig, 
    AutoTokenizer, 
    AutoModelForCausalLM, 
    TextStreamer, 
    GenerationConfig, 
    logging,
    TrainingArguments,
    Trainer,
)
import datasets
import json
import pandas as pd
from pathlib import Path
import torch
import transformers
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import DataLoader, Dataset
import torch.nn as nn
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType

## Setup Dataset

In [2]:
# _OPENFUNCTIONS_TEST = "datasets/gorilla_openfunctions/test.jsonl"
# _OPENFUNCTIONS_TRAIN = "datasets/gorilla_openfunctions/train.jsonl"

Base Zephyr Model Prompt Template:
```text
<|system|>
You are a friendly chatbot who always responds in the style of a pirate.</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
```

In [3]:
# test_data = pd.read_json(_OPENFUNCTIONS_TEST, lines=True)
# train_data = pd.read_json(_OPENFUNCTIONS_TRAIN, lines=True)

In [4]:
# column_types = {
#     'question': 'string',
#     'function': 'string',
#     'model_answer': 'string',
# }
# test_data = test_data.astype(column_types)
# train_data = train_data.astype(column_types)

In [5]:
# train_data['Functions'][432]

## Train Model

In [6]:
logging.set_verbosity_info()

In [7]:
_BASE_MODEL_PATH = Path('../models/zephyr-7b-beta/')
_LORA_OUTPUT_PATH = Path('../models/loras/')

In [8]:
base_model_tokenizer = AutoTokenizer.from_pretrained(_BASE_MODEL_PATH, use_fast=False)
base_model_config = AutoConfig.from_pretrained(_BASE_MODEL_PATH)

loading file tokenizer.model
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file tokenizer.json
loading configuration file ../models/zephyr-7b-beta/config.json
Model config MistralConfig {
  "_name_or_path": "../models/zephyr-7b-beta",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pad_token_id": 2,
  "rms_norm_eps": 1e-05,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.0.dev0",
  "use_cache": true,
  "vocab_size": 32000
}



In [9]:
base_model_config.torch_dtype = torch.float16

In [10]:
base_model = AutoModelForCausalLM.from_pretrained(
    _BASE_MODEL_PATH,
    config=base_model_config, 
    device_map='auto', 
    torch_dtype=base_model_config.torch_dtype,
    low_cpu_mem_usage=True
)

loading weights file ../models/zephyr-7b-beta/pytorch_model.bin.index.json
Instantiating MistralForCausalLM model under default dtype torch.float16.
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 2
}



Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

All model checkpoint weights were used when initializing MistralForCausalLM.

All the weights of MistralForCausalLM were initialized from the model checkpoint at ../models/zephyr-7b-beta.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
loading configuration file ../models/zephyr-7b-beta/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}



In [11]:
for param in base_model.parameters():
    # Turning off gradient calculation for base model as we want to train lora, not base model
    param.requires_grad = False

In [12]:
base_model.config.use_cache = False

In [13]:
base_model_tokenizer.bos_token, base_model_tokenizer.pad_token, base_model_tokenizer.eos_token, base_model_tokenizer.unk_token

('<s>', '</s>', '</s>', '<unk>')

### [Gradient Accumulation](https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-accumulation)
    The idea behind gradient accumulation is to instead of calculating the gradients for the whole batch at once to do it in smaller steps. The way we do that is to calculate the gradients iteratively in smaller batches by doing a forward and backward pass through the model and accumulating the gradients in the process. When enough gradients are accumulated we run the model’s optimization step. This way we can easily increase the overall batch size to numbers that would never fit into the GPU’s memory. In turn, however, the added forward and backward passes can slow down the training a bit.

### [Gradient Checkpointing](https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing)
    Even when we set the batch size to 1 and use gradient accumulation we can still run out of memory when working with large models. In order to compute the gradients during the backward pass all activations from the forward pass are normally saved. This can create a big memory overhead. Alternatively, one could forget all activations during the forward pass and recompute them on demand during the backward pass. This would however add a significant computational overhead and slow down training.

    Gradient checkpointing strikes a compromise between the two approaches and saves strategically selected activations throughout the computational graph so only a fraction of the activations need to be re-computed for the gradients. See this great article explaining the ideas behind gradient checkpointing.

In [14]:
base_model.gradient_checkpointing_enable()
base_model.enable_input_require_grads()

In [15]:
def print_trainable_parameters(model: nn.Module):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for name, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable(%): {100 * trainable_params / all_param}"
    )

https://medium.com/@manyi.yim/more-about-loraconfig-from-peft-581cf54643db

In [16]:
lora_config = LoraConfig(
    # peft_type: str | PeftType = None,
    # auto_mapping: dict | None = None,
    # base_model_name_or_path: str = None,
    # revision: str = None,
    task_type = TaskType.CAUSAL_LM,
    # inference_mode: bool = False,
    r = 64, #! 8, 16, 32, 64
    target_modules = ["q_proj", "v_proj"],
    lora_alpha = 16, #! 8, 16, 32
    lora_dropout = 0.1, #! 0.05
    # fan_in_fan_out: bool = False,
    bias = "none",
    # modules_to_save: List[str] | None = None,
    # init_lora_weights: bool = True,
    # layers_to_transform: List[int] | int | None = None,
    # layers_pattern: str | None = None
)
peft_model = get_peft_model(base_model, lora_config)
print_trainable_parameters(peft_model)

trainable params: 27262976 || all params: 7268995072 || trainable(%): 0.3750583915652433


In [17]:
trainig_parms = TrainingArguments(
    output_dir=_LORA_OUTPUT_PATH,
    num_train_epochs=1,
    gradient_accumulation_steps=1,
    per_device_train_batch_size=4,
    
    logging_steps=25, # Default: 500
    # fp16=True,
    
    save_steps=25,
    save_safetensors=True,
    report_to="tensorboard",
)

PyTorch: setting up devices


In [18]:
# Sample dataset
data = datasets.load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: base_model_tokenizer(samples['quote']), batched=True)

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [22]:
data['train'] = data['train'].select(range(100))
data

DatasetDict({
    train: Dataset({
        features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
        num_rows: 100
    })
})

In [25]:
trainer = Trainer(
    model=peft_model,
    train_dataset=data['train'],
    # eval_dataset=data['validation'],
    args=trainig_parms,
    tokenizer=base_model_tokenizer,
    # callbacks=[],
    data_collator=transformers.DataCollatorForLanguageModeling(base_model_tokenizer, mlm=False),
)

You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching.


In [26]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: quote, author, tags. If quote, author, tags are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 100
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 25
  Number of trainable parameters = 27,262,976


Step,Training Loss
25,1.1936


Saving model checkpoint to ../models/loras/tmp-checkpoint-25
tokenizer config file saved in ../models/loras/tmp-checkpoint-25/tokenizer_config.json
Special tokens file saved in ../models/loras/tmp-checkpoint-25/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=25, training_loss=1.193631134033203, metrics={'train_runtime': 52.6901, 'train_samples_per_second': 1.898, 'train_steps_per_second': 0.474, 'total_flos': 348616162836480.0, 'train_loss': 1.193631134033203, 'epoch': 1.0})

In [54]:
prompt = data['train'][76]['input_ids']
prompt = torch.tensor(prompt).unsqueeze(0)
outputs = base_model.generate(prompt)
base_model_tokenizer.decode(outputs[0], skip_special_tokens=True), data['train'][76]['quote']

Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 2,
  "use_cache": false
}



('“Love is that condition in which the happiness of another person is essential to your own.”',
 '“Love is that condition in which the happiness of another person is essential to your own.”')

In [62]:
peft_model.save_pretrained(_LORA_OUTPUT_PATH)
peft_model.config.save_pretrained(_LORA_OUTPUT_PATH)
peft_model = AutoModelForCausalLM.from_pretrained(
    _LORA_OUTPUT_PATH,
    config=base_model_config,
    device_map='auto',
    torch_dtype=base_model_config.torch_dtype,
    low_cpu_mem_usage=True
)

peft_model.config.use_cache = False


Configuration saved in ../models/loras/generation_config.json
Detected adapters on the model, saving the model in the PEFT format, only adapter weights will be saved.
To match the expected format of the PEFT library, all keys of the state dict of adapters will be pre-pended with `base_model.model`.
Model weights saved in ../models/loras/pytorch_model.bin
Configuration saved in ../models/loras/config.json
loading weights file ../models/zephyr-7b-beta/pytorch_model.bin.index.json
Instantiating MistralForCausalLM model under default dtype torch.float16.
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 2
}



Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

All model checkpoint weights were used when initializing MistralForCausalLM.

All the weights of MistralForCausalLM were initialized from the model checkpoint at ../models/zephyr-7b-beta.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
loading configuration file ../models/zephyr-7b-beta/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

Could not locate the tokenizer configuration file, will try to use the model config instead.
loading configuration file ../models/loras/config.json
Model config MistralConfig {
  "_name_or_path": "../models/loras",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mi

OSError: Can't load tokenizer for '../models/loras'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '../models/loras' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.

In [65]:
peft_model_tokenizer = AutoTokenizer.from_pretrained(_BASE_MODEL_PATH, use_fast=False)

prompt = data['train'][76]['input_ids'][:5]
prompt = torch.tensor(prompt).unsqueeze(0)
outputs = peft_model.generate(prompt)
peft_model_tokenizer.decode(outputs[0], skip_special_tokens=True), data['train'][76]['quote']


loading file tokenizer.model
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file tokenizer.json
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 2,
  "use_cache": false
}



('“Love is that',
 '“Love is that condition in which the happiness of another person is essential to your own.”')