<a href="https://colab.research.google.com/github/MakovChen/LLMs-Development-Kit/blob/main/PEFT_experiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PEFT experiment
Take GPT2+LoRA as an example

In [1]:
#huggingface APIs (for obtaining model parameters, datasets, and other resources)
!pip install transformers
!pip install datasets

#Parameter-Efficient Fine-Tuning methods (including tools for adding Adapters, training Adapters, etc. to the base model)
!pip install peft

Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m55.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m107.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m75.0 MB/s[0m eta [36m0:00:

### Load related resources

In [2]:
import torch, transformers, peft, datasets

To get the base model, you can select your own model from the model pool (https://huggingface.co/models) <br>
Here I use the smallest one, gpt-2, as an example, and put the model name into `model_name`.

In [3]:
model_name = "gpt2"
base_model = transformers.AutoModelForCausalLM.from_pretrained(model_name)

#Load the model into the pdft module and reduce the floating point precision of its parameters to reduce the computational load
model = peft.prepare_model_for_int8_training(base_model)
print(f"Total parameters: {sum(p.numel() for p in model.parameters())}")

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Total parameters: 124439808


Get tokenizer<br>
Each base model has its corresponding tokenizer, just call it by the name of the base model `model_name`.


In [4]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

#Set the token to be filled when padding the sample (here use the default EPS of gpt-2, you can also set it according to your own needs, e.g. <EOS>)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Get the dataset, you can select the dataset you need from the dataset pool (https://huggingface.co/datasets?sort=downloads)<br>

In [5]:
dataset = datasets.load_dataset("cnn_dailymail", '3.0.0')
dataset

Downloading builder script:   0%|          | 0.00/8.33k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/9.88k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/15.1k [00:00<?, ?B/s]

Downloading and preparing dataset cnn_dailymail/3.0.0 to /root/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/1b3c71476f6d152c31c1730e83ccb08bcf23e348233f4fcc11e182248e6bf7de...


Downloading data files:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/159M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/376M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/12.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/661k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/572k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

Dataset cnn_dailymail downloaded and prepared to /root/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/1b3c71476f6d152c31c1730e83ccb08bcf23e348233f4fcc11e182248e6bf7de. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 287113
    })
    validation: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 13368
    })
    test: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 11490
    })
})

### Perform fine tuning training

Organizing datasets<br>
The sample formats supported by each model are different, and can be adjusted by referring to a few compatible datasets or open source examples.

In [None]:
def preprocess(example):
  example["input_ids"] = tokenizer(example["article"], truncation=True, padding="max_length", return_tensors="pt").input_ids
  example["labels"] = tokenizer(example["highlights"], truncation=True, padding="max_length", return_tensors="pt").input_ids
  return example

train_dataset = dataset["validation"].map(preprocess) #train set太大，以val set和 test set代替
val_dataset = dataset["test"].map(preprocess)

Map:   0%|          | 0/13368 [00:00<?, ? examples/s]

Adding a trainable Adapter network to the base model<br>
The official documentation (https://github.com/huggingface/peft) lists the types of Adapters supported by the different models


In [None]:
model = peft.get_peft_model(model, peft.LoraConfig(r=8, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"))

#Save the parameters before LoRA training to verify the correct update at a later time.
params_dict = {name: params for name, params in [p for p in model.named_parameters()] if "lora" in name}
print(list(params_dict.keys()))
model.print_trainable_parameters()

Testing the mod's performance before training

In [None]:
def generate(input, model):
  input_ids = tokenizer(input, return_tensors="pt").input_ids
  output_ids = model.generate(input_ids = input_ids, max_length=100, generation_config=transformers.GenerationConfig(temperature=0, top_p=0.75, top_k=40, num_beams=1, pad_token_id=tokenizer.eos_token_id))
  output = tokenizer.batch_decode(output_ids)[0]
  return output

print(generate("what is GPT?", model))

Set the training hyperparameters and start the training

In [None]:
#update gradient for every batch
args = transformers.TrainingArguments(output_dir="./results", learning_rate=1e+2, per_device_train_batch_size=2, num_train_epochs=1, gradient_accumulation_steps=1)
trainer = transformers.Trainer(model = model, train_dataset = train_dataset, eval_dataset = val_dataset, args=args)

In [None]:
trainer.train()

### Function Test

Check if the LoRA parameters have been updated compared to the pre-training period

# ★ the question is here, the weight of LoRA is same as before training

In [None]:
import numpy as np
diff = 0
for name, params in [p for p in trainer.model.named_parameters()]:
  if name in list(params_dict.keys()):
        diff += torch.sum(params - params_dict[name]).detach().numpy()
print('LoRA weights change:',diff)

Make inferences

In [None]:
print(generate("what is GPT?", trainer.model))

Save Model

In [None]:
torch.save(trainer.model, "lora_gpt2.pth")

Reloading Models and Inference

In [None]:
reload_model = torch.load("lora_gpt2.pth")

In [None]:
generate("what is GPT?", reload_model)