# QLoRA
Repo: https://github.com/artidoro/qlora  
Paper: https://arxiv.org/abs/2305.14314  
It is a method to effectively reduce the memory required to update the weights of large models. And the 'fine-tuning' process is really easy under the support of Huggingface lib.

The dataset used can be found here:https://drive.google.com/drive/folders/10Ky8BBFkLBENIA7PItdt6YV6YgLWDsHi?usp=drive_link

More experimental toys here: https://github.com/YanJiaHuan/LLMs_for_yjh

## Falcon-7b
Falcon-7B is a 7B parameters, causal decoder-only model. The original training requires 384 A100 40GB GPUs, using a 2D parallelism strategy (PP=2, DP=192) combined with ZeRO.

Checkpoint: https://huggingface.co/tiiuae/falcon-7b  
blog tutorial: https://vilsonrodrigues.medium.com/run-your-private-llm-falcon-7b-instruct-with-less-than-6gb-of-gpu-using-4-bit-quantization-ff1d4ffbabcc

In [None]:
# Dependencies
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install datasets
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch
!pip install -q -U xformers
!pip install sacrebleu

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting datasets
  Downloading datasets-2.14.4-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [None]:
# libs
import torch
import pandas as pd
import transformers
from transformers import(
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
)
from peft import (
    prepare_model_for_kbit_training,
    LoraConfig,
    get_peft_model
)
from datasets import Dataset, load_dataset, load_metric

In [None]:
# model_name
model_id = "tiiuae/falcon-7b"

In [None]:
# LoRA Config
nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)
lora_config = LoraConfig(
    r=8,  # 理论上调的越高越好，8是一个分界线
    lora_alpha=32, # 这个参数类似lr
    target_modules=["query_key_value"], # 需要影响的层
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [None]:
# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    padding_side="right",
    use_fast=False,
)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config, trust_remote_code=True)
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Downloading (…)okenizer_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/950 [00:00<?, ?B/s]

Downloading (…)/configuration_RW.py:   0%|          | 0.00/2.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)main/modelling_RW.py:   0%|          | 0.00/47.6k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# load data
Data_path = "./background_train.json"
data = load_dataset('json', data_files=Data_path)
data = data['train'].train_test_split(test_size=0.1)

def preprocess_function(examples):
    inputs = ["### Instruction:\n" + instruction + "\n\n" + "### Input:\n" + context + "\n\n" for instruction, context in
              zip(examples["instruction"], examples["input"])]
    model_inputs = tokenizer(inputs, padding="max_length", max_length=1024, truncation=True)
    labels = tokenizer(examples["output"], padding="max_length", max_length=1024, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_data = data.map(preprocess_function, batched=True)


In [None]:
# load metric
bleu_metric = load_metric('sacrebleu')

def compute_metrics(eval_preds):
    labels = eval_preds.label_ids
    preds = eval_preds.predictions

    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = [tokenizer.batch_decode(label, skip_special_tokens=True) for label in labels]
    print(decoded_preds[:5])
    result = bleu_metric.compute(predictions=decoded_preds, references=decoded_labels)
    return {"bleu_score": result['score']}  # Rename the score as 'bleu_score'


In [None]:
# train

training_args = Seq2SeqTrainingArguments(
    logging_dir="./logs_for_falcon_7b",  # Path to directory to save logs
    output_dir="./Checkpoints/falcon/1890",
    evaluation_strategy="steps",
    eval_steps=20,
    learning_rate=1e-4,
    weight_decay=1e-5,
    save_strategy='steps',
    save_steps=600,
    num_train_epochs=500,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    eval_accumulation_steps=1,
    fp16=True,
    predict_with_generate=True,
    logging_strategy='steps',   # Log after every X steps
    logging_steps=100           # Set X to be 100
)

trainer = Seq2SeqTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=tokenized_data['train'],
    eval_dataset=tokenized_data['test'],
    compute_metrics=compute_metrics,
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

## FLAN-T5
It is an enhanced version of T5 that has been finetuned in a mixture of tasks. The model used here is 'flan-t5-base', which has around 0.2B parameters.  
Paper: https://arxiv.org/pdf/2210.11416.pdf  
Checkpoint: https://huggingface.co/google/flan-t5-base

In [None]:
# dependencies
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install datasets
!pip install sentencepiece  # for t5tokenizer
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch
!pip install -q -U xformers
!pip install sacrebleu

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
# libs (same)
import torch
import pandas as pd
import transformers
from transformers import(
    T5Tokenizer,
    MT5ForConditionalGeneration,
    BitsAndBytesConfig,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
)
from peft import (
    prepare_model_for_kbit_training,
    LoraConfig,
    get_peft_model
)
from datasets import Dataset, load_dataset, load_metric

In [None]:
# model_name
model_id = "google/flan-t5-base"

In [None]:
# LoRA Config
nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)
lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [None]:
# load model and tokenizer
model = MT5ForConditionalGeneration.from_pretrained(model_id, quantization_config=nf4_config)
tokenizer = T5Tokenizer.from_pretrained(model_id)
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

You are using a model of type t5 to instantiate a model of type mt5. This is not supported for all configurations of models and can yield errors.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


trainable params: 884,736 || all params: 248,462,592 || trainable%: 0.3560841867092814


In [None]:
# load data
Data_path = "./background_train.json"
data = load_dataset('json', data_files=Data_path)
data = data['train'].train_test_split(test_size=0.1)

def preprocess_function(examples):
    inputs = ["### Instruction:\n" + instruction + "\n\n" + "### Input:\n" + context + "\n\n" for instruction, context in
              zip(examples["instruction"], examples["input"])]
    model_inputs = tokenizer(inputs, padding="max_length", max_length=1024, truncation=True)
    labels = tokenizer(examples["output"], padding="max_length", max_length=1024, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_data = data.map(preprocess_function, batched=True)

Map:   0%|          | 0/1555 [00:00<?, ? examples/s]

Map:   0%|          | 0/173 [00:00<?, ? examples/s]

In [None]:
# load metric
bleu_metric = load_metric('sacrebleu')

def compute_metrics(eval_preds):
    labels = eval_preds.label_ids
    preds = eval_preds.predictions

    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = [tokenizer.batch_decode(label, skip_special_tokens=True) for label in labels]
    print(decoded_preds[:5])
    result = bleu_metric.compute(predictions=decoded_preds, references=decoded_labels)
    return {"bleu_score": result['score']}  # Rename the score as 'bleu_score'

  bleu_metric = load_metric('sacrebleu')


In [None]:
%load_ext tensorboard

In [None]:
# train

training_args = Seq2SeqTrainingArguments(
    logging_dir="./logs_for_flan_t5",  # Path to directory to save logs
    output_dir="./Checkpoints/flan_t5/1890",
    evaluation_strategy="steps",
    eval_steps=20,
    learning_rate=1e-4,
    weight_decay=1e-5,
    save_strategy='steps',
    save_steps=600,
    num_train_epochs=500,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    eval_accumulation_steps=1,
    fp16=True,
    predict_with_generate=True,
    logging_strategy='steps',   # Log after every X steps
    logging_steps=100           # Set X to be 100
)

trainer = Seq2SeqTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=tokenized_data['train'],
    eval_dataset=tokenized_data['test'],
    compute_metrics=compute_metrics,
)

trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss,Validation Loss


KeyboardInterrupt: ignored

In [None]:
# Tensorboard
%tensorboard --logdir logs_for_flan_t5


***** TensorBoard Uploader *****

This will upload your TensorBoard logs to https://tensorboard.dev/ from
the following directory:

logs_for_flan_t5

This TensorBoard will be visible to everyone. Do not upload sensitive
data.

Your use of this service is subject to Google's Terms of Service
<https://policies.google.com/terms> and Privacy Policy
<https://policies.google.com/privacy>, and TensorBoard.dev's Terms of Service
<https://tensorboard.dev/policy/terms/>.

This notice will not be shown again while you are logged into the uploader.
To log out, run `tensorboard dev auth revoke`.

Continue? (yes/NO) yes

To sign in with the TensorBoard uploader:

1. On your computer or phone, visit:

   https://www.google.com/device

2. Sign in with your Google account, then enter:

   FNMT-CKYM



New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/GwV9AP6VSn6pjquB0eiBuA/

[1m[2023-08-22T08:47:55][0m Started scanning logdir.
[1m[2023-08-22T08:47:55][0m Total up