## Finetune Falcon-7b (sharded version) on a Google colab notebook

Project C - Team 4

## Setup

The used libraries are `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent `SFTTrainer`. We will use `bitsandbytes` to quantize the base model into 4bit (QLoRA approach). We will also install `einops` as it is a requirement to load Falcon models.

In [None]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:50"

In [None]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb evaluate huggingface_hub nltk
!pip install torch

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.0/124.0 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m34.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m38.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.2/94.2 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m81.7 MB/s[0m 

## Dataset load

We make use of PubMedQA, which has more than 200000 instances (Artificially created)

The dataset can be found [here](https://huggingface.co/datasets/pubmed_qa)

Alternative dataset: CPGQA

In [None]:
from datasets import load_dataset

dataset_name = "pbaoo2705/cpgqa_processed-2"
dataset = load_dataset(dataset_name)

ModuleNotFoundError: ignored

Alternative dataset: BiomedQA

In [None]:
from datasets import load_dataset

dataset_name = "Shushant/BiomedicalQuestionAnsweringDataset"
raw_dataset = load_dataset(dataset_name)['train'].train_test_split(train_size=3000, test_size=100)
dataset = raw_dataset['train']
eval_dataset = raw_dataset['test']

#Remove unusable rows
dataset = dataset.filter(lambda example: not "answer_category" in example['answers'])
eval_dataset = eval_dataset.filter(lambda example: not "answer_category" in example['answers'])

Filter:   0%|          | 0/3000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/100 [00:00<?, ? examples/s]

In [None]:
from datasets import load_dataset

dataset_name = "pbaoo2705/biomedqa_processed"
dataset = load_dataset(dataset_name)
eval_dataset_name = 'pbaoo2705/biomedqa_processed_eval'
eval_dataset = load_dataset(eval_dataset_name)

Downloading readme:   0%|          | 0.00/618 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.33M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/2992 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/690 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/70.4k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/99 [00:00<?, ? examples/s]

## Loading the model

In this section we will load the [Falcon 7B model](https://huggingface.co/tiiuae/falcon-7b), quantize it in 4bit and attach LoRA adapters on it. Let's get started!

In [None]:
from transformers import AutoTokenizer, BartForConditionalGeneration

#Summarizer initialize
summary_model = BartForConditionalGeneration.from_pretrained("sshleifer/distilbart-cnn-12-6")
summary_tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
import torch
from transformers import AutoTokenizer, BitsAndBytesConfig, BartForConditionalGeneration
#Quantizer initialize
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

#Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

In [None]:
import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, BitsAndBytesConfig, BartForConditionalGeneration, AutoConfig

model_name = 'pbaoo2705/falcon-7b-sharded-2'
model_name = 'hung200504/falcon-7b-sharded-bf16-finetuned'
base_model_name = "tiiuae/falcon-7b"
#Get config.json file from the base model name directory
config = AutoConfig.from_pretrained(base_model_name, trust_remote_code=True, max_new_tokens=2048)

#Get model from HuggingFace's transformers library
model = AutoModelForQuestionAnswering.from_pretrained(
    model_name,
    config=config,
    quantization_config=bnb_config,
    trust_remote_code=True,
    torch_dtype=torch.float32
)


model.config.use_cache = False


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading (…)figuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



Downloading (…)/adapter_config.json:   0%|          | 0.00/549 [00:00<?, ?B/s]

Downloading (…)n/modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00008.bin:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00004-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00005-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00006-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00007-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00008-of-00008.bin:   0%|          | 0.00/921M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Some weights of FalconForQuestionAnswering were not initialized from the model checkpoint at ybelkada/falcon-7b-sharded-bf16 and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading adapter_model.bin:   0%|          | 0.00/261M [00:00<?, ?B/s]

Below we will load the configuration file in order to create the LoRA model. According to QLoRA paper, it is important to consider all linear layers in the transformer block for maximum performance. Therefore we will add `dense`, `dense_h_to_4_h` and `dense_4h_to_h` layers in the target modules in addition to the mixed query key value layer.

Lora configuration:

In [None]:
from peft import LoraConfig

#Setup numerical value for LoRA
lora_alpha = 16
lora_dropout = 0.2
lora_r = 32

#Use LoRA config for fine-tuning this model
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="QUESTION_ANSWERING",
    inference_mode=False,
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

AdaLora configuration:

In [None]:
from peft import AdaLoraConfig

peft_config = AdaLoraConfig(
    init_r=12,
    target_r=8,
    beta1=0.85,
    beta2=0.85,
    tinit=200,
    tfinal=1000,
    deltaT=10,
    lora_alpha=32,
    lora_dropout=0.2,
    task_type="QUESTION_ANSWERING",
    inference_mode=False,
        target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

IA3 configuration:

In [None]:
from peft import IA3Config

peft_config = IA3Config(
    task_type="QUESTION_ANSWERING",
    feedforward_modules=[
        "dense_h_to_4h",
        "dense_4h_to_h"
    ],
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h"
    ],
)

In [None]:
print(peft_config)

IA3Config(peft_type=<PeftType.IA3: 'IA3'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='QUESTION_ANSWERING', inference_mode=False, target_modules=['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'], feedforward_modules=['dense_h_to_4h', 'dense_4h_to_h'], fan_in_fan_out=False, modules_to_save=None, init_ia3_weights=True)


## Loading the trainer

Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [None]:
from transformers import TrainingArguments

#Arguments needed for training process
output_dir = "falcon-7b-sharded-2"
gradient_accumulation_steps = 1
#Paged adamw 32 bits optimization algorithm
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 30
eval_steps = 200
learning_rate = 5e-4
max_grad_norm = 0.3
max_steps = 1500
#max_steps to 1500 if train using biomedqa
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=True,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

(Archived) Data processing for PubMedQA:

In [None]:
#Summary text
def context_summarize(row):
    full_context = ''.join(row['context']['contexts'])
    inputs = summary_tokenizer(full_context, max_length=1024, truncation=True, return_tensors="pt")
    summary_tokens = summary_model.generate(inputs["input_ids"], num_beams=2, min_length=100, max_length=300)
    row['context'] = summary_tokenizer.batch_decode(summary_tokens, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return row

#Format the text
def get_text(row):
  row['text'] = "### Human: " + row['context'] + " " + row['question'] + "### Assistant: " + row['long_answer']
  return row

#Summary answer
def get_answer(row):
  answer = row['long_answer']
  inputs = summary_tokenizer(answer, max_length=1024, truncation=True, return_tensors="pt")
  summary_tokens = summary_model.generate(inputs["input_ids"], num_beams=2, min_length=10, max_length=30)
  row['answers'] = summary_tokenizer.batch_decode(summary_tokens, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
  return row

# #Sample the output
# sample_row = dataset_reformat[0]
# print(' '.join(sample_row['context']['contexts']))
# sample_row = context_summarize(sample_row)
# print(sample_row['context'])

dataset = dataset.map(get_answer)




Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

ValueError: ignored

In [None]:
eval_dataset.add_column(name="answers", column=eval_dataset['long_answer'])
eval_dataset = eval_dataset.map(get_answer)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
dataset.add_column(name="", column=dataset['long_answer'])
dataset = dataset.map(get_text,  remove_columns=dataset_reformat["train"].column_names)

In [None]:
#Preprocess function cpgqa
def get_input_ids(row):
  question = row["question"] + " "
  context = row["context"]
  inputs = tokenizer(
      question,
      context,
      truncation="only_second",
      return_offsets_mapping=True,
      padding="max_length",
  )
  row['input_ids'] = inputs['input_ids']
  row['attention_mask'] = inputs['attention_mask']
  row['answer'] = row['answer_text']
  offset_mapping = inputs.pop("offset_mapping")

  start_char = row["answer_start"]
  end_char = row["answer_start"] + len(row['answer'])

  row["start_positions"] = start_char
  row["end_positions"] = end_char
  return row

In [None]:
#Preprocess cpgqa (train set)
ds_col = dataset.column_names
dataset.add_column(name="input_ids", column=dataset['question'])
dataset.add_column(name="attention_mask", column=dataset['question'])
dataset.add_column(name="answer", column=dataset['answer_text'])
dataset.add_column(name="end_position", column=dataset['answer_text'])
dataset.add_column(name="start_position", column=dataset['answer_text'])
dataset = dataset.map(get_input_ids, remove_columns=ds_col)




Map:   0%|          | 0/884 [00:00<?, ? examples/s]

Map:   0%|          | 0/109 [00:00<?, ? examples/s]

In [None]:
#Preprocess cpgqa (eval set)
# eval_dataset.add_column(name="input_ids", column=eval_dataset['question'])
# eval_dataset.add_column(name="attention_mask", column=eval_dataset['question'])
# eval_dataset.add_column(name="answer", column=eval_dataset['answer_text'])
# eval_dataset.add_column(name="end_position", column=eval_dataset['answer_text'])
# eval_dataset.add_column(name="start_position", column=eval_dataset['answer_text'])
eval_dataset = eval_dataset.map(get_input_ids)

Map:   0%|          | 0/109 [00:00<?, ? examples/s]

In [None]:
#preprocess function for biomedqa
def get_input_ids_2(row):
  question = row["question"] + " "
  context = row["context"]
  inputs = tokenizer(
      question,
      context,
      truncation="only_second",
      max_length=2048,
      return_offsets_mapping=True,
      padding="max_length",
  )
  offset_mapping = inputs.pop("offset_mapping")
  row['input_ids'] = inputs['input_ids']
  row['attention_mask'] = inputs['attention_mask']
  split_ind = row['answers'].rfind(',')
  text = row['answers'][:split_ind].replace("{'text': '", '')[:-1]
  start_pos = row['answers'][split_ind:].replace(", 'answer_start':","").replace("}", "")
  row['answer'] = text
  start_char = int(start_pos)
  end_char = start_char + len(row['answer'])
  row["start_positions"] = start_char
  row["end_positions"] = end_char
  return row

In [None]:
ds_col = dataset.column_names
dataset = dataset.map(get_input_ids_2, remove_columns=ds_col)

Map:   0%|          | 0/2992 [00:00<?, ? examples/s]

In [None]:
eval_dataset = eval_dataset.map(get_input_ids_2)

Map:   0%|          | 0/99 [00:00<?, ? examples/s]

In [None]:
from huggingface_hub import notebook_login

token = "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"
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

HTTPError: ignored

In [None]:

dataset.push_to_hub("biomedqa_processed")
eval_dataset.push_to_hub("biomedqa_processed_eval")

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/3 [00:00<?, ?ba/s]

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Define evaluation metrics and re-size evaluation dataset
> Indented block



In [None]:
import evaluate

#Metric using f1
def compute_metrics(eval_preds):
    metric_f1 = evaluate.load("f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric_f1.compute(predictions=predictions[0], references=labels[0], average="macro")


In [None]:
#Break down the test set to get 200 instance for evaluation
#eval_dataset_breakdown["test"] is the desired evaluation set
eval_dataset_breakdown = eval_dataset
print(eval_dataset_breakdown)

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0.1', 'Unnamed: 0', 'id', 'title', 'context', 'question', 'answers'],
        num_rows: 99
    })
})


In [None]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

In [None]:
from transformers import Trainer
from peft import PeftModelForQuestionAnswering

max_seq_length = 2048
peft_model = PeftModelForQuestionAnswering(model=model, peft_config=peft_config)
peft_model.to(torch.device("cuda"))

trainer = Trainer(
    model=peft_model,
    train_dataset=dataset["train"],
    eval_dataset=eval_dataset_breakdown,
    tokenizer=tokenizer,
    args=training_arguments,
    compute_metrics=compute_metrics,
)

In [None]:
for param in model.parameters():
    # Check if parameter dtype is  Float (float32)
    if param.dtype == torch.float16:
        param.data = param.data.to(torch.float32)

In [None]:
for param in model.parameters():
  if (param.dtype != torch.float32):
    param.data = param.data.to(torch.float32)




OutOfMemoryError: ignored

In [None]:
print(eval_dataset_breakdown["test"])

Dataset({
    features: ['title', 'id', 'question', 'answer_text', 'answer_start', 'context', 'input_ids', 'attention_mask', 'answer', 'start_positions', 'end_positions'],
    num_rows: 89
})


In [None]:
#Get pipeline
from transformers import QuestionAnsweringPipeline
pipeline = QuestionAnsweringPipeline(model=model, tokenizer=tokenizer)

In [None]:
print(eval_dataset)

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0.1', 'Unnamed: 0', 'id', 'title', 'context', 'question', 'answers'],
        num_rows: 99
    })
})


In [None]:
#Get F1 score & accuracy (precision) score
total_f1 = 0
total_accuracy = 0
for ins in eval_dataset['train']:
  ans = pipeline(question=ins['question'], context=ins['context'], max_answer_len=50, max_question_len=300)
  ref_tokens = tokenizer(" " + ins["answer"])["input_ids"]
  ans_tokens = tokenizer(ans["answer"])["input_ids"]
  common_tokens = set(ans_tokens) & set(ref_tokens)
  precision = len(common_tokens) / len(ans_tokens)
  recall = len(common_tokens) / len(ref_tokens)
  total_accuracy += precision
  print(tokenizer.decode(ans_tokens), "|", tokenizer.decode(ref_tokens), "|")
  if (len(common_tokens) == 0):
    total_f1 += 0
    print(0)
  else:
    f1 = 2 * precision * recall / (precision + recall)
    total_f1 += f1
    print(f1)

print("F1 average score:", total_f1 / eval_dataset['train'].num_rows)
print("Accuracy average score: ", total_accuracy / eval_dataset['train'].num_rows)
# #Generated text
# print(pipeline(question=question, context=context, max_answer_len=300, max_question_len=300))

 Lehman High School and High School of American Studies. |  Fieldston, Horace Mann, and Riverdale Country School |
0.17391304347826086
 become increasingly common. Generally, in the United States, Gibson F-hole F-5 mandolins and mandolins influenced by that design are strongly associated with bluegrass, while the A- |  complicated woodwork |
0
 just recently being conducted as of 201 |  very small fees |
0
. " |  18th |
0.3333333333333333
 a result of exposure to natural or depleted uranium, exposure to uranium and its decay products, especially radon, are widely known and significant health threats. Exposure to strontium-90, iodine-131, and other fission products is unrelated |  alpha radiation |
0
 is so aligned, supportive, and conducive with the United States, that it is like a U.S. state. It can also be used in a pejorative |  their local or national culture has become too Americanized |
0
-equipment effort has resulted in the acquisition |  C-130 Hercules |
0.14285714285714285
 t

In [None]:
#Get BLEU score
from nltk.translate.bleu_score import sentence_bleu

total_bleu = 0
for ins in eval_dataset['train']:
  start=ins["start_positions"]
  end=ins["end_positions"]
  ans = pipeline(question=ins['question'], context=ins['context'], max_answer_len=350, max_question_len=50)
  bleu = sentence_bleu([ins["answer"]], ans["answer"])
  total_bleu += bleu

print("BLEU average score:", total_bleu / eval_dataset["train"].num_rows)

The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


BLEU average score: 0.04041604598087647


We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [None]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

## Train the model

Now let's train the model! Simply call `trainer.train()`

In [None]:
import wandb
wandb.login(key="2cae7567987969ff534ca431429f53beb6f16fdf")
wandb.init()

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mphambao2705[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [None]:
import numpy as np
trainer.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
30,7.6511
60,7.0866
90,7.0886
120,6.9513
150,6.9813
180,6.8821
210,6.9062
240,6.9532
270,6.864
300,6.7677


TrainOutput(global_step=1500, training_loss=6.957622395833333, metrics={'train_runtime': 9150.4666, 'train_samples_per_second': 0.328, 'train_steps_per_second': 0.164, 'total_flos': 2.46677066366976e+17, 'train_loss': 6.957622395833333, 'epoch': 1.0})

In [None]:
#Replace current model with trained model
model = trainer.model

Access token: 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

In [None]:
#Login to huggingface
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
#Push model to hub
trainer.push_to_hub('QLoRA applied #5')

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.bin:   0%|          | 0.00/261M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.09k [00:00<?, ?B/s]

'https://huggingface.co/pbaoo2705/falcon-7b-sharded-2/tree/main/'

In [None]:
import numpy as np
trainer.evaluate()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


{'eval_loss': nan,
 'eval_f1': 0.0,
 'eval_runtime': 83.2243,
 'eval_samples_per_second': 1.069,
 'eval_steps_per_second': 1.069}

Clear RAM for re-use session

In [None]:
import gc
gc.collect()

2443

In [None]:
from evaluate import evaluator
from datasets import load_dataset
from transformers import AutoModelForCausalLM

f1_calculator = evaluator('question-answering')

def reformat_dataset(row):
  answers = { 'text': row['text'], 'answer_start': [1]}
  row['id'] = row['pubid']
  row['answers'] = answers
  return row

#Validation set load
dataset_name = "pbaoo2705/processed_dataset_v2"
test_dataset = load_dataset(dataset_name, split='test')
print(test_dataset)
test_dataset = test_dataset.map(reformat_dataset)
test_model = AutoModelForCausalLM.from_pretrained(
    'pbaoo2705/falcon-7b-sharded',
    # quantization_config=bnb_config,
    trust_remote_code=True
)
results = f1_calculator.compute(
    model_or_pipeline=test_model,
    tokenizer=tokenizer,
    data=test_dataset,
)

Dataset({
    features: ['pubid', 'question', 'context', 'long_answer', 'final_decision', 'text'],
    num_rows: 1000
})


In [None]:
print(results)

In [None]:
!nvidia-smi

Mon Oct 16 06:12:29 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    41W / 300W |   5858MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces