## Finetune Falcon-7b (sharded version) on a Google colab notebook

Project C - Team 4

## Setup

The used libraries are `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent `SFTTrainer`. We will use `bitsandbytes` to quantize the base model into 4bit (QLoRA approach). We will also install `einops` as it is a requirement to load Falcon models.

In [1]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb evaluate huggingface_hub
!pip install torch

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.0/118.0 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m65.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m39.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m91.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m65.4 MB/s[0m eta [36m

## Dataset load

We make use of PubMedQA, which has more than 200000 instances (Artificially created)

The dataset can be found [here](https://huggingface.co/datasets/pubmed_qa)

In [2]:
from datasets import load_dataset

dataset_name = "pbaoo2705/qa_processed"
eval_dataset_name = 'pbaoo2705/qa_processed_eval'
dataset = load_dataset(dataset_name, split='train')
eval_dataset = load_dataset(eval_dataset_name, split='test')


Downloading readme:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/5.67M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/667 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.13M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating test split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Alternative dataset: CPGQA

In [3]:
from datasets import load_dataset

dataset_name = "pbaoo2705/cpgqa_processed"
eval_dataset_name = 'pbaoo2705/cpgqa_processed_eval'
dataset = load_dataset(dataset_name)
eval_dataset = load_dataset(dataset_name)

Downloading readme:   0%|          | 0.00/830 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/202k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/884 [00:00<?, ? examples/s]

## Loading the model

In this section we will load the [Falcon 7B model](https://huggingface.co/tiiuae/falcon-7b), quantize it in 4bit and attach LoRA adapters on it. Let's get started!

In [None]:
from transformers import AutoTokenizer, BartForConditionalGeneration

#Summarizer initialize
summary_model = BartForConditionalGeneration.from_pretrained("sshleifer/distilbart-cnn-12-6")
summary_tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [4]:
import torch
from transformers import AutoTokenizer, BitsAndBytesConfig, BartForConditionalGeneration
#Quantizer initialize
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

#Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

In [5]:
import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, BitsAndBytesConfig, BartForConditionalGeneration, AutoConfig

model_name = 'pbaoo2705/falcon-7b-sharded-2'
base_model_name = "tiiuae/falcon-7b"

#Get config.json file from the base model name directory
config = AutoConfig.from_pretrained(base_model_name, trust_remote_code=True,  max_new_tokens=2048)

#Get model from HuggingFace's transformers library
model = AutoModelForQuestionAnswering.from_pretrained(
    model_name,
    config=config,
    quantization_config=bnb_config,
    trust_remote_code=True
)


model.config.use_cache = False


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading (…)figuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



Downloading (…)/adapter_config.json:   0%|          | 0.00/445 [00:00<?, ?B/s]

Downloading (…)n/modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00008.bin:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00004-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00005-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00006-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00007-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00008-of-00008.bin:   0%|          | 0.00/921M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Some weights of FalconForQuestionAnswering were not initialized from the model checkpoint at ybelkada/falcon-7b-sharded-bf16 and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading adapter_model.bin:   0%|          | 0.00/4.13M [00:00<?, ?B/s]

Below we will load the configuration file in order to create the LoRA model. According to QLoRA paper, it is important to consider all linear layers in the transformer block for maximum performance. Therefore we will add `dense`, `dense_h_to_4_h` and `dense_4h_to_h` layers in the target modules in addition to the mixed query key value layer.

Lora configuration:

In [6]:
from peft import LoraConfig

#Setup numerical value for LoRA
lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

#Use LoRA config for fine-tuning this model
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="QUESTION_ANSWERING",
    inference_mode=False,
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

AdaLora configuration:

In [None]:
from peft import AdaLoraConfig

peft_config = AdaLoraConfig(
    init_r=12,
    target_r=8,
    beta1=0.85,
    beta2=0.85,
    tinit=200,
    tfinal=1000,
    deltaT=10,
    lora_alpha=32,
    lora_dropout=0.1,
    task_type="CAUSAL_LM",
    inference_mode=False,
        target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

IA3 configuration:

In [5]:
from peft import IA3Config

peft_config = IA3Config(
    task_type="QUESTION_ANSWERING",
    feedforward_modules=[
        "dense_h_to_4h",
        "dense_4h_to_h"
    ],
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h"
    ],
)

In [None]:
print(peft_config)

IA3Config(peft_type=<PeftType.IA3: 'IA3'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='QUESTION_ANSWERING', inference_mode=False, target_modules=['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'], feedforward_modules=['dense_h_to_4h', 'dense_4h_to_h'], fan_in_fan_out=False, modules_to_save=None, init_ia3_weights=True)


## Loading the trainer

Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [7]:
from transformers import TrainingArguments

#Arguments needed for training process
output_dir = "falcon-7b-sharded-2"
gradient_accumulation_steps = 1
#Paged adamw 32 bits optimization algorithm
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
eval_steps = 200
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 300
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=True,
    eval_accumulation_steps=50,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    eval_steps=eval_steps,
    do_eval=True,
    do_predict=True,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    evaluation_strategy='epoch',
    num_train_epochs=1
)

(Archived) Data processing for PubMedQA:

In [None]:
#Summary text
def context_summarize(row):
    full_context = ''.join(row['context']['contexts'])
    inputs = summary_tokenizer(full_context, max_length=1024, truncation=True, return_tensors="pt")
    summary_tokens = summary_model.generate(inputs["input_ids"], num_beams=2, min_length=100, max_length=300)
    row['context'] = summary_tokenizer.batch_decode(summary_tokens, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return row

#Format the text
def get_text(row):
  row['text'] = "### Human: " + row['context'] + " " + row['question'] + "### Assistant: " + row['long_answer']
  return row

#Summary answer
def get_answer(row):
  answer = row['long_answer']
  inputs = summary_tokenizer(answer, max_length=1024, truncation=True, return_tensors="pt")
  summary_tokens = summary_model.generate(inputs["input_ids"], num_beams=2, min_length=10, max_length=30)
  row['answers'] = summary_tokenizer.batch_decode(summary_tokens, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
  return row

# #Sample the output
# sample_row = dataset_reformat[0]
# print(' '.join(sample_row['context']['contexts']))
# sample_row = context_summarize(sample_row)
# print(sample_row['context'])

dataset = dataset.map(get_answer)




Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

ValueError: ignored

In [None]:
eval_dataset.add_column(name="answers", column=eval_dataset['long_answer'])
eval_dataset = eval_dataset.map(get_answer)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:


dataset.add_column(name="", column=dataset['long_answer'])
dataset = dataset.map(get_text,  remove_columns=dataset_reformat["train"].column_names)



In [46]:
def get_input_ids(row):
  question = dataset[0]["question"] + " "
  context = dataset[0]["context"]
  inputs = tokenizer(
      question,
      context,
      truncation="only_second",
      return_offsets_mapping=True,
      padding="max_length",
  )
  row['input_ids'] = inputs['input_ids']
  row['attention_mask'] = inputs['attention_mask']
  row['answer'] = row['answer_text']
  offset_mapping = inputs.pop("offset_mapping")

  start_char = row["answer_start"]
  end_char = row["answer_start"] + len(row['answer'])

  row["start_positions"] = start_char
  row["end_positions"] = end_char
  return row

dataset.add_column(name="input_ids", column=dataset['question'])
dataset.add_column(name="attention_mask", column=dataset['question'])
dataset.add_column(name="answer", column=dataset['answer_text'])
dataset.add_column(name="end_position", column=dataset['answer_text'])
dataset.add_column(name="start_position", column=dataset['answer_text'])
dataset = dataset.map(get_input_ids)

eval_dataset.add_column(name="input_ids", column=eval_dataset['question'])
eval_dataset.add_column(name="attention_mask", column=eval_dataset['question'])
eval_dataset.add_column(name="answer", column=eval_dataset['answer_text'])
eval_dataset.add_column(name="end_position", column=eval_dataset['answer_text'])
eval_dataset.add_column(name="start_position", column=eval_dataset['answer_text'])
eval_dataset = eval_dataset.map(get_input_ids)


Map:   0%|          | 0/884 [00:00<?, ? examples/s]

Map:   0%|          | 0/104 [00:00<?, ? examples/s]

In [50]:
from huggingface_hub import notebook_login

token = "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"
notebook_login()

dataset.push_to_hub("cpgqa_processed")
eval_dataset.push_to_hub("cpgqa_processed_eval")



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

In [36]:
print(dataset[0]['answer'])

assist Primary Care providers in determining if an opioid taper is necessary for a specific patient, in performing the taper, and in providing follow-up and support during the taper


In [37]:
for row in dataset:
  if row['input_ids'] == None:
    print(row)
  if row['attention_mask'] == None:
    print(row)
  if row['answer'] == None:
    print(row)


In [None]:
#Format the input
def combine_context(row):
  row['context'] = ''.join(row['context']['contexts'])
  return row

def get_text(row):
  row['text'] = "### Context: " + row['context'] +  "### Question: " + row['question'] + "### Answer: " + row['long_answer']
  return row

dataset_train = dataset_reformat['train']
dataset_train.add_column(name="text", column=dataset_train['long_answer'])
dataset_train = dataset_train.map(combine_context)
dataset_train = dataset_train.map(get_text,  remove_columns=dataset_reformat["train"].column_names)

NameError: ignored

Define evaluation metrics and re-size evaluation dataset
> Indented block



In [8]:
import evaluate

#Metric using f1
def compute_metrics(eval_preds):
    metric_f1 = evaluate.load("f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric_f1.compute(predictions=predictions[0], references=labels[0], average="weighted")


In [9]:
#Break down the test set to get 200 instance for evaluation
#eval_dataset_breakdown["test"] is the desired evaluation set
eval_dataset_breakdown = eval_dataset['train'].train_test_split(train_size=0.9, test_size=0.1)
print(eval_dataset_breakdown)

DatasetDict({
    train: Dataset({
        features: ['title', 'id', 'question', 'answer_text', 'answer_start', 'context', 'input_ids', 'attention_mask', 'answer', 'start_positions', 'end_positions'],
        num_rows: 795
    })
    test: Dataset({
        features: ['title', 'id', 'question', 'answer_text', 'answer_start', 'context', 'input_ids', 'attention_mask', 'answer', 'start_positions', 'end_positions'],
        num_rows: 89
    })
})


In [9]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

In [10]:
from transformers import Trainer
from peft import PeftModelForQuestionAnswering

max_seq_length = 2048
peft_model = PeftModelForQuestionAnswering(model=model, peft_config=peft_config)

trainer = Trainer(
    model=peft_model,
    train_dataset=dataset["train"],
    eval_dataset=eval_dataset_breakdown["test"],
    tokenizer=tokenizer,
    args=training_arguments,
    compute_metrics=compute_metrics,
)

In [11]:
#Datatype preprocessing for inference
for param in model.parameters():
    # Check if parameter dtype is Float (float32)
    if param.dtype == torch.float32:
        param.data = param.data.to(torch.float16)

print(model)

FalconForQuestionAnswering(
  (transformer): FalconModel(
    (word_embeddings): Embedding(65024, 4544)
    (h): ModuleList(
      (0-31): 32 x FalconDecoderLayer(
        (self_attention): FalconAttention(
          (maybe_rotary): FalconRotaryEmbedding()
          (query_key_value): Linear4bit(
            in_features=4544, out_features=4672, bias=False
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4544, out_features=64, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=64, out_features=4672, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
          (dense): Linear4bit(
            in_features=4544, out_features=4544, bias=False
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1, 

In [11]:
for param in model.parameters():
  if (param.dtype != torch.float32):
    param.data = param.data.to(torch.float32)




OutOfMemoryError: ignored

In [26]:
from transformers import QuestionAnsweringPipeline

pipeline = QuestionAnsweringPipeline(model=model, tokenizer=tokenizer)
#Sample instance from dataset"
#text= 'Here is the context: "The aim of the study was to investigate the association between fat distribution and asthma severity in children. Anthropometric measures including height, weight, wider neck circumference, waist circumference, and hip circumference were obtained. The prevalence of children with wider neck circumference higher than 90th percentile was also more frequent in children with severe asthma (15 [41.7%) vs. 21 [23.1%)) A total of 127 children (82 male, 64.6%) with a median age of 8.3 (6.4-11.3) years were included.". Answer this question based on the given context: Is wider neck circumference related to severe asthma in children?'
context = "The Opioid Taper Decision Tool is designed to assist Primary Care providers in determining if an opioid taper is necessary for a specific patient, in performing the taper, and in providing follow-up and support during the taper."
question = "What is the purpose of Opioid Taper Decision Tool?"

#Generated text
print(pipeline(question=question, context=context, max_answer_len=300, max_question_len=300))

{'score': 0.009610171429812908, 'start': 65, 'end': 228, 'answer': ' providers in determining if an opioid taper is necessary for a specific patient, in performing the taper, and in providing follow-up and support during the taper.'}


In [16]:
print(trainer.args)

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=50,
eval_delay=0,
eval_steps=200,
evaluation_strategy=epoch,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=N

We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [None]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

## Train the model

Now let's train the model! Simply call `trainer.train()`

In [None]:
import wandb
wandb.login(key="2cae7567987969ff534ca431429f53beb6f16fdf")
wandb.init()

In [None]:
import numpy as np
trainer.train()

During training, the model should converge nicely as follows:

![image](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/loss-falcon-7b.png)

The `SFTTrainer` also takes care of properly saving only the adapters during training instead of saving the entire model.

Access token: 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

In [None]:
#Login to huggingface
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
#Push model to hub
trainer.push_to_hub('QLoRA applied #2')

adapter_model.bin:   0%|          | 0.00/522M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/4.09k [00:00<?, ?B/s]

'https://huggingface.co/pbaoo2705/falcon-7b-sharded-qlora/tree/main/'

In [2]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:50"

In [13]:
import numpy as np
trainer.evaluate()

Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


{'eval_loss': nan,
 'eval_f1': 0.00024968789013732833,
 'eval_runtime': 77.6589,
 'eval_samples_per_second': 1.146,
 'eval_steps_per_second': 1.146}

Clear RAM for re-use session

In [None]:
import gc
gc.collect()

2443

In [None]:
from evaluate import evaluator
from datasets import load_dataset
from transformers import AutoModelForCausalLM

f1_calculator = evaluator('question-answering')

def reformat_dataset(row):
  answers = { 'text': row['text'], 'answer_start': [1]}
  row['id'] = row['pubid']
  row['answers'] = answers
  return row

#Validation set load
dataset_name = "pbaoo2705/processed_dataset_v2"
test_dataset = load_dataset(dataset_name, split='test')
print(test_dataset)
test_dataset = test_dataset.map(reformat_dataset)
test_model = AutoModelForCausalLM.from_pretrained(
    'pbaoo2705/falcon-7b-sharded',
    # quantization_config=bnb_config,
    trust_remote_code=True
)
results = f1_calculator.compute(
    model_or_pipeline=test_model,
    tokenizer=tokenizer,
    data=test_dataset,
)

Dataset({
    features: ['pubid', 'question', 'context', 'long_answer', 'final_decision', 'text'],
    num_rows: 1000
})


In [None]:
print(results)

In [None]:
!nvidia-smi

Thu Oct  5 04:02:00 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P0    45W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces