# Fine-Tuning with Falcon 7B, Bits and Bytes, and QLoRA

Today we'll explore fine-tuning the Falcon-7B model available on Kaggle Models using QLoRA, Bits and Bytes, and PEFT.

- QLoRA: [Quantized Low Rank Adapters](https://arxiv.org/pdf/2305.14314.pdf) - this is a method for fine-tuning LLMs that uses a small number of quantized, updateable parameters to limit the complexity of training. This technique also allows those small sets of parameters to be added efficiently into the model itself, which means you can do fine-tuning on lots of data sets, potentially, and swap these "adapters" into your model when necessary.
- [Bits and Bytes](https://github.com/TimDettmers/bitsandbytes): An excellent package by Tim Dettmers et al., which provides a lightweight wrapper around custom CUDA functions that make LLMs go faster - optimizers, matrix mults, and quantization. In this notebook we'll be using the library to load our model as efficiently as possible.
- [PEFT](https://github.com/huggingface/peft): An excellent Huggingface library that enables a number Parameter Efficient Fine-tuning (PEFT) methods, which again make it less expensive to fine-tune LLMs - especially on more lightweight hardware like that present in Kaggle notebooks.

Claim detection

.

## Package Installation

Note that we're loading very specific versions of these libraries. Dependencies in this space can be quite difficult to untangle, and simply taking the latest version of each library can lead to conflicting version requirements. It's a good idea to take note of which versions work for your particular use case, and `pip install` them directly.

In [None]:
# !pip install -qqq bitsandbytes==0.39.0

!pip install bitsandbytes==0.40.2

Collecting bitsandbytes==0.40.2
  Downloading bitsandbytes-0.40.2-py3-none-any.whl (92.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.40.2


In [None]:
!pip install -qqq torch==2.0.1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m619.9/619.9 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.0/21.0 MB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.8/11.8 MB[0m [31m58.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m557.1/557.1 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.1/317.1 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.4/168.4 MB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.6/54.6 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━

In [None]:
!pip install -qqq -U git+https://github.com/huggingface/transformers.git@e03a9cc

[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone


In [None]:
!pip install -qqq -U git+https://github.com/huggingface/peft.git@42a184f


[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone


In [None]:
!pip install -qqq -U git+https://github.com/huggingface/accelerate.git@c9fbb71


[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for accelerate (pyproject.toml) ... [?25l[?25hdone


In [None]:
!pip install -qqq datasets==2.12.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install -qqq loralib==0.1.1
!pip install -qqq einops==0.6.1

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25h

# Import

In [None]:
import pandas as pd
import json
import os
from pprint import pprint
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset, Dataset
from huggingface_hub import notebook_login

from peft import LoraConfig, PeftConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

In [None]:
# !python -m bitsandbytes

# Loading and preparing our model

We're going to use the FALCON 7B model for our test. We'll be using Bits and Bytes to load it in 4-bit format, which should reduce memory consumption considerably, at a cost of some accuracy.

Note the parameters in `BitsAndBytesConfig` - this is a fairly standard 4-bit quantization configuration, loading the weights in 4-bit format, using a straightforward format (`normal float 4`) with double quantization to improve QLoRA's resolution. The weights are converted back to `bfloat16` for weight updates, then the extra precision is discarded.

In [None]:
!huggingface-cli login --token HF_Token

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
#model_name = "meta-llama/Llama-2-7b-chat-hf"
model_name = "tiiuae/falcon-7b-instruct"
MODEL_NAME = model_name

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

configuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Below, we'll use a nice PEFT wrapper to set up our model for training / fine-tuning. Specifically this function sets the output embedding layer to allow gradient updates, as well as performing some type casting on various components to ensure the model is ready to be updated.

In [None]:
model = prepare_model_for_kbit_training(model)

Below, we define some helper functions - their purpose is to properly identify our update layers so we can... update them!

In [None]:
import re
def get_num_layers(model):
    numbers = set()
    for name, _ in model.named_parameters():
        for number in re.findall(r'\d+', name):
            numbers.add(int(number))
    return max(numbers)

def get_last_layer_linears(model):
    names = []

    num_layers = get_num_layers(model)
    for name, module in model.named_modules():
        if str(num_layers) in name and not "encoder" in name:
            if isinstance(module, torch.nn.Linear):
                names.append(name)
    return names

## LORA config

Some key elements from this configuration:
1. `r` is the width of the small update layer. In theory, this should be set wide enough to capture the complexity of the problem you're attempting to fine-tune for. More simple problems may be able to get away with smaller `r`. In our case, we'll go very small, largely for the sake of speed.
2. `target_modules` is set using our helper functions - every layer identified by that function will be included in the PEFT update.

In [None]:
config = LoraConfig(
    r=2,
    lora_alpha=32,
    target_modules=get_last_layer_linears(model),
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
################################
config.inference_mode = False
######################################
model = get_peft_model(model, config)

## Load some data

Here, we're loading a 200,000 question Jeopardy dataset. In the interests of time we won't load all of them - just the first 1000 - but we'll fine-tune our model using the question and answers. Note that what we're training the model to do is use its existing knowledge (plus whatever little it learns from our question-answer pairs) to answer questions in the *format* we want, specifically short answers.

In [None]:
#df = pd.read_csv("/kaggle/input/200000-jeopardy-questions/JEOPARDY_CSV.csv", nrows=1000)
#df.columns = [str(q).strip() for q in df.columns]

#data = Dataset.from_pandas(df)

In [None]:
from datasets import load_dataset

data = load_dataset("health_fact")

data

Downloading builder script:   0%|          | 0.00/7.08k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.61k [00:00<?, ?B/s]

Downloading and preparing dataset health_fact/default to /root/.cache/huggingface/datasets/health_fact/default/1.1.0/99503637e4255bd805f84d57031c18fe4dd88298f00299d56c94fc59ed68ec19...


Downloading data:   0%|          | 0.00/24.9M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9832 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1235 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1225 [00:00<?, ? examples/s]

Dataset health_fact downloaded and prepared to /root/.cache/huggingface/datasets/health_fact/default/1.1.0/99503637e4255bd805f84d57031c18fe4dd88298f00299d56c94fc59ed68ec19. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['claim_id', 'claim', 'date_published', 'explanation', 'fact_checkers', 'main_text', 'sources', 'label', 'subjects'],
        num_rows: 9832
    })
    test: Dataset({
        features: ['claim_id', 'claim', 'date_published', 'explanation', 'fact_checkers', 'main_text', 'sources', 'label', 'subjects'],
        num_rows: 1235
    })
    validation: Dataset({
        features: ['claim_id', 'claim', 'date_published', 'explanation', 'fact_checkers', 'main_text', 'sources', 'label', 'subjects'],
        num_rows: 1225
    })
})

In [None]:
df = data['train'].to_pandas().head(1000)
df = df.rename(columns={'label':'Answer'})
data = Dataset.from_pandas(df)

In [None]:
df

Unnamed: 0,claim_id,claim,date_published,explanation,fact_checkers,main_text,sources,Answer,subjects
0,15661,"""The money the Clinton Foundation took from fr...","April 26, 2015","""Gingrich said the Clinton Foundation """"took m...",Katie Sanders,"""Hillary Clinton is in the political crosshair...",https://www.wsj.com/articles/clinton-foundatio...,0,"Foreign Policy, PunditFact, Newt Gingrich,"
1,9893,Annual Mammograms May Have More False-Positives,"October 18, 2011",This article reports on the results of a study...,,While the financial costs of screening mammogr...,,1,"Screening,WebMD,women's health"
2,11358,SBRT Offers Prostate Cancer Patients High Canc...,"September 28, 2016",This news release describes five-year outcomes...,"Mary Chris Jaklevic,Steven J. Atlas, MD, MPH,K...",The news release quotes lead researcher Robert...,https://www.healthnewsreview.org/wp-content/up...,1,"Association/Society news release,Cancer"
3,10166,"Study: Vaccine for Breast, Ovarian Cancer Has ...","November 8, 2011","While the story does many things well, the ove...",,"The story does discuss costs, but the framing ...",http://clinicaltrials.gov/ct2/results?term=can...,2,"Cancer,WebMD,women's health"
4,11276,Some appendicitis cases may not require ’emerg...,"September 20, 2010",We really don’t understand why only a handful ...,,"""Although the story didn’t cite the cost of ap...",,2,
...,...,...,...,...,...,...,...,...,...
995,10774,Focus is on deep brain stimulation to treat di...,"April 30, 2007","""The story reports on deep brain stimulation (...",,"""The story does not mention the cost of this t...",../view_content/detail.php?type=Theme&id=26,1,
996,3902,Ex-officer accused of shooting neighbor claims...,,A former California Highway Patrol officer cla...,,The documents were filed in support of Trever ...,https://www.vcstar.com/story/news/local/commun...,2,"Shootings, Mental health, Health, General News..."
997,14104,"In 2006, Donald Trump was hoping for a real es...","May 26, 2016",Two charities will pay $6 million to resolve c...,C. Eugene Emery Jr.,The settlements with the patient assistance ch...,https://www.theglobeandmail.com/report-on-busi...,2,"National, Bankruptcy, Candidate Biography, Deb..."
998,10287,New valve procedure doesn’t open heart,"February 26, 2007","""This was a story about evolving options for t...",,There was no cost estimate provided for this m...,,0,


In [None]:
prompt = "Given the following claim: " + df["claim"].values[0]+""" pick one of the following option
(a) true
(b) false
(c) mixture
(d) unknown?
"""
prompt

'Given the following claim: "The money the Clinton Foundation took from from foreign governments while Hillary Clinton was secretary of state ""is clearly illegal. … The Constitution says you can’t take this stuff." pick one of the following option\n(a) true\n(b) false\n(c) mixture\n(d) unknown?\n'

## Let's generate!

Below we're setting up our generative model:
- Top P: a method for choosing from among a selection of most probable outputs, as opposed to greedily just taking the highest)
- Temperature: a modulation on the softmax function used to determine the values of our outputs
- We limit the return sequences to 1 - only one answer is allowed! - and deliberately force the answer to be short.

In [None]:
generation_config = model.generation_config
generation_config.max_new_tokens = 10
generation_config.temperature = 0.2
generation_config.top_p = 0.9
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

Now, we'll generate an answer to our first question, just to see how the model does!

It's fascinatingly wrong. :-)

In [None]:
%%time
device = "cuda"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(
        input_ids = encoding.input_ids,
        attention_mask = encoding.attention_mask,
        generation_config = generation_config
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Given the following claim: "The money the Clinton Foundation took from from foreign governments while Hillary Clinton was secretary of state ""is clearly illegal. … The Constitution says you can’t take this stuff." pick one of the following option
(a) true
(b) false
(c) mixture
(d) unknown?
(e) unknown? pick one of the following
CPU times: user 1.78 s, sys: 2.49 ms, total: 1.78 s
Wall time: 1.82 s


## Format our fine-tuning data

We'll match the prompt setup we used above.

In [None]:
data

Dataset({
    features: ['claim_id', 'claim', 'date_published', 'explanation', 'fact_checkers', 'main_text', 'sources', 'Answer', 'subjects'],
    num_rows: 1000
})

In [None]:
prompt = "Given the following claim: " + df["claim"].values[0]+""" pick one of the following option
(a) true
(b) false
(c) mixture
(d) unknown?
"""

prompt

'Given the following claim: "The money the Clinton Foundation took from from foreign governments while Hillary Clinton was secretary of state ""is clearly illegal. … The Constitution says you can’t take this stuff." pick one of the following option\n(a) true\n(b) false\n(c) mixture\n(d) unknown?\n'

In [None]:
data

Dataset({
    features: ['claim_id', 'claim', 'date_published', 'explanation', 'fact_checkers', 'main_text', 'sources', 'Answer', 'subjects'],
    num_rows: 1000
})

In [None]:
class_map = {0:'false',2:'true',1:'mixture',3:'unknown',-1:'not_applicable'}

In [None]:
def generate_prompt(data_point):
    return f"""Given the following claim:
            {data_point["claim"]}.
            pick one of the following option
            (a) true
            (b) false
            (c) mixture
            (d) unknown
            (e) not_applicable? Answer: {class_map[data_point["Answer"]]}""".strip()


def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data.shuffle().map(generate_and_tokenize_prompt)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
data

Dataset({
    features: ['claim_id', 'claim', 'date_published', 'explanation', 'fact_checkers', 'main_text', 'sources', 'Answer', 'subjects', 'input_ids', 'attention_mask'],
    num_rows: 1000
})

In [None]:
data['claim_id'][0]

'1068'

## Train!

Now, we'll use our data to update our model. Using the Huggingface `transformers` library, let's set up our training loop and then run it. Note that we are ONLY making one pass on all this data.

In [None]:
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=1e-4,
    fp16=True,
    output_dir="finetune_jeopardy",
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.01,
    report_to="none"
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False

In [None]:
trainer.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=250, training_loss=1.420882568359375, metrics={'train_runtime': 1379.9444, 'train_samples_per_second': 0.725, 'train_steps_per_second': 0.181, 'total_flos': 1268251486941696.0, 'train_loss': 1.420882568359375, 'epoch': 1.0})

In [None]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): FalconForCausalLM(
      (transformer): FalconModel(
        (word_embeddings): Embedding(65024, 4544)
        (h): ModuleList(
          (0-30): 31 x FalconDecoderLayer(
            (self_attention): FalconAttention(
              (maybe_rotary): FalconRotaryEmbedding()
              (query_key_value): Linear4bit(in_features=4544, out_features=4672, bias=False)
              (dense): Linear4bit(in_features=4544, out_features=4544, bias=False)
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): FalconMLP(
              (dense_h_to_4h): Linear4bit(in_features=4544, out_features=18176, bias=False)
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear4bit(in_features=18176, out_features=4544, bias=False)
            )
            (input_layernorm): LayerNorm((4544,), eps=1e-05, elementwise_affine=True)
          )
          (31): FalconDecoderLayer(
   

In [None]:
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

In [None]:
!huggingface-cli login --token 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

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
model.push_to_hub(
     "Your-HF-Repo/falcon-7b-qlora-chat-claim-data", use_auth_token=True
)


## Loading and using the model later

Now, we'll save the PEFT fine-tuned model, then load it and use it to generate some more answers.

In [None]:
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.5
generation_config.top_p = 0.9
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
import numpy as np

In [None]:
testdf = data['test'].to_pandas().head(1000)
testdf = testdf.rename(columns={'label':'Answer'})
testdata = Dataset.from_pandas(testdf)

In [None]:
testdf

Unnamed: 0,claim_id,claim,date_published,explanation,fact_checkers,main_text,sources,Answer,subjects
0,33456,A mother revealed to her child in a letter aft...,"November 6, 2011",The one-eyed mother story expounds upon two mo...,David Mikkelson,"In April 2005, we spotted a tearjerker on the ...",,0,Glurge Gallery
1,2542,Study says too many Americans still drink too ...,"February 25, 2013","On any given day in the United States, 18 perc...",,That means the great majority of Americans sta...,http://bit.ly/X1NVtW,2,Health News
2,26678,Viral image Says 80% of novel coronavirus case...,"March 13, 2020",The website Information is Beautiful published...,Paul Specht,"Amid the spread of the novel coronavirus, many...",https://www.facebook.com/informationisbeautifu...,2,"Facebook Fact-checks, Coronavirus, Viral image,"
3,40705,An email says that 9-year old Craig Shergold o...,"March 16, 2015",Send greeting or business cards to cancer vict...,Rich Buhler & Staff,Craig Shergold is real and in 1989...,https://www.reddit.com/submit?url=https%3A%2F%...,0,"Inspirational, Pleas"
4,35718,"Employees at a Five Guys restaurant in Daphne,...","July 15, 2020","What's undetermined: As of this writing, Five ...",Dan MacGuill,"In July 2020, amid a new wave of nationwide pr...",,3,Law Enforcement
...,...,...,...,...,...,...,...,...,...
995,24605,The health care reform plan would set limits s...,"August 6, 2009",Club for Growth's health care ad campaign is m...,Catharine Richert,Like other groups criticizing health care refo...,https://online.wsj.com/article/SB1246929734353...,0,"National, Health Care, Club for Growth,"
996,32770,A man killed himself over the Treasury's decis...,"April 21, 2016",Empire Herald carries no disclaimer warning re...,Kim LaCapria,"On 21 April 2016, the web site Empire Herald p...","http://www.snopes.com/bodies-blm-carved-skin/,...",0,"Junk News, empire herald, harriet tubman, suicide"
997,10499,Extracting the facts about pomegranate pills,"June 13, 2011",This was not one of the LA Times’ Healthy Skep...,,The story includes the cost of pomegranate pil...,,2,"Los Angeles Times,Supplements"
998,11327,Estrogen Patch in Newly Postmenopausal Women M...,"July 21, 2016",A small pilot study has found that giving rece...,"Sharon Dunwoody, PhD,Karen Carlson, MD,Kathlyn...",Costs for hormone therapy are readily availabl...,https://www.healthnewsreview.org/wp-content/up...,1,"Alzheimer's disease,Hospital news release"


In [None]:
def generate_test_prompt(data_point):
    return f"""Given the following claim:
            {data_point["claim"]}.
            pick one of the following option
            (a) true
            (b) false
            (c) mixture
            (d) unknown
            (e) not_applicable?""".strip()



'Given the following claim:\n            A mother revealed to her child in a letter after her death that she had just one eye because she had donated the other to him..\n            pick one of the following option\n            (a) true\n            (b) false\n            (c) mixture\n            (d) unknown\n            (e) not_applicable?'

In [None]:
%%time

prompt = generate_test_prompt(testdf.iloc[0])
device = "cuda"
encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
  outputs = model.generate(
      input_ids = encoding.input_ids,
      attention_mask = encoding.attention_mask,
      generation_config = generation_config
  )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Given the following claim:
            A mother revealed to her child in a letter after her death that she had just one eye because she had donated the other to him..
            pick one of the following option
            (a) true
            (b) false
            (c) mixture
            (d) unknown
            (e) not_applicable? Answer: false. 
            pick one of the following
CPU times: user 1.84 s, sys: 0 ns, total: 1.84 s
Wall time: 1.87 s


In [None]:
outputs

tensor([[  780,   585,  5456,   362,  2396,  4410,    23, 33367, 30869,    23,
           273, 39245, 30869,   455,  5795, 14008, 23688,   345, 16551,   345,
          1777,    37,   513,   513,   513,   513,   513,   513,   513,   513,
           513,   513,   513,   513,   513,   513,   513,   513,   513,   513,
           513,   513,   513,   513,   513,   513,   513,   513,   513,   513,
           513,   513]], device='cuda:0')

### Next we will go to Hugging Face hub to deploy this model using Gradio/Streamlit