# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

### lodading datasets and tokenizer

In [1]:
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding, Trainer, TrainingArguments
import torch
import torch.nn.functional as F
import pandas as pd
import numpy as np

In [2]:
dataset = load_dataset("google/boolq")
dataset

Downloading readme:   0%|          | 0.00/6.57k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 3.69M/3.69M [00:00<00:00, 10.6MB/s]
Downloading data: 100%|██████████| 1.26M/1.26M [00:00<00:00, 9.68MB/s]


Generating train split:   0%|          | 0/9427 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3270 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'passage'],
        num_rows: 9427
    })
    validation: Dataset({
        features: ['question', 'answer', 'passage'],
        num_rows: 3270
    })
})

In [3]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [72]:
def process_rows(batch, tokenizer):
    tmp_list = []
    label_list = []
    for i in range(len(batch['question'])):
        concatenated =  batch['passage'][i] + '<|endoftext|>' +\
                        batch['question'][i] + '<|endoftext|>' +\
                        'Yes or No?<|endoftext|>'
        tmp_list.append(concatenated)
        
        # Convert answer to label
        answer = batch['answer'][i]
        label = 1 if answer == True else 0
        label_list.append(label)
        
    # Tokenize the concatenated text
    tokenized = tokenizer(tmp_list, truncation=True, padding=True, return_tensors="pt")
    tokenized["labels"] = torch.tensor(label_list)
    return tokenized

dataset_train = dataset['train'].map(
    lambda batch: process_rows(batch, tokenizer), batched=True)
dataset_validation = dataset['validation'].map(
    lambda batch: process_rows(batch, tokenizer), batched=True)


Map:   0%|          | 0/9427 [00:00<?, ? examples/s]

Map:   0%|          | 0/3270 [00:00<?, ? examples/s]

In [6]:
print(dataset_train)

Dataset({
    features: ['question', 'answer', 'passage', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 9427
})


### loadding fundation model

In [7]:
from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification

In [8]:
#model = AutoModelForCausalLM.from_pretrained("gpt2")
model = AutoModelForSequenceClassification.from_pretrained('gpt2', 
        num_labels=2,
        id2label={0: "right", 1: "wrong"},
        label2id={"wrong": 0, "right": 1}
        )

model.config.pad_token_id = model.config.eos_token_id

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
model

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)

### evaluating original fundation model output

Random pick some QA passages and check outputs

In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

In [11]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="steps",
    eval_steps=10,
    per_device_eval_batch_size=5,
    seed=42,
    disable_tqdm=False,  # True 则禁用 tqdm 进度条
    
)

validation_sample = dataset_validation.select(range(0, 500))

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    train_dataset=dataset_train,
    eval_dataset = validation_sample,
    #eval_dataset=tokenized_dataset["validation"],
)

In [12]:
trainer.evaluate()
#trainer.evaluate(eval_dataset=validation_sample)

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'eval_loss': 0.7548133134841919,
 'eval_accuracy': 0.628,
 'eval_runtime': 34.6784,
 'eval_samples_per_second': 14.418,
 'eval_steps_per_second': 2.884}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [13]:
from peft import LoraConfig, get_peft_model

In [14]:
config = LoraConfig(fan_in_fan_out = True, task_type="SEQ_CLS")
lora_model = get_peft_model(model, config)
lora_model.config.pad_token_id = model.config.eos_token_id

In [15]:
lora_model.print_trainable_parameters()
lora_model

trainable params: 297,984 || all params: 124,737,792 || trainable%: 0.23888830740245906


PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()

In [54]:
len(dataset_train)

9427

In [80]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = torch.from_numpy(predictions)
    labels = torch.from_numpy(labels)
    
    loss = F.cross_entropy(predictions, labels)
    accuracy = (torch.argmax(predictions, dim=1) == labels).float().mean()
    
    return {"eval_loss": loss.item(), "eval_accuracy": accuracy.item()}

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=5,
    per_device_eval_batch_size=5,
    learning_rate=1e-4,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    num_train_epochs=1,
    weight_decay=0.01,
    warmup_steps=50,
    load_best_model_at_end=True,
    disable_tqdm=False,
)

In [69]:
validation_sample = dataset_validation.select(range(0, 500))
trainer_sample = dataset_train.select(range(0, 9000))

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=trainer_sample,
    eval_dataset=validation_sample,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    
)
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6504,0.625678,0.668


Checkpoint destination directory ./results/checkpoint-1000 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=1000, training_loss=0.6281192016601562, metrics={'train_runtime': 866.9169, 'train_samples_per_second': 5.768, 'train_steps_per_second': 1.154, 'total_flos': 2198322585354240.0, 'train_loss': 0.6281192016601562, 'epoch': 1.0})

In [82]:
trainer.evaluate()

{'eval_loss': 0.625678300857544,
 'eval_accuracy': 0.6679999828338623,
 'eval_runtime': 29.5822,
 'eval_samples_per_second': 16.902,
 'eval_steps_per_second': 3.38}

### save model

In [126]:
tokenizer.save_pretrained("lora-tokenizer")

('lora-tokenizer/tokenizer_config.json',
 'lora-tokenizer/special_tokens_map.json',
 'lora-tokenizer/vocab.json',
 'lora-tokenizer/merges.txt',
 'lora-tokenizer/added_tokens.json',
 'lora-tokenizer/tokenizer.json')

In [70]:
lora_model.save_pretrained("gpt2-lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [121]:
from peft import AutoPeftModelForSequenceClassification
lora_model_load = AutoPeftModelForSequenceClassification.from_pretrained(
    "gpt2-lora", 
#     num_labels=2,
#     id2label={0: "right", 1: "wrong"},
#     label2id={"wrong": 0, "right": 1}
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [122]:
tokenizer_load = AutoTokenizer.from_pretrained("lora-tokenizer")
lora_model.config.pad_token_id = tokenizer.eos_token_id
lora_model_load.config = lora_model.config

In [123]:
validation_sample = dataset_validation.select(range(0, 500))
#trainer_sample = dataset_validation.select(range(0, 3000))

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    learning_rate=1e-4,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    num_train_epochs=2,
    weight_decay=0.01,
    #warmup_steps=100,
    load_best_model_at_end=True,
    disable_tqdm=False,
)

trainer = Trainer(
    model=lora_model_load,
    args=training_args,
    #train_dataset=trainer_sample,
    eval_dataset=validation_sample,
    tokenizer=tokenizer_load,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer_load),
    compute_metrics=compute_metrics,
    
)

In [124]:
trainer.evaluate()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'eval_loss': 1.3073432445526123,
 'eval_accuracy': 0.3700000047683716,
 'eval_runtime': 30.5296,
 'eval_samples_per_second': 16.378,
 'eval_steps_per_second': 16.378}

In [125]:
items_for_manual_review = dataset_validation.select(
    [0, 1, 5, 10, 20, 30]
)

results = trainer.predict(items_for_manual_review)
df = pd.DataFrame(
    {
        "passage": [item["passage"] for item in items_for_manual_review],
        "question": [item["question"] for item in items_for_manual_review],
        "answer": [item["answer"] for item in items_for_manual_review],
        "predictions": results.predictions.argmax(axis=1),
        "labels": results.label_ids,
    }
)
# Show all the cell
pd.set_option("display.max_colwidth", None)
df

Unnamed: 0,passage,question,answer,predictions,labels
0,"All biomass goes through at least some of these steps: it needs to be grown, collected, dried, fermented, distilled, and burned. All of these steps require resources and an infrastructure. The total amount of energy input into the process compared to the energy released by burning the resulting ethanol fuel is known as the energy balance (or ``energy returned on energy invested''). Figures compiled in a 2007 report by National Geographic Magazine point to modest results for corn ethanol produced in the US: one unit of fossil-fuel energy is required to create 1.3 energy units from the resulting ethanol. The energy balance for sugarcane ethanol produced in Brazil is more favorable, with one unit of fossil-fuel energy required to create 8 from the ethanol. Energy balance estimates are not easily produced, thus numerous such reports have been generated that are contradictory. For instance, a separate survey reports that production of ethanol from sugarcane, which requires a tropical climate to grow productively, returns from 8 to 9 units of energy for each unit expended, as compared to corn, which only returns about 1.34 units of fuel energy for each unit of energy expended. A 2006 University of California Berkeley study, after analyzing six separate studies, concluded that producing ethanol from corn uses much less petroleum than producing gasoline.",does ethanol take more energy make that produces,False,0,0
1,"Property tax or 'house tax' is a local tax on buildings, along with appurtenant land. It is and imposed on the Possessor (not the custodian of property as per 1978, 44th amendment of constitution). It resembles the US-type wealth tax and differs from the excise-type UK rate. The tax power is vested in the states and is delegated to local bodies, specifying the valuation method, rate band, and collection procedures. The tax base is the annual rental value (ARV) or area-based rating. Owner-occupied and other properties not producing rent are assessed on cost and then converted into ARV by applying a percentage of cost, usually four percent. Vacant land is generally exempt. Central government properties are exempt. Instead a 'service charge' is permissible under executive order. Properties of foreign missions also enjoy tax exemption without requiring reciprocity. The tax is usually accompanied by service taxes, e.g., water tax, drainage tax, conservancy (sanitation) tax, lighting tax, all using the same tax base. The rate structure is flat on rural (panchayat) properties, but in the urban (municipal) areas it is mildly progressive with about 80% of assessments falling in the first two brackets.",is house tax and property tax are same,True,0,1
2,"Barq's /ˈbɑːrks/ is an American soft drink. Its brand of root beer is notable for having caffeine. Barq's, created by Edward Barq and bottled since the turn of the 20th century, is owned by the Barq family but bottled by the Coca-Cola Company. It was known as Barq's Famous Olde Tyme Root Beer until 2012.",is barq's root beer a pepsi product,False,0,0
3,"In response to the National Minimum Drinking Age Act in 1984, which reduced by up to 10% the federal highway funding of any state which did not have a minimum purchasing age of 21, the New York Legislature raised the drinking age from 19 to 21, effective December 1, 1985. (The drinking age had been 18 for many years before the first raise on December 4th, 1982, to 19.) Persons under 21 are prohibited from purchasing alcohol or possessing alcohol with the intent to consume, unless the alcohol was given to that person by their parent or legal guardian. There is no law prohibiting where people under 21 may possess or consume alcohol that was given to them by their parents. Persons under 21 are prohibited from having a blood alcohol level of 0.02% or higher while driving.",can minors drink with parents in new york,True,0,1
4,"Street Addressing will have the same street address of the post office, plus a ``unit number'' that matches the P.O. Box number. As an example, in El Centro, California, the post office is located at 1598 Main Street. Therefore, for P.O. Box 9975 (fictitious), the Street Addressing would be: 1598 Main Street Unit 9975, El Centro, CA. Nationally, the first five digits of the zip code may or may not be the same as the P.O. Box address, and the last four digits (Zip + 4) are virtually always different. Except for a few of the largest post offices in the U.S., the 'Street Addressing' (not the P.O. Box address) nine digit Zip + 4 is the same for all boxes at a given location.",does p o box come before street address,False,0,0
5,"As part of Marvel's Marvel NOW! initiative a new Deadpool ongoing series was launched, written by Brian Posehn and Gerry Duggan and illustrated by Tony Moore. He is also a member of the Thunderbolts. In the 27th issue of his new series, as part of ``All-New Marvel NOW!'', Deadpool was married for the third time. Initially a secret, his bride was revealed in the webcomic Deadpool: The Gauntlet to be Shiklah, Queen of the Undead. Deadpool also discovers that he has a daughter by the name of Eleanor from a former flame of Deadpool named Carmelita.",does deadpool have a kid in the comics,True,0,1
