# Lightweight Fine-Tuning Project

* PEFT technique: LoRA
* Model: GPT-2
* Evaluation approach: Evaluate the pretrained model and the fine-tuned model with a test set and compare the result
* Fine-tuning dataset: dair-ai/emotion from HuggingFace

## Loading and Evaluating a Foundation Model


### Load the datasdet dair-air/emotion and explore the data

In [1]:
import os
import torch
import numpy as np
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSequenceClassification, DataCollatorWithPadding, Trainer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, TaskType, get_peft_model, PeftModelForTokenClassification

temp_path = "/tmp"
save_path = "./data"

In [2]:
ds = load_dataset("dair-ai/emotion", "split")
ds

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [3]:
import random

# print some random featues and the labels
print("Features:")
indices = random.sample(range(len(ds["train"])), 10)
for i in indices:
    print("{} : {}".format(ds["train"]['text'][i], ds["train"]['label'][i]))

print("\nLabels: {}".format(ds["train"].features["label"].names))

Features:
i feel the other person is unimportant but it is my interpretation see the trend that i have been misunderstood and that instead of wasting time hence the impatience part having them explain what i feel is already a misunderstanding i try to reexplain my intent : 0
i actually thought i would feel bothered being their since ehb and the other woman ow spent quite a bit of time together there but i didnt feel much of anything : 3
i feel the cold more than him : 3
i always feel reassured after my appts : 1
i moved into uni today and i feel so homesick and lonely and useless and part of mes saying fuck it go home and get a job and sod the degree : 0
i can t believe all the newborns that i ve photographed with heads full of dark hair but i am feeling just a little envious because my babies are bald and blonde as they come : 3
i kinda feel more relaxed with this blog than with the other one : 1
i am feeling super excited as the weeks seem to be flying by and we are getting closer an

In [4]:
# create data structures for further processing

# names of the splits
splits=list(ds.keys())
# number of classes
num_classes=len(ds["train"].features["label"].names)

# Dictionairies to translate between label string and label number
id2label = dict(zip(range(num_classes), ds['train'].features['label'].names))
label2id = dict(zip(ds['train'].features['label'].names, range(num_classes)))
print(id2label)
print(label2id)

{0: 'sadness', 1: 'joy', 2: 'love', 3: 'anger', 4: 'fear', 5: 'surprise'}
{'sadness': 0, 'joy': 1, 'love': 2, 'anger': 3, 'fear': 4, 'surprise': 5}


### Create a base model with added padding token

In [5]:
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"

# Create a base model variant with classification head
def create_base_model(model_id):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_id, 
        num_labels=num_classes,
        id2label=id2label,
        label2id=label2id,
        device_map=device)
    if model.config.pad_token_id is None:
        model.config.pad_token_id = model.config.eos_token_id

    return model

In [6]:
# Use GPT-2 as a small base model
model_id = "openai-community/gpt2"
model = create_base_model(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Add tokens to the dataset
tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )

for param in model.base_model.parameters():
    param.requires_grad = False

# Add the padding token which is missing in GPT-2
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# metric function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [7]:
model_name = "gpt2_classification"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir_base = os.path.join(save_path, model_name)

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir=checkpoint_dir,
        learning_rate=2e-3,
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        num_train_epochs=4,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.310627,0.497
2,No log,1.241377,0.5335
3,No log,1.226692,0.5425
4,1.334000,1.220351,0.544


TrainOutput(global_step=640, training_loss=1.2996052742004394, metrics={'train_runtime': 58.6993, 'train_samples_per_second': 1090.302, 'train_steps_per_second': 10.903, 'total_flos': 1848843351244800.0, 'train_loss': 1.2996052742004394, 'epoch': 4.0})

In [8]:
# Evaluate the model
original_performance=trainer.evaluate()
print(original_performance)

model.save_pretrained(save_dir_base)

{'eval_loss': 1.220351219177246, 'eval_accuracy': 0.544, 'eval_runtime': 1.3128, 'eval_samples_per_second': 1523.444, 'eval_steps_per_second': 15.234, 'epoch': 4.0}


## Performing Parameter-Efficient Fine-Tuning

Create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [9]:
# Use Lora for PEFT
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.TOKEN_CLS,
    fan_in_fan_out=True,
)

# create PEFT model, use a fresh pretrained base model
model_lora = get_peft_model(create_base_model(model_id), peft_config)
model_lora.print_trainable_parameters()

model_name = "gpt2_classification_lora"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir = os.path.join(save_path, model_name)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 594,432 || all params: 125,038,848 || trainable%: 0.4754


In [10]:
trainer_lora = Trainer(
    model=model_lora,
    args=TrainingArguments(
        output_dir=checkpoint_dir,
        learning_rate=2e-3,
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        num_train_epochs=4,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_lora.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.243477,0.9085
2,No log,0.168843,0.9275
3,No log,0.144285,0.927
4,0.319800,0.125742,0.9285


TrainOutput(global_step=640, training_loss=0.2757429122924805, metrics={'train_runtime': 111.8946, 'train_samples_per_second': 571.967, 'train_steps_per_second': 5.72, 'total_flos': 1861763687424000.0, 'train_loss': 0.2757429122924805, 'epoch': 4.0})

###  Save the model


In [11]:
# Saving the model
model_lora.save_pretrained(save_dir)

## Performing Inference with a PEFT Model

Load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Compare the results to the results from prior to fine-tuning.

In [12]:
model_id = "openai-community/gpt2"
model_base = create_base_model(model_id)

model_loaded = PeftModelForTokenClassification.from_pretrained(model_base, save_dir)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
trainer_evaluate = Trainer(
    model=model_loaded,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_lora_evaluate",
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        do_train=False,
        do_eval=True,
    ),
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

fine_tuned_performance=trainer_evaluate.evaluate()

In [14]:

#print("Original Model:  ", original_performance)
print("Fine-Tuned Model:", fine_tuned_performance)

#print("Original Model accurcy:   ", original_performance['eval_accuracy'])
print("Fine-Tuned Model accurcy: ", fine_tuned_performance['eval_accuracy'])

Fine-Tuned Model: {'eval_loss': 0.1257418692111969, 'eval_model_preparation_time': 0.0019, 'eval_accuracy': 0.9285, 'eval_runtime': 1.4194, 'eval_samples_per_second': 1409.083, 'eval_steps_per_second': 14.091}
Fine-Tuned Model accurcy:  0.9285


### Use different Quantization: QLoRA

In [15]:
model_id = "openai-community/gpt2"
temp_path = "/tmp"
save_path = "./data"

model_name = "gpt2_classification_4bit_lora"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir_base = os.path.join(save_path, model_name)

model_id = "openai-community/gpt2"
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)

model4b = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    num_labels=num_classes,
    id2label=id2label,
    label2id=label2id,
    torch_dtype="auto")

model4b.config.pad_token_id = model.config.eos_token_id

for param in model4b.base_model.parameters():
    param.requires_grad = False

# peft model
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.TOKEN_CLS,
    fan_in_fan_out=True,
)

model4bl = get_peft_model(model4b, peft_config)
model4bl.print_trainable_parameters()

trainer = Trainer(
    model=model4bl,
    args=TrainingArguments(
        output_dir=checkpoint_dir,
        learning_rate=2e-3,
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        num_train_epochs=5,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        fp16=True
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()
validation_lora_q4 = trainer.evaluate()
#model4bl.save_pretrained(save_dire)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 594,432 || all params: 125,038,848 || trainable%: 0.4754


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.24846,0.9105
2,No log,0.197286,0.927
3,No log,0.146974,0.9275
4,0.397200,0.126689,0.925
5,0.397200,0.120459,0.9285


In [16]:
print("Original Model accuracy:         ", original_performance['eval_accuracy'])
print("Fine-Tuned Model accuracy:       ", fine_tuned_performance['eval_accuracy'])
print("Fine-Tuned Model 4 bit accuracy: ", validation_lora_q4['eval_accuracy'])

Original Model accuracy:          0.544
Fine-Tuned Model accuracy:        0.9285
Fine-Tuned Model 4 bit accuracy:  0.9285


### Experiment with different LoRA parameters

In [17]:
def create_lora_config(r, lora_alpha, lora_dropout):
    peft_config = LoraConfig(
        r=r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        task_type=TaskType.TOKEN_CLS,
        fan_in_fan_out=True,
    )

    return peft_config

def create_lora_model(peft_config):
    model_base = create_base_model(model_id)
    model_lora = get_peft_model(model_base, peft_config)

    return model_lora
    
def create_trainer(model, learning_rate, weight_decay):
    trainer = Trainer(
        model=model,
        args=TrainingArguments(
            output_dir='/tmp',
            per_device_train_batch_size=50,
            per_device_eval_batch_size=50,
            num_train_epochs=4,
            learning_rate=learning_rate,
            weight_decay=weight_decay,
            eval_strategy="epoch",
            save_strategy="epoch",
            load_best_model_at_end=True,
        ),
        train_dataset=tokenized_ds["train"],
        eval_dataset=tokenized_ds["test"],
        processing_class=tokenizer,
        data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
        compute_metrics=compute_metrics,
    )

    return trainer

def evaluate_model(model):
    eval = model.evaluate()

results = []

In [18]:
for r in [8, 4, 2]:
    dropout = 0.1
    learning_rate = 2e-3
    weight_decay = 0.01
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.203967,0.9175
2,0.481400,0.164355,0.929
3,0.481400,0.152113,0.9295
4,0.165000,0.131185,0.931


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01     0.931


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=4, alpha=8 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.218007,0.915
2,0.494800,0.167399,0.929
3,0.494800,0.152778,0.9275
4,0.182300,0.13154,0.928


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01     0.931
1  4      8      0.1          0.002          0.01     0.928


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=2, alpha=4 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.243588,0.9115
2,0.528500,0.180413,0.9225
3,0.528500,0.171336,0.9225
4,0.196700,0.148653,0.9215


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01    0.9310
1  4      8      0.1          0.002          0.01    0.9280
2  2      4      0.1          0.002          0.01    0.9215


In [19]:
for learning_rate in [2e-4, 2e-3, 2e-2]:
    dropout = 0.1
    r = 8
    weight_decay = 0.01
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.514827,0.8065
2,1.004500,0.336314,0.8755
3,1.004500,0.292412,0.8935
4,0.397000,0.268815,0.899


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9310
1  4      8      0.1         0.0020          0.01    0.9280
2  2      4      0.1         0.0020          0.01    0.9215
3  8     16      0.1         0.0002          0.01    0.8990


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.208435,0.9205
2,0.475500,0.16255,0.9315
3,0.475500,0.148615,0.926
4,0.160000,0.121173,0.9305


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9310
1  4      8      0.1         0.0020          0.01    0.9280
2  2      4      0.1         0.0020          0.01    0.9215
3  8     16      0.1         0.0002          0.01    0.8990
4  8     16      0.1         0.0020          0.01    0.9305


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,2.046261,0.2955
2,2.357200,2.046261,0.2955
3,2.357200,2.046261,0.2955
4,2.089400,2.046261,0.2955


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9310
1  4      8      0.1         0.0020          0.01    0.9280
2  2      4      0.1         0.0020          0.01    0.9215
3  8     16      0.1         0.0002          0.01    0.8990
4  8     16      0.1         0.0020          0.01    0.9305
5  8     16      0.1         0.0200          0.01    0.2955


In [20]:
for dropout in [0.01, 0.1, 0.5]:
    learning_rate = 2e-3
    r = 8
    weight_decay = 0.01
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.01


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.222568,0.9135
2,0.470900,0.146687,0.9335
3,0.470900,0.136879,0.929
4,0.156000,0.117551,0.9345


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020          0.01    0.9310
1  4      8     0.10         0.0020          0.01    0.9280
2  2      4     0.10         0.0020          0.01    0.9215
3  8     16     0.10         0.0002          0.01    0.8990
4  8     16     0.10         0.0020          0.01    0.9305
5  8     16     0.10         0.0200          0.01    0.2955
6  8     16     0.01         0.0020          0.01    0.9345


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.208435,0.9205
2,0.475500,0.16255,0.9315
3,0.475500,0.148615,0.926
4,0.160000,0.121173,0.9305


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020          0.01    0.9310
1  4      8     0.10         0.0020          0.01    0.9280
2  2      4     0.10         0.0020          0.01    0.9215
3  8     16     0.10         0.0002          0.01    0.8990
4  8     16     0.10         0.0020          0.01    0.9305
5  8     16     0.10         0.0200          0.01    0.2955
6  8     16     0.01         0.0020          0.01    0.9345
7  8     16     0.10         0.0020          0.01    0.9305


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.5


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.224562,0.917
2,0.526600,0.179373,0.9245
3,0.526600,0.161703,0.924
4,0.203900,0.133277,0.927


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020          0.01    0.9310
1  4      8     0.10         0.0020          0.01    0.9280
2  2      4     0.10         0.0020          0.01    0.9215
3  8     16     0.10         0.0002          0.01    0.8990
4  8     16     0.10         0.0020          0.01    0.9305
5  8     16     0.10         0.0200          0.01    0.2955
6  8     16     0.01         0.0020          0.01    0.9345
7  8     16     0.10         0.0020          0.01    0.9305
8  8     16     0.50         0.0020          0.01    0.9270


In [21]:
for weight_decay in [0.001, 0.01, 0.1]:
    learning_rate = 2e-3
    dropout = 0.1
    r = 8
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.232432,0.9125
2,0.490000,0.167103,0.9275
3,0.490000,0.159356,0.926
4,0.161000,0.125772,0.9295



The request is taking longer than expected, please try again later. - silently ignoring the lookup for the file config.json in openai-community/gpt2.


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020         0.010    0.9310
1  4      8     0.10         0.0020         0.010    0.9280
2  2      4     0.10         0.0020         0.010    0.9215
3  8     16     0.10         0.0002         0.010    0.8990
4  8     16     0.10         0.0020         0.010    0.9305
5  8     16     0.10         0.0200         0.010    0.2955
6  8     16     0.01         0.0020         0.010    0.9345
7  8     16     0.10         0.0020         0.010    0.9305
8  8     16     0.50         0.0020         0.010    0.9270
9  8     16     0.10         0.0020         0.001    0.9295


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.208435,0.9205
2,0.475500,0.16255,0.9315
3,0.475500,0.148615,0.926
4,0.160000,0.121173,0.9305


    r  alpha  dropout  learning_rate  weight_decay  accuracy
0   8     16     0.10         0.0020         0.010    0.9310
1   4      8     0.10         0.0020         0.010    0.9280
2   2      4     0.10         0.0020         0.010    0.9215
3   8     16     0.10         0.0002         0.010    0.8990
4   8     16     0.10         0.0020         0.010    0.9305
5   8     16     0.10         0.0200         0.010    0.2955
6   8     16     0.01         0.0020         0.010    0.9345
7   8     16     0.10         0.0020         0.010    0.9305
8   8     16     0.50         0.0020         0.010    0.9270
9   8     16     0.10         0.0020         0.001    0.9295
10  8     16     0.10         0.0020         0.010    0.9305


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.229873,0.9155
2,0.488500,0.194819,0.926
3,0.488500,0.143995,0.9295
4,0.165700,0.125075,0.935


    r  alpha  dropout  learning_rate  weight_decay  accuracy
0   8     16     0.10         0.0020         0.010    0.9310
1   4      8     0.10         0.0020         0.010    0.9280
2   2      4     0.10         0.0020         0.010    0.9215
3   8     16     0.10         0.0002         0.010    0.8990
4   8     16     0.10         0.0020         0.010    0.9305
5   8     16     0.10         0.0200         0.010    0.2955
6   8     16     0.01         0.0020         0.010    0.9345
7   8     16     0.10         0.0020         0.010    0.9305
8   8     16     0.50         0.0020         0.010    0.9270
9   8     16     0.10         0.0020         0.001    0.9295
10  8     16     0.10         0.0020         0.010    0.9305
11  8     16     0.10         0.0020         0.100    0.9350
