# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

### Load the datasdet dair-air/emotion and explore the data

In [1]:
from datasets import load_dataset

ds = load_dataset("dair-ai/emotion", "split")
ds

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [2]:
import random

# print some random featues and the labels
print("Features:")
indices = random.sample(range(len(ds["train"])), 10)
for i in indices:
    print("{} : {}".format(ds["train"]['text'][i], ds["train"]['label'][i]))

print("\nLabels: {}".format(ds["train"].features["label"].names))

Features:
i hate not feeling useful : 1
i reflect back on all the beer i drank i feel shamed : 0
i had begun to feel apprehensive when thick black rain clouds stormed into the sky above town : 4
i do now as compared with years ago is that i no longer feel i have to be accepted by others only those who matter to me : 2
i feel like each kid left school this year with at least three pieces they were really proud of : 1
i actually feel solidarity with the americans who went on to cry for blood in iraq tortured prisoners and the stripping of the bill of rights : 4
i feel that when i run i that is me sarah the mind am supporting this body : 1
i feel suspicious when i see this redundant use of the credential : 4
i do feel lonely at times and at times i still feel that i am alone : 0
i miss the feeling of loving : 2

Labels: ['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']


In [3]:
# create data structures for further processing

# names of the splits
splits=list(ds.keys())
# number of classes
num_classes=len(ds["train"].features["label"].names)

# Dictionairies to translate between label string and label number
id2label = dict(zip(range(num_classes), ds['train'].features['label'].names))
label2id = dict(zip(ds['train'].features['label'].names, range(num_classes)))
print(id2label)
print(label2id)

{0: 'sadness', 1: 'joy', 2: 'love', 3: 'anger', 4: 'fear', 5: 'surprise'}
{'sadness': 0, 'joy': 1, 'love': 2, 'anger': 3, 'fear': 4, 'surprise': 5}


In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSequenceClassification
import torch

# Use GPT-2 as a small base model
# Create a variant with classification head
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "openai-community/gpt2"
model = AutoModelForSequenceClassification.from_pretrained(
    model_id, 
    num_labels=num_classes,
    id2label=id2label,
    label2id=label2id,
    device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Add tokens to the dataset
tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )

for param in model.base_model.parameters():
    param.requires_grad = False

# Add the padding token which is missing in GPT-2
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    model.resize_token_embeddings(len(tokenizer))
    model.config.pad_token_id = model.config.eos_token_id
    print("Padding token: {}".format(tokenizer.pad_token))

# metric function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


Padding token: [PAD]


In [5]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
import os

temp_path = "/tmp"
save_path = "./data"

model_name = "gpt2_classification"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir_base = os.path.join(save_path, model_name)

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir=checkpoint_dir,
        learning_rate=2e-3,
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        num_train_epochs=5,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.222557,0.5505
2,No log,1.140038,0.571
3,No log,1.10735,0.5785
4,1.250500,1.125446,0.583
5,1.250500,1.092814,0.5815


TrainOutput(global_step=800, training_loss=1.1946510696411132, metrics={'train_runtime': 67.4856, 'train_samples_per_second': 1185.438, 'train_steps_per_second': 11.854, 'total_flos': 2309089289011200.0, 'train_loss': 1.1946510696411132, 'epoch': 5.0})

In [6]:
# Evaluate the model
original_performance=trainer.evaluate()
print(original_performance)

model.save_pretrained(save_dir_base, save_embedding_layers=True)

{'eval_loss': 1.0928144454956055, 'eval_accuracy': 0.5815, 'eval_runtime': 1.3331, 'eval_samples_per_second': 1500.268, 'eval_steps_per_second': 15.003, 'epoch': 5.0}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [7]:
from peft import LoraConfig, TaskType, get_peft_model

torch.cuda.empty_cache()

# Use Lora for PEFT
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.TOKEN_CLS,
    fan_in_fan_out=True,
)

# adding PEFT modifies the base model in-place
# so it should be saved for restoring the PEFT model later
model_lora = get_peft_model(model, peft_config)
model_lora.print_trainable_parameters()

model_name = "gpt2_classification_lora"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir = os.path.join(save_path, model_name)

trainable params: 594,432 || all params: 125,039,616 || trainable%: 0.4754


In [8]:
trainer_lora = Trainer(
    model=model_lora,
    args=TrainingArguments(
        output_dir=checkpoint_dir,
        learning_rate=2e-3,
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        num_train_epochs=5,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_lora.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.235753,0.91
2,No log,0.160215,0.9295
3,No log,0.137495,0.935
4,0.254300,0.125307,0.934
5,0.254300,0.124479,0.9325




TrainOutput(global_step=800, training_loss=0.2025426959991455, metrics={'train_runtime': 145.967, 'train_samples_per_second': 548.069, 'train_steps_per_second': 5.481, 'total_flos': 2325225977856000.0, 'train_loss': 0.2025426959991455, 'epoch': 5.0})

###  ⚠️ IMPORTANT ⚠️

Due to workspace storage constraints, you should not store the model weights in the same directory but rather use `/tmp` to avoid workspace crashes which are irrecoverable.
Ensure you save it in /tmp always.

In [9]:
# Saving the model

model_lora.save_pretrained(save_dir, save_embedding_layers=True)

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [10]:
from peft import PeftModelForTokenClassification

# loading the model
model_base = AutoModelForSequenceClassification.from_pretrained(save_dir_base)
model_loaded = PeftModelForTokenClassification.from_pretrained(model_base, save_dir)

In [None]:
trainer_evaluate = Trainer(
    model=model_loaded,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_lora_evaluate",
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        do_train=False,
        do_eval=True,
    ),
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

fine_tuned_performance=trainer_evaluate.evaluate()from peft import LoraConfig, TaskType, get_peft_model

torch.cuda.empty_cache()

# Use Lora for PEFT
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.TOKEN_CLS,
    fan_in_fan_out=True,
)

# adding PEFT modifies the base model in-place
# so it should be saved for restoring the PEFT model later
model_lora = get_peft_model(model, peft_config)
model_lora.print_trainable_parameters()

model_name = "gpt2_classification_lora"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir = os.path.join(save_path, model_name)

In [12]:

print("Original Model:  ", original_performance)
print("Fine-Tuned Model:", fine_tuned_performance)

print("Original Model accurcy:   ", original_performance['eval_accuracy'])
print("Fine-Tuned Model accurcy: ", fine_tuned_performance['eval_accuracy'])

Original Model:   {'eval_loss': 1.0895805358886719, 'eval_accuracy': 0.582, 'eval_runtime': 1.328, 'eval_samples_per_second': 1506.029, 'eval_steps_per_second': 15.06, 'epoch': 5.0}
Fine-Tuned Model: {'eval_loss': 0.12447859346866608, 'eval_model_preparation_time': 0.0019, 'eval_accuracy': 0.9325, 'eval_runtime': 1.43, 'eval_samples_per_second': 1398.587, 'eval_steps_per_second': 13.986}
Original Model accurcy:    0.582
Fine-Tuned Model accurcy:  0.9325


### Use different Quantization: QLoRA

In [13]:
from transformers import BitsAndBytesConfig

torch.cuda.empty_cache()

temp_path = "/tmp"
save_path = "./data"

model_name = "gpt2_classification_4bit_lora"
checkpoint_dir = os.path.join(temp_path, model_name)
save_dir_base = os.path.join(save_path, model_name)

model_id = "openai-community/gpt2"
quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model4b = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    num_labels=num_classes,
    id2label=id2label,
    label2id=label2id,
    torch_dtype="auto")

model4b.resize_token_embeddings(len(tokenizer))
model4b.config.pad_token_id = model.config.eos_token_id

for param in model4b.base_model.parameters():
    param.requires_grad = False

# peft model
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.TOKEN_CLS,
    fan_in_fan_out=True,
)

model4bl = get_peft_model(model4b, peft_config)
model4bl.print_trainable_parameters()

trainer = Trainer(
    model=model4bl,
    args=TrainingArguments(
        output_dir=checkpoint_dir,
        learning_rate=2e-3,
        per_device_train_batch_size=100,
        per_device_eval_batch_size=100,
        num_train_epochs=5,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        fp16=True
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()
validation_lora_q4 = trainer.evaluate()
model4bl.save_pretrained(save_dir, save_embedding_layers=True)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 594,432 || all params: 125,039,616 || trainable%: 0.4754




Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.245774,0.911
2,No log,0.163552,0.927
3,No log,0.13895,0.929
4,0.402400,0.127697,0.9325
5,0.402400,0.122325,0.9345




In [14]:
print("Original Model accurcy:         ", original_performance['eval_accuracy'])
print("Fine-Tuned Model accurcy:       ", fine_tuned_performance['eval_accuracy'])
print("Fine-Tuned Model 4 bit accurcy: ", validation_lora_q4['eval_accuracy'])

Original Model accurcy:          0.582
Fine-Tuned Model accurcy:        0.9325
Fine-Tuned Model 4 bit accurcy:  0.9345


### Experiment with different LoRA parameters

In [30]:
from peft import PeftModelForTokenClassification, LoraConfig, TaskType, get_peft_model
import pandas as pd

torch.cuda.empty_cache()


def create_lora_config(r, lora_alpha, lora_dropout):
    peft_config = LoraConfig(
        r=r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        task_type=TaskType.TOKEN_CLS,
        fan_in_fan_out=True,
    )

    return peft_config

def create_lora_model(peft_config):
    model_base = AutoModelForSequenceClassification.from_pretrained(save_dir_base)
    model_lora = get_peft_model(model_base, peft_config)

    return model_lora
    
def create_trainer(model, learning_rate, weight_decay):
    trainer = Trainer(
        model=model,
        args=TrainingArguments(
            output_dir='/tmp',
            per_device_train_batch_size=50,
            per_device_eval_batch_size=50,
            num_train_epochs=4,
            learning_rate=learning_rate,
            weight_decay=weight_decay,
            eval_strategy="epoch",
            save_strategy="epoch",
            load_best_model_at_end=True,
        ),
        train_dataset=tokenized_ds["train"],
        eval_dataset=tokenized_ds["test"],
        processing_class=tokenizer,
        data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
        compute_metrics=compute_metrics,
    )

    return trainer

def evaluate_model(model):
    eval = model.evaluate()

results = []

In [None]:
for r in [8, 4, 2]:
    dropout = 0.1
    learning_rate = 2e-3
    weight_decay = 0.01
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.200721,0.924
2,0.330700,0.158218,0.9255
3,0.330700,0.138393,0.9335
4,0.162700,0.116763,0.9345


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01    0.9345
Start training a model with R=4, alpha=8 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.202029,0.92
2,0.349900,0.161765,0.931
3,0.349900,0.142318,0.928
4,0.170900,0.131283,0.9275


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01    0.9345
1  4      8      0.1          0.002          0.01    0.9275
Start training a model with R=2, alpha=4 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.225922,0.9125
2,0.380200,0.175321,0.9315
3,0.380200,0.15973,0.9295
4,0.191200,0.138409,0.928


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01    0.9345
1  4      8      0.1          0.002          0.01    0.9275
2  2      4      0.1          0.002          0.01    0.9280
   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1          0.002          0.01    0.9345
1  4      8      0.1          0.002          0.01    0.9275
2  2      4      0.1          0.002          0.01    0.9280


In [None]:
for learning_rate in [2e-4, 2e-3, 2e-2]:
    dropout = 0.1
    r = 8
    weight_decay = 0.01
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.384705,0.852
2,0.554400,0.276423,0.888
3,0.554400,0.234788,0.907
4,0.318600,0.219001,0.9105


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9345
1  4      8      0.1         0.0020          0.01    0.9275
2  2      4      0.1         0.0020          0.01    0.9280
3  8     16      0.1         0.0002          0.01    0.9105
Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.174814,0.917
2,0.330200,0.200212,0.924
3,0.330200,0.137183,0.9285
4,0.159300,0.114883,0.925


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9345
1  4      8      0.1         0.0020          0.01    0.9275
2  2      4      0.1         0.0020          0.01    0.9280
3  8     16      0.1         0.0002          0.01    0.9105
4  8     16      0.1         0.0020          0.01    0.9250
Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.941357,0.2905
2,1.926900,1.66673,0.291
3,1.926900,1.634083,0.332
4,1.712900,1.559309,0.351


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9345
1  4      8      0.1         0.0020          0.01    0.9275
2  2      4      0.1         0.0020          0.01    0.9280
3  8     16      0.1         0.0002          0.01    0.9105
4  8     16      0.1         0.0020          0.01    0.9250
5  8     16      0.1         0.0200          0.01    0.3510
   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16      0.1         0.0020          0.01    0.9345
1  4      8      0.1         0.0020          0.01    0.9275
2  2      4      0.1         0.0020          0.01    0.9280
3  8     16      0.1         0.0002          0.01    0.9105
4  8     16      0.1         0.0020          0.01    0.9250
5  8     16      0.1         0.0200          0.01    0.3510


In [33]:
for dropout in [0.01, 0.1, 0.5]:
    learning_rate = 2e-3
    r = 8
    weight_decay = 0.01
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Start training a model with R=8, alpha=16 droptout=0.01


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.191566,0.9215
2,0.329600,0.145977,0.9355
3,0.329600,0.151537,0.9275
4,0.155400,0.127282,0.9255


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020          0.01    0.9345
1  4      8     0.10         0.0020          0.01    0.9275
2  2      4     0.10         0.0020          0.01    0.9280
3  8     16     0.10         0.0002          0.01    0.9105
4  8     16     0.10         0.0020          0.01    0.9250
5  8     16     0.10         0.0200          0.01    0.3510
6  8     16     0.01         0.0020          0.01    0.9255
Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.174814,0.917
2,0.330200,0.200212,0.924
3,0.330200,0.137183,0.9285
4,0.159300,0.114883,0.925


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020          0.01    0.9345
1  4      8     0.10         0.0020          0.01    0.9275
2  2      4     0.10         0.0020          0.01    0.9280
3  8     16     0.10         0.0002          0.01    0.9105
4  8     16     0.10         0.0020          0.01    0.9250
5  8     16     0.10         0.0200          0.01    0.3510
6  8     16     0.01         0.0020          0.01    0.9255
7  8     16     0.10         0.0020          0.01    0.9250
Start training a model with R=8, alpha=16 droptout=0.5


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.20493,0.9185
2,0.371700,0.163888,0.925
3,0.371700,0.139484,0.93
4,0.193500,0.120642,0.937


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020          0.01    0.9345
1  4      8     0.10         0.0020          0.01    0.9275
2  2      4     0.10         0.0020          0.01    0.9280
3  8     16     0.10         0.0002          0.01    0.9105
4  8     16     0.10         0.0020          0.01    0.9250
5  8     16     0.10         0.0200          0.01    0.3510
6  8     16     0.01         0.0020          0.01    0.9255
7  8     16     0.10         0.0020          0.01    0.9250
8  8     16     0.50         0.0020          0.01    0.9370


In [34]:
for weight_decay in [0.001, 0.01, 0.1]:
    learning_rate = 2e-3
    dropout = 0.1
    r = 8
    alpha = 2 * r
    config = create_lora_config(r, alpha, dropout)
    model = create_lora_model(config)
    trainer = create_trainer(model, learning_rate, weight_decay)

    print("Start training a model with R={}, alpha={} droptout={}".format(r, alpha, dropout))

    trainer.train()
    eval = trainer.evaluate()

    accuracy = eval['eval_accuracy']
    results.append({'r': r, 'alpha': alpha, 'dropout': dropout, 'learning_rate':learning_rate, 'weight_decay': weight_decay, 'accuracy': accuracy})
    df_results = pd.DataFrame(results)
    print(df_results)

Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.198457,0.9185
2,0.341400,0.142298,0.9285
3,0.341400,0.136052,0.9285
4,0.164700,0.118024,0.933


   r  alpha  dropout  learning_rate  weight_decay  accuracy
0  8     16     0.10         0.0020         0.010    0.9345
1  4      8     0.10         0.0020         0.010    0.9275
2  2      4     0.10         0.0020         0.010    0.9280
3  8     16     0.10         0.0002         0.010    0.9105
4  8     16     0.10         0.0020         0.010    0.9250
5  8     16     0.10         0.0200         0.010    0.3510
6  8     16     0.01         0.0020         0.010    0.9255
7  8     16     0.10         0.0020         0.010    0.9250
8  8     16     0.50         0.0020         0.010    0.9370
9  8     16     0.10         0.0020         0.001    0.9330
Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.174814,0.917
2,0.330200,0.200212,0.924
3,0.330200,0.137183,0.9285
4,0.159300,0.114883,0.925


    r  alpha  dropout  learning_rate  weight_decay  accuracy
0   8     16     0.10         0.0020         0.010    0.9345
1   4      8     0.10         0.0020         0.010    0.9275
2   2      4     0.10         0.0020         0.010    0.9280
3   8     16     0.10         0.0002         0.010    0.9105
4   8     16     0.10         0.0020         0.010    0.9250
5   8     16     0.10         0.0200         0.010    0.3510
6   8     16     0.01         0.0020         0.010    0.9255
7   8     16     0.10         0.0020         0.010    0.9250
8   8     16     0.50         0.0020         0.010    0.9370
9   8     16     0.10         0.0020         0.001    0.9330
10  8     16     0.10         0.0020         0.010    0.9250
Start training a model with R=8, alpha=16 droptout=0.1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.194818,0.922
2,0.337200,0.152118,0.93
3,0.337200,0.137306,0.928
4,0.162800,0.116735,0.9315


    r  alpha  dropout  learning_rate  weight_decay  accuracy
0   8     16     0.10         0.0020         0.010    0.9345
1   4      8     0.10         0.0020         0.010    0.9275
2   2      4     0.10         0.0020         0.010    0.9280
3   8     16     0.10         0.0002         0.010    0.9105
4   8     16     0.10         0.0020         0.010    0.9250
5   8     16     0.10         0.0200         0.010    0.3510
6   8     16     0.01         0.0020         0.010    0.9255
7   8     16     0.10         0.0020         0.010    0.9250
8   8     16     0.50         0.0020         0.010    0.9370
9   8     16     0.10         0.0020         0.001    0.9330
10  8     16     0.10         0.0020         0.010    0.9250
11  8     16     0.10         0.0020         0.100    0.9315
