# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA (Low Rank Adaptation)
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
# Install the required versino of datasets if needed (uncomment to run)
# You may need to restart the kernel after running this cell
# ! pip install -q "datasets==2.15.0"

In [1]:
# Load the Climate Sentiment dataset from Hugging Face
# Link for more info: https://huggingface.co/datasets/climatebert/climate_sentiment?row=12
from datasets import load_dataset

# Load the train and test splits of the climate_sentiment dataset
splits = ["train", "test"]
ds = {split: ds for split, ds in zip(splits, load_dataset("climatebert/climate_sentiment", split=splits))}

# Show the dataset
ds

  from .autonotebook import tqdm as notebook_tqdm


{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 1000
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 320
 })}

In [2]:
# Inspect the first element. For labels, 0 is risk, 1 is neutral, and 2 is opportunity
ds['train'][0]

{'text': '− Scope 3: Optional scope that includes indirect emissions associated with the goods and services supply chain produced outside the organization. Included are emissions from the transport of products from our logistics centres to stores (downstream) performed by external logistics operators (air, land and sea transport) as well as the emissions associated with electricity consumption in franchise stores.',
 'label': 1}

### Pre-process datasets
The dataset needs to be processed by converting all of the text into tokens for the models.

In [3]:
from transformers import GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token  # Assign padding token
tokenizer.build_inputs_with_special_tokens(tokenizer.all_special_tokens)  # Rebuild vocabulary

# Tokenize dataset
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = ds[split].map(
        lambda x: tokenizer(x['text'], padding = 'max_length', truncation=True), batched=True
    )

In [4]:
tokenized_dataset

{'train': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 1000
 }),
 'test': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 320
 })}

In [5]:
# Display the first element. The tokenized elements are stored in 'input_ids'
tokenized_dataset['train'][0]

{'text': '− Scope 3: Optional scope that includes indirect emissions associated with the goods and services supply chain produced outside the organization. Included are emissions from the transport of products from our logistics centres to stores (downstream) performed by external logistics operators (air, land and sea transport) as well as the emissions associated with electricity consumption in franchise stores.',
 'label': 1,
 'input_ids': [14095,
  41063,
  513,
  25,
  32233,
  8354,
  326,
  3407,
  12913,
  8971,
  3917,
  351,
  262,
  7017,
  290,
  2594,
  5127,
  6333,
  4635,
  2354,
  262,
  4009,
  13,
  34774,
  389,
  8971,
  422,
  262,
  4839,
  286,
  3186,
  422,
  674,
  26355,
  19788,
  284,
  7000,
  357,
  2902,
  5532,
  8,
  6157,
  416,
  7097,
  26355,
  12879,
  357,
  958,
  11,
  1956,
  290,
  5417,
  4839,
  8,
  355,
  880,
  355,
  262,
  8971,
  3917,
  351,
  8744,
  7327,
  287,
  8663,
  7000,
  13,
  50256,
  50256,
  50256,
  50256,
  50256,
  

In [6]:
from transformers import GPT2ForSequenceClassification

model = GPT2ForSequenceClassification.from_pretrained(
    'gpt2',
    num_labels=3,
    id2label={0: 'risk', 1: 'neutral', 2: 'opportunity'},
    label2id={'risk': 0, 'neutral': 1, 'opportunity': 2}
)

# Freeze all the parameters of the base model using param.requires_grad = False
# more info here: https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad = False

# Use model.score to output the final classification layer for GPT2. In others it may be model.classifier
model.score

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Linear(in_features=768, out_features=3, bias=False)

In [7]:
# Print full model parameters
print(model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=3, bias=False)
)


### Evaluate base model on test set

In [8]:
model.eval()

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=3, bias=False)
)

In [9]:
import torch

input_ids = tokenized_dataset["test"][0]["input_ids"]
attention_mask = tokenized_dataset["test"][0]["attention_mask"]

# Convert to tensors (assuming attention_mask is a list)
input_ids = torch.tensor([input_ids])
attention_mask = torch.tensor([attention_mask])

with torch.no_grad():
    outputs = model(input_ids, attention_mask=attention_mask)
    logits = outputs.logits
    predicted_label_index = torch.argmax(logits, dim=-1)
    predicted_label = model.config.id2label[predicted_label_index.item()]


print('Predicted label: ', predicted_label)
print('Predicted index: ', predicted_label_index)



Predicted label:  neutral
Predicted index:  tensor([1])


In [10]:
tokenized_dataset['test'][1]

{'text': 'Verizon’s environmental, health and safety management system provides a framework for identifying, controlling, and reducing the risks associated with the environments in which we operate. Besides regular management system assessments, internal and third-party compliance audits and inspections are performed annually at hundreds of facilities worldwide. The goal of these assessments is to identify and correct site-specific issues, and to educate and empower facility managers and supervisors to implement corrective actions. Verizon’s environment, health and safety efforts are directed and supported by experienced experts around the world that support our operations and facilities.',
 'label': 1,
 'input_ids': [13414,
  8637,
  447,
  247,
  82,
  6142,
  11,
  1535,
  290,
  3747,
  4542,
  1080,
  3769,
  257,
  9355,
  329,
  13720,
  11,
  12755,
  11,
  290,
  8868,
  262,
  7476,
  3917,
  351,
  262,
  12493,
  287,
  543,
  356,
  8076,
  13,
  16238,
  3218,
  4542,
  1

In [11]:
from datasets import load_metric

metric = load_metric('accuracy')

def predict(model, tokenized_dataset, split="test"):
  """
  Function to make predictions on a specific split of the tokenized dataset.
  """

  predictions = [] # Collect predictions
  labels = [] # Collect true labels for evaluation
  for i, datapoint in enumerate(tokenized_dataset[split]):
    if i >= 5:
      break
    input_ids = torch.tensor([datapoint["input_ids"]])
    attention_mask = torch.tensor([datapoint["attention_mask"]])

    with torch.no_grad():
      outputs = model(input_ids, attention_mask=attention_mask)
      logits = outputs.logits
      predicted_label_index = torch.argmax(logits, dim=-1)
      predicted_label = model.config.id2label[predicted_label_index.item()]
      predictions.append(predicted_label_index.item())
      labels.append(datapoint['label'])
      
  # Assuming model outputs integers 0, 1, 2 for labels
  # predictions_int = [model.config.label2id[pred] for pred in predictions]
  labels_string = [model.config.id2label[label] for label in labels]

  # # Use converted integer labels for metric calculation
  metric.add_batch(predictions=predictions, references=labels)
  accuracy = metric.compute()
  # accuracy_alt = compute_metrics(predictions, labels)

  return predictions, labels, accuracy#, accuracy_alt

# Make predictions on the test set and calculate accuracy
base_predictions, base_labels, base_accuracy = predict(model, tokenized_dataset, split="test")
print("Test Set Predictions: ", base_predictions)
print("Test Set Labels: ", base_labels)
print("Test Set Accuracy: ", base_accuracy)

  metric = load_metric('accuracy')


Test Set Predictions:  [1, 1, 1, 1, 1]
Test Set Labels:  [0, 1, 1, 0, 0]
Test Set Accuracy:  {'accuracy': 0.4}


False

### Train model without Fine-Tuning

In [12]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {'accuracy': (predictions == labels).mean()}

# Use HuggingFace Trainer class for training and evaluating mbase model
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        per_device_train_batch_size=1, # Keeping low for low memory
        per_device_eval_batch_size=1,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy='epoch',
        save_strategy='epoch',
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    )

trainer.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
 50%|█████     | 500/1000 [03:52<03:57,  2.11it/s]

{'loss': 2.0265, 'grad_norm': 0.27658286690711975, 'learning_rate': 0.001, 'epoch': 0.5}


100%|██████████| 1000/1000 [07:45<00:00,  2.12it/s]

{'loss': 1.9409, 'grad_norm': 14.449419975280762, 'learning_rate': 0.0, 'epoch': 1.0}


                                                   
100%|██████████| 1000/1000 [09:43<00:00,  2.12it/s]

{'eval_loss': 1.417924165725708, 'eval_accuracy': 0.509375, 'eval_runtime': 118.4536, 'eval_samples_per_second': 2.701, 'eval_steps_per_second': 2.701, 'epoch': 1.0}


100%|██████████| 1000/1000 [09:44<00:00,  1.71it/s]

{'train_runtime': 584.3157, 'train_samples_per_second': 1.711, 'train_steps_per_second': 1.711, 'train_loss': 1.9837210083007812, 'epoch': 1.0}





TrainOutput(global_step=1000, training_loss=1.9837210083007812, metrics={'train_runtime': 584.3157, 'train_samples_per_second': 1.711, 'train_steps_per_second': 1.711, 'train_loss': 1.9837210083007812, 'epoch': 1.0})

In [13]:
trainer.evaluate()

100%|██████████| 320/320 [01:50<00:00,  2.89it/s]


{'eval_loss': 1.3843942880630493,
 'eval_accuracy': 0.509375,
 'eval_runtime': 111.0946,
 'eval_samples_per_second': 2.88,
 'eval_steps_per_second': 2.88,
 'epoch': 1.0}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [7]:
import torch
device = torch.device("mps")

In [8]:
from peft import LoraConfig
config = LoraConfig(
    r=2,  # Rank of LoRA decomposition; low rank to start out quickly training
    lora_alpha=16,  # Learning rate for LoRA weights
    lora_dropout=0.05,  # Dropout rate for LoRA weights
    bias="none",  # Remove biases from the original model
    task_type="CLASSIFICATION"  # Text classification task
    ) #Look at LoRA adapter documentation for additional hyperparameters

In [None]:
# from transformers import AutoModelForCausalLM
# model = AutoModelForCausalLM.from_pretrained('gpt2')

In [9]:
from peft import get_peft_model
lora_model = get_peft_model(model, config)



In [10]:
lora_model.print_trainable_parameters()

trainable params: 73,728 || all params: 124,515,840 || trainable%: 0.059211743662493065


In [11]:
lora_model.to(device)

PeftModel(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=2, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=2, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
      

In [60]:
# from codecarbon import EmissionsTracker

# # Initialize carbon tracker to track energy consumption
# tracker = EmissionsTracker(
#     project_name='lora_climate_train',
#     output_dir='./data/carbon_tracking',
#     output_file='emissions.csv',
#     log_level='error'
# )
# tracker.start()
# float = tracker.stop()


FileNotFoundError: [Errno 2] No such file or directory: '/Users/andrewwrist/Documents/Projects/AI_ML/venv/lib/python3.11/site-packages/codecarbon/data/hardware/cpu_power.csv'

In [12]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {'accuracy': (predictions == labels).mean()}

# Use HuggingFace Trainer class for training and evaluating mbase model
lora_trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/lora",
        learning_rate=2e-3,
        per_device_train_batch_size=1, # Keeping low for low memory
        per_device_eval_batch_size=1,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy='epoch',
        save_strategy='epoch',
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    )

lora_trainer.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
 50%|█████     | 500/1000 [08:06<08:01,  1.04it/s]

{'loss': 1.4601, 'grad_norm': 0.10177532583475113, 'learning_rate': 0.001, 'epoch': 0.5}


100%|██████████| 1000/1000 [16:27<00:00,  1.12s/it]

{'loss': 1.1495, 'grad_norm': 2.6937546730041504, 'learning_rate': 0.0, 'epoch': 1.0}


                                                   
100%|██████████| 1000/1000 [18:51<00:00,  1.12s/it]

{'eval_runtime': 144.1881, 'eval_samples_per_second': 2.219, 'eval_steps_per_second': 2.219, 'epoch': 1.0}


KeyError: 'eval_loss'

In [15]:
lora_trainer.evaluate()

                                                   
100%|██████████| 1000/1000 [24:51<00:00,  1.12s/it]

{'eval_runtime': 134.6331, 'eval_samples_per_second': 2.377, 'eval_steps_per_second': 2.377, 'epoch': 1.0}


{'eval_runtime': 134.6331,
 'eval_samples_per_second': 2.377,
 'eval_steps_per_second': 2.377,
 'epoch': 1.0}

In [20]:
lora_model.save_pretrained('gpt_lora_climate_sentiment')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [None]:
from peft import AutoPeftModelForCausalLM
lora_model = AutoPeftModelForCausalLM.from_pretrained('gpt_lora_climate_sentiment')

In [None]:
from transformers import AutoTokenizer

tokenizer_lora = AutoTokenizer.from_pretrained('gpt2')
inputs = tokenizer_lora(...<mask>)