<a href="https://colab.research.google.com/github/UritiSrikanth/Assignment-3--LLM-coding-and-report-submission/blob/main/Assignment_3_LLM_coding_and_report_submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Assignment_3_LLM_coding_and_report_submission**

**Dataset Link:** https://huggingface.co/datasets/fka/awesome-chatgpt-prompts

**GitHub Link:** https://github.com/UritiSrikanth/Assignment-3--LLM-coding-and-report-submission


## **CODING PART:**

In [20]:
!pip install datasets
!pip install evaluate



In [21]:
import matplotlib.pyplot as plt
from datasets import load_dataset
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments, DataCollatorWithPadding, TrainerCallback
import evaluate
import numpy as np
import torch



We start by loading the dataset using the load_dataset function from the datasets library. In this case, we load the "**fka/awesome-chatgpt-prompts**" dataset, which contains columns 'act' and 'prompt'.

In [22]:
# Step 1: Load the dataset_prompts
dataset_prompts = load_dataset("fka/awesome-chatgpt-prompts")
# Print the dataset structure
print("dataset_prompts structure:", dataset_prompts)

Using the latest cached version of the dataset since fka/awesome-chatgpt-prompts couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /root/.cache/huggingface/datasets/fka___awesome-chatgpt-prompts/default/0.0.0/7baf3f8a5f3d38acc585d42d12193b27baf8cf79 (last modified on Tue Aug 13 14:38:46 2024).


dataset_prompts structure: DatasetDict({
    train: Dataset({
        features: ['act', 'prompt'],
        num_rows: 153
    })
})


In [23]:
# Check the first few entries to understand the data
print(dataset_prompts['train'][:5])

# Analyze the 'act' column to understand the distribution
acts = [entry['act'] for entry in dataset_prompts['train']]
unique_acts, counts = np.unique(acts, return_counts=True)

# Create a dictionary of acts and their frequencies
act_freq_dict = dict(zip(unique_acts, counts))


{'act': ['Linux Terminal', 'English Translator and Improver', '`position` Interviewer', 'JavaScript Console', 'Excel Sheet'], 'prompt': ['I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets {like this}. my first command is pwd', 'I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language and you will detect the language, translate it and answer in the corrected and improved version of my text, in English. I want you to replace my simplified A0-level words and sentences with more beautiful and elegant, upper level English words and sentences. Keep the meaning same, but make them more literary. I want y

Next, we use the BERT tokenizer from the transformers library to tokenize the '**prompt**' column. The tokenize_function is defined to apply padding and truncation to the tokenized outputs, ensuring that all sequences have the same length. We then map this tokenization function to the dataset using the map method, which processes the dataset in batches for efficiency.

In [24]:
# Step 2: Tokenize the data
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def tokenize_function(examples):
    return tokenizer(examples['prompt'], padding="max_length", truncation=True)

processed_dataset_prompts = dataset_prompts.map(tokenize_function, batched=True)

Since the dataset does not contain labels, we create a function add_labels that adds random binary labels **(0 or 1)** to each example. This simulates a classification problem, enabling us to train and evaluate the model. We map this function to the tokenized dataset, generating a labeled dataset. We use the DataCollatorWithPadding to create batches of data with uniform lengths, which is necessary for efficient training on the GPU. The tokenized dataset is then split into training and evaluation subsets using an 90-10 split, providing separate data for training and validation.

In [25]:
# Step 3: Add dummy labels to the dataset_prompts (e.g., binary classification)
import random

def add_labels(examples):
    examples['labels'] = [random.randint(0, 1) for _ in range(len(examples['prompt']))]
    return examples

processed_dataset_prompts = processed_dataset_prompts.map(add_labels, batched=True)

# Prepare data for training
sequence_data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Split dataset_prompts into training and evaluation
processed_dataset_prompts = processed_dataset_prompts['train'].train_test_split(test_size=0.1)
train_dataset_prompts = processed_dataset_prompts['train']
eval_dataset_prompts = processed_dataset_prompts['test']

We initialize a BertForSequenceClassification model with two output labels, suitable for binary classification. We define TrainingArguments, specifying various training configurations like the output directory, evaluation strategy, logging settings, learning rate, batch sizes, number of epochs, and weight decay for regularization. We load the accuracy metric using the evaluate library and define the compute_metrics function to calculate accuracy from the model's predictions and the true labels. This function will be used during evaluation to track model performance. To log training and evaluation metrics, we create a custom MetricsCallback class that inherits from TrainerCallback. This class logs training loss, evaluation loss, training accuracy, and evaluation accuracy. The on_log method captures these metrics during training, and the on_epoch_end method computes training accuracy at the end of each epoch by iterating over the training dataloader and comparing predictions to true labels.




In [26]:
# Step 4: Set up the Trainer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    logging_dir='./logs',
    logging_steps=10,
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    save_total_limit=1,
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [27]:
# Define the evaluation metric
accuracy_metric = evaluate.load("accuracy")



Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [28]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

In [29]:
class MetricsCallback(TrainerCallback):
    def __init__(self):
        self.train_loss = []
        self.eval_loss = []
        self.train_acc = []
        self.eval_acc = []

    def on_log(self, args, state, control, logs=None, **kwargs):
        if logs is not None:
            if 'loss' in logs:
                self.train_loss.append((state.global_step, logs['loss']))
            if 'eval_loss' in logs:
                self.eval_loss.append((state.global_step, logs['eval_loss']))
            if 'eval_accuracy' in logs:
                self.eval_acc.append((state.global_step, logs['eval_accuracy']))

    def on_epoch_end(self, args, state, control, **kwargs):
        train_dataloader = trainer.get_train_dataloader()
        model.eval()
        total_correct = 0
        total_samples = 0
        for batch in train_dataloader:
            inputs = {k: v.to(args.device) for k, v in batch.items()}
            with torch.no_grad():
                outputs = model(**inputs)
            predictions = outputs.logits.argmax(dim=-1)
            total_correct += (predictions == inputs['labels']).sum().item()
            total_samples += predictions.size(0)
        train_accuracy = total_correct / total_samples
        self.train_acc.append((state.global_step, train_accuracy))


In [30]:
metrics_callback = MetricsCallback()

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset_prompts,
    eval_dataset=eval_dataset_prompts,
    tokenizer=tokenizer,
    data_collator=sequence_data_collator,
    compute_metrics=compute_metrics,
    callbacks=[metrics_callback],
)

In [31]:
# Train the model
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.805422,0.375
2,0.743500,0.702645,0.5
3,0.666100,0.714033,0.6875


TrainOutput(global_step=27, training_loss=0.6872752684134024, metrics={'train_runtime': 58.4727, 'train_samples_per_second': 7.029, 'train_steps_per_second': 0.462, 'total_flos': 108138643752960.0, 'train_loss': 0.6872752684134024, 'epoch': 3.0})

After training, we evaluate the model on the validation set using trainer.evaluate(). This method returns the evaluation results, including validation loss and accuracy, which are printed out for review.

In [35]:
# Step 5: Evaluate_the_model
evaluation_summary = trainer.evaluate()

# Print_evaluation_results
print("Evaluation__results:", evaluation_summary)

Evaluation__results: {'eval_loss': 0.7140331864356995, 'eval_accuracy': 0.6875, 'eval_runtime': 0.6224, 'eval_samples_per_second': 25.707, 'eval_steps_per_second': 1.607, 'epoch': 3.0}


# **Predictions**

In [36]:
# Provided sentences
sentences = [
    "I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets {like this}. my first command is pwd.",
    "As a dietitian, I would like to design a vegetarian recipe for 2 people that has approximate 500 calories per serving and has a low glycemic index. Can you please provide a suggestion?",
    "I want you to act a psychologist. i will provide you my thoughts. I want you to give me scientific suggestions that will make me feel better. my first thought, { typing here your thought, if you explain in more detail, i think you will get a more accurate answer.",

]

# True labels (assuming these are the true labels for the provided sentences)
true_labels = ["command", "statement", "statement"]

# Tokenize the sentences
tokenized_sentences = tokenizer(sentences, padding="max_length", truncation=True, return_tensors="pt")

# Move tensors to the same device as the model
input_ids = tokenized_sentences['input_ids'].to(model.device)
attention_mask = tokenized_sentences['attention_mask'].to(model.device)

# Predict with the model
model.eval()
with torch.no_grad():
    outputs = model(input_ids, attention_mask=attention_mask)
    logits = outputs.logits
    predictions = outputs.logits.argmax(dim=-1).cpu().numpy()

# Map numeric predictions to labels
label_mapping = {0: "command", 1: "statement", 2: "question"}
predicted_labels = [label_mapping[pred] for pred in predictions]

# Print the results
for i, sentence in enumerate(sentences):
    print(f"Sentence {i+1}:")
    print(f"Text: {sentence}")
    print(f"True Label: {true_labels[i]}")
    print(f"Predicted Label: {predicted_labels[i]}")
    print()

Sentence 1:
Text: I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets {like this}. my first command is pwd.
True Label: command
Predicted Label: command

Sentence 2:
Text: As a dietitian, I would like to design a vegetarian recipe for 2 people that has approximate 500 calories per serving and has a low glycemic index. Can you please provide a suggestion?
True Label: statement
Predicted Label: statement

Sentence 3:
Text: I want you to act a psychologist. i will provide you my thoughts. I want you to give me scientific suggestions that will make me feel better. my first thought, { typing here your thought, if you explain in more detail, i think you will g

# **Final_predictions**





In [37]:
# Map numeric predictions to labels
label_mapping = {0: "command", 1: "statement", 2: "question"}
predicted_labels = [label_mapping[pred] for pred in predictions]

# Print raw logits, true labels, and predicted labels
for i, sentence in enumerate(sentences):
    print(f"Sentence {i+1}:")
    print(f"Text: {sentence}")
    print(f"Logits: {logits[i].cpu().numpy()}")  # Print raw logits
    print(f"True Label: {true_labels[i]}")
    print(f"Predicted Label: {predicted_labels[i]}")
    print()

# Check for any discrepancies in the label mapping
print("Predictions:", predictions)
print("True Labels:", true_labels)
print("Predicted Labels:", predicted_labels)

Sentence 1:
Text: I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets {like this}. my first command is pwd.
Logits: [0.3216548  0.08792661]
True Label: command
Predicted Label: command

Sentence 2:
Text: As a dietitian, I would like to design a vegetarian recipe for 2 people that has approximate 500 calories per serving and has a low glycemic index. Can you please provide a suggestion?
Logits: [0.01182105 0.13202178]
True Label: statement
Predicted Label: statement

Sentence 3:
Text: I want you to act a psychologist. i will provide you my thoughts. I want you to give me scientific suggestions that will make me feel better. my first thought, { typing here