# Assignment 3

Metadata tagging for specification documents is a focused and manageable task that is highly relevant in the context of Retrieval-Augmented Generation (RAG). This task can significantly improve the efficiency and accuracy of information retrieval by ensuring that documents are properly indexed and filtered based on their main components or subjects. Given the scope and capabilities of current models, this task is both feasible and impactful, providing meaningful improvements in practical applications.

## Problem Statement
**Task:** Metadata Tagging for Specification Documents in the Automotive Industry

**Description:** The task involves automatically identifying and tagging the main components or subjects within each specification document. This ensures that the documents, when stored in a vector database, can be easily filtered and retrieved for downstream tasks such as Retrieval-Augmented Generation (RAG).

**Motivation**: Proper metadata tagging is crucial for efficient document retrieval. It enhances the capability of RAG systems by enabling more precise filtering and retrieval of relevant documents. This is particularly important in scenarios where large volumes of documents need to be processed and where specific information needs to be accessed quickly. This is a problem faced in most companies developing a RAG framework that work with confidential and domain-specific data.

## Training and Evaluation Data:

We will create a dataset consisting of specification documents from the automotive industry and their corresponding metadata tags. The training data will be used to fine-tune the model with LoRA (Low-Rank Adaptation), and the evaluation data will test the model's performance. The specifications will include varied components and details to ensure a robust training dataset.

### GPT4o-mini

Prompted GPT4o mini to generate the dataset required for training and evaluation. The prompt is provided below and was used to curate a dataset of 160 records.

### Data Generation Prompt

**Prompt for Generating Metadata-Tagged Specifications**  

SYSTEM  
You are tasked with generating a dataset of 160 specifications from the automotive industry. This dataset will be used for metadata tagging, a specific subtask within requirement normalization and Retrieval-Augmented Generation (RAG). The goal of metadata tagging is to identify the main components or subjects in each specification document. This ensures that when the specifications are stored in a vector database and later retrieved for RAG, they can be easily filtered.

GUIDELINES  
Please follow these guidelines to create the dataset:

1. Specification Content: Each specification should describe a component or feature of a vehicle in detail. Include relevant technical details or requirements for the component, ensuring that the description is lengthy and reflective of real-world situations.

2. Metadata Tags: Each specification should have tags that best describe it. Tags should include the main component of the vehicle and what the specification is describing. Ensure that the tags cover all significant aspects of the specification. Avoid including dimensions or metrics in the metadata tags as they will not be used for filtering retrievals.

3. Variety and Realism: Ensure the dataset is varied in terms of components and specifications. Include different models, brands, and features to cover a broad range of automotive specifications. The model numbers and brand names should be believable but do not need to be accurate.

4. Format: Provide the dataset in JSON format. Each entry should include the specification and its corresponding metadata tags.

EXAMPLE
```
{
    "dataset": [
        {
            "specification": "The BMW X3’s M40i trim includes a turbocharged inline-six engine, adaptive M suspension, and performance brakes. This trim is designed to deliver high levels of performance and sporty handling.",
            "metadata_tags": ["M40i Trim", "BMW X3"]
        },
        {
            "specification": "The Audi Q7’s Premium Plus Package includes a 19-speaker Bang & Olufsen audio system, a panoramic sunroof, and adaptive air suspension. This package enhances both comfort and performance, providing a luxurious driving experience.",
            "metadata_tags": ["Premium Plus Package", "Audi Q7"]
        },
        {
            "specification": "The Infiniti QX80’s Sensory Package includes a 17-speaker audio system, advanced climate control with individual temperature settings, and premium leather seats. This package provides superior comfort and high-quality audio for all passengers.",
            "metadata_tags": ["Sensory Package", "Infiniti QX80"]
        }
    ]
}
```

## Evaluation Metrics

**Target Metric:** We will use Cosine Similarity for evaluating the performance of the model in correctly identifying and tagging the metadata.

**Justification:** Cosine Similarity measures the cosine of the angle between two vectors, providing a robust measure of similarity between the reference and generated labels. It is particularly useful in this context because it captures the semantic similarity between the tags, even if the exact wording differs. This is the same metric used for retrieval in Vector DBs and will provide a practical means of evaluating the performance.

## Methodology

### Data Creation

* **Specification Generation:** We will generate 160 specifications from the automotive industry. Each specification will include a component and its description, with tags identifying the main elements.
* **JSON Format:** The dataset will be provided in JSON format, where each entry includes the specification and its corresponding metadata tags.

### Data Preparation

* **Train-Test Split:** The dataset will be split by a 80:20 ratio for training and testing respectively.
* **Tokenization:** The dataset will be tokenized using a pre-trained tokenizer.
* **Embedding:** We will use a pre-trained model (e.g., SentenceTransformer) to generate embeddings for the specifications and tags.

### Fine-Tuning with LoRA

* **Model:** We will use a pre-trained model (Qwen/Qwen2-0.5B-Instruct) and apply LoRA to adapt the model for the specific task.
* **Training:** The fine-tuning process will involve training the model on the prepared dataset using the Hugging Face Trainer API with specific configurations for batch size, gradient accumulation, and mixed precision training to handle memory constraints.

### Evaluation:
* **Cosine Similarity Calculation:** We will calculate cosine similarity between the reference tags and the generated tags using embeddings from the pre-trained model.
* **Comparison:** We will compare the performance of the base model, the LoRA-adapted model, and the merged model after unloading the LoRA components.

# Assignment 4

## Install Necessary Libraries

In [1]:
!pip install datasets
!pip install peft --no-deps
!pip install sacrebleu
!pip install sentence-transformers

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.5.0,>=2023.1.0 (from fsspec[http]<=2024.5.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.5.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Import Libraries

In [2]:
import json
import os
import pandas as pd
import torch
import random
import numpy as np
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, EarlyStoppingCallback
from datasets import Dataset
from sklearn.metrics.pairwise import cosine_similarity
from peft import LoraConfig, TaskType, get_peft_model
from sentence_transformers import SentenceTransformer
from sklearn.model_selection import KFold

## Load Data and Initialize Models

Ensure the data file is placed in the same directory as the notebook and pass the name of the file to the variable `file_name`.

In [5]:
file_name = "gpt4o-mini-ner-dataset.json"

In [6]:
# Set seed for reproducibility
torch.manual_seed(42)
random.seed(42)
np.random.seed(42)

# Load the dataset
with open(file_name, "r") as f:
    data = json.load(f)["dataset"]

# Split data into training and evaluation sets
split = round(len(data)*0.8)
train_data = data[1:80]
eval_data = data[80:]

# Model name
model_name = "Qwen/Qwen2-0.5B-Instruct"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", attn_implementation="eager", device_map="auto", temperature=0.5, do_sample=True)
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [7]:
model.generation_config

GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}

## Prepare Prompt

In [8]:
def prepare_prompt(example, task):
    prompt = f"""You are a Senior Engineer in an automotive manufacturing company. Your job is to identify metadata tags relating to the from a specification document.

    INSTRUCTIONS
    1. Tags should include the vehicle brand and component that is being described.
    2. Generate tags as comma separated values enclosed within curly braces {{}} for the given Task Specification .
    3. Do not add any additional text or preamble after the tags.

    EXAMPLES
    Example Specification: "{example['specification']}"
    Example Metadata Tags: {{{', '.join(example['metadata_tags'])}}}

    TASK
    Task Specification: "{task['specification']}"
    Generate Metadata Tags: {{"""
    return prompt

# Prepare example and task
example = train_data[0]
eval_tasks = eval_data

# Create evaluation prompts
eval_prompts = [prepare_prompt(example, task) for task in eval_tasks]

## Before Fine-Tuning

### Generate Labels

In [9]:
def generate_labels(model, prompts):
    generated_labels = []
    for prompt in prompts:
        inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
        with torch.no_grad():
            outputs = model.generate(**inputs, max_length=500, num_return_sequences=1)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_labels.append(response)
    return generated_labels

# Generate labels for evaluation data using the base model
generated_labels_base = generate_labels(model, eval_prompts)

In [10]:
generated_labels_base[0]



### Parse Generated Labels

In [11]:
import re

def parse_generated_labels(generated_labels):
    parsed_labels = []
    for label in generated_labels:
        match = re.search(r"Generate Metadata Tags: {(.*?)}", label, re.DOTALL)
        if match:
            tags = match.group(1)
            tags_list = [tag.strip() for tag in tags.split(',')]
            parsed_labels.append(tags_list)
        else:
            parsed_labels.append([])
    return parsed_labels

# Parse the generated labels
parsed_labels_base = parse_generated_labels(generated_labels_base)

### Evaluation using Cosine Similarity

In [12]:
def calculate_cosine_similarity(reference_labels, generated_labels, model):
    # Join labels into space-separated strings
    reference_texts = [' '.join(labels) for labels in reference_labels]
    generated_texts = [' '.join(labels) for labels in generated_labels]

    # Use the model to encode the texts
    reference_embeddings = model.encode(reference_texts)
    generated_embeddings = model.encode(generated_texts)

    # Calculate cosine similarity
    similarities = []
    for ref_emb, gen_emb in zip(reference_embeddings, generated_embeddings):
        similarity = cosine_similarity([ref_emb], [gen_emb])
        similarities.append(similarity[0][0])

    # Return the mean cosine similarity
    return np.mean(similarities)

# Prepare reference labels
reference_labels = [item['metadata_tags'] for item in eval_data]

# Calculate cosine similarity for the base model
cosine_similarity_score = calculate_cosine_similarity(reference_labels, parsed_labels_base, embedding_model)
print(f'Cosine Similarity: {cosine_similarity_score}')

Cosine Similarity: 0.6464983820915222


## Fine-Tuning

### Prepare Data (Tokenization)

In [13]:
def prepare(data, tokenizer):
    """
    Prepares the dataset for Causal Language Modeling by appending example and task specifications with labels,
    tokenizing the text, and creating input tensors.

    Args:
    - data: List of dictionaries, where each dictionary contains 'specification' and 'metadata_tags' keys.
    - tokenizer: Tokenizer to be used for tokenizing the text.

    Returns:
    - tokenized_dataset: Hugging Face Dataset object containing tokenized inputs and attention masks.
    """
    example = data[0]
    input_sequences = []
    labels = []

    for datum in data:
        # Construct the input prompt
        prompt = f"""You are a Senior Engineer in an automotive manufacturing company. Your job is to identify metadata tags relating to the from a specification document.

        INSTRUCTIONS
        1. Tags should include the vehicle brand and component that is being described.
        2. Generate tags as comma separated values enclosed within curly braces {{}} for the given Task Specification .
        3. Do not add any additional text or preamble after the tags.

        EXAMPLES
        Example Specification: "{example['specification']}"
        Example Metadata Tags: {{{', '.join(example['metadata_tags'])}}}

        TASK
        Task Specification: "{datum['specification']}"
        Generate Metadata Tags: {{"""

        # Expected output (metadata tags)
        tags = ', '.join(datum['metadata_tags']) + "}"

        # Concatenate the prompt and the tags to form the full input sequence
        full_sequence = prompt + tags
        input_sequences.append(full_sequence)

    # Tokenize the full sequences
    tokenized_output = tokenizer(input_sequences, return_tensors="pt", padding=True, truncation=True)

    input_ids = tokenized_output["input_ids"]
    attention_mask = tokenized_output["attention_mask"]

    # Create labels by copying input_ids
    labels = input_ids.clone()

    # Mask the input prompt part in the labels by setting them to -100 (ignored in loss calculation)
    for i, datum in enumerate(data):
        prompt_length = len(tokenizer(prompt)["input_ids"]) - 1  # -1 to exclude the last '{'
        labels[i, :prompt_length] = -100

    # Create a Dataset object
    dataset_dict = {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

    return Dataset.from_dict(dataset_dict)

# Prepare training and evaluation datasets
tokenized_train_dataset = prepare(train_data, tokenizer)
tokenized_eval_dataset = prepare(eval_data, tokenizer)

# Decode the first set of input_ids to verify the tokenization
print(tokenizer.decode(tokenized_train_dataset["input_ids"][0]))

You are a Senior Engineer in an automotive manufacturing company. Your job is to identify metadata tags relating to the from a specification document.

        INSTRUCTIONS
        1. Tags should include the vehicle brand and component that is being described. 
        2. Generate tags as comma separated values enclosed within curly braces {} for the given Task Specification .
        3. Do not add any additional text or preamble after the tags.

        EXAMPLES
        Example Specification: "The Toyota Prius, model year 2024, is designed to achieve a fuel efficiency rating of at least 25 miles per gallon in city driving conditions. This hybrid vehicle combines a highly efficient 1.8L four-cylinder engine with an electric motor, utilizing regenerative braking to optimize fuel consumption and reduce overall environmental impact."
        Example Metadata Tags: {Hybrid Vehicle, Toyota Prius}

        TASK
        Task Specification: "The Toyota Prius, model year 2024, is designed to ac

### LoRA Fine-Tuning Configuration

In [14]:
# Define LoRA configuration
lora_config = LoraConfig(
    init_lora_weights="gaussian",
    r=16,
    use_rslora=True,
    task_type=TaskType.CAUSAL_LM,
    target_modules=["qkv_proj", "o_proj", "gate_up_proj", "down_proj"]
)

# Apply LoRA to the model
lora_model = get_peft_model(model, lora_config)

training_args = TrainingArguments(
    output_dir='./results_lora',            # Directory to save model checkpoints and other outputs
    num_train_epochs=5,                     # Number of training epochs
    per_device_train_batch_size=8,          # Batch size per device (GPU/CPU) during training
    per_device_eval_batch_size=8,           # Batch size per device (GPU/CPU) during evaluation
    gradient_accumulation_steps=1,          # Number of updates steps to accumulate gradients
    warmup_steps=200,                       # Number of warmup steps for learning rate scheduler
    learning_rate=1e-4,                     # Learning rate for optimizer
    weight_decay=0.01,                      # Weight decay for regularization
    logging_dir='./logs',                   # Directory to save training logs
    fp16=True,                              # Enable mixed precision training (16-bit floating point)
    logging_steps=10,                       # Log training progress every 10 steps
    load_best_model_at_end=True,            # Load the best model at the end of training based on evaluation metric
    eval_strategy="steps",            # Evaluate the model at regular intervals (defined by logging_steps)
    save_steps=10,                          # Save checkpoint every 10 steps (you can adjust this based on your needs)
    save_total_limit=2,                     # Limit the total number of saved checkpoints (older ones will be deleted)
)

### LoRA Training with Cross Validation

In [15]:
# Initialize K-Fold Cross-Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)

# Placeholder to store evaluation results for each fold
evaluation_results = []

# Loop through each fold
for fold, (train_idx, eval_idx) in enumerate(kf.split(tokenized_train_dataset)):
    print(f"Starting fold {fold + 1}/{kf.n_splits}")

    # Split dataset into training and evaluation sets for this fold
    train_dataset = tokenized_train_dataset.select(train_idx)
    eval_dataset = tokenized_train_dataset.select(eval_idx)

    # Apply LoRA to the model for each fold
    lora_model = get_peft_model(model, lora_config)

    # Define Trainer with EarlyStoppingCallback
    trainer = Trainer(
        model=lora_model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],  # Stop if no improvement after 3 eval steps
    )

    # Train the model
    trainer.train()

    # Evaluate the model
    metrics = trainer.evaluate()
    evaluation_results.append(metrics)

    print(f"Metrics for fold {fold + 1}: {metrics}")

# After cross-validation, you may want to average the metrics across all folds
average_metrics = {key: torch.tensor([result[key] for result in evaluation_results]).mean().item() for key in evaluation_results[0]}
print(f"Average Metrics Across Folds: {average_metrics}")


Starting fold 1/5


We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


Step,Training Loss,Validation Loss
10,10.5299,9.372646
20,8.7166,5.854324
30,3.819,0.911113
40,0.4403,0.202496


Metrics for fold 1: {'eval_loss': 0.20249615609645844, 'eval_runtime': 0.5215, 'eval_samples_per_second': 30.683, 'eval_steps_per_second': 3.835, 'epoch': 5.0}
Starting fold 2/5


Step,Training Loss,Validation Loss
10,10.484,9.723928
20,8.4191,5.754099
30,3.2496,0.571081
40,0.2909,0.163233


Metrics for fold 2: {'eval_loss': 0.16323253512382507, 'eval_runtime': 0.5116, 'eval_samples_per_second': 31.272, 'eval_steps_per_second': 3.909, 'epoch': 5.0}
Starting fold 3/5


Step,Training Loss,Validation Loss
10,10.428,9.626274
20,8.475,5.727683
30,3.3157,0.664175
40,0.2839,0.215983


Metrics for fold 3: {'eval_loss': 0.2159833461046219, 'eval_runtime': 0.5256, 'eval_samples_per_second': 30.443, 'eval_steps_per_second': 3.805, 'epoch': 5.0}
Starting fold 4/5


Step,Training Loss,Validation Loss
10,10.0957,10.623943
20,8.4148,6.19755
30,3.1489,0.502957
40,0.3215,0.090625


Metrics for fold 4: {'eval_loss': 0.09062479436397552, 'eval_runtime': 0.5249, 'eval_samples_per_second': 30.48, 'eval_steps_per_second': 3.81, 'epoch': 5.0}
Starting fold 5/5


Step,Training Loss,Validation Loss
10,10.45,10.176892
20,8.2547,5.998636
30,3.2372,0.587929
40,0.3147,0.157185


Metrics for fold 5: {'eval_loss': 0.1571851670742035, 'eval_runtime': 0.4847, 'eval_samples_per_second': 30.947, 'eval_steps_per_second': 4.126, 'epoch': 5.0}
Average Metrics Across Folds: {'eval_loss': 0.16590438783168793, 'eval_runtime': 0.5136600136756897, 'eval_samples_per_second': 30.764999389648438, 'eval_steps_per_second': 3.8970000743865967, 'epoch': 5.0}


In [16]:
# Optional: Train on the entire dataset after cross-validation
final_trainer = Trainer(
    model=lora_model,  # The LoRA model initialized earlier
    args=training_args,  # Reuse the same training arguments
    train_dataset=tokenized_train_dataset,  # Use the entire dataset for final training
    eval_dataset=tokenized_eval_dataset,  # Optionally, you can still evaluate on a validation set if needed
)

final_trainer.train()

Step,Training Loss,Validation Loss
10,0.1423,0.227111
20,0.098,0.16157
30,0.0595,0.135748
40,0.0421,0.133133
50,0.0308,0.133619


TrainOutput(global_step=50, training_loss=0.07453902781009675, metrics={'train_runtime': 43.4741, 'train_samples_per_second': 9.086, 'train_steps_per_second': 1.15, 'total_flos': 230874691392000.0, 'train_loss': 0.07453902781009675, 'epoch': 5.0})

## After Fine-Tuning

### Evaluation using Cosine Similarity

In [17]:
# Generate labels for evaluation data using the fine-tuned model
generated_labels_lora = generate_labels(lora_model, eval_prompts)

# Parse the generated labels
parsed_labels_lora = parse_generated_labels(generated_labels_lora)

# Calculate cosine similarity for the fine-tuned model
cosine_similarity_lora = calculate_cosine_similarity(reference_labels, parsed_labels_lora, embedding_model)
print(f'Fine-Tuned Model - Cosine Similarity: {cosine_similarity_lora}')


Fine-Tuned Model - Cosine Similarity: 0.9584165811538696


In [18]:
generated_labels_lora[0]



In [19]:
parsed_labels_lora[0]

['Active Driving Assistant', 'BMW 3 Series']

In [20]:
reference_labels[0]

['Active Driving Assistant', 'BMW 3 Series']

### Merge and Unload LoRA model

In [21]:
trained_model = lora_model.merge_and_unload()

### Evaluation using Cosine Similarity after Merging

In [22]:
# Generate labels for evaluation data using the fine-tuned model
generated_labels_trained = generate_labels(trained_model, eval_prompts)

# Parse the generated labels
parsed_labels_trained = parse_generated_labels(generated_labels_trained)

# Calculate cosine similarity for the fine-tuned model
cosine_similarity_trained = calculate_cosine_similarity(reference_labels, parsed_labels_trained, embedding_model)
print(f'Fine-Tuned Model - Cosine Similarity: {cosine_similarity_trained}')

Fine-Tuned Model - Cosine Similarity: 0.9521085619926453


In [23]:
parsed_labels_trained[0]

['Active Driving Assistant', 'BMW 3 Series']

In [24]:
for indx, data in enumerate(eval_tasks):
  data['gen_metadata_tags'] = parsed_labels_base[indx]
  data['lora_metadata_tags'] = parsed_labels_lora[indx]
  data['trained_metadata_tags'] = parsed_labels_trained[indx]
lora_df = pd.DataFrame(eval_tasks)
lora_df['metadata_tags'] = lora_df['metadata_tags'].apply(lambda x: ', '.join(x))
lora_df['gen_metadata_tags'] = lora_df['gen_metadata_tags'].apply(lambda x: ', '.join(x))
lora_df['lora_metadata_tags'] = lora_df['lora_metadata_tags'].apply(lambda x: ', '.join(x))
lora_df['trained_metadata_tags'] = lora_df['trained_metadata_tags'].apply(lambda x: ', '.join(x))
lora_df.head()

Unnamed: 0,specification,metadata_tags,gen_metadata_tags,lora_metadata_tags,trained_metadata_tags
0,The BMW 3 Series’ Active Driving Assistant sho...,"Active Driving Assistant, BMW 3 Series","Blind Spot Monitoring, Lane Departure Warning,...","Active Driving Assistant, BMW 3 Series","Active Driving Assistant, BMW 3 Series"
1,The Mazda CX-5’s i-Activ AWD system should inc...,"i-Activ AWD System, Mazda CX-5","i-Activ AWD System, Mazda CX-5, Wheel Slippage...","i-Activ AWD System, Mazda CX-5","i-Activ AWD System, Mazda CX-5"
2,The Hyundai Sonata’s remote start system shoul...,"Remote Start System, Hyundai Sonata","Remote Start System, Hyundai Sonata","Remote Start System, Hyundai Sonata","Remote Start System, Hyundai Sonata"
3,The Jeep Grand Cherokee’s Quadra-Lift air susp...,"Quadra-Lift Air Suspension System, Jeep Grand ...","Quadra-Lift Air Suspension System, Jeep Grand ...","Quadra-Lift Air Suspension System, Jeep Grand ...","Quadra-Lift Air Suspension, Jeep Grand Cherokee"
4,The Nissan 370Z’s NISMO performance exhaust sy...,"NISMO Performance Exhaust System, Nissan 370Z","NISMO Performance Exhaust System, Dual Exhaust...","NISMO Performance Exhaust System, Nissan 370Z","NISMO Exhaust System, Nissan 370Z"


## Results

**Evaluation Metric (Cosine Similarity):**

In [25]:
print(f'Pre-Trained Model - Cosine Similarity: {cosine_similarity_score}')
print(f'Fine-Tuned Model - Cosine Similarity: {cosine_similarity_lora}')
print(f'Merged Fine-Tuned Model - Cosine Similarity: {cosine_similarity_trained}')

Pre-Trained Model - Cosine Similarity: 0.6464983820915222
Fine-Tuned Model - Cosine Similarity: 0.9584165811538696
Merged Fine-Tuned Model - Cosine Similarity: 0.9521085619926453
