<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/Llumo_AI_Assignment_Mohd_Kaif.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Fine Tuning Meta's Llama 3.2B Model on Meta Review Summarization Task**
This notebook demonstrates the process of fine-tuning the Meta LLaMA 3.2B model for summarizing academic paper meta-reviews. We'll go through the entire pipeline, from setting up the environment to evaluating the model's performance.


**First, let's install the necessary libraries**

In [1]:
!pip install -qU transformers datasets evaluate rouge_score trl peft bitsandbytes accelerate xformer bert-score

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m74.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.6/316.6 kB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.7/320.7 kB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m13.4 MB/s[0m eta [36m

In [2]:
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
import evaluate
import matplotlib.pyplot as plt
from accelerate import Accelerator
from huggingface_hub import notebook_login
from transformers import pipeline
import os
import plotly.express as px
import plotly.graph_objects as go

In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
# Enable xformers for optimized attention
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"

# Initialize accelerator
accelerator = Accelerator()

In [6]:
# Load and explore the dataset
dataset = load_dataset("zqz979/meta-review")
print(f"Dataset size: {len(dataset['train'])} train, {len(dataset['validation'])} validation, {len(dataset['test'])} test")

print("\nSample Meta-Review:")
print(dataset['train'][0]['Input'][:500] + "...")
print("\nSample Summary:")
print(dataset['train'][0]['Output'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

train.csv:   0%|          | 0.00/83.9M [00:00<?, ?B/s]

validation.csv:   0%|          | 0.00/18.0M [00:00<?, ?B/s]

test.csv:   0%|          | 0.00/18.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7692 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1648 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1649 [00:00<?, ? examples/s]

Dataset size: 7692 train, 1648 validation, 1649 test

Sample Meta-Review:
In this paper, the author investigates how to utilize large-scale human video to train dexterous robot manipulation skills. To leverage the information from the Internet videos, the author proposes a handful of techniques to pre-process the video data to extract the action information. Then the network is trained on the extracted hand data and deployed to the real robot with some human demonstration collected by teleoperation for fine-tuning. Experiments show that the proposed pipeline can solve...

Sample Summary:
This paper studies how to learn dexterous manipulation from human videos.    In the initial review, the reviewer appreciated the direction and real-world experiment but also raised  concerns about the need of special sensor for tracking. During rebuttal, the authors effectively addressed this concern by providing additional experiment results, and reviewers were satisfied with the response.  AC would l

In [7]:
# Load tokenizer
model_name = "meta-llama/Llama-3.2-1B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Configure quantization for faster training and lower memory usage
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16  # Use bf16 for computation
)

# Load model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True
)

# Enable gradient checkpointing and disable caching for memory efficiency
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

In [8]:
# Improved prompt for summarization
def generate_summary_prompt(meta_review):
    return f"""As an AI trained to summarize meta-reviews of academic papers, your task is to provide a concise and informative summary that captures the key points of the following meta-review. Focus on these aspects:

1. Overall assessment: The general consensus on the paper's quality and contribution.
2. Strengths: The main positive points highlighted by reviewers.
3. Weaknesses: Primary concerns or criticisms raised.
4. Recommendations: Any suggestions for improvement or future work.
5. Decision: The final verdict (e.g., accept, reject, revise).

Ensure your summary is objective, clear, and captures the essence of the meta-review without specific details about individual reviewers' comments. Aim for a length of 3-5 sentences.

Meta-review:
{meta_review}

Summary:"""

# Preprocess Data
def preprocess_function(examples):
    inputs = [generate_summary_prompt(review) for review in examples["Input"]]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    labels = tokenizer(examples["Output"], max_length=128, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_train = dataset['train'].map(preprocess_function, batched=True, remove_columns=dataset['train'].column_names, num_proc=4)
tokenized_eval = dataset['validation'].map(preprocess_function, batched=True, remove_columns=dataset['validation'].column_names, num_proc=4)

Map (num_proc=4):   0%|          | 0/7692 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/1648 [00:00<?, ? examples/s]

In [9]:
# Define LoRA Configuration with smaller rank for faster training
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",]
)

In [12]:
# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)

training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=10,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=0,
    logging_steps=500,
    learning_rate=2e-4,
    weight_decay=0.001,
    bf16=True,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="cosine",
    report_to="tensorboard",
    gradient_checkpointing=True
)



In [13]:
# Define evaluation metric
rouge = evaluate.load('rouge')

def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions

    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    labels_ids[labels_ids == -100] = tokenizer.pad_token_id
    label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)

    rouge_output = rouge.compute(predictions=pred_str, references=label_str, use_stemmer=True)
    return {
        'rouge1': rouge_output['rouge1'].mid.fmeasure,
        'rouge2': rouge_output['rouge2'].mid.fmeasure,
        'rougeL': rouge_output['rougeL'].mid.fmeasure,
    }

# Set up trainer
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    peft_config=lora_config,
    dataset_text_field="Input",
    max_seq_length=256,
    compute_metrics=compute_metrics,
)

# Train the model
print("Starting fast model training...")
trainer.train()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Starting fast model training...


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
500,2.0173
1000,1.9063
1500,1.8517
2000,1.8371
2500,1.7723
3000,1.7573
3500,1.693
4000,1.6788
4500,1.623
5000,1.5994


TrainOutput(global_step=9620, training_loss=1.6241921343575396, metrics={'train_runtime': 5211.8325, 'train_samples_per_second': 14.759, 'train_steps_per_second': 1.846, 'total_flos': 2.3128451309371392e+17, 'train_loss': 1.6241921343575396, 'epoch': 10.0})

In [15]:
# Save the fine-tuned model
model.save_pretrained("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")
print("Fine-tuned model saved.")

Fine-tuned model saved.


In [None]:
from nltk.translate.bleu_score import corpus_bleu
from bert_score import score

In [None]:
import torch
from tqdm import tqdm

# Evaluate the model
test_dataset = dataset['test']

# Generate summaries using the model's generate method
generated_summaries = []
references = []

print("Generating summaries...")
for i, review in enumerate(tqdm(test_dataset['Input'][:100])):  # Limit to first 100 for testing
    try:
        prompt = generate_summary_prompt(review)
        inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True, padding="max_length").to(model.device)
        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50, num_return_sequences=1)
        summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_summary = summary.split("Summary:")[-1].strip()
        generated_summaries.append(generated_summary)
        references.append(test_dataset['Output'][i])

        if i % 10 == 0:
            print(f"Generated summary {i}: {generated_summary[:100]}...")
    except Exception as e:
        print(f"Error generating summary for review {i}: {str(e)}")

print(f"Generated {len(generated_summaries)} summaries")

# Calculate ROUGE scores
print("Calculating ROUGE scores...")
rouge_scores = rouge.compute(predictions=generated_summaries, references=references, use_stemmer=True)
print("Test Set ROUGE Scores:", rouge_scores)

# Calculate BLEU score
print("Calculating BLEU score...")
bleu_score = calculate_bleu(references, generated_summaries)
print("BLEU Score:", bleu_score)

# Calculate BERTScore
print("Calculating BERTScore...")
bert_scores = calculate_bertscore(references, generated_summaries)
print("BERTScore:", bert_scores)

Generating summaries...


  0%|          | 0/100 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  1%|          | 1/100 [00:15<24:54, 15.09s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Generated summary 0: As an AI trained to summarize meta-reviews of academic papers, your task is to provide a concise and...


  2%|▏         | 2/100 [00:30<24:34, 15.04s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  3%|▎         | 3/100 [00:45<24:17, 15.02s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  4%|▍         | 4/100 [01:00<24:03, 15.04s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  5%|▌         | 5/100 [01:15<23:46, 15.02s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  6%|▌         | 6/100 [01:30<23:30, 15.01s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  7%|▋         | 7/100 [01:45<23:15, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  8%|▊         | 8/100 [02:00<23:01, 15.02s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  9%|▉         | 9/100 [02:15<22:45, 15.01s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 10%|█         | 10/100 [02:30<22:30, 15.00s/it]Setting 

Generated summary 10: As an AI trained to summarize meta-reviews of academic papers, your task is to provide a concise and...


 12%|█▏        | 12/100 [03:00<21:59, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 13%|█▎        | 13/100 [03:15<21:46, 15.01s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 14%|█▍        | 14/100 [03:30<21:30, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 15%|█▌        | 15/100 [03:45<21:14, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 16%|█▌        | 16/100 [04:00<20:59, 14.99s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 17%|█▋        | 17/100 [04:15<20:45, 15.01s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 18%|█▊        | 18/100 [04:30<20:30, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 19%|█▉        | 19/100 [04:45<20:15, 15.00s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 20%|██        | 20/100 [05:00<19:59, 15.00s/it]

Generated summary 20: As an AI trained to summarize meta-reviews of academic papers, your task is to provide a concise and...


In [None]:
# Postprocess summaries
def postprocess_summary(summary, max_length=100, required_keywords=None):
    # Truncate to max_length
    summary = summary[:max_length]

    # Ensure the summary ends with a complete sentence
    last_period = summary.rfind('.')
    if last_period != -1:
        summary = summary[:last_period + 1]

    # Check for required keywords
    if required_keywords:
        missing_keywords = [kw for kw in required_keywords if kw.lower() not in summary.lower()]
        if missing_keywords:
            summary += f" Key points: {', '.join(missing_keywords)}."

    # Remove any trailing whitespace
    summary = summary.strip()

    return summary

# Example usage
required_keywords = ["accept", "reject", "revise"]
processed_summaries = [postprocess_summary(summary, max_length=120, required_keywords=required_keywords) for summary in generated_summaries]

# Print a few processed summaries
print("\nSample Processed Summaries:")
for i in range(3):
    print(f"\nOriginal Summary: {generated_summaries[i]}")
    print(f"Processed Summary: {processed_summaries[i]}")

In [None]:
# Visualizations using Plotly

# ROUGE Scores Visualization
rouge_data = {
    'Metric': ['ROUGE-1', 'ROUGE-2', 'ROUGE-L'],
    'Score': [rouge_scores['rouge1'].mid.fmeasure, rouge_scores['rouge2'].mid.fmeasure, rouge_scores['rougeL'].mid.fmeasure]
}

fig_rouge = px.bar(rouge_data, x='Metric', y='Score', title='ROUGE Scores')
fig_rouge.show()

# BLEU Score Visualization
bleu_data = {
    'Metric': ['BLEU'],
    'Score': [bleu_score]
}

fig_bleu = px.bar(bleu_data, x='Metric', y='Score', title='BLEU Score')
fig_bleu.show()

# BERTScore Visualization
bert_data = {
    'Metric': ['Precision', 'Recall', 'F1'],
    'Score': [bert_scores['precision'], bert_scores['recall'], bert_scores['f1']]
}

fig_bert = px.bar(bert_data, x='Metric', y='Score', title='BERTScore')
fig_bert.show()

# Loss over Epochs Visualization
epochs = list(range(1, len(training_losses) + 1))
loss_data = {
    'Epoch': epochs,
    'Loss': training_losses
}

fig_loss = px.line(loss_data, x='Epoch', y='Loss', title='Training Loss Over Epochs')
fig_loss.show()