<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/Llumo_AI_Assignment_Mohd_Kaif.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Fine Tuning Meta's Llama 3.2B Model on Meta Review Summarization Task**
This notebook demonstrates the process of fine-tuning the Meta LLaMA 3.2B model for summarizing academic paper meta-reviews. We'll go through the entire pipeline, from setting up the environment to evaluating the model's performance.


**First, let's install the necessary libraries**

In [None]:
!pip install -qU transformers datasets evaluate rouge_score trl peft bitsandbytes accelerate xformer

In [None]:
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
import evaluate
import matplotlib.pyplot as plt
from accelerate import Accelerator
from huggingface_hub import notebook_login
from transformers import pipeline
import os

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Enable xformers for optimized attention
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"

# Initialize accelerator
accelerator = Accelerator()

In [None]:
# Load and explore the dataset
dataset = load_dataset("zqz979/meta-review")
print(f"Dataset size: {len(dataset['train'])} train, {len(dataset['validation'])} validation, {len(dataset['test'])} test")

print("\nSample Meta-Review:")
print(dataset['train'][0]['Input'][:500] + "...")
print("\nSample Summary:")
print(dataset['train'][0]['Output'])

Dataset size: 7692 train, 1648 validation, 1649 test

Sample Meta-Review:
In this paper, the author investigates how to utilize large-scale human video to train dexterous robot manipulation skills. To leverage the information from the Internet videos, the author proposes a handful of techniques to pre-process the video data to extract the action information. Then the network is trained on the extracted hand data and deployed to the real robot with some human demonstration collected by teleoperation for fine-tuning. Experiments show that the proposed pipeline can solve...

Sample Summary:
This paper studies how to learn dexterous manipulation from human videos.    In the initial review, the reviewer appreciated the direction and real-world experiment but also raised  concerns about the need of special sensor for tracking. During rebuttal, the authors effectively addressed this concern by providing additional experiment results, and reviewers were satisfied with the response.  AC would l

In [33]:
# Load tokenizer
model_name = "meta-llama/Llama-3.2-1B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Configure quantization for faster training and lower memory usage
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16  # Use bf16 for computation
)

# Load model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True
)

# Enable gradient checkpointing and disable caching for memory efficiency
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()



In [34]:
# Improved prompt for summarization
def generate_summary_prompt(meta_review):
    return f"""As an AI trained to summarize meta-reviews of academic papers, your task is to provide a concise and informative summary that captures the key points of the following meta-review. Focus on these aspects:

1. Overall assessment: The general consensus on the paper's quality and contribution.
2. Strengths: The main positive points highlighted by reviewers.
3. Weaknesses: Primary concerns or criticisms raised.
4. Recommendations: Any suggestions for improvement or future work.
5. Decision: The final verdict (e.g., accept, reject, revise).

Ensure your summary is objective, clear, and captures the essence of the meta-review without specific details about individual reviewers' comments. Aim for a length of 3-5 sentences.

Meta-review:
{meta_review}

Summary:"""

# Preprocess Data
def preprocess_function(examples):
    inputs = [generate_summary_prompt(review) for review in examples["Input"]]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    labels = tokenizer(examples["Output"], max_length=128, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_train = dataset['train'].map(preprocess_function, batched=True, remove_columns=dataset['train'].column_names, num_proc=4)
tokenized_eval = dataset['validation'].map(preprocess_function, batched=True, remove_columns=dataset['validation'].column_names, num_proc=4)

In [38]:
# Define LoRA Configuration with smaller rank for faster training
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",]
)

In [43]:
# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)

training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=10,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=0,
    logging_steps=1000,
    learning_rate=2e-4,
    weight_decay=0.001,
    bf16=True,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="cosine",
    report_to="tensorboard",
    gradient_checkpointing=True
)

In [44]:
# Define evaluation metric
rouge = evaluate.load('rouge')

def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions

    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    labels_ids[labels_ids == -100] = tokenizer.pad_token_id
    label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)

    rouge_output = rouge.compute(predictions=pred_str, references=label_str, use_stemmer=True)
    return {
        'rouge1': rouge_output['rouge1'].mid.fmeasure,
        'rouge2': rouge_output['rouge2'].mid.fmeasure,
        'rougeL': rouge_output['rougeL'].mid.fmeasure,
    }

# Set up trainer
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    peft_config=lora_config,
    dataset_text_field="Input",
    max_seq_length=256,
    compute_metrics=compute_metrics,
)

# Train the model
print("Starting fast model training...")
trainer.train()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Starting fast model training...


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
1000,1.962
2000,1.8446
3000,1.7646
4000,1.6853
5000,1.6102
6000,1.5438
7000,1.4828
8000,1.4419
9000,1.4147


TrainOutput(global_step=9620, training_loss=1.6236451083558017, metrics={'train_runtime': 5204.2254, 'train_samples_per_second': 14.78, 'train_steps_per_second': 1.848, 'total_flos': 2.3128451309371392e+17, 'train_loss': 1.6236451083558017, 'epoch': 10.0})

In [48]:
# Save the fine-tuned model
model.save_pretrained("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")
print("Fine-tuned model saved.")

Fine-tuned model saved.


In [56]:
!pip install -q bert-score
from nltk.translate.bleu_score import corpus_bleu
from bert_score import score

In [57]:
test_dataset = dataset['test']

summarizer = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=150,
    do_sample=True,
    top_p=0.95,
    top_k=50,
    num_return_sequences=1
)

generated_summaries = []
for review in test_dataset['Input']:
    prompt = generate_summary_prompt(review)
    summary = summarizer(prompt)[0]['generated_text']
    generated_summaries.append(summary.split("Summary:")[-1].strip())

# Calculate ROUGE scores
rouge_scores = rouge.compute(predictions=generated_summaries, references=test_dataset['Output'], use_stemmer=True)
print("Test Set ROUGE Scores:", rouge_scores)

# Calculate BLEU score
def calculate_bleu(references, hypotheses):
    return corpus_bleu([[ref.split()] for ref in references], [hyp.split() for hyp in hypotheses])

bleu_score = calculate_bleu(test_dataset['Output'], generated_summaries)
print("BLEU Score:", bleu_score)

# Calculate BERTScore
def calculate_bertscore(references, hypotheses):
    P, R, F1 = score(hypotheses, references, lang="en", verbose=True)
    return {"precision": P.mean().item(), "recall": R.mean().item(), "f1": F1.mean().item()}

bert_scores = calculate_bertscore(test_dataset['Output'], generated_summaries)
print("BERTScore:", bert_scores)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'Mamba2ForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCaus

ValueError: Input length of input_ids is 150, but `max_length` is set to 150. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.

In [58]:
def postprocess_summary(summary, max_length=100, required_keywords=None):
    # Truncate to max_length
    summary = summary[:max_length]

    # Ensure the summary ends with a complete sentence
    last_period = summary.rfind('.')
    if last_period != -1:
        summary = summary[:last_period + 1]

    # Check for required keywords
    if required_keywords:
        missing_keywords = [kw for kw in required_keywords if kw.lower() not in summary.lower()]
        if missing_keywords:
            summary += f" Key points: {', '.join(missing_keywords)}."

    # Remove any trailing whitespace
    summary = summary.strip()

    return summary

# Example usage
required_keywords = ["accept", "reject", "revise"]
processed_summaries = [postprocess_summary(summary, max_length=120, required_keywords=required_keywords) for summary in generated_summaries]

# Print a few processed summaries
print("\nSample Processed Summaries:")
for i in range(3):
    print(f"\nOriginal Summary: {generated_summaries[i]}")
    print(f"Processed Summary: {processed_summaries[i]}")


Sample Processed Summaries:


IndexError: list index out of range