<a href="https://colab.research.google.com/github/anshupandey/Working_with_Large_Language_models/blob/main/WWL_C11_LLM_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
pip install nltk rouge-score --quiet

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone


### Perplexity:

Example: If the average perplexity is 20, this means the model is fairly good at predicting the next word. A lower value would be better, but 20 is reasonable for complex tasks.


### BLEU:

Example: If the average BLEU score is 0.45, this means the model's predictions have a good overlap with the reference sentences. A score closer to 0.5 or higher is often considered good.

### ROUGE:

Example: If the ROUGE-1 score is 0.6, ROUGE-2 is 0.4, and ROUGE-L is 0.5, this indicates that the generated text has a high overlap with the reference text at various granularities, which is a positive indication of performance.

In [None]:
import math
import nltk
from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu

In [None]:
# Sample data
predictions = ["the cat is on the mat", "there is a cat on the mat"]
references = [["the cat is on the mat"], ["there is a cat on the mat"]]

In [4]:
# Perplexity Calculation
def calculate_perplexity(predicted_sentence, reference_sentence):
    ref_len = len(reference_sentence.split())
    log_prob_sum = 0
    for word in reference_sentence.split():
        if word in predicted_sentence.split():
            log_prob_sum += math.log(1 / (predicted_sentence.split().count(word) / len(predicted_sentence.split())))
        else:
            log_prob_sum += math.log(1 / len(predicted_sentence.split()))
    return math.exp(log_prob_sum / ref_len)

perplexities = [calculate_perplexity(pred, ref[0]) for pred, ref in zip(predictions, references)]
average_perplexity = sum(perplexities) / len(perplexities)
print(f"Average Perplexity: {average_perplexity}")

Average Perplexity: 5.881101577952299


In [5]:
# BLEU Score Calculation
def calculate_bleu(predicted_sentence, reference_sentence):
    return sentence_bleu([reference_sentence.split()], predicted_sentence.split())

bleu_scores = [calculate_bleu(pred, ref[0]) for pred, ref in zip(predictions, references)]
average_bleu = sum(bleu_scores) / len(bleu_scores)

print(f"Average BLEU Score: {average_bleu}")

Average BLEU Score: 1.0


In [8]:
# ROUGE Score Calculation
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

def calculate_rouge(predicted_sentence, reference_sentence):
    scores = scorer.score(reference_sentence, predicted_sentence)
    return scores

rouge_scores = [calculate_rouge(pred, ref[0]) for pred, ref in zip(predictions, references)]

average_rouge = {
    'rouge1': sum([score['rouge1'].fmeasure for score in rouge_scores]) / len(rouge_scores),
    'rouge2': sum([score['rouge2'].fmeasure for score in rouge_scores]) / len(rouge_scores),
    'rougeL': sum([score['rougeL'].fmeasure for score in rouge_scores]) / len(rouge_scores),
}

print(f" Average ROUGE Scores: {average_rouge}")

 Average ROUGE Scores: {'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0}


In [None]:
# Sample data
inputs = [
    "Translate the following English text to French: 'Hello, how are you?'",
    "Summarize the following text: 'The quick brown fox jumps over the lazy dog.'"
]
references = [
    ["Bonjour, comment ça va?"],
    ["The quick brown fox jumps over the lazy dog."]
]

### Install Vertex AI SDK for Python


In [None]:
! pip3 install --upgrade --user --quiet google-cloud-aiplatform

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[0m

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the cell below to authenticate your environment.


In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
PROJECT_ID = "jrproject-402905"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}
MODEL_ID = "gemini-1.5-flash-preview-0514"  # @param {type:"string"}
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)

## Gemini 1.5 Flash

In [None]:
from vertexai.generative_models import GenerationConfig, GenerativeModel

# load the model
model = GenerativeModel(MODEL_ID, system_instruction=[ "You are a helpful assistant.","Your answer questions in a concise way",],)

# Set model parameters
generation_config = GenerationConfig( temperature=0.9, top_k=32,)

def generate_response(prompt,model=model):
  contents = [prompt]
  response = model.generate_content(contents, generation_config=generation_config,)
  return response.text

In [None]:
# Get predictions from GPT-3.5
predictions = [get_response(input_text) for input_text in inputs]