# 🤖 BERTScore Demonstration

This notebook shows how to compute **BERTScore** using Hugging Face's `evaluate` library.

BERTScore compares model-generated text against reference text using contextual embeddings from BERT models, measuring:
- **Precision**: similarity of generated tokens to reference tokens
- **Recall**: how much of the reference is captured in the generated text
- **F1-score**: harmonic mean of precision and recall

In [None]:
# Install the evaluate library if needed
# !pip install evaluate

In [None]:
import evaluate
bertscore = evaluate.load("bertscore")

In [None]:
# Sample data: model prediction vs reference
predictions = [
    "The cat sat on the mat."
]

references = [
    "A cat was sitting on a mat."
]

In [None]:
# Compute BERTScore using a base model (e.g., roberta-base)
results = bertscore.compute(predictions=predictions, references=references, model_type="roberta-base")
results

### 🧠 Notes:
- BERTScore uses contextual embeddings instead of surface n-gram overlap.
- It works well for evaluating **semantic similarity**, especially in generation tasks like summarization and translation.
- You can choose different `model_type` (e.g., `bert-base-uncased`, `roberta-large`) for different trade-offs in quality vs speed.