### ðŸ”¹ CodeBERTScore

**CodeBERTScore** is an automatic evaluation metric for code generation tasks, adapted from **BERTScore**.  
It leverages **CodeBERT**, a transformer model pre-trained on large-scale code and natural language data, to measure **semantic and syntactic similarity** between predicted and reference code.

#### ðŸ§® How It Works
1. Tokenize both predicted and reference code snippets using CodeBERTâ€™s tokenizer.  
2. Compute contextual embeddings for all tokens.  
3. Build a **cosine similarity matrix** between tokens of the prediction and the reference.  
4. Derive:
   - **Precision (P):** how well predicted tokens are covered by reference tokens.  
   - **Recall (R):** how well reference tokens are captured by predicted tokens.  
   - **F1-score (F1):** harmonic mean of Precision and Recall.

$$
\text{Precision} = \frac{1}{|Y|} \sum_{y \in Y} \max_{x \in X} \text{cos}(x, y)
$$
$$
\text{Recall} = \frac{1}{|X|} \sum_{x \in X} \max_{y \in Y} \text{cos}(x, y)
$$
$$
\text{F1} = 2 \times \frac{P \times R}{P + R}
$$

In [1]:
import code_bert_score
predictions = [
    "public int add(int a, int b) { return a + b; }",
    "public void greet() { System.out.println('Hi'); }"
]

refs = [
    "int add(int a, int b) { return a + b; }",
    "void greet() { System.out.println('Hi'); }"
]

# Example usage for Java
pred_results = code_bert_score.score(
    cands=predictions,  # list of generated code snippets
    refs=refs,           # list of reference (ground truth) code snippets
    lang='java'          # specify language as Java
)


  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [2]:
pred_results

(tensor([0.9539, 0.9514]),
 tensor([0.9801, 0.9818]),
 tensor([0.9668, 0.9663]),
 tensor([0.9774, 0.9786]))

In [None]:
precision, recall, f1, _ = code_bert_score.score(
    cands=predictions, refs=refs, lang='java'
)
print("Precision:", precision.mean().item())
print("Recall:", recall.mean().item())
print("F1:", f1.mean().item())


Precision: 0.9526264667510986
Recall: 0.9809467792510986
F1: 0.9665781259536743


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
