# BERT Score
-  BERTScore is an automatic evaluation metric for text generation that computes a similarity score for each token in the candidate sentence with each token in the reference sentence. It leverages the pre-trained contextual embeddings from BERT models and matches words in candidate and reference sentences by cosine similarity.
-  BERTScore takes 3 mandatory arguments : 
    -  predictions (a list of string of candidate sentences), 
    -  references (a list of strings or list of list of strings of reference sentences) and 
    -  either lang (a string of two letters indicating the language of the sentences, in ISO 639-1 format) or model_type (a string specififying which model to use, according to the BERT specification). The default behavior of the metric is to use the suggested model for the target language when one is specified, otherwise to use the model_type indicated.
-  https://huggingface.co/spaces/evaluate-metric/bertscore
-  https://paperswithcode.com/paper/bertscore-evaluating-text-generation-with
-  https://www.geeksforgeeks.org/explanation-of-bert-model-nlp/
-  https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model
-  https://colab.research.google.com/drive/1kpL8Y_AnUUiCxFjhxSrxCsc6-sDMNb_Q

## under progress

In [5]:
#!pip install evaluate
#!pip install bert_score

In [6]:
from evaluate import load
bertscore = load("bertscore")
predictions = ["hello there", "general kenobi"]
references = ["hello there", "general kenobi"]

In [7]:
results = bertscore.compute(predictions=predictions, references=references, lang="en")

Downloading (…)lve/main/config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

In [None]:
results

## Maximal values with the distilbert-base-uncased model:

In [8]:
#from evaluate import load
#bertscore = load("bertscore")
predictions = ["hello world", "general kenobi"]
references = ["hello world", "general kenobi"]
results = bertscore.compute(predictions=predictions, references=references, model_type="distilbert-base-uncased")

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

In [9]:
print(results)

{'precision': [1.000000238418579, 1.0000001192092896], 'recall': [1.000000238418579, 1.0000001192092896], 'f1': [1.000000238418579, 1.0000001192092896], 'hashcode': 'distilbert-base-uncased_L5_no-idf_version=0.3.12(hug_trans=4.26.1)'}


## Partial match with the distilbert-base-uncased model:

In [10]:
#from evaluate import load
#bertscore = load("bertscore")
predictions = ["hello world", "general kenobi"]
references = ["goodnight moon", "the sun is shining"]
results = bertscore.compute(predictions=predictions, references=references, model_type="distilbert-base-uncased")
print(results)

{'precision': [0.7899678945541382, 0.5584040284156799], 'recall': [0.7899678945541382, 0.5889027714729309], 'f1': [0.7899678349494934, 0.573248028755188], 'hashcode': 'distilbert-base-uncased_L5_no-idf_version=0.3.12(hug_trans=4.26.1)'}


### Limitations
-  The original BERTScore paper showed that BERTScore correlates well with human judgment on sentence-level and system-level evaluation, but this depends on the model and language pair selected.
-  calculating the BERTScore metric involves downloading the BERT model that is used to compute the score-- the default model for en, roberta-large, takes over 1.4GB of storage space and downloading it can take a significant amount of time depending on the speed of your internet connection. If this is an issue, choose a smaller model; for instance distilbert-base-uncased is 268MB. A full list of compatible models can be found here.
-  https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/edit#gid=0

##  Example
-  https://torchmetrics.readthedocs.io/en/stable/text/bert_score.html

In [12]:
from pprint import pprint
from torchmetrics.text.bert import BERTScore

In [14]:
bertscore2 = BERTScore()

  warn(


Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [13]:
preds = ["hello there", "general kenobi"]
target = ["hello there", "master kenobi"]

In [15]:
pprint(bertscore2(preds, target))

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


{'f1': [0.9999998807907104, 0.9960543513298035],
 'precision': [0.9999998807907104, 0.9960543513298035],
 'recall': [0.9999998807907104, 0.9960543513298035]}


In [18]:
from torchmetrics.text.bert import BERTScore
preds = ["hello there", "general kenobi"]
target = ["hello there", "master kenobi"]

metric = BERTScore()
metric.update(preds, target)
fig_, ax_ = metric.plot()

## X2

In [None]:
#import needed libraries
import pandas as pd
import numpy as np
#pip install google  #to install Google Search by Mario Vilas see
#https://python-googlesearch.readthedocs.io/en/latest/
import googlesearch  #Scrap serps
#to randomize pause
import random
import time  #to calcute page time download
from datetime import date      
import sys #for sys variables
 
import requests #to read urls contents
from bs4 import BeautifulSoup  #to decode html
from bs4.element import Comment
 


In [None]:
#remove comments and non visible tags from html
def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True