# BERT SCORE

BERTSCORE computes the similarity
of two sentences as a sum of cosine similarities between their tokens’ embeddings.

https://arxiv.org/pdf/1904.09675.pdf

Given a reference sentence x = <x1, . . . , xk> and a candidate sentence xˆ = <xˆ1, . . . , xˆl>, we use
contextual embeddings to represent the tokens, and compute matching using cosine similarity, optionally 
weighted with inverse document frequency scores. 

The complete score matches each token in x to a token in xˆ to compute recall,
and each token in xˆ to a token in x to compute precision. We use greedy matching to maximize
the matching similarity score,2 where each token is matched to the most similar token in the other
sentence

In [1]:
!pip install bert_score==0.2.2
!pip install torch

Collecting bert_score==0.2.2
  Downloading bert_score-0.2.2-py3-none-any.whl (14 kB)
Collecting transformers>=2.2.0
  Downloading transformers-2.11.0-py3-none-any.whl (674 kB)
[K     |████████████████████████████████| 674 kB 19.1 MB/s eta 0:00:01
Collecting torch>=1.0.0
  Downloading torch-1.5.0-cp37-cp37m-manylinux1_x86_64.whl (752.0 MB)
[K     |████████████████████████████████| 752.0 MB 3.8 kB/s s eta 0:00:01
Collecting regex!=2019.12.17
  Downloading regex-2020.6.8-cp37-cp37m-manylinux2010_x86_64.whl (661 kB)
[K     |████████████████████████████████| 661 kB 102.2 MB/s eta 0:00:01
[?25hCollecting tokenizers==0.7.0
  Downloading tokenizers-0.7.0-cp37-cp37m-manylinux1_x86_64.whl (5.6 MB)
[K     |████████████████████████████████| 5.6 MB 105.5 MB/s eta 0:00:01
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.91-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 109.4 MB/s eta 0:00:01
[?25hCollecting sacremoses
  Downloading sac

In [2]:
!pip install git+https://github.com/Tiiiger/bert_score
!git clone https://github.com/Tiiiger/bert_score
%cd bert_score
!pip install .

Collecting git+https://github.com/Tiiiger/bert_score
  Cloning https://github.com/Tiiiger/bert_score to /tmp/pip-req-build-o5wpazio
  Running command git clone -q https://github.com/Tiiiger/bert_score /tmp/pip-req-build-o5wpazio
Building wheels for collected packages: bert-score
  Building wheel for bert-score (setup.py) ... [?25ldone
[?25h  Created wheel for bert-score: filename=bert_score-0.3.3-py3-none-any.whl size=52503 sha256=1d02ac4291724e13064e89ef3113118cce5b6b3bb2e0580bb100f2e9a3a86b52
  Stored in directory: /tmp/pip-ephem-wheel-cache-k0vv27gg/wheels/14/b1/f2/908224271508d2ab483b7537445f2b05e7076c144fc25e0fb1
Successfully built bert-score
Installing collected packages: bert-score
  Attempting uninstall: bert-score
    Found existing installation: bert-score 0.2.2
    Uninstalling bert-score-0.2.2:
      Successfully uninstalled bert-score-0.2.2
Successfully installed bert-score-0.3.3
Cloning into 'bert_score'...
remote: Enumerating objects: 164, done.[K
remote: Counting obj

In [3]:
!python -m unittest discover

Downloading: 100%|██████████████████████████████| 482/482 [00:00<00:00, 756kB/s]
Downloading: 100%|███████████████████████████| 899k/899k [00:00<00:00, 53.6MB/s]
Downloading: 100%|███████████████████████████| 456k/456k [00:00<00:00, 25.3MB/s]
Downloading: 100%|█████████████████████████| 1.43G/1.43G [00:20<00:00, 70.7MB/s]
...
----------------------------------------------------------------------
Ran 12 tests in 159.204s

OK


In [4]:
import pandas as pd
import numpy as np
from bert_score import score

In [5]:
!wget https://sota-ydata.s3.amazonaws.com/Paraphrase.csv

--2020-06-09 12:32:12--  https://sota-ydata.s3.amazonaws.com/Paraphrase.csv
Resolving sota-ydata.s3.amazonaws.com (sota-ydata.s3.amazonaws.com)... 52.216.98.179
Connecting to sota-ydata.s3.amazonaws.com (sota-ydata.s3.amazonaws.com)|52.216.98.179|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44220 (43K) [text/csv]
Saving to: ‘Paraphrase.csv’


2020-06-09 12:32:12 (38.2 MB/s) - ‘Paraphrase.csv’ saved [44220/44220]



In [6]:
df = pd.read_csv("Paraphrase.csv")

In [7]:
[df['text_2'][0]]

['take additional measures to']

In [8]:
scores = []
for i in range(df.shape[0]):
  _, _, F1 = score([df['text_1'][i]], [df['text_2'][i]], lang="en", verbose=True)
  scores.append(F1.mean())

calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.72 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.71 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.91 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.44 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 13.99 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.41 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.51 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.46 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.74 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.07 seconds, 14.57 sentences/sec


In [9]:
len(scores)

998

In [11]:
OldMin = 0.0
OldMax = 1.0
NewMin = 0.0
NewMax = 5.0

new_scores = []
for score in scores:
    new_scores.append(((score - OldMin) * (NewMax - NewMin)) / (OldMax - OldMin) + NewMin)

In [20]:
!pwd

/home/ubuntu/bert_score


In [21]:
pd.Series(scores).to_csv("BERTscore_vanilla.csv")

In [22]:
pd.Series(new_scores).to_csv("BERTscore_updatedrange.csv")