# SBERT doc vs. quote simple test

Semantic search for quotes? A simple test using SBERT.

Anchor (doc): "In 2021, Quinto’s family — with a coalition of 75 Filipino-American organizations — helped push through a California bill banning police restraints that impair breathing in Quinto’s name."

Positive (raw transcript), 0.3975: "We did that, and as soon as that went public. About 170 more than 170 Filipino organizations Filipino American organizations actually ended up coming together. to form a coalition because of my brother's death."

Positive (manually fixed), 0.4084: "We did that, and as soon as that went public about 170 more than 170 Filipino American organizations actually ended up coming together to form a coalition because of my brother's death."

Negative (raw transcript), 0.0472: "we do base it very specifically on our experiences and experiences and those around us, who who've experienced some experience. who he gone through similar tragedies"

Negative (manually fixed), 0.0619: "We do base it very specifically on our experiences and experiences of those around us who have gone through similar tragedies."

In [1]:
!pip install sentence_transformers

Collecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m268.9 kB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l- done
Building wheels for collected packages: sentence_transformers
  Building wheel for sentence_transformers (setup.py) ... [?25l- \ | done
[?25h  Created wheel for sentence_transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125938 sha256=71ebb869da9243b5c6708fec37562488f5f394cb22502bebe20c57a5c7fa2465
  Stored in directory: /root/.cache/pip/wheels/bf/06/fb/d59c1e5bd1dac7f6cf61ec0036cc3a10ab8fecaa6b2c3d3ee9
Successfully built sentence_transformers
Installing collected packages: sentence_transformers
Successfully installed sentence_transformers-2.2.2
[0m

In [2]:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [3]:
# anchor (written) string
s_doc = "In 2021, Quinto’s family — with a coalition of 75 Filipino-American organizations — helped push through a California bill banning police restraints that impair breathing in Quinto’s name."

# related quote
s_quote = "We did that, and as soon as that went public. About 170 more than 170 Filipino organizations Filipino American organizations actually ended up coming together. to form a coalition because of my brother's death."

# related quote manually fixed for grammar and wording
s_quote_fixed = "We did that, and as soon as that went public about 170 more than 170 Filipino American organizations actually ended up coming together to form a coalition because of my brother's death."

# unrelated quote
s_quote_negative = "we do base it very specifically on our experiences and experiences and those around us, who who've experienced some experience. who have gone through similar tragedies"

# unrelated quote manually fixed for grammar and wording
s_quote_negative_fixed = "We do base it very specifically on our experiences and experiences of those around us who have gone through similar tragedies."

In [4]:
# pack strings into list
quotes = [s_doc, s_quote, s_quote_fixed, s_quote_negative, s_quote_negative_fixed]

# find embeddings for quotes and unpack into separate variables
e_doc, e_quote, e_quote_fixed, e_quote_negative, e_quote_negative_fixed = model.encode(quotes)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
sim_doc_quote = util.cos_sim(e_doc, e_quote)
sim_doc_quote_fixed = util.cos_sim(e_doc, e_quote_fixed)
sim_doc_quote_negative = util.cos_sim(e_doc, e_quote_negative)
sim_doc_quote_negative_fixed = util.cos_sim(e_doc, e_quote_negative_fixed)

In [6]:
print("sim doc quote: " + str(sim_doc_quote))
print("sim doc quote fixed: " + str(sim_doc_quote_fixed))
print("sim doc quote negative: " + str(sim_doc_quote_negative))
print("sim doc quote negative fixed: " + str(sim_doc_quote_negative_fixed))

sim doc quote: tensor([[0.3975]])
sim doc quote fixed: tensor([[0.4084]])
sim doc quote negative: tensor([[0.0472]])
sim doc quote negative fixed: tensor([[0.0619]])
