* **Semantic Search** enables search by meaning, and not simply keyword matching.

In [7]:
import pandas as pd
import numpy as np
from tqdm import tqdm
import os

In [2]:
from sentence_transformers import SentenceTransformer

Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



# Dense Retrieval

<center><img src="img/dense.png"></center>

In [13]:
import cohere
api_key=os.environ["COHERE_API"]
co=cohere.Client(api_key)

In [14]:
text = """
Interstellar is a 2014 epic science fiction film co-written, directed, and produced by
Christopher Nolan.
It stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, Bill Irwin, Ellen
Burstyn, Matt Damon, and Michael Caine.
Set in a dystopian future where humanity is struggling to survive, the film follows a group
of astronauts who travel through a wormhole near Saturn in search of a new home for
mankind.
Brothers Christopher and Jonathan Nolan wrote the screenplay, which had its origins in
a script Jonathan developed in 2007.
Caltech theoretical physicist and 2017 Nobel laureate in Physics[4] Kip Thorne was an
executive producer, acted as a scientific consultant, and wrote a tie-in book, The Science
of Interstellar.
Cinematographer Hoyte van Hoytema shot it on 35 mm movie film in the Panavision
anamorphic format and IMAX 70 mm.
Principal photography began in late 2013 and took place in Alberta, Iceland, and Los
Angeles.
Interstellar uses extensive practical and miniature effects and the company Double
Negative created additional digital effects.
Interstellar premiered on October 26, 2014, in Los Angeles.
In the United States, it was first released on film stock, expanding to venues using digital
projectors.
The film had a worldwide gross over $677 million (and $773 million with subsequent rereleases),
making it the tenth-highest grossing film of 2014.
It received acclaim for its performances, direction, screenplay, musical score, visual
effects, ambition, themes, and emotional weight.
It has also received praise from many astronomers for its scientific accuracy and
portrayal of theoretical astrophysics. Since its premiere, Interstellar gained a cult
following,[5] and now is regarded by many sci-fi experts as one of the best sciencefiction
films of all time.
Interstellar was nominated for five awards at the 87th Academy Awards, winning Best
Visual Effects, and received numerous other accolades"""

In [15]:
chunks=text.split(".")
chunks

['\nInterstellar is a 2014 epic science fiction film co-written, directed, and produced by\nChristopher Nolan',
 '\nIt stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, Bill Irwin, Ellen\nBurstyn, Matt Damon, and Michael Caine',
 '\nSet in a dystopian future where humanity is struggling to survive, the film follows a group\nof astronauts who travel through a wormhole near Saturn in search of a new home for\nmankind',
 '\nBrothers Christopher and Jonathan Nolan wrote the screenplay, which had its origins in\na script Jonathan developed in 2007',
 '\nCaltech theoretical physicist and 2017 Nobel laureate in Physics[4] Kip Thorne was an\nexecutive producer, acted as a scientific consultant, and wrote a tie-in book, The Science\nof Interstellar',
 '\nCinematographer Hoyte van Hoytema shot it on 35 mm movie film in the Panavision\nanamorphic format and IMAX 70 mm',
 '\nPrincipal photography began in late 2013 and took place in Alberta, Iceland, and Los\nAngeles',
 '\nInterstellar u

In [19]:
chunks=list(map(lambda x: x.strip("\n"),chunks))
chunks

['Interstellar is a 2014 epic science fiction film co-written, directed, and produced by\nChristopher Nolan',
 'It stars Matthew McConaughey, Anne Hathaway, Jessica Chastain, Bill Irwin, Ellen\nBurstyn, Matt Damon, and Michael Caine',
 'Set in a dystopian future where humanity is struggling to survive, the film follows a group\nof astronauts who travel through a wormhole near Saturn in search of a new home for\nmankind',
 'Brothers Christopher and Jonathan Nolan wrote the screenplay, which had its origins in\na script Jonathan developed in 2007',
 'Caltech theoretical physicist and 2017 Nobel laureate in Physics[4] Kip Thorne was an\nexecutive producer, acted as a scientific consultant, and wrote a tie-in book, The Science\nof Interstellar',
 'Cinematographer Hoyte van Hoytema shot it on 35 mm movie film in the Panavision\nanamorphic format and IMAX 70 mm',
 'Principal photography began in late 2013 and took place in Alberta, Iceland, and Los\nAngeles',
 'Interstellar uses extensive pr

In [20]:
# embedding chunks with cohere
response=co.embed(
    texts=chunks,
    input_type="search_document",
).embeddings
embeds=np.array(response)
print(embeds.shape)

(15, 4096)


In [24]:
import faiss 
index=faiss.IndexFlatL2(embeds.shape[1])
print(index.is_trained)

True


In [25]:
index.add(np.float32(embeds))

In [30]:
def search_top_k(query,top_k=3):
    q_embed=co.embed(
        texts=[query],
        input_type="search_query"
    ).embeddings[0]
    distances,ids=index.search(np.float32([q_embed]),top_k)
    chunks_np=np.array(chunks)
    results=pd.DataFrame({
        "texts":chunks_np[ids[0]],
        "dists":distances[0]
    })
    return results

In [31]:
query="how precise was the science"
results=search_top_k(query)
results

Unnamed: 0,texts,dists
0,It has also received praise from many astronom...,11104.859375
1,Interstellar uses extensive practical and mini...,11975.107422
2,Caltech theoretical physicist and 2017 Nobel l...,12704.984375


## Chunking Long Text

<center>
<img src="img/chunking.png">
<img src="img/chunk_app.png">
</center>

- A vector database allows you to add or delete vectors without having to rebuild the index

# Reranking

<center>
    <img src="img/rerank_pipe.png">
</center>

In [33]:
results=co.rerank(
    query=query,documents=chunks,top_n=3,return_documents=True
)
results



In [36]:
for result in results.results:
    print(result.document.text,result.relevance_score)

It has also received praise from many astronomers for its scientific accuracy and
portrayal of theoretical astrophysics 0.16981852
The film had a worldwide gross over $677 million (and $773 million with subsequent rereleases),
making it the tenth-highest grossing film of 2014 0.07030385
Caltech theoretical physicist and 2017 Nobel laureate in Physics[4] Kip Thorne was an
executive producer, acted as a scientific consultant, and wrote a tie-in book, The Science
of Interstellar 0.0043994132


<center>
    <img src="img/rerank_schema.png">
</center>