## facebook ai similarity search

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS 
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("alien.txt")
documents = loader.load()
documents

[Document(metadata={'source': 'alien.txt'}, page_content='Extraterrestrial life, or alien life (colloquially, alien), is life that originates from another world rather than on Earth. No extraterrestrial life has yet been scientifically conclusively detected. Such life might range from simple forms such as prokaryotes to intelligent beings, possibly bringing forth civilizations that might be far more, or far less, advanced than humans.[1][2][3] The Drake equation speculates about the existence of sapient life elsewhere in the universe. The science of extraterrestrial life is known as astrobiology.\n\nSpeculation about the possibility of inhabited worlds beyond Earth dates back to antiquity. Early Christian writers discussed the idea of a "plurality of worlds" as proposed by earlier thinkers such as Democritus; Augustine references Epicurus\'s idea of innumerable worlds "throughout the boundless immensity of space" in The City of God.[4]\n\nPre-modern writers typically assumed extraterre

In [2]:
text_splitter = CharacterTextSplitter(chunk_size = 200,chunk_overlap = 30)

In [10]:
documents = text_splitter.split_documents(documents)

Created a chunk of size 545, which is longer than the specified 200
Created a chunk of size 341, which is longer than the specified 200
Created a chunk of size 767, which is longer than the specified 200
Created a chunk of size 696, which is longer than the specified 200
Created a chunk of size 392, which is longer than the specified 200
Created a chunk of size 712, which is longer than the specified 200
Created a chunk of size 1508, which is longer than the specified 200
Created a chunk of size 931, which is longer than the specified 200
Created a chunk of size 1383, which is longer than the specified 200
Created a chunk of size 1156, which is longer than the specified 200
Created a chunk of size 526, which is longer than the specified 200
Created a chunk of size 469, which is longer than the specified 200
Created a chunk of size 1163, which is longer than the specified 200
Created a chunk of size 669, which is longer than the specified 200
Created a chunk of size 740, which is longer

In [11]:
embeddings = OllamaEmbeddings(model = 'gemma2:2b')
db = FAISS.from_documents(documents, embeddings)


In [15]:
query = "what is an extraterrestrial life?"
docs = db.similarity_search(query)
docs[0].page_content

"The aspects of these harsh environments that make them ideal for the origin of life on Earth, as well as the possibility of creation of life on other planets, is the chemical reactions forming spontaneously. For example, the hydrothermal vents found on the ocean floor are known to support many chemosynthetic processes[9] which allow organisms to utilize energy through reduced chemical compounds that fix carbon.[64] In return, these reactions will allow for organisms to live in relatively low oxygenated environments while maintaining enough energy to support themselves. The early Earth environment was reducing[65] and therefore, these carbon fixing compounds were necessary for the survival and possible origin of life on Earth. With the little amount of information that scientists have found regarding the atmosphere on other planets in the Milky Way galaxy and beyond, the atmospheres are most likely reducing or with very low oxygen levels,[66] especially when compared with Earth's atmos

As a retriever, we can also convert the vectorstore into a retriever class. this allows us to eaily use it in other langchain methods, which largely work with retrievers

In [16]:
retriever = db.as_retriever()
ans = retriever.invoke(query)
ans[0].page_content

"The aspects of these harsh environments that make them ideal for the origin of life on Earth, as well as the possibility of creation of life on other planets, is the chemical reactions forming spontaneously. For example, the hydrothermal vents found on the ocean floor are known to support many chemosynthetic processes[9] which allow organisms to utilize energy through reduced chemical compounds that fix carbon.[64] In return, these reactions will allow for organisms to live in relatively low oxygenated environments while maintaining enough energy to support themselves. The early Earth environment was reducing[65] and therefore, these carbon fixing compounds were necessary for the survival and possible origin of life on Earth. With the little amount of information that scientists have found regarding the atmosphere on other planets in the Milky Way galaxy and beyond, the atmospheres are most likely reducing or with very low oxygen levels,[66] especially when compared with Earth's atmos

Similarity search with score
there are some FAISS specific methods. 
one of them is similarity_search_with_score which allows you to return not only the documents but also the distance score of the query to them. the returned distance score is L2 distance. Lower the score , It is better

In [17]:
ans_and_score = db.similarity_search_with_score(query)
ans_and_score

[(Document(id='089ba740-b9ce-40a9-969c-b97f72a06da9', metadata={'source': 'alien.txt'}, page_content="The aspects of these harsh environments that make them ideal for the origin of life on Earth, as well as the possibility of creation of life on other planets, is the chemical reactions forming spontaneously. For example, the hydrothermal vents found on the ocean floor are known to support many chemosynthetic processes[9] which allow organisms to utilize energy through reduced chemical compounds that fix carbon.[64] In return, these reactions will allow for organisms to live in relatively low oxygenated environments while maintaining enough energy to support themselves. The early Earth environment was reducing[65] and therefore, these carbon fixing compounds were necessary for the survival and possible origin of life on Earth. With the little amount of information that scientists have found regarding the atmosphere on other planets in the Milky Way galaxy and beyond, the atmospheres are

In [18]:
## directly passing the vectors 
embedding_vector = embeddings.embed_query(query)
db.similarity_search_by_vector(embedding_vector)

[Document(id='089ba740-b9ce-40a9-969c-b97f72a06da9', metadata={'source': 'alien.txt'}, page_content="The aspects of these harsh environments that make them ideal for the origin of life on Earth, as well as the possibility of creation of life on other planets, is the chemical reactions forming spontaneously. For example, the hydrothermal vents found on the ocean floor are known to support many chemosynthetic processes[9] which allow organisms to utilize energy through reduced chemical compounds that fix carbon.[64] In return, these reactions will allow for organisms to live in relatively low oxygenated environments while maintaining enough energy to support themselves. The early Earth environment was reducing[65] and therefore, these carbon fixing compounds were necessary for the survival and possible origin of life on Earth. With the little amount of information that scientists have found regarding the atmosphere on other planets in the Milky Way galaxy and beyond, the atmospheres are 

In [19]:
### Saving and Loading 
db.save_local("faiss_index")

In [24]:
## loading
loaded_db = FAISS.load_local("faiss_index",embeddings, allow_dangerous_deserialization=True)

In [25]:
loaded_db.similarity_search(query)

[Document(id='089ba740-b9ce-40a9-969c-b97f72a06da9', metadata={'source': 'alien.txt'}, page_content="The aspects of these harsh environments that make them ideal for the origin of life on Earth, as well as the possibility of creation of life on other planets, is the chemical reactions forming spontaneously. For example, the hydrothermal vents found on the ocean floor are known to support many chemosynthetic processes[9] which allow organisms to utilize energy through reduced chemical compounds that fix carbon.[64] In return, these reactions will allow for organisms to live in relatively low oxygenated environments while maintaining enough energy to support themselves. The early Earth environment was reducing[65] and therefore, these carbon fixing compounds were necessary for the survival and possible origin of life on Earth. With the little amount of information that scientists have found regarding the atmosphere on other planets in the Milky Way galaxy and beyond, the atmospheres are 