Faiss

Facebok AI Similarity (Faiss) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possible do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [9]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader=TextLoader("speech.txt")
documents=loader.load()
Text_Splitter=CharacterTextSplitter(chunk_size=20, chunk_overlap=5)
docs=Text_Splitter.split_documents(documents)


In [10]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Nigatana Gokuda....i love to watch this anime.\nit was childhood anime which was use t watch in my childhood days.')]

docs

In [12]:
embeddings=OllamaEmbeddings()
db=FAISS.from_documents(docs, embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x28a42f2eab0>

In [14]:
## querying
query = "who is goku"
docs=db.similarity_search(query)
docs

[Document(id='67e6a9b4-985f-4404-9817-38bf4e9fb916', metadata={'source': 'speech.txt'}, page_content='Nigatana Gokuda....i love to watch this anime.\nit was childhood anime which was use t watch in my childhood days.')]

Aa a Retriever

We can also convert the bvectorstore into a retriever class. This allows us t easily use it in other LangChain
methods, which largely work with retrievers.

In [16]:
retriever=  db.as_retriever()
retriever.invoke(query)
docs[0].page_content

'Nigatana Gokuda....i love to watch this anime.\nit was childhood anime which was use t watch in my childhood days.'

Similarity Search With Score

There are some FAISS specific methods. One of them is similarity_Search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2
distance. Therefore, a lower score is better.

In [18]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score

[(Document(id='67e6a9b4-985f-4404-9817-38bf4e9fb916', metadata={'source': 'speech.txt'}, page_content='Nigatana Gokuda....i love to watch this anime.\nit was childhood anime which was use t watch in my childhood days.'),
  np.float32(23567.156))]

In [19]:
emedding_vector= embeddings.embed_query(query)
emedding_vector

[3.365841865539551,
 -2.3504855632781982,
 2.5376017093658447,
 1.0542936325073242,
 -0.18533244729042053,
 -3.205479383468628,
 0.36722174286842346,
 -0.13830003142356873,
 0.9207285642623901,
 -0.16606572270393372,
 -1.1659460067749023,
 -2.1147124767303467,
 -0.06883677840232849,
 2.277165651321411,
 -0.3825697600841522,
 -0.8641270399093628,
 -2.7251927852630615,
 -2.063044786453247,
 0.2719815671443939,
 -0.09785901755094528,
 2.650308609008789,
 -2.707223653793335,
 -1.295717716217041,
 0.7363823652267456,
 -3.0589122772216797,
 -3.172492027282715,
 0.6781783699989319,
 1.1879762411117554,
 2.0177409648895264,
 1.3804348707199097,
 0.6171729564666748,
 -1.5168284177780151,
 -1.1690789461135864,
 1.258673906326294,
 1.1426163911819458,
 -1.2247453927993774,
 -2.0534253120422363,
 -0.9062257409095764,
 0.27469369769096375,
 -1.3172731399536133,
 -1.7639544010162354,
 0.5497037172317505,
 -0.1823546290397644,
 0.1870373636484146,
 -1.9884331226348877,
 3.783402919769287,
 0.12092823

In [20]:
db.similarity_search_by_vector(emedding_vector)
docs_and_score[0][0].page_content

'Nigatana Gokuda....i love to watch this anime.\nit was childhood anime which was use t watch in my childhood days.'

In [21]:
## Saving and Loading
db.save_local("faiss_index")

In [22]:
new_db=FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
docs=new_db.similarity_search(query)

In [23]:
docs

[Document(id='67e6a9b4-985f-4404-9817-38bf4e9fb916', metadata={'source': 'speech.txt'}, page_content='Nigatana Gokuda....i love to watch this anime.\nit was childhood anime which was use t watch in my childhood days.')]