
Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors.
It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM.
It also contains supporting code for evaluation and parameter tuning.



In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../Data-Ingestion/speech.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=200,chunk_overlap = 30)
docs = text_splitter.split_documents(documents)
docs

[Document(metadata={'source': '../Data-Ingestion/speech.txt'}, page_content="Speech is the use of the human voice as a medium for language.\nSpoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon.\nThere are many different intentional speech acts, such as informing, declaring, asking, persuading, directing; acts may vary in various aspects like enunciation, intonation, loudness, and tempo to convey meaning.\nIndividuals may also unintentionally communicate aspects of their social position through speech, such as sex, age, place of origin, physiological and mental condition, education, and experiences.")]

In [4]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")
db = FAISS.from_documents(docs,embeddings)
db


<langchain_community.vectorstores.faiss.FAISS at 0x27477908dc0>

In [5]:
### Quering

query = "What spoken language combines?"

docs = db.similarity_search(query)
docs

[Document(id='f963ab3a-aa36-4a84-8783-f2fccaa2e7e1', metadata={'source': '../Data-Ingestion/speech.txt'}, page_content="Speech is the use of the human voice as a medium for language.\nSpoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon.\nThere are many different intentional speech acts, such as informing, declaring, asking, persuading, directing; acts may vary in various aspects like enunciation, intonation, loudness, and tempo to convey meaning.\nIndividuals may also unintentionally communicate aspects of their social position through speech, such as sex, age, place of origin, physiological and mental condition, education, and experiences.")]

#### As a Retriever

```As a Retriever
We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other LangChain
methods, which largely work with retrievers

we cant us vector store db directly in llm's models we need to convert vector store db into retriever to use in llm

In [6]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

"Speech is the use of the human voice as a medium for language.\nSpoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon.\nThere are many different intentional speech acts, such as informing, declaring, asking, persuading, directing; acts may vary in various aspects like enunciation, intonation, loudness, and tempo to convey meaning.\nIndividuals may also unintentionally communicate aspects of their social position through speech, such as sex, age, place of origin, physiological and mental condition, education, and experiences."

Similarity Search with score

There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return
not only the documents but also the distance score of the query to them.Jhe returned distance score is L2
distance. Therefore, a lower score is better.

In [7]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='f963ab3a-aa36-4a84-8783-f2fccaa2e7e1', metadata={'source': '../Data-Ingestion/speech.txt'}, page_content="Speech is the use of the human voice as a medium for language.\nSpoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon.\nThere are many different intentional speech acts, such as informing, declaring, asking, persuading, directing; acts may vary in various aspects like enunciation, intonation, loudness, and tempo to convey meaning.\nIndividuals may also unintentionally communicate aspects of their social position through speech, such as sex, age, place of origin, physiological and mental condition, education, and experiences."),
  np.float32(373.6297))]

we can also pass vector instead of documents

In [8]:
embeddings_vector = embeddings.embed_query(query)
embeddings_vector

[0.15798084437847137,
 0.5120458006858826,
 -4.082484245300293,
 -0.4784327745437622,
 1.1038262844085693,
 0.10434720665216446,
 -0.9318997263908386,
 -0.4123402237892151,
 -0.5134251117706299,
 -0.6889792084693909,
 -0.09757763147354126,
 -0.1335199773311615,
 0.9321622252464294,
 1.8160008192062378,
 -0.22814412415027618,
 -1.3205331563949585,
 -0.3057699203491211,
 0.33232754468917847,
 -1.2231028079986572,
 1.8165100812911987,
 -1.9196696281433105,
 1.1816108226776123,
 1.0094308853149414,
 -1.167564034461975,
 2.0028231143951416,
 0.4723759591579437,
 -0.3366633355617523,
 0.8622792363166809,
 -1.731506586074829,
 -1.3744410276412964,
 0.6180493831634521,
 -1.1953856945037842,
 0.45701128244400024,
 1.3643853664398193,
 -0.6380380988121033,
 -1.439465880393982,
 0.8053198456764221,
 -0.7258009314537048,
 -0.8465796709060669,
 -0.18383577466011047,
 0.3750064969062805,
 0.7111011743545532,
 0.2821485698223114,
 -0.6023359298706055,
 2.1714305877685547,
 -0.6370998024940491,
 0.131

In [9]:
docs_score = db.similarity_search_by_vector(embeddings_vector)
docs_score

[Document(id='f963ab3a-aa36-4a84-8783-f2fccaa2e7e1', metadata={'source': '../Data-Ingestion/speech.txt'}, page_content="Speech is the use of the human voice as a medium for language.\nSpoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon.\nThere are many different intentional speech acts, such as informing, declaring, asking, persuading, directing; acts may vary in various aspects like enunciation, intonation, loudness, and tempo to convey meaning.\nIndividuals may also unintentionally communicate aspects of their social position through speech, such as sex, age, place of origin, physiological and mental condition, education, and experiences.")]

In [10]:
###Saving and loading vector db in local

db.save_local("faiss_local")

In [11]:
new_db = FAISS.load_local("faiss_local",embeddings,allow_dangerous_deserialization=True)
docs = new_db.similarity_search(query)
docs

[Document(id='f963ab3a-aa36-4a84-8783-f2fccaa2e7e1', metadata={'source': '../Data-Ingestion/speech.txt'}, page_content="Speech is the use of the human voice as a medium for language.\nSpoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon.\nThere are many different intentional speech acts, such as informing, declaring, asking, persuading, directing; acts may vary in various aspects like enunciation, intonation, loudness, and tempo to convey meaning.\nIndividuals may also unintentionally communicate aspects of their social position through speech, such as sex, age, place of origin, physiological and mental condition, education, and experiences.")]