FAISS (Facebook AI Similarity Search) is an open-source library from Meta for fast similarity search and clustering of high-dimensional vectors, essential for large-scale AI applications like image/video retrieval, recommendation systems, and NLP, handling billions of vectors efficiently by creating specialized indexes for quick nearest-neighbor searches. It offers speed and memory optimization through techniques like quantization and GPU acceleration, finding items similar to a query vector even in massive datasets where traditional databases fail. 

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings


#load documents
loader = TextLoader("speech.txt")
documents = loader.load()

#split documents
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=20)
texts = text_splitter.split_documents(documents)

# create vector store



  from pydantic.v1.fields import FieldInfo as FieldInfoV1
  from .autonotebook import tqdm as notebook_tqdm
Created a chunk of size 482, which is longer than the specified 100
Created a chunk of size 477, which is longer than the specified 100
Created a chunk of size 108, which is longer than the specified 100


In [4]:
texts

[Document(metadata={'source': 'speech.txt'}, page_content='Text to speech (TTS) is the use of software to create a sound output in the form of a spoken voice. The program that is used by programs to change text on the page to an audio output of the spoken voice is normally a text to speech engine. Blind people, people who do not see well, and people with reading disabilities can rely on good text-to-speech systems. That way they can listen to pieces of the text. TTS engines are needed for an audio output of machine translation results.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Up until about 2010, there was the analytic approach: This approach uses multiply steps to convert the text to speech. Usually, an input text is transformed into phonetic writing. This says how the words are pronounced, and not how they are written. In the phonetic writing, phonemes can be identified. The system can then produce speech by putting together prerecorded or synthesized diphones. A

In [6]:
# Create embeddings
embeddings = OllamaEmbeddings(model="llama2")
# Create vector store
vectorstore = FAISS.from_documents(texts, embeddings)

In [None]:
### Now you can use the vectorstore for similarity search or other operations
query = "What is Text-to-Speech (TTS)?"
docs = vectorstore.similarity_search(query)


In [9]:
docs[0].page_content

'As of 2022, deep learning is used. To get a good result, neural networks are trained with many good samples.'

#### As a retriver
Vector database se sabse relevant (milta-julta) documents nikalna


Normal Query kya hoti hai?

Normal query ka matlab:

Direct vector database me search

Sirf similar documents nikalna





ðŸ”¹ Retriever Query kya hoti hai?

Retriever query ka matlab:

Query ko ek standard search interface ke through pass karna

Advanced options (MMR, filters, k tuning)





Similarity Search with Score kya hota hai?

Ye function:

Documents return karta hai

Saath me score bhi deta hai

Score batata hai ki query aur document kitne similar hain

In [11]:
retriever = vectorstore.as_retriever()
retriever.invoke(query)
docs[0].page_content

'As of 2022, deep learning is used. To get a good result, neural networks are trained with many good samples.'

In [16]:
docs_and_score = vectorstore.similarity_search_with_score(query)

docs_and_score[0][0].page_content


'As of 2022, deep learning is used. To get a good result, neural networks are trained with many good samples.'

In [18]:
### saving and loading vectorstore
vectorstore.save_local("faiss_vectorstore")


In [21]:
new_vectorstore = FAISS.load_local("faiss_vectorstore", embeddings, allow_dangerous_deserialization=True)
docs = new_vectorstore.similarity_search(query)

In [22]:
docs

[Document(id='1b677c87-de52-49ac-8900-75961e1f1a75', metadata={'source': 'speech.txt'}, page_content='As of 2022, deep learning is used. To get a good result, neural networks are trained with many good samples.'),
 Document(id='93a48199-8e72-4f5f-b3f8-7ac440c5c90c', metadata={'source': 'speech.txt'}, page_content='Text to speech (TTS) is the use of software to create a sound output in the form of a spoken voice. The program that is used by programs to change text on the page to an audio output of the spoken voice is normally a text to speech engine. Blind people, people who do not see well, and people with reading disabilities can rely on good text-to-speech systems. That way they can listen to pieces of the text. TTS engines are needed for an audio output of machine translation results.'),
 Document(id='8f497c38-27bd-4b10-9c0f-3b52cd2e0140', metadata={'source': 'speech.txt'}, page_content='Up until about 2010, there was the analytic approach: This approach uses multiply steps to conve