# FAISS

Facebook AI Similarity Search is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM, It also contains supporting code for evaluation and parameter tuning

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("speech.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size = 200, chunk_overlap = 30)
docs  = text_splitter.split_documents(documents)

Created a chunk of size 422, which is longer than the specified 200
Created a chunk of size 426, which is longer than the specified 200
Created a chunk of size 376, which is longer than the specified 200
Created a chunk of size 270, which is longer than the specified 200
Created a chunk of size 310, which is longer than the specified 200
Created a chunk of size 283, which is longer than the specified 200
Created a chunk of size 268, which is longer than the specified 200
Created a chunk of size 276, which is longer than the specified 200
Created a chunk of size 247, which is longer than the specified 200
Created a chunk of size 343, which is longer than the specified 200


In [2]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Ladies and gentlemen,\n\nThank you all for being here today. It is both an honor and a privilege to address you at this pivotal moment in our shared journey toward progress and transformation.'),
 Document(metadata={'source': 'speech.txt'}, page_content="Over the last decade, we have witnessed a remarkable evolution in the world of technology. Artificial intelligence, machine learning, blockchain, and quantum computing are no longer just buzzwords—they are active drivers of change, disrupting traditional systems and reimagining what's possible across every industry. Yet, amid all this innovation, we must ask ourselves: are we building systems that truly serve humanity?"),
 Document(metadata={'source': 'speech.txt'}, page_content='The potential for good is enormous. AI has begun revolutionizing healthcare—diagnosing diseases faster and with greater accuracy than ever before. In education, intelligent tutoring systems are adaptin

In [4]:
embeddings = OllamaEmbeddings(model="gemma:2b")
db = FAISS.from_documents(docs,embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x15dfdc90550>

In [7]:
query = "What is the main topic about the text?"
query_result = db.similarity_search(query)
query_result

[Document(id='2af9de96-0ef3-4c8f-9962-1e5cc7f674ff', metadata={'source': 'speech.txt'}, page_content='Because progress without purpose is just motion. But progress with intention? That’s transformation.\n\nThank you.'),
 Document(id='29b84866-fc19-4cdb-9768-7ef759bfbb63', metadata={'source': 'speech.txt'}, page_content='Finally, let us imagine boldly. Imagine a world where rural clinics use AI to detect illnesses early, saving lives before symptoms emerge. Imagine classrooms that adapt in real-time to student needs, eliminating the achievement gap. Imagine cities that run on renewable power, orchestrated by intelligent grids, serving both people and planet.'),
 Document(id='c949310d-40d3-44bb-99bd-9aee1be2e45d', metadata={'source': 'speech.txt'}, page_content='The potential for good is enormous. AI has begun revolutionizing healthcare—diagnosing diseases faster and with greater accuracy than ever before. In education, intelligent tutoring systems are adapting to individual learning sty

# As a Retriever

We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other Langchain methods, which largely work with retrievers

In [8]:
retriever = db.as_retriever()
retriever.invoke(query)

[Document(id='2af9de96-0ef3-4c8f-9962-1e5cc7f674ff', metadata={'source': 'speech.txt'}, page_content='Because progress without purpose is just motion. But progress with intention? That’s transformation.\n\nThank you.'),
 Document(id='29b84866-fc19-4cdb-9768-7ef759bfbb63', metadata={'source': 'speech.txt'}, page_content='Finally, let us imagine boldly. Imagine a world where rural clinics use AI to detect illnesses early, saving lives before symptoms emerge. Imagine classrooms that adapt in real-time to student needs, eliminating the achievement gap. Imagine cities that run on renewable power, orchestrated by intelligent grids, serving both people and planet.'),
 Document(id='c949310d-40d3-44bb-99bd-9aee1be2e45d', metadata={'source': 'speech.txt'}, page_content='The potential for good is enormous. AI has begun revolutionizing healthcare—diagnosing diseases faster and with greater accuracy than ever before. In education, intelligent tutoring systems are adapting to individual learning sty

# Similarity Search With Score

There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [9]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='2af9de96-0ef3-4c8f-9962-1e5cc7f674ff', metadata={'source': 'speech.txt'}, page_content='Because progress without purpose is just motion. But progress with intention? That’s transformation.\n\nThank you.'),
  np.float32(2417.4385)),
 (Document(id='29b84866-fc19-4cdb-9768-7ef759bfbb63', metadata={'source': 'speech.txt'}, page_content='Finally, let us imagine boldly. Imagine a world where rural clinics use AI to detect illnesses early, saving lives before symptoms emerge. Imagine classrooms that adapt in real-time to student needs, eliminating the achievement gap. Imagine cities that run on renewable power, orchestrated by intelligent grids, serving both people and planet.'),
  np.float32(2566.4956)),
 (Document(id='c949310d-40d3-44bb-99bd-9aee1be2e45d', metadata={'source': 'speech.txt'}, page_content='The potential for good is enormous. AI has begun revolutionizing healthcare—diagnosing diseases faster and with greater accuracy than ever before. In education, intelligent t

In [10]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[-0.10199008136987686,
 -0.031137611716985703,
 -0.5863866209983826,
 1.1169493198394775,
 0.932097315788269,
 1.6089903116226196,
 1.1856186389923096,
 0.6142663955688477,
 -0.7910444736480713,
 -0.24905449151992798,
 2.0573456287384033,
 0.6989849805831909,
 0.13383749127388,
 1.0903658866882324,
 -0.9834638237953186,
 0.020579509437084198,
 5.106143474578857,
 1.486456036567688,
 -0.39522382616996765,
 0.5108317732810974,
 2.443732500076294,
 0.9214428663253784,
 0.44091397523880005,
 -0.40331393480300903,
 -0.5745734572410583,
 -0.8763605356216431,
 -0.23952682316303253,
 0.7903960347175598,
 0.6244751811027527,
 -1.2474660873413086,
 1.3760820627212524,
 0.07469666749238968,
 -0.34819188714027405,
 0.2391585409641266,
 0.32903605699539185,
 -0.02615470066666603,
 -0.29221928119659424,
 -0.494477778673172,
 0.6345770955085754,
 -0.7289779782295227,
 -0.6959054470062256,
 -1.5868711471557617,
 1.3576053380966187,
 -1.6192067861557007,
 0.015007962472736835,
 -0.33966293931007385,
 1

In [11]:
docs_score = db.similarity_search_by_vector(embedding_vector)

In [12]:
docs_score

[Document(id='2af9de96-0ef3-4c8f-9962-1e5cc7f674ff', metadata={'source': 'speech.txt'}, page_content='Because progress without purpose is just motion. But progress with intention? That’s transformation.\n\nThank you.'),
 Document(id='29b84866-fc19-4cdb-9768-7ef759bfbb63', metadata={'source': 'speech.txt'}, page_content='Finally, let us imagine boldly. Imagine a world where rural clinics use AI to detect illnesses early, saving lives before symptoms emerge. Imagine classrooms that adapt in real-time to student needs, eliminating the achievement gap. Imagine cities that run on renewable power, orchestrated by intelligent grids, serving both people and planet.'),
 Document(id='c949310d-40d3-44bb-99bd-9aee1be2e45d', metadata={'source': 'speech.txt'}, page_content='The potential for good is enormous. AI has begun revolutionizing healthcare—diagnosing diseases faster and with greater accuracy than ever before. In education, intelligent tutoring systems are adapting to individual learning sty

In [13]:
db.save_local("faiss_index")

In [14]:
#loading
new_db = FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs = new_db.similarity_search(query)

In [15]:
docs

[Document(id='2af9de96-0ef3-4c8f-9962-1e5cc7f674ff', metadata={'source': 'speech.txt'}, page_content='Because progress without purpose is just motion. But progress with intention? That’s transformation.\n\nThank you.'),
 Document(id='29b84866-fc19-4cdb-9768-7ef759bfbb63', metadata={'source': 'speech.txt'}, page_content='Finally, let us imagine boldly. Imagine a world where rural clinics use AI to detect illnesses early, saving lives before symptoms emerge. Imagine classrooms that adapt in real-time to student needs, eliminating the achievement gap. Imagine cities that run on renewable power, orchestrated by intelligent grids, serving both people and planet.'),
 Document(id='c949310d-40d3-44bb-99bd-9aee1be2e45d', metadata={'source': 'speech.txt'}, page_content='The potential for good is enormous. AI has begun revolutionizing healthcare—diagnosing diseases faster and with greater accuracy than ever before. In education, intelligent tutoring systems are adapting to individual learning sty