## **This tutorial is for the vector DB**

### 1. FAISS

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = TextLoader('speech.txt').load()

text_splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap = 30)
docs = text_splitter.split_documents(documents)
docs

[Document(metadata={'source': 'speech.txt'}, page_content="The Birth of Vigyan Dhara\n\nThe Vigyan Dhara scheme came into force with effect from 16.01.2025.  It merges three key umbrella schemes into one, focusing on:\n\n❖ Science and Technology (S&T) Institutional and Human Capacity Building: This component focuses on strengthening India's scientific infrastructure and human resource pool. It aims to build and enhance research and development (R&D) labs across academic institutions, creating a robust environment for scientific research.\n\n❖ Research and Development (R&D): Vigyan Dhara emphasises research in various critical areas, including basic research, translational research in sustainable energy and water, and access to international mega facilities. This component also fosters collaborative research through international bilateral and multilateral cooperation."),
 Document(metadata={'source': 'speech.txt'}, page_content='❖ Innovation, Technology Development, and Deployment: Thi

In [6]:
embeddings = OllamaEmbeddings(model='nomic-embed-text')
db = FAISS.from_documents(documents=docs, embedding=embeddings)

In [8]:
query = "What are the Key Focus Areas of Vigyan Dhara?"
result = db.similarity_search(query=query)
result[0].page_content

'❖ Innovation, Technology Development, and Deployment: This segment of the scheme aims to drive innovation at all levels, from schools to higher education and the industry. It seeks to promote technology development and deployment, with a particular focus on increasing collaboration between academia, government, and industry, as well as supporting startups.\n\nThis strategic integration enhances efficiency in fund utilization and establishes synchronization among the sub-schemes and programs, ensuring a more streamlined approach to achieving scientific progress in India.\n\nKey Focus Areas of Vigyan Dhara\n\n1. Capacity Building\n\nEstablishing advanced research laboratories in academic institutions\nSupporting faculty development and student research\nPromoting international scientific collaborations\n \n\n2. Research and Development'

## **DB as a Retriever**

We can convert the vectorstore into a Retriever class. This allows us to easliy use it in other LangChain methods, which largely work with retrievers.

In [10]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

'❖ Innovation, Technology Development, and Deployment: This segment of the scheme aims to drive innovation at all levels, from schools to higher education and the industry. It seeks to promote technology development and deployment, with a particular focus on increasing collaboration between academia, government, and industry, as well as supporting startups.\n\nThis strategic integration enhances efficiency in fund utilization and establishes synchronization among the sub-schemes and programs, ensuring a more streamlined approach to achieving scientific progress in India.\n\nKey Focus Areas of Vigyan Dhara\n\n1. Capacity Building\n\nEstablishing advanced research laboratories in academic institutions\nSupporting faculty development and student research\nPromoting international scientific collaborations\n \n\n2. Research and Development'

## **Similarity Search with score**

There are some FAISS specific methods, and one of them is **similarity_search_with_score**, which allows you to return the distance score of the query to the result documents. The returned distance score is L2 distance (Manhanttan distance) therefore lower score is better.

In [11]:
docs_score = db.similarity_search_with_score(query=query)
docs_score

[(Document(id='3d82c5ef-e692-48fa-8166-75545e1264c7', metadata={'source': 'speech.txt'}, page_content='❖ Innovation, Technology Development, and Deployment: This segment of the scheme aims to drive innovation at all levels, from schools to higher education and the industry. It seeks to promote technology development and deployment, with a particular focus on increasing collaboration between academia, government, and industry, as well as supporting startups.\n\nThis strategic integration enhances efficiency in fund utilization and establishes synchronization among the sub-schemes and programs, ensuring a more streamlined approach to achieving scientific progress in India.\n\nKey Focus Areas of Vigyan Dhara\n\n1. Capacity Building\n\nEstablishing advanced research laboratories in academic institutions\nSupporting faculty development and student research\nPromoting international scientific collaborations\n \n\n2. Research and Development'),
  298.94128),
 (Document(id='db900e50-5bc8-4ae8-

## **Similarity search by Vector**

In [13]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[-0.6059664487838745,
 1.2476463317871094,
 -3.421314001083374,
 0.24086515605449677,
 1.2123832702636719,
 -0.8107025027275085,
 1.0161339044570923,
 -0.7363887429237366,
 -0.32108306884765625,
 0.4040510356426239,
 1.2819466590881348,
 -0.6576157808303833,
 1.9543966054916382,
 1.5466151237487793,
 0.522363007068634,
 -1.3982549905776978,
 -0.9713996052742004,
 -0.45335933566093445,
 0.022120380774140358,
 0.36779680848121643,
 -1.2631123065948486,
 0.6028529405593872,
 -0.1806616336107254,
 0.12188220024108887,
 0.5596415996551514,
 0.18307507038116455,
 0.9520277976989746,
 -0.18862928450107574,
 -0.44390544295310974,
 0.3130866587162018,
 1.0988706350326538,
 -0.47393953800201416,
 -0.030148614197969437,
 -0.35082685947418213,
 0.2964194715023041,
 -0.4599565863609314,
 1.0285028219223022,
 0.6201717853546143,
 -0.27293863892555237,
 -0.9632949829101562,
 0.4748021960258484,
 -0.6098228693008423,
 1.097834587097168,
 -1.4263043403625488,
 0.5968238115310669,
 0.33043015003204346,


In [14]:
docs_vector = db.similarity_search_by_vector(embedding_vector)
docs_vector

[Document(id='3d82c5ef-e692-48fa-8166-75545e1264c7', metadata={'source': 'speech.txt'}, page_content='❖ Innovation, Technology Development, and Deployment: This segment of the scheme aims to drive innovation at all levels, from schools to higher education and the industry. It seeks to promote technology development and deployment, with a particular focus on increasing collaboration between academia, government, and industry, as well as supporting startups.\n\nThis strategic integration enhances efficiency in fund utilization and establishes synchronization among the sub-schemes and programs, ensuring a more streamlined approach to achieving scientific progress in India.\n\nKey Focus Areas of Vigyan Dhara\n\n1. Capacity Building\n\nEstablishing advanced research laboratories in academic institutions\nSupporting faculty development and student research\nPromoting international scientific collaborations\n \n\n2. Research and Development'),
 Document(id='db900e50-5bc8-4ae8-abb6-b681b1f3428

### 2. ChromaDB

In [16]:
from langchain_community.vectorstores import Chroma

chroma_db = Chroma.from_documents(documents=docs, embedding=embeddings)
chroma_db

<langchain_community.vectorstores.chroma.Chroma at 0x1b9fab21fd0>

In [17]:
query = "What are the Key Focus Areas of Vigyan Dhara?"
result = chroma_db.similarity_search(query=query)
result[0].page_content

'❖ Innovation, Technology Development, and Deployment: This segment of the scheme aims to drive innovation at all levels, from schools to higher education and the industry. It seeks to promote technology development and deployment, with a particular focus on increasing collaboration between academia, government, and industry, as well as supporting startups.\n\nThis strategic integration enhances efficiency in fund utilization and establishes synchronization among the sub-schemes and programs, ensuring a more streamlined approach to achieving scientific progress in India.\n\nKey Focus Areas of Vigyan Dhara\n\n1. Capacity Building\n\nEstablishing advanced research laboratories in academic institutions\nSupporting faculty development and student research\nPromoting international scientific collaborations\n \n\n2. Research and Development'

In [None]:
## We can save this Chroma DB in our local system

#chroma_db = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory="./chroma_db")