<a href="https://colab.research.google.com/github/Vinayak-Sharma12/Types-of-Vector-Stores-or-Databases/blob/main/ChromaDB_Vector_Database(Part1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q chromadb sentence-transformers langchain langchain_core

## **LOAD TEXT FROM .TXT FILE**

In [2]:
# Replace 'your_text_file.txt' with the actual path to your file
file_path = '/content/sample_rag_10_pages.txt'

try:
    with open(file_path, 'r') as file:
        text = file.read()
        print("File content:")
        print(text)
except FileNotFoundError:
    print(f"Error: The file '{file_path}' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

File content:
The History of Artificial Intelligence
Artificial Intelligence (AI) is a branch of computer science focused on creating machines capable of performing tasks that typically require human intelligence. 
The concept dates back to the mid-20th century, when pioneers such as Alan Turing explored whether machines could "think." 
In 1956, the Dartmouth Conference marked the official birth of AI as a field. Over the decades, AI research has gone through cycles of optimism, funding booms, and periods known as "AI winters" where interest waned due to limited progress. 
Key milestones include the development of expert systems in the 1970s, the rise of machine learning in the 1990s, and the explosion of deep learning in the 2010s. Today, AI is applied in natural language processing, computer vision, robotics, and decision-making systems across industries.

The Great Wall of China
The Great Wall of China is one of the most remarkable architectural achievements in history. 
Constructio

## **SPLITTING THE TEXT**

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter=RecursiveCharacterTextSplitter()
chunks=splitter.split_text(text)

In [4]:
len(chunks)
print(chunks)

['The History of Artificial Intelligence\nArtificial Intelligence (AI) is a branch of computer science focused on creating machines capable of performing tasks that typically require human intelligence. \nThe concept dates back to the mid-20th century, when pioneers such as Alan Turing explored whether machines could "think." \nIn 1956, the Dartmouth Conference marked the official birth of AI as a field. Over the decades, AI research has gone through cycles of optimism, funding booms, and periods known as "AI winters" where interest waned due to limited progress. \nKey milestones include the development of expert systems in the 1970s, the rise of machine learning in the 1990s, and the explosion of deep learning in the 2010s. Today, AI is applied in natural language processing, computer vision, robotics, and decision-making systems across industries.\n\nThe Great Wall of China\nThe Great Wall of China is one of the most remarkable architectural achievements in history. \nConstruction be

## **1.CREATING VECTOR STORE BY ADDING EMBEDDING MANUALLY**

STEPS

1.   Define Embedding model
2.   Embed chunks
3.   Create client
4.   Create collection
2.   Add documents,embedding and metadata in collection





In [5]:
from sentence_transformers import SentenceTransformer
embedding_model=SentenceTransformer('all-mpnet-base-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [6]:
embeddings = embedding_model.encode(chunks) # is array

print("Shape of chunk embeddings:", embeddings.shape)

Shape of chunk embeddings: (3, 768)


In [7]:
import chromadb
client=chromadb.Client()

In [8]:
collection=client.create_collection(name='my_collections')

In [9]:
collection.add(
    ids=['id1','id2','id3'],
    embeddings=embeddings,
    documents=chunks
    )

## **2.CREATING VECTOR STORE USING embedding_function**

**STEPS**

1. Define embedding model
2. Create client
3. Create collection and define embedding in it
4. Add docs to the collection



Defined Embedding model

In [10]:
from sentence_transformers import SentenceTransformer
from chromadb.utils import embedding_functions
embedding_func=embedding_functions.SentenceTransformerEmbeddingFunction(model_name='all-mpnet-base-v2')

In [11]:
client=chromadb.Client()

In [12]:
new_collections=client.create_collection(name='auto_collection',embedding_function=embedding_func)

In [13]:
new_collections.add(ids=['id1','id2','id3'],documents=chunks)

## **QUERYING THE VECTOR STORE(MANUAL)**

*Embeddings in result is by default shown as None if you want that uncomment include line*

In [14]:
# Example query
query = "What is Artificial Intelligence?"

# Create embedding for the query
query_embedding = embedding_model.encode(query).tolist()

# Perform a similarity search and include embeddings in the results
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=1, # You can adjust the number of results
    #include=['embeddings', 'documents', 'distances'] # Explicitly include embeddings
)

print("Query Results:")
print(results)

Query Results:
{'ids': [['id1']], 'embeddings': None, 'documents': [['The History of Artificial Intelligence\nArtificial Intelligence (AI) is a branch of computer science focused on creating machines capable of performing tasks that typically require human intelligence. \nThe concept dates back to the mid-20th century, when pioneers such as Alan Turing explored whether machines could "think." \nIn 1956, the Dartmouth Conference marked the official birth of AI as a field. Over the decades, AI research has gone through cycles of optimism, funding booms, and periods known as "AI winters" where interest waned due to limited progress. \nKey milestones include the development of expert systems in the 1970s, the rise of machine learning in the 1990s, and the explosion of deep learning in the 2010s. Today, AI is applied in natural language processing, computer vision, robotics, and decision-making systems across industries.\n\nThe Great Wall of China\nThe Great Wall of China is one of the most

## **QUERY THE VECTOR STORE(EMBEDDING_FUNCTION)**

In [17]:
# Example query
query = "What is Artificial Intelligence?"

# Create embedding for the query

# Perform a similarity search and include embeddings in the results
results = new_collections.query(
    query_texts=[query],
    n_results=1, # You can adjust the number of results
    #include=['embeddings', 'documents', 'distances'] # Explicitly include embeddings
)

print("Query Results:")
print(results)

Query Results:
{'ids': [['id1']], 'embeddings': None, 'documents': [['The History of Artificial Intelligence\nArtificial Intelligence (AI) is a branch of computer science focused on creating machines capable of performing tasks that typically require human intelligence. \nThe concept dates back to the mid-20th century, when pioneers such as Alan Turing explored whether machines could "think." \nIn 1956, the Dartmouth Conference marked the official birth of AI as a field. Over the decades, AI research has gone through cycles of optimism, funding booms, and periods known as "AI winters" where interest waned due to limited progress. \nKey milestones include the development of expert systems in the 1970s, the rise of machine learning in the 1990s, and the explosion of deep learning in the 2010s. Today, AI is applied in natural language processing, computer vision, robotics, and decision-making systems across industries.\n\nThe Great Wall of China\nThe Great Wall of China is one of the most