### 🌟 What is Chroma DB?

🧠 Chroma DB is an AI-native, open-source vector database.

🎯 It focuses on developer productivity and happiness.

🪪 Licensed under Apache 2.0.

🛠️ Used to store and search vector embeddings efficiently.

https://python.langchain.com/v0.2/docs/integrations/vectorstores/

In [None]:
# 🛠️ Step 1: Install Chroma DB

# 🧪 Step 2: Use langchain_chroma
from langchain_chroma import Chroma
# ✅ Now you’re ready to use Chroma inside your LangChain apps!

# 📄 Step 3: Load the .txt File
from langchain_community.document_loaders import TextLoader

loader = TextLoader("speech.txt")
data = loader.load()
# •	📂 File: speech.txt
# •	📥 Loaded using TextLoader
# •	✅ Now data contains all the document content


# ✂️ Step 4: Split Text into Chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)
# •	🧩 Breaks large text into chunks of 500 characters
# •	❌ No overlap in this case (chunk_overlap=0)
# •	📄 splits contains the small document chunks

# 🧠 Step 5: Create Embeddings (Ollama)
from langchain_community.embeddings import OllamaEmbeddings

embedding = OllamaEmbeddings(model="nomic-embed-text")

# C:\Users\sahus\AppData\Local\Temp\ipykernel_4752\2227383628.py:4: LangChainDeprecationWarning: The class `OllamaEmbeddings` was deprecated in LangChain 0.3.1 and will be removed in 1.0.0. An updated version of the class exists in the :class:`~langchain-ollama package and should be used instead. To use it run `pip install -U :class:`~langchain-ollama` and import as `from :class:`~langchain_ollama import OllamaEmbeddings``.
#   embedding = OllamaEmbeddings()


# •	🧠 Uses Ollama (open-source) for embeddings
# •	🚫 No need for OpenAI key

# 🧱 Step 6: Create Chroma Vector DB
vector_db = Chroma.from_documents(splits, embedding)
vector_db
# <langchain_chroma.vectorstores.Chroma at 0x1b00c89dd00>

# •	🔁 Converts all document chunks into vectors
# •	🏠 Stores them inside Chroma Vector Store

# ❓ Step 7: Query the Vector DB
query = "What does the speaker believe is the main reason United States should enter the war?"
docs = vector_db.similarity_search(query)
print(docs[0].page_content)
# It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy,


# •	❓ Ask question based on speech.txt
# •	📦 Uses similarity search to find relevant chunks
# •	✅ Displays the best matching result

# 💾 Step 8: Save Chroma DB to Disk
vector_db = Chroma.from_documents(
    splits,
    embedding,
    persist_directory="chroma_db"
)
vector_db
# <langchain_chroma.vectorstores.Chroma at 0x1b00c8ac980>


# •	💽 persist_directory="chroma_db" saves the vector store to disk
# •	🧠 Internally uses SQLite to store the vectors
# •	✅ Can reuse without reprocessing every time

# 📂 Step 9: Load Chroma DB from Disk
db2 = Chroma(persist_directory="chroma_db", embedding_function=embedding)

docs = db2.similarity_search(query)
print(docs[0].page_content)
# It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy,


# •	🧠 Loads existing vector DB from chroma_db folder
# •	❓ Query it the same way using similarity_search

# 📉 Step 10: Similarity Search with Score
docs_with_score = db2.similarity_search_with_score(query)
docs_with_score
# [(Document(id='86871e88-3cd0-45ad-9f4b-ee294542c6bc', metadata={'source': 'speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy,'),
#   310.3403625488281),
#  (Document(id='51825906-465f-407f-90df-2f15401f308b', metadata={'source': 'speech.txt'}, page_content='Just because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.\n\n…'),
#   337.8399658203125),
#  (Document(id='e31ac907-00b2-4c8c-8d02-23563cba62aa', metadata={'source': 'speech.txt'}, page_content='reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.'),
#   353.76361083984375),
#  (Document(id='05029b26-b052-4b03-b994-a22e07ccf6c7', metadata={'source': 'speech.txt'}, page_content='To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.'),
#   354.73211669921875)]

# •	🧮 Returns both:
# o	Matching text chunk
# o	L2 Distance (Manhattan Distance)
# •	📉 Lower score = more relevant result

# 🔁 Step 11: Convert to Retriever
retriever = db2.as_retriever()
results = retriever.invoke(query)

for doc in results:
    print(doc.page_content)
# It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy,
# Just because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.

# …
# reestablishment of intimate relations of mutual advantage between us—however hard it may be for them, for the time being, to believe that this is spoken from our hearts.
# To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend her blood and her might for the principles that gave her birth and happiness and the peace which she has treasured. God helping her, she can do no other.



# •	🔄 Converts Chroma DB into a retriever
# •	🧩 Useful in LangChain pipelines with LLMs
# •	✅ Lets you plug directly into RAG systems




<langchain_chroma.vectorstores.Chroma at 0x1b00c89dd00>