### Chroma 

chroma is a AI native open-source vector database. It is designed to handle large-scale vector data and provide efficient similarity search capabilities. Chroma is built to work seamlessly with machine learning models and can be easily integrated into AI applications.

In [1]:
## Building a sample vectordb
from langchain_chroma import Chroma


In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
loader = TextLoader('speech.txt', encoding='utf8')
data = loader.load()
data 

[Document(metadata={'source': 'speech.txt'}, page_content='“The Power of Curiosity”\n\nGood [morning/afternoon/evening] everyone,\n\nHave you ever wondered why we keep asking questions, even when answers seem just out of reach? Curiosity is not just a spark in our minds—it’s the engine of progress. Every great invention, every bold idea, every story of triumph started with a simple question: “What if…?”\n\nCuriosity pushes us beyond comfort, beyond routine, and into discovery. It encourages us to fail, to stumble, and then to rise stronger. It reminds us that knowledge is not a destination, but a journey.\n\nSo today, I urge each of you: embrace your curiosity. Ask questions that scare you. Explore paths no one has walked before. Because in the act of seeking, in the courage to wonder, we find our greatest potential.\n\nRemember, the world doesn’t change because we play it safe—it changes because we dared to ask, dared to learn, and dared to act.\n\nThank you.')]

In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=0
)

splits = text_splitter.split_documents(data)


In [8]:
embedding = OllamaEmbeddings(model="nomic-embed-text")

vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
vectordb

<langchain_chroma.vectorstores.Chroma at 0x2fab564e510>

In [9]:
query = "What did the president say about K-12 education?"
docs = vectordb.similarity_search(query, k=2)
docs

[Document(id='72b7ec8e-d8f6-4b12-9c8b-f2f7ead1b356', metadata={'source': 'speech.txt'}, page_content='Remember, the world doesn’t change because we play it safe—it changes because we dared to ask, dared to learn, and dared to act.\n\nThank you.'),
 Document(id='14b0deb8-3769-4530-a203-8f0794201b67', metadata={'source': 'speech.txt'}, page_content='Curiosity pushes us beyond comfort, beyond routine, and into discovery. It encourages us to fail, to stumble, and then to rise stronger. It reminds us that knowledge is not a destination, but a journey.\n\nSo today, I urge each of you: embrace your curiosity. Ask questions that scare you. Explore paths no one has walked before. Because in the act of seeking, in the courage to wonder, we find our greatest potential.')]

In [11]:
docs[0].page_content

'Remember, the world doesn’t change because we play it safe—it changes because we dared to ask, dared to learn, and dared to act.\n\nThank you.'

In [13]:
## save to the disk 

vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

In [19]:
docs = vectordb.similarity_search(query)


In [20]:
docs[0].page_content

'Remember, the world doesn’t change because we play it safe—it changes because we dared to ask, dared to learn, and dared to act.\n\nThank you.'

In [23]:
## retriever option
retriever = vectordb.as_retriever()
retriever.invoke(query)[0].page_content

'Remember, the world doesn’t change because we play it safe—it changes because we dared to ask, dared to learn, and dared to act.\n\nThank you.'