#### Chroma
Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

https://python.langchain.com/v0.2/docs/integrations/vectorstores/

In [2]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [3]:
loader = TextLoader("./data/speech.txt")
data = loader.load()
data

[Document(metadata={'source': './data/speech.txt'}, page_content='Hello!\nMy name is Eniola, and I am excited to be your tutor for this program.\nI am a Data Scientist, Open-source Contributor, and Frontend Engineer with years of experience in the tech industry. Throughout my career, I’ve been privileged to build innovative products, solve real-world problems, and empower over 5,000 individuals to grow into better versions of themselves. My work and insights have also connected me with an audience of over 10,000 followers across social media platforms.\nLooking ahead, my vision is to create groundbreaking projects and support millions—if not billions—of people in achieving their goals and thriving in the tech industry.\nIn this program, I am here to guide you every step of the way, ensuring you have an unforgettable learning experience.\nLet’s embark on this transformative journey together!')]

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=10)
splits = text_splitter.split_documents(data)

In [5]:
splits

[Document(metadata={'source': './data/speech.txt'}, page_content='Hello!\nMy name is Eniola, and I am excited to be your tutor for this program.'),
 Document(metadata={'source': './data/speech.txt'}, page_content='I am a Data Scientist, Open-source Contributor, and Frontend Engineer with years of experience in the tech industry. Throughout my career, I’ve been privileged to build innovative products, solve'),
 Document(metadata={'source': './data/speech.txt'}, page_content='solve real-world problems, and empower over 5,000 individuals to grow into better versions of themselves. My work and insights have also connected me with an audience of over 10,000 followers across'),
 Document(metadata={'source': './data/speech.txt'}, page_content='across social media platforms.'),
 Document(metadata={'source': './data/speech.txt'}, page_content='Looking ahead, my vision is to create groundbreaking projects and support millions—if not billions—of people in achieving their goals and thriving in the

In [6]:
embeddings  = OllamaEmbeddings(model="gemma2:2b")
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings)
vectordb

  embeddings  = OllamaEmbeddings(model="gemma2:2b")


<langchain_chroma.vectorstores.Chroma at 0x1695cbdf0>

In [7]:
query = "WHat's the speaker future ambition?"
docs = vectordb.similarity_search(query)
docs[0].page_content

'Looking ahead, my vision is to create groundbreaking projects and support millions—if not billions—of people in achieving their goals and thriving in the tech industry.'

In [8]:
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings, persist_directory="./chroma_db")

In [9]:
# load the db
database = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
docs = database.similarity_search(query)
print(docs[0].page_content)

Looking ahead, my vision is to create groundbreaking projects and support millions—if not billions—of people in achieving their goals and thriving in the tech industry.


In [10]:
# similiarity search with score
docs = vectordb.similarity_search_with_score(query)
docs

[(Document(id='7714b987-028f-41e4-8885-2ee2420c2cac', metadata={'source': './data/speech.txt'}, page_content='Looking ahead, my vision is to create groundbreaking projects and support millions—if not billions—of people in achieving their goals and thriving in the tech industry.'),
  4657.192218740981),
 (Document(id='fc0c8c51-cf78-4d06-9e1e-79a5d01ed1b3', metadata={'source': './data/speech.txt'}, page_content='across social media platforms.'),
  4934.824689992004),
 (Document(id='514f151a-e62b-40ef-aa56-da5debeca6b3', metadata={'source': './data/speech.txt'}, page_content='In this program, I am here to guide you every step of the way, ensuring you have an unforgettable learning experience.\nLet’s embark on this transformative journey together!'),
  5340.430681062898),
 (Document(id='69721802-3025-4289-a741-9159282a6acb', metadata={'source': './data/speech.txt'}, page_content='Hello!\nMy name is Eniola, and I am excited to be your tutor for this program.'),
  5920.618651977435)]

In [11]:
# Retrieval
retriever = vectordb.as_retriever()
retriever_result = retriever.invoke(query)

In [12]:
retriever_result[0].page_content

'Looking ahead, my vision is to create groundbreaking projects and support millions—if not billions—of people in achieving their goals and thriving in the tech industry.'