#### Chroma
Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

https://python.langchain.com/v0.2/docs/integrations/vectorstores/

In [1]:
## building a sample vectordb
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [12]:
loader = TextLoader("speech.txt")
data = loader.load()
data

[Document(metadata={'source': 'speech.txt'}, page_content='What Is Amazon SageMaker?\nAmazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data\nscientists and developers can quickly and easily build and train machine learning models, and then\ndirectly deploy them into a production-ready hosted environment. It provides an integrated Jupyter\nauthoring notebook instance for easy access to your data sources for exploration and analysis, so you\ndon\'t have to manage servers. It also provides common machine learning algorithms that are optimized\nto run efficiently against extremely large data in a distributed environment. With native support for\nbring-your-own-algorithms and frameworks, Amazon SageMaker offers flexible distributed training\noptions that adjust to your specific workflows. Deploy a model into a secure and scalable environment by\nlaunching it with a single click from the Amazon SageMaker console. Training and hosting are billed by\nminutes

In [15]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=70)
splits = text_splitter.split_documents(data)

In [16]:
len(splits)

18

In [19]:
# embedding=OllamaEmbeddings()
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings
embedding = HuggingFaceEndpointEmbeddings()
vectordb=Chroma.from_documents(documents=splits,embedding=embedding)
vectordb

<langchain_chroma.vectorstores.Chroma at 0x19ba9bbf190>

In [20]:
## query it
query = "What is AMazon Sagemanker"
docs = vectordb.similarity_search(query)
docs[0].page_content

'What Is Amazon SageMaker?\nAmazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data\nscientists and developers can quickly and easily build and train machine learning models, and then\ndirectly deploy them into a production-ready hosted environment. It provides an integrated Jupyter\nauthoring notebook instance for easy access to your data sources for exploration and analysis, so you'

In [21]:
## Saving to the disk
vectordb=Chroma.from_documents(documents=splits,embedding=embedding,persist_directory="./chroma_db")


In [25]:
# load from disk
db2 = Chroma(persist_directory="./chroma_db", embedding_function=embedding)
docs=db2.similarity_search("explain me the working of machine learning amzon sagemaker")
print(docs[0].page_content)

How Amazon SageMaker Works
Amazon SageMaker is a fully managed service that enables you to quickly and easily integrate machine
learning-based models into your applications. This section provides an overview of machine learning
and explains how Amazon SageMaker works. If you are a first-time user of Amazon SageMaker, we
recommend that you read the following sections in order:
Topics
• Machine Learning with Amazon SageMaker (p. 2)
• Explore and Preprocess Data (p. 4)


In [27]:
## similarity Search With Score
docs = vectordb.similarity_search_with_score(query)
docs

[(Document(metadata={'source': 'speech.txt'}, page_content='What Is Amazon SageMaker?\nAmazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data\nscientists and developers can quickly and easily build and train machine learning models, and then\ndirectly deploy them into a production-ready hosted environment. It provides an integrated Jupyter\nauthoring notebook instance for easy access to your data sources for exploration and analysis, so you'),
  0.5895249186473444),
 (Document(metadata={'source': 'speech.txt'}, page_content="don't have to manage servers. It also provides common machine learning algorithms that are optimized\nto run efficiently against extremely large data in a distributed environment. With native support for\nbring-your-own-algorithms and frameworks, Amazon SageMaker offers flexible distributed training\noptions that adjust to your specific workflows. Deploy a model into a secure and scalable environment by"),
  0.885385487295426),
 (

In [28]:
### Retriever option
retriever=vectordb.as_retriever()
retriever.invoke(query)[0].page_content

'What Is Amazon SageMaker?\nAmazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data\nscientists and developers can quickly and easily build and train machine learning models, and then\ndirectly deploy them into a production-ready hosted environment. It provides an integrated Jupyter\nauthoring notebook instance for easy access to your data sources for exploration and analysis, so you'