Chromadb is an open-source embedding database designed specifically for AI and LLM applications. 

In [2]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [3]:
loader= TextLoader("speech.txt")
data= loader.load()

In [4]:
data

[Document(metadata={'source': 'speech.txt'}, page_content="Generative AI (Gen AI) is a subset of artificial intelligence focused on creating new contentâ€”such as text, images, music, videos, and codeâ€”based on patterns learned from large datasets. Unlike traditional AI, which typically analyzes and interprets existing data to make predictions or decisions, generative AI models are trained to generate novel outputs that resemble the data they were trained on .\nThe Verge\n+2\nMcKinsey & Company\n+2\nExamples.com\n+2\nMIT News\n How Generative AI Works\nGenerative AI models utilize advanced machine learning techniques, particularly deep learning, to understand and replicate the underlying structures of their training data. These models learn to generate new content by identifying patterns and relationships within the data. Common architectures include:\nTechRadar\n\nGenerative Adversarial Networks (GANs): Consist of two neural networksâ€”a generator and a discriminatorâ€”that work in o

In [5]:
#split
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=0)
splits= text_splitter.split_documents(data)

In [6]:
embedding=OllamaEmbeddings()
vectordb=Chroma.from_documents(documents=splits,embedding=embedding)
vectordb

  embedding=OllamaEmbeddings()


<langchain_chroma.vectorstores.Chroma at 0x22c3b3b1520>

In [7]:
#query it
query="what is Gen aI"
docs= vectordb.similarity_search(query)
docs[0].page_content

'Generative AI models utilize advanced machine learning techniques, particularly deep learning, to understand and replicate the underlying structures of their training data. These models learn to generate new content by identifying patterns and relationships within the data. Common architectures include:\nTechRadar'

In [8]:
#saving to the disk
vectordb=Chroma.from_documents(documents=splits,embedding=embedding,persist_directory="./chroma_db")

In [9]:
#load the disk
db2=Chroma(persist_directory="./chroma_db",embedding_function=embedding)
docs=db2.similarity_search(query)
print(docs[0].page_content)

Generative AI models utilize advanced machine learning techniques, particularly deep learning, to understand and replicate the underlying structures of their training data. These models learn to generate new content by identifying patterns and relationships within the data. Common architectures include:
TechRadar


In [11]:
#to get similarity score using of manhatten distance simlarity score will provided
docs=vectordb.similarity_search_with_score(query)
docs

[(Document(id='ba39615a-2a25-45f9-9dac-4c78af6267b1', metadata={'source': 'speech.txt'}, page_content='Generative AI models utilize advanced machine learning techniques, particularly deep learning, to understand and replicate the underlying structures of their training data. These models learn to generate new content by identifying patterns and relationships within the data. Common architectures include:\nTechRadar'),
  20159.91796875),
 (Document(id='bd869a39-815f-42d1-9132-723911f2ea6d', metadata={'source': 'speech.txt'}, page_content='Quality Control: Ensuring the accuracy and reliability of generated content is crucial, especially in critical fields like healthcare.'),
  24083.6484375),
 (Document(id='5aab0695-8146-411d-89db-3d35dfa7deec', metadata={'source': 'speech.txt'}, page_content='Generative AI (Gen AI) is a subset of artificial intelligence focused on creating new contentâ€”such as text, images, music, videos, and codeâ€”based on patterns learned from large datasets. Unlike

In [12]:
#retrievel option
retriever=vectordb.as_retriever()
retriever.invoke(query)[0].page_content

'Generative AI models utilize advanced machine learning techniques, particularly deep learning, to understand and replicate the underlying structures of their training data. These models learn to generate new content by identifying patterns and relationships within the data. Common architectures include:\nTechRadar'