# Creating a document 

LangChain implements a Document abstraction, which is intended to represent a unit of text and associated metadata. It has two attributes:

## page_content: a string representing the content;<br>
## metadata: a dict containing arbitrary metadata.<br>
The metadata attribute can capture information about the source of the document, its relationship to other documents, and other information. Note that an individual Document object often represents a chunk of a larger document.

In [1]:
from langchain_core.documents import Document 

documents = [
    Document(
        page_content="Dogs are great comapnions, Known for their loyalty and friendliness.",
        metadata = {"source":"mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are indepndent pets that often enjoy their own sapce.",
        metadata = {"source":"mammal-pets-doc"},
    ),
    Document(
        page_content = "Goldfish are popolar pets for biginners, requiring relatively simple care.",
        metadata = {"source":"fish-pets-doc"},
    ),
    Document(
        page_content = "parrots are intelligent birds capable of mimmicking human speech.",
        metadata = {"source":"bird-pets-doc"}
    ),
    Document(
        page_content = "Rabbits are social animal that need plent of space to hop around.",
        metadata={"source":"mammal-pets-doc"},
    ),
]

In [2]:
documents

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great comapnions, Known for their loyalty and friendliness.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are indepndent pets that often enjoy their own sapce.'),
 Document(metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are popolar pets for biginners, requiring relatively simple care.'),
 Document(metadata={'source': 'bird-pets-doc'}, page_content='parrots are intelligent birds capable of mimmicking human speech.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animal that need plent of space to hop around.')]

## vector store

In [3]:
import os 
from dotenv import load_dotenv
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
from langchain_chroma import Chroma
from langchain_groq import chat_models,ChatGroq



os.environ["HF_TOCKEN"] = os.getenv("HF_TOKEN")

In [4]:
llm=ChatGroq(groq_api_key=groq_api_key,mode="Llama3-8b-8192")
llm

                    mode was transferred to model_kwargs.
                    Please confirm that mode is what you intended.


ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x743d58d96d60>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x743d58dac190>, model_kwargs={'mode': 'Llama3-8b-8192'}, groq_api_key=SecretStr('**********'))

## for embedding using Hugging face

In [5]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings= HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

2025-03-10 15:49:11.602860: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741601951.616452   50841 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741601951.620308   50841 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-10 15:49:11.635311: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  return torch._C._cuda_getDeviceCount() > 0


In [6]:
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(documents,embedding=embeddings)
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x743ba8fe4730>

In [7]:
vectorstore.similarity_search("cat")

[Document(id='1ba031da-e1fe-4e3d-9918-78f8b5a5b329', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are indepndent pets that often enjoy their own sapce.'),
 Document(id='98b09684-d1bb-40a0-a54a-bf7eca43c476', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animal that need plent of space to hop around.'),
 Document(id='351e6dd5-5bcb-41ca-872a-2745bed04e62', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great comapnions, Known for their loyalty and friendliness.'),
 Document(id='6a264db2-ffc5-4217-abb9-57e5272b8470', metadata={'source': 'bird-pets-doc'}, page_content='parrots are intelligent birds capable of mimmicking human speech.')]

In [8]:
## async query
await vectorstore.asimilarity_search("cat")

[Document(id='1ba031da-e1fe-4e3d-9918-78f8b5a5b329', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are indepndent pets that often enjoy their own sapce.'),
 Document(id='98b09684-d1bb-40a0-a54a-bf7eca43c476', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animal that need plent of space to hop around.'),
 Document(id='351e6dd5-5bcb-41ca-872a-2745bed04e62', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great comapnions, Known for their loyalty and friendliness.'),
 Document(id='6a264db2-ffc5-4217-abb9-57e5272b8470', metadata={'source': 'bird-pets-doc'}, page_content='parrots are intelligent birds capable of mimmicking human speech.')]

In [9]:
vectorstore.similarity_search_with_score("cat")

[(Document(id='1ba031da-e1fe-4e3d-9918-78f8b5a5b329', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are indepndent pets that often enjoy their own sapce.'),
  0.9109034538269043),
 (Document(id='98b09684-d1bb-40a0-a54a-bf7eca43c476', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animal that need plent of space to hop around.'),
  1.5528877973556519),
 (Document(id='351e6dd5-5bcb-41ca-872a-2745bed04e62', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great comapnions, Known for their loyalty and friendliness.'),
  1.6033623218536377),
 (Document(id='6a264db2-ffc5-4217-abb9-57e5272b8470', metadata={'source': 'bird-pets-doc'}, page_content='parrots are intelligent birds capable of mimmicking human speech.'),
  1.6298046112060547)]

## reatrivers


---

LangChain VectorStore objects do not subclass Runnable, and so cannot immediately be integrated into LangChain Expression Language chains.

LangChain Retrievers are Runnables, so they implement a standard set of methods (e.g., synchronous and asynchronous invoke and batch operations) and are designed to be incorporated in LCEL chains.

We can create a simple version of this ourselves, without subclassing Retriever. If we choose what method we wish to use to retrieve documents, we can create a runnable easily. Below we will build one around the similarity_search method:


In [10]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

In [12]:
retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)
retriever.batch(["cat","dog"])

[[Document(id='1ba031da-e1fe-4e3d-9918-78f8b5a5b329', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are indepndent pets that often enjoy their own sapce.')],
 [Document(id='351e6dd5-5bcb-41ca-872a-2745bed04e62', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great comapnions, Known for their loyalty and friendliness.')]]

##

Vectorstores implement an as_retriever method that will generate a Retriever, specifically a VectorStoreRetriever. These retrievers include specific search_type and search_kwargs attributes that identify what methods of the underlying vector store to call, and how to parameterize them. For instance, we can replicate the above with the following:


In [21]:
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)

retriever.batch(["cat","dog"])

[[Document(id='1ba031da-e1fe-4e3d-9918-78f8b5a5b329', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are indepndent pets that often enjoy their own sapce.')],
 [Document(id='351e6dd5-5bcb-41ca-872a-2745bed04e62', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great comapnions, Known for their loyalty and friendliness.')]]

In [22]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this questions using the provided context only.

{question}

context:
{context} 
"""

prompt = ChatPromptTemplate.from_messages([("human",message)])

rag_chain = {"context":retriever,"question":RunnablePassthrough()}|prompt|llm

response=rag_chain.invoke("hey tell me about dogs")
print(response.content)


TypeError: create() got an unexpected keyword argument 'mode'