# Installation of Packages

**LangChain**
- a framework for developing LLM powered applications. It provides developers streamlined the approach in integrating AI to their applications.

**RAG**
- (Retrieval-Augmented Generation) is a technique that grants generative artificial intelligence models information retrieval capabilities.

**LangSmith**
- used for observability and traceability in LangChain applications.




In [None]:
!pip install langchain
!pip install langchain-openai
!pip install langchain-pinecone
!pip install langchain-community
!pip install langchainhub
!pip install langsmith



# Loading Environment Variables

In [None]:
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["INDEX_NAME"] = userdata.get('INDEX_NAME')
os.environ["PINECONE_API_KEY"] = userdata.get('PINECONE_API_KEY')

In [None]:
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="langchain-assignment"
os.environ["LANGCHAIN_API_KEY"]=userdata.get("LANGSMITH_API_KEY")

# Ingesting Data for Storing Vectors in the Pinecone vector DB

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("/content/sample_data/mediumblog1.txt")
document = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(document)

In [None]:
from langchain_text_splitters import CharacterTextSplitter
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
PineconeVectorStore.from_documents(texts,embeddings, index_name=os.environ["INDEX_NAME"])

print("Vectors stored to the Pinecone vector database!")

Vectors stored to the Pinecone vector database!


# Creating a Query using LCEL

In [None]:
from langchain_openai import ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
llm = ChatOpenAI(model="gpt-4o-mini")
vectorstore = PineconeVectorStore(index_name=os.environ["INDEX_NAME"], embedding=embeddings)

In [None]:
template = """Use the following pieces of context to answer the question at the end.
If information about it is not available, say "I don't know".
Use three sentences maximum and keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}
Helpful Answer:
"""

In [None]:
def format_docs(docs):
  return "\n\n".join(doc.page_content for doc in docs)

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

custom_rag_prompt = PromptTemplate.from_template(template)
rag_chain = (
    {"context": vectorstore.as_retriever() | format_docs, "question": RunnablePassthrough()} | custom_rag_prompt | llm
)


In [None]:
query = "What is Pinecone?"
res = rag_chain.invoke(query)

print(res)

content='Pinecone is a vector database designed to handle high-dimensional data and facilitate efficient similarity search and retrieval. It is particularly useful for applications involving machine learning and AI, where understanding complex datasets is essential. Thanks for asking!' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 953, 'total_tokens': 998, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_72ed7ab54c', 'finish_reason': 'stop', 'logprobs': None} id='run-0e112e40-6a8e-4cde-9059-d9a2c1be332c-0' usage_metadata={'input_tokens': 953, 'output_tokens': 45, 'total_tokens': 998, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}


# Alternate Method - Using LangChain's Built-in Methods (chains)

In [None]:
from langchain import hub

retrieval_qa_chat_prompt = hub.pull('langchain-ai/retrieval-qa-chat')

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)
retrieval_chain = create_retrieval_chain(retriever=vectorstore.as_retriever(), combine_docs_chain=combine_docs_chain)

In [None]:
query = "What are Vector Databases?"
result = retrieval_chain.invoke(input={"input": query})
print(result)

{'input': 'What are Vector Databases?', 'context': [Document(id='b27061ee-4009-4e5b-80d0-c0a2844cdbd4', metadata={'source': '/content/sample_data/mediumblog1.txt'}, page_content='Understanding Vector Databases\nVector databases have gained significant importance in various fields due to their unique ability to efficiently store, index, and search high-dimensional data points, often referred to as vectors. These databases are designed to handle data where each entry is represented as a vector in a multi-dimensional space. The vectors can represent a wide range of information, such as numerical features, embeddings from text or images, and even complex data like molecular structures.\n\nLet’s represent the vector database using a 2D grid where one axis represents the color of the animal (brown, black, white) and the other axis represents the size (small, medium, large).\n\n\nIn this representation:'), Document(id='794c46c1-1576-4f1a-96b8-fb59507007aa', metadata={'source': '/content/sampl