# Building a RAG pipeline

## Environemnt Variables

Create a `.env` file in your project directory and save the following.

```
PINECONE_API_KEY = "<your api key>"
OPENAI_API_KEY = "<your api key>"
LANGCHAIN_API_KEY = "<your api key>"
```

### Load Environment variable

`python-dotenv` package can be used as shown below to load the `.env` file we just created and then using `os` module we can set the environemnt variables.

To install: `pip install python-dotenv`

In [1]:
import os

from dotenv import load_dotenv
load_dotenv()

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY")
os.environ['LANGCHAIN_PROJECT'] = os.getenv("LANGCHAIN_PROJECT")

## Importing the required packages

In [2]:
import json

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings # To create embeddings
from langchain_openai import ChatOpenAI
from langchain_pinecone import PineconeVectorStore # To connect with the Vectorstore

### Defining globle variables

In [3]:
INDEX_NAME = 'earning-calls'
TOP_K = 6
QUARTER = "Q1"
FILENAME = "Adani Enterprises Ltd.pdf"
YEAR = "FY24"
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

## Loading Vectorstore

In [4]:
index = PineconeVectorStore(index_name=INDEX_NAME, embedding=embeddings) # loading the index
retriver = index.as_retriever(search_kwargs={"filter": {"quarter": QUARTER, "filename": FILENAME, "year": YEAR}, "k": TOP_K})

In [5]:
retriver.invoke("what is the capax?")

[Document(page_content='mining is, all our commercial mining activities, be it India or overseas will fall under the \ncommercial mining which fall under our natural resource s region of which my colleague Vinay \nis the CEO . \nModerator:  Thank you. The  next question is from the line of Gaurav Singhal from Aspex Management \nLimited . Please go ahead.  \nGaurav Singhal:  Two questions from me. So, one is, can you help give a break up of the $3.7 billion CAPEX plan \nfor this year  across the different segment s? Thank you.  \nRobbie Singh : Approximately about $300 million for Green Hydrogen, $1.1 billi on for airports, approximately \n$1.7 billion for the road network, just under $100 million  for water , just under $200 million for \nthe Data Center, and then small completion cost for the copper project just under $200 million.  \nGaurav Singhal:  And then secondly in terms of financing the CAPEX, the board had passed resolution to al low \nthe group  to raise about 12,500 crores 

## Creating a prompt template

In [6]:
chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directly reference the given context in your answer.\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines."),
        ("human", "Context information is below.\n---------------------\n{context}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query}\nAnswer: "),
    ]
)

In [7]:
print(chat_template.format(context="THIS IS A SAMPLE CONTEXT TO SEE HOW THE PROMPT LOOKS", query="THIS IS A SAMPLE QUERY?"))

System: You are an expert Q&A system that is trusted around the world.
Always answer the query using the provided context information, and not prior knowledge.
Some rules to follow:
1. Never directly reference the given context in your answer.
2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.
Human: Context information is below.
---------------------
THIS IS A SAMPLE CONTEXT TO SEE HOW THE PROMPT LOOKS
---------------------
Given the context information and not prior knowledge, answer the query.
Query: THIS IS A SAMPLE QUERY?
Answer: 


## Creating the RAG chain

In [8]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | chat_template
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriver, "query": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

In [9]:
response = rag_chain_with_source.invoke("What was the income?")
print(response['answer'])

The consolidated total income was at Rs. 25,810 crores.


## Prompt Versioning

Loading a Specific Version of a Prompt:

1. **Version Tracking in Repositories:**
   - Each push to a prompt repository saves a new version, identified by a unique commit hash.

2. **Loading the Latest Version:**
   - By default, accessing the repo will load the most recent version of a given prompt.

3. **Loading a Specific Version:**
   - To load a specific version, include its commit hash with the prompt name.
   - Example: For loading the "earning-call-rag" with version `6214c98a`, append this hash to the prompt name in your loading command.

In [10]:
from langchain import hub
prompt = hub.pull("lovepreet/earning-call-rag:6214c98a") # A prompt can be created either locally and pushed to hub or it can be created directly on the hub and then pulled.

In [11]:
print(prompt.format(context="THIS IS A SAMPLE CONTEXT TO SEE HOW THE PROMPT LOOKS", query="THIS IS A SAMPLE QUERY?"))

System: You are an expert Q&A system that is trusted around the world.
Always answer the query using the provided context information, and not prior knowledge.
Some rules to follow:
1. Never directly reference the given context in your answer.
2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.
Human: Context information is below.
---------------------
THIS IS A SAMPLE CONTEXT TO SEE HOW THE PROMPT LOOKS
---------------------
Given the context information and not prior knowledge, answer the query.
Query: THIS IS A SAMPLE QUERY?
Answer: 


In [12]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt # placing the newly loaded prompt here
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriver, "query": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

In [13]:
response = rag_chain_with_source.invoke("What was the income?")
print(response['answer'])

The consolidated total income was at Rs. 25,810 crores.


### How to Share Prompts on LangChain Hub:

1. **Getting Started:**
   - Getting prompts from LangChain Hub is easy, and so is sharing your own prompts.

   - This lets you easily share and manage your own prompts.

2. **Making a Prompt:**

   - First, create a prompt that fits what you need.

   - Make sure it follows the rules of LangChain Hub.

3. **Sharing Process:**

   - The sharing has two important parts:
     - **Account Handle:** Your special name in LangChain Hub, like `me-langchain-user`.
     - **Prompt Name:** A clear name for your prompt, showing what it does.

4. **How to Share with Code:**
   - This is a simple way to share:
     ```python
     from langchain import hub

     # Define your prompt
     my_prompt = "..."  # Your prompt content goes here

     # Share it on LangChain Hub
     hub.push(f"{account_handle}/{prompt_name}", my_prompt)
     ```
     - Replace `account_handle` with your username and `my_prompt` with your prompt's name.
     
     - Make sure `my_prompt` follows LangChain PromptTemplate.

5. **Using Your Shared Prompt:**
   - Once it's shared, you can use it in different apps through LangChain Hub.
   - This makes it easy to share with others, work together, and manage your prompts.

In [14]:
len(prompt.messages), prompt.messages

(2,
 [SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template="You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directly reference the given context in your answer.\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.")),
  HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'query'], template='Context information is below.\n---------------------\n{context}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query}\nAnswer: '))])

In [15]:
prompt.messages[1]

HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'query'], template='Context information is below.\n---------------------\n{context}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query}\nAnswer: '))

In [16]:
prompt.messages[1].prompt

PromptTemplate(input_variables=['context', 'query'], template='Context information is below.\n---------------------\n{context}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query}\nAnswer: ')

In [17]:
print(prompt.messages[1].prompt.template)

Context information is below.
---------------------
{context}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query}
Answer: 


In [18]:
prompt.messages[1].prompt.template = 'Context information is below.\n---------------------\n{context}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nUse Bullet poits whenever possible in the answer.\nQuery: {query}\nAnswer: '

In [19]:
print(prompt.messages[1].prompt.template)

Context information is below.
---------------------
{context}
---------------------
Given the context information and not prior knowledge, answer the query.
Use Bullet poits whenever possible in the answer.
Query: {query}
Answer: 


Added the following line in the prompt: `Use Bullet poits whenever possible in the answer.`

In [51]:
handle = os.getenv("LANGSMIT_USER_HANDLE")

prompt_url = hub.push(f"{handle}/earning-call-rag", prompt)

In [52]:
prompt_url

'https://smith.langchain.com/hub/lovepreet/earning-call-rag/359db5ba'