# RAG with LCEL
* How to do RAG in a very fast and condensed way.

## Setup

#### After you download the code from the github repository in your computer
In terminal:
* cd project_name
* pyenv local 3.11.4
* poetry install
* poetry shell

#### To open the notebook with Jupyter Notebooks
In terminal:
* jupyter lab

Go to the folder of notebooks and open the right notebook.

#### To see the code in Virtual Studio Code or your editor of choice.
* open Virtual Studio Code or your editor of choice.
* open the project-folder
* open the 007-rag-with-lcel.py file

## Create your .env file
* In the github repo we have included a file named .env.example
* Rename that file to .env file and here is where you will add your confidential api keys. Remember to include:
* OPENAI_API_KEY=your_openai_api_key
* LANGCHAIN_TRACING_V2=true
* LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
* LANGCHAIN_API_KEY=your_langchain_api_key
* LANGCHAIN_PROJECT=your_project_name

We will call our LangSmith project **007-rag-with-lcel**.

## Track operations
From now on, we can track the operations **and the cost** of this project from LangSmith:
* [smith.langchain.com](https://smith.langchain.com)

## Connect with the .env file located in the same directory of this notebook

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#pip install python-dotenv

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

#### Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [3]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [4]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

## Vector databases (aka vector stores): store and search embeddings
* See the documentation page [here](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/).
* See the list of vector stores [here](https://python.langchain.com/v0.1/docs/integrations/vectorstores/).

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#!pip install langchain-chroma

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
loaded_document = TextLoader('./data/state_of_the_union.txt').load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks_of_text = text_splitter.split_documents(loaded_document)

vector_db = Chroma.from_documents(chunks_of_text, OpenAIEmbeddings())

In [4]:
question = "What did the president say about the John Lewis Voting Rights Act?"

response = vector_db.similarity_search(question)

print(response[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


## Retriever: returns a response given a question
* A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store.
* A retriever does not need to be able to store documents, only to return (or retrieve) them.
* Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.
* See the documentation page [here](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/).
* See the list of third-party retrievers [here](https://python.langchain.com/v0.1/docs/integrations/retrievers/).

#### Vector store as retriever

In [2]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("./data/state_of_the_union.txt")

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#!pip install faiss-cpu

In [3]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loaded_document = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

chunks_of_text = text_splitter.split_documents(loaded_document)

embeddings = OpenAIEmbeddings()

vector_db = FAISS.from_documents(chunks_of_text, embeddings)

In [4]:
retriever = vector_db.as_retriever()

#### Simple use without LCEL

In [93]:
response = retriever.invoke("what did he say about ketanji brown jackson?")

In [94]:
len(response)

4

In [95]:
response[0]

Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': './data/state_of_the_union.txt'})

#### Specifying top k

In [6]:
retriever = vector_db.as_retriever(search_kwargs={"k": 3})

In [7]:
response = retriever.invoke("what did he say about ketanji brown jackson?")

In [18]:
len(response)

3

In [8]:
response

[Document(id='12b5ba4c-0b46-40e9-b0bc-9f8f8ba830d6', metadata={'source': './data/state_of_the_union.txt'}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.'),
 Document(id='1235ae17-872f-44a6-be85-6b00be9bfbb6', metadata={'source': './data/s

#### Simple use with LCEL and input and output formatters

In [9]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI(model="gpt-4o-mini")

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

response = chain.invoke("what did he say about ketanji brown jackson?")

In [10]:
print(question)

NameError: name 'question' is not defined

Here’s what each part of the chain does, in simple terms:

1. **Retrieving Relevant Documents**: First, the system retrieves documents that are relevant to the question you ask. It uses a retriever which is designed to find the most relevant information based on the query.

2. **Formatting Documents**: Once the relevant documents are retrieved, the next step is to format them into a readable format. The function `format_docs` does this by taking the content of each document and separating them with two newline spaces. This creates a clear and organized chunk of text that serves as the context for answering the question.

3. **Preparing the Prompt**: With the context formatted, the system uses a template to prepare the prompt for the AI model. The template structures the input by placing the formatted context first and then the question. This way, the AI model knows exactly what the background information is and what it needs to answer.

4. **Generating the Answer**: The structured prompt is then fed into an AI model, in this case, ChatOpenAI. The AI reads the combined context and question and generates an answer based on what it knows from the provided text.

5. **Extracting the Final Answer**: Finally, the response from the AI is parsed into a straightforward text format using `StrOutputParser`, making it easy to read and understand. This step ensures that what you get as the output is just the answer, cleaned up from any formatting or raw data that the AI might output.

In essence, this code automates the process of fetching relevant information, preparing it, and asking an AI to provide a clear answer based on that information. It's like setting up a mini question-answering system where the AI has all the necessary context to provide accurate responses.

In [11]:
response

'He said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson four days ago, describing her as one of the nation’s top legal minds who will continue Justice Stephen Breyer’s legacy of excellence. He highlighted her experience as a former top litigator in private practice, a former federal public defender, and noted her background from a family of public school educators and police officers. He also praised her as a consensus builder who has received broad support from various organizations, including the Fraternal Order of Police and former judges from both parties.'

## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file 001-data-load.py
* In terminal, make sure you are in the directory of the file and run:
    * python 007-rag-with-lcel.py