# Advanced RAG with Langchain and Rerankers

In this notebook we will enhance the previous "naive RAG" using a **re-ranking model**.

## Environment setup
Before executing the following cells, make sure to set the following environment variables in the `.env` file or export them:
* `AZURE_OPENAI_KEY`
* `AZURE_OPENAI_ENDPOINT`
* `MODEL_DEPLOYMENT_NAME`
* `EMBEDDING_DEPLOYMENT_NAME`
* `COHERE_API_KEY` (create one [here](https://dashboard.cohere.com/api-keys))

<br/>
<img src="../assets/keys_endpoint.png" width="800"/>

In [2]:
!pip install -q langchain==0.1.0 cohere python-dotenv langchain_openai


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file

In [4]:
import os

from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter

## Reranking
A Reranker is a supervised model trained using a dataset of pairs (query, tag) created via self-supervision such that, given a query and document pair, will output a similarity score. We use this score to reorder the documents by relevance to our query, thus creating a two-stage retrieval system. A first-stage model (an embedding model/retriever) retrieves a set of relevant documents from a larger dataset. Then, a second-stage model (the reranker) is used to rerank those documents retrieved by the first-stage model.

<br/>
<img src="../assets/reranking.png" width="800"/>


In [5]:
openai_api_version = "2023-08-01-preview"

embedder = AzureOpenAIEmbeddings(
    deployment=os.getenv('EMBEDDING_DEPLOYMENT_NAME'),
    openai_api_version=openai_api_version,
)

For this notebook, we've scraped langchain's blogs (see `data/`).

In [6]:
text_loader_kwargs={'autodetect_encoding': True}

loader = DirectoryLoader(
    '../data/langchain_blog_posts/',
    glob="**/*.txt",
    loader_cls=TextLoader,
    loader_kwargs=text_loader_kwargs,
    #silent_errors=True
)

documents = loader.load()

#splitting the text into
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
)
texts = text_splitter.split_documents(documents)

len(texts)

429

In [7]:
persist_directory = '../data/embeddings/'

if not os.path.exists(persist_directory):
    vectordb = Chroma.from_documents(
        documents=texts,
        embedding=embedder,
        persist_directory=persist_directory
    )
    vectordb.persist()
else:
    # load from disk
    vectordb = Chroma(
        embedding_function=embedder,
        persist_directory=persist_directory
    )

In [8]:
query = "What is Langsmith?"

Using the approach described in the previous notebook, we retrieve

In [9]:
retriever = vectordb.as_retriever(search_kwargs={"k": 25})
docs = retriever.get_relevant_documents(query)

len(docs)

25

In [11]:
from langchain.retrievers.document_compressors import CohereRerank

cohere_client = CohereRerank(
    model="rerank-english-v2.0",
    top_n=5,
) # api key read from .env

In [12]:
def compare(query: str):
    docs = retriever.get_relevant_documents(query)
    docs_content = [doc.page_content for doc in docs]
    id2doc = {idx : doc for idx, doc in enumerate(docs_content)}
    doc2id = {doc : idx for idx, doc in enumerate(docs_content)}

    rerank_docs = cohere_client.compress_documents(
        query=query,
        documents=docs
    )
    original_docs = []
    reranked_docs = []
    # compare order change
    for i, doc in enumerate(rerank_docs):
        rerank_i = doc2id[doc.page_content]
        print(f"{i}\t->\t{rerank_i}")
        if i != rerank_i:
            reranked_docs.append(f"[{rerank_i}]\n{doc.page_content}")
            original_docs.append(f"[{i}]\n{id2doc[i]}")

    for orig, rerank in zip(original_docs, reranked_docs):
        print(f"ORIGINAL:\n{orig}\n\nRE-RANKED:\n{rerank}\n\n---\n")

In [13]:
query = "What is Langsmith?"
compare(query)

0	->	0
1	->	1
2	->	14
3	->	2
4	->	22
ORIGINAL:
[2]
"LangChain's (the company's) goal is to make it as easy as possible to develop LLM applications"

said Harrison Chase, co-founder and CEO of LangChain.

"To that end, we realized pretty early that what was needed - and missing - wasn't just an open source tool like LangChain, but also a complementary platform for managing these new types of applications. To that end, we built LangSmith - which is usable with or without LangChain and let's users easily debug, monitor, test, evaluate, and now (with the recently launched Hub) share and collaborate on their LLM applications.”



What Are LangSmith Traces?

RE-RANKED:
[14]
As a very simple example, we considered it to be table stakes for LangSmith to help users easily create datasets from existing logs and use them immediately for testing and evaluation, seamlessly connecting the logging/debugging workflows to the testing/evaluation ones.



Fintual, a Latin American startup with big dreams

In [14]:
compare("Explain how to connect langchain to mysql")

0	->	2
1	->	11
2	->	1
3	->	0
4	->	12
ORIGINAL:
[0]
The LangChain library has multiple SQL chains and even an SQL agent aimed at making interacting with data stored in SQL as easy as possible. Here are some relevant links:

Introduction

Most of an enterprise’s data is traditionally stored in SQL databases. With the amount of valuable data stored there, business intelligence (BI) tools that make it easy to query and understand the data present there have risen in popularity. But what if you could just interact with a SQL database in natural language? With LLMs today, that is possible. LLMs have an understanding of SQL and are able to write it pretty well. However, there are several issues that make this a non-trivial task.

The Problems

So LLMs can write SQL - what more is needed?

Unfortunately, a few things.

RE-RANKED:
[2]
URL: https://blog.langchain.dev/llms-and-sql/
Title: LLMs and SQL

Francisco Ingham and Jon Luo are two of the community members leading the change on the SQL int

## Question-answering using Reranking
A comparison between the naive approach against re-ranking.

In [15]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.prompts import PromptTemplate

llm = AzureChatOpenAI(
    deployment_name=os.getenv('MODEL_DEPLOYMENT_NAME'),
    openai_api_version=openai_api_version,
    temperature=0.,
    max_tokens=1024
)

retriever = vectordb.as_retriever(search_kwargs={"k": 3})

template = """Use the following pieces of context to answer the question at the end.
Use three sentences maximum and keep the answer as concise as possible.
Don't try to make up the answer, only use the context to answer the question.
The pieces of context refer to langchain.

Context:
{context}

Question: {question}
Helpful Answer:"""

prompt = PromptTemplate.from_template(template)

qa_chain_base = (
    RunnableParallel(
        {"context": retriever, "question": RunnablePassthrough()}
    )
    | prompt
    | llm
    | StrOutputParser()
)

question = "Explain how to connect langchain to sql. Show me the code to do that"
print(qa_chain_base.invoke(question))

The LangChain library provides tools to interact with SQL databases, including SQL chains and an SQL agent. However, the specific code to connect LangChain to SQL is not mentioned in the given context.


In [16]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank

retriever = vectordb.as_retriever(search_kwargs={"k": 30})
compressor = CohereRerank(top_n=3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

qa_chain_rerank = (
    RunnableParallel(
        {"context": compression_retriever, "question": RunnablePassthrough()}
    )
    | prompt
    | llm
    | StrOutputParser()
)

print(qa_chain_rerank.invoke(question))

To connect LangChain to SQL, you can use the following code:

```
from langchain.agents import AgentExecutor, create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.llms.openai import OpenAI
from langchain.sql_database import SQLDatabase

def create_agent(db_uri, agent_type=AgentType.OPENAI_FUNCTIONS, verbose=VERBOSE_LANGCHAIN, temperature=0, model="gpt-3.5-turbo-0613"):
    db = SQLDatabase.from_uri(db_uri)
    toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=temperature))
    return create_sql_agent(
        llm=ChatOpenAI(temperature=temperature, model=model),
        toolkit=toolkit,
        verbose=verbose,
        agent_type=agent_type,
    )
```

This code initializes the LangChain Agent and connects it to your SQL database. You need to provide the `db_uri` parameter with the URI of your SQL database.


## Extra #1: Open-source Rerankers

Cohere's reranker is a propietary model, for which you need an access token and a paid subscription for production application. However, there are open-source alternatives that you can use for free.


In [17]:
!pip install -q flashrank


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [23]:
from langchain_core.documents import Document
from typing import List
from flashrank import Ranker, RerankRequest
from functools import lru_cache

@lru_cache
def get_ranker(model_name: str = "ms-marco-MultiBERT-L-12"):
    return Ranker(model_name=model_name)

def rerank_with_multiBERT(
        query: str,
        docs: List[Document],
        top_n: int = 5,
):
    reranker = get_ranker()
    passages = [{"id": i, "text": doc.page_content} for i, doc in enumerate(docs)]

    rerank_request = RerankRequest(
        query=query,
        passages=passages
    )
    rerank_response = reranker.rerank(rerank_request)
    return rerank_response[:top_n]


query = "What is Langsmith?"
docs = retriever.get_relevant_documents(query)

rerank_docs = rerank_with_multiBERT(query, docs)

In [24]:
rerank_docs

[{'id': 27,
  'text': "What Are LangSmith Traces?\n\nTraces in the world of LangSmith are analogous to logs when programming; they allow us to easily see what text came in and out of chains and LLMs. Think of them as detailed breadcrumbs illuminating the AI's journey. Each trace, like a footprint on a sandy beach, represents a pivotal AI decision. Traces don't merely depict the path taken; they shed light on the underlying thought process and actions taken at each juncture.\n\nHere’s what one of our traces looks like inside LangSmith:\n\nAll the individual traces are consolidated into datasets:\n\n\n\nDo You Really Need To Use LangSmith?\n\nWhen generative AI works, it feels like watching a viral “satisfying video” montage - so delightful. But when it doesn’t, it sucks, and sometimes it sucks real bad.",
  'score': 0.9682064},
 {'id': 0,
  'text': 'Since the launch of HelpHub, we were trying to do things on hard mode when it came to iterating and improving functionality. That is, of co

## Extra #2: Rerank with a LLM
We can also use a LLM to rerank the documents, asking it to predict the relevance of the document to the question. This approach is likely to be less effective than using a model designed for this task, but it is interesting to see how it works.

In [21]:
from langchain.output_parsers.openai_functions import PydanticOutputFunctionsParser
from langchain_core.utils.function_calling import convert_pydantic_to_openai_function
from pydantic import Field
from pydantic import BaseModel

map_prompt_str = """

Context of the task : Technical documents related to langchain.
Copy the parts of the following document that are related to the statement: "{question}".

==== Document : ====
{context}
==== End of Document ====
"""

def get_rerank_chain(
        llm,
        doc_acceptance_threshold: float = 0.5,
        max_rerank_docs: int = 4
):
    map_prompt = PromptTemplate(template=map_prompt_str, input_variables=["context", "question"])

    class AnalysisAndContextEvaluation(BaseModel):
        """Return the answer to the question and a relevance score."""

        analysis: str = Field(description="How can the context help to answer the question?")
        evaluation: str = Field(
            description="One word, chosen among these adjectives: perfect, high, good, ok, irrelevant."
        )
        score: float = Field(
            description="A 0.0-1.0 relevance score, where 1.0 indicates the provided context answers the question completely and 0.0 indicates the provided context does not answer the question at all."
        )

    function = convert_pydantic_to_openai_function(AnalysisAndContextEvaluation)
    chain_rerank_doc = (
        map_prompt
        | llm.bind(
            temperature=0, functions=[function], function_call={"name": "AnalysisAndContextEvaluation"}
        )
        | PydanticOutputFunctionsParser(pydantic_schema=AnalysisAndContextEvaluation)
    ).with_config(run_name="Rerank")

    """
    here we define the mapping chain:
    chain_map_docs (takes into input a question 'question', and the documents list: 'documents')
    """

    def chain_get_context_and_questions(__input):
        question = __input["question"]
        return [{"question": question, "context": d.page_content} for d in __input["documents"]]

    chain_docs_and_ranks = RunnableParallel(
        documents=lambda x: x["documents"],
        ranks=chain_get_context_and_questions | (chain_rerank_doc | (lambda x: x.score)).map(),
    )

    def keep_best_docs(__input):
        float_ranks = __input["ranks"]
        docs_sorted = [
            x[0]
            for x in sorted(zip(__input["documents"], float_ranks), key=lambda x: x[1], reverse=True)
            if x[1] >= doc_acceptance_threshold
        ]
        return docs_sorted[0:max_rerank_docs]

    # the mapper chain
    chain_rerank_docs = (chain_docs_and_ranks | keep_best_docs).with_config(
        run_name="chain_map_docs: Rerank docs"
    )

    return chain_rerank_docs


rerank_chain_llm = get_rerank_chain(llm, max_rerank_docs=5)
qa_chain_llm = (
        RunnableParallel(question=RunnablePassthrough(), documents=retriever)
        | rerank_chain_llm
)

In [None]:
ranked_docs_llm = qa_chain_llm.invoke(question)