<a href="https://colab.research.google.com/github/MikeConDH/contoso-chat/blob/main/Custom_RAG_Pipeline_with_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Simple Extensions to RAG Application

In the following notebook, we'll leverage [Cohere's](https://txt.cohere.com/) [impressive](https://txt.cohere.com/introducing-embed-v3/) [work](https://txt.cohere.com/using-llms-for-search/) to augment our simple RAG application by adding:

- their v3 embeddings model
- a reranking system

We'll also extend our simple examples to work over the entire blog corpus of [Coding Temple](https://www.codingtemple.com/blog/coding-in-public-help-battle-imposter-syndrome-and-inspire-others/).

## Dependencies

We'll grab some dependencies for our RAG application, then our web scraping system!

In [1]:
!pip install langchain openai cohere tiktoken -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.8/812.8 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m267.1/267.1 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.3/144.3 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m276.8/276.8 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.5/87.5 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━

You'll need to provide an OpenAI as well as a Cohere API!

The Cohere trial API key will be more than enough for this notebook!

In [2]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Open AI API Key: ")

Open AI API Key: ··········


In [3]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key: ")

Cohere API Key: ··········


In [4]:
!pip install nest_asyncio selenium unstructured -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.5/10.5 MB[0m [31m35.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m74.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m467.2/467.2 kB[0m [31m39.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m433.8/433.8 kB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m274.7/274.7 kB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m54.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m55.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for langdetect (setup.py) ... [?25l[?25hdone


### Boilerplate

We need to use `nest_asyncio` to avoid any issues with Jupyter and async methods.

In [5]:
import nest_asyncio

nest_asyncio.apply()

## Web Scraping

Here we will scrape the Coding Temple blogs to get all their current blogs!

In [6]:
from langchain.document_loaders import UnstructuredURLLoader, SeleniumURLLoader

loaders = SeleniumURLLoader(urls=["https://www.codingtemple.com/post-sitemap.xml"])
data = loaders.load()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [7]:
import re

documents = []

urls = [url for url in re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', data[0].page_content)]

In [8]:
loaders = SeleniumURLLoader(urls=urls[1:])
documents = loaders.load()

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 750,
    chunk_overlap  = 50,
    length_function = len,
    is_separator_regex = False,
)

In [10]:
split_documents = text_splitter.split_documents(documents)

## RAG Pipeline

Now that we have our documents, and we've split them - let's create our `VectorStore`!

In [11]:
!pip install faiss-cpu -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25h

### Cohere Embeddings

We'll use Cohere's embeddings which are shown to be the most performant on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)!

In [12]:
from langchain.embeddings import CohereEmbeddings

embeddings = CohereEmbeddings(cohere_api_key=os.environ.get("COHERE_API_KEY"))

  warn_deprecated(


In [13]:
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_documents(
    split_documents, embedding=embeddings
)

### Reranking Pipeline

We're going to fetch a lot of initial documents for reranking.

The basic idea is to cast a wide net - and then rerank the subset by relevance and only keep the `top_k` documents (default of 3) to use as context to augment the prompt.

In [14]:
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

In [15]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank

compressor = CohereRerank(user_agent="coding_temple_demo", cohere_api_key=os.environ.get("COHERE_API_KEY"))
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=base_retriever
)

  warn_deprecated(


### Setting Up RAG

Now we can build some helper functions to create our simple RAG chain!

We'll be using the [LCEL](https://python.langchain.com/docs/expression_language/) to achieve this goal.

In [16]:
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

In [17]:
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)

In [18]:
from typing import Tuple, List

def _format_chat_history(chat_history: List[Tuple]) -> str:
    buffer = ""
    for dialogue_turn in chat_history:
        human = "Human: " + dialogue_turn[0]
        ai = "Assistant: " + dialogue_turn[1]
        buffer += "\n" + "\n".join([human, ai])
    return buffer

In [19]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    return_messages=True, output_key="answer", input_key="question"
)

In [20]:
from langchain.schema import format_document

DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

In [21]:
from operator import itemgetter
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.schema.output_parser import StrOutputParser
from langchain.chat_models import ChatOpenAI


loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)


standalone_question = {
    "standalone_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: _format_chat_history(x["chat_history"]),
    }
    | CONDENSE_QUESTION_PROMPT
    | ChatOpenAI(temperature=0)
    | StrOutputParser(),
}


retrieved_documents = {
    "docs": itemgetter("standalone_question") | compression_retriever,
    "question": lambda x: x["standalone_question"],
}


final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question"),
}


answer = {
    "answer": final_inputs | ANSWER_PROMPT | ChatOpenAI(),
    "docs": itemgetter("docs"),
}


final_chain = loaded_memory | standalone_question | retrieved_documents | answer

  warn_deprecated(


In [22]:
inputs = {"question" : "What are some ways to avoid imposter syndrome?"}

In [23]:
result = final_chain.invoke(inputs)

In [24]:
result

{'answer': AIMessage(content='Some strategies to overcome imposter syndrome include acknowledging your achievements, seeking support from communities like DEV Community and Stack Overflow, embracing continuous learning, and showcasing your programming skills publicly.', response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 369, 'total_tokens': 403}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-05841ed7-e866-4594-816e-2b4c5607561a-0'),
 'docs': [Document(page_content='Overcoming Imposter Syndrome\n\nEncourage someone experiencing imposter syndrome by validating their feelings while emphasizing their achievements. Offer constructive feedback and remind them of past successes. Encourage them to join programming communities like DEV Community for support, networking opportunities, and skill development.\n\nThe secret to overcoming imposter syndrome lies in self-awareness, acceptance of imper

In [25]:
inputs = {"question" : "What does an AI business leader need to know about building internal training programs?"}

In [26]:
result = final_chain.invoke(inputs)

In [27]:
result

{'answer': AIMessage(content='An AI business leader needs to know that building internal training programs is essential for ensuring long-term success and competitiveness in the age of AI. They should prioritize investing in reskilling their workforce and embracing the opportunities and challenges that generative AI brings. Additionally, utilizing generative AI to create personalized training programs tailored to the specific needs and skill sets of each worker can help improve performance and job satisfaction. It is important to assess current employee skills, predict future skill requirements, and design and manage programs to bridge the skills gap effectively.', response_metadata={'token_usage': {'completion_tokens': 105, 'prompt_tokens': 439, 'total_tokens': 544}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-904b1203-f00c-4fe2-9b05-f033a1dc1ab0-0'),
 'docs': [Document(page_content='Some insightful resource

In [28]:
memory.save_context(inputs, {"answer": result["answer"].content})

In [29]:
memory.load_memory_variables({})

{'history': [HumanMessage(content='What does an AI business leader need to know about building internal training programs?'),
  AIMessage(content='An AI business leader needs to know that building internal training programs is essential for ensuring long-term success and competitiveness in the age of AI. They should prioritize investing in reskilling their workforce and embracing the opportunities and challenges that generative AI brings. Additionally, utilizing generative AI to create personalized training programs tailored to the specific needs and skill sets of each worker can help improve performance and job satisfaction. It is important to assess current employee skills, predict future skill requirements, and design and manage programs to bridge the skills gap effectively.')]}