## Langchain x Qdrant: RAG Demo with Web Scraping

This notebook demonstrates how to use Langchain and Qdrant to build a RAG model with web scraping. The RAG model is a retrieval-augmented model that uses a retriever to find relevant documents and a generator to generate answers. The retriever is built using Qdrant, an open-source vector search engine, and the generator is GPT-3.5, a language model developed by OpenAI.

## Setting Up

You can install the required libraries using the following command - get Poetry from [here](https://python-poetry.org/docs/):

```bash
poetry install
```

In [None]:
import getpass
import os

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Qdrant
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [None]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

## Download and Index

We need to first load the blog post contents. We use urllib and BeautifulSoup to load and parse the HTML content. We then index the blog post contents using Qdrant.

In [None]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()


### Chunking before Indexing

Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.

To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Qdrant.from_documents(
    documents=splits, embedding=OpenAIEmbeddings(), location=":memory:", collection_name="lilianweng"
)

We Langchain's LCEL Runnable protocol to define the chain: context, format_docs, prompt, output parsing. We then run the chain to get the answer. 

We download a pre-written prompt from the Langchain hub and run it with the blog post content. The prompt is designed to extract the most relevant information from the blog post.

In [None]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain.invoke("What is Task Decomposition?")