Reference

https://python.langchain.com/docs/use_cases/question_answering/quickstart

https://colab.research.google.com/github/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb#scrollTo=140580ef-5db0-43cc-a524-9c39e04d4df0

https://www.youtube.com/watch?v=cBpdiQ3gljM

https://github.com/dorianbrown/rank_bm25

Other:

https://python.langchain.com/docs/expression_language/cookbook/retrieval

https://python.langchain.com/docs/use_cases/question_answering/

In [1]:
# %pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai bs4

In [2]:
# %pip install chromadb 

In [3]:
import getpass
import os

# os.environ["OPENAI_API_KEY"] = getpass.getpass()

# import dotenv

# dotenv.load_dotenv()


In [5]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

In [6]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [9]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

In [21]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [22]:
rag_chain.invoke("What is Task Decomposition?")

'Task Decomposition is a process where complex tasks are broken down into smaller, simpler steps. This can be achieved by using techniques like Chain of Thought (CoT) and Tree of Thoughts, which transform big tasks into multiple manageable tasks. The process can be guided by simple prompts, task-specific instructions, or human inputs.'

In [15]:
# cleanup
vectorstore.delete_collection()