## User Guide on LangChain custom Retriever and LLMChain
:: This guide introduces way to use the IRRetriever and IRChain.

- Its purpose is on integrating the IR and LangChain interface.
- It wraps the IR model with LangChain BaseRetriever. (see retreivers.py)
- It inherits the LangChain LLMChain to make IRChain, which enable the user to seamlessly connect IRRetriever and LLMChain. (see chains.py)


[TODO]
- Solve the max_token exceeding error.
- Integrate the IRRetriever with LangChain RetrievalQA.

```
- Writer: Eungis
- last update: 23.11.10
```

## IRRetriever

In [None]:
import warnings
import logging

import IR
from retrievers import IRRetriever

logging = logging.getLogger(__name__)
warnings.filterwarnings("ignore")

ir_model = IR()
retriever = IRRetriever(ir_model=ir_model, top_k=5)
# And it assumed the documents have already been indexed before.
# Please load the project that have indexed documents when you use the IRRetriever.
retriever.load_project("{YOUR_PROJECT}", "{WHERE}")
docs = retriever._get_relevant_documents("{YOUR_QUESTION}")

In [2]:
import yaml

# CONFIG contains the API key such as OPENAI_API_KEY, etc.
# Load your api key to use LLM.
CONFIG = yaml.load(open("../config.yaml"), Loader=yaml.FullLoader)
ANTHROPIC_CONFIG = CONFIG.get("anthropic")
OPENAI_CONFIG = CONFIG.get("openai")

## Simple usage of IRChain

In [3]:
from chains import IRChain
from utils import load_ir_chain
from langchain.chat_models import AzureChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema.messages import SystemMessage
from langchain.prompts import HumanMessagePromptTemplate

# Prepare your prompt
system_template = """You are an assistant chatbot. Answer the question as best as you can."""

human_template = """Below are documents provided.
{context}

-------------
Question: {question}
Answer:"""

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=system_template),
        HumanMessagePromptTemplate.from_template(human_template),
    ]
)

# Initialize llm and chain
chat_model = AzureChatOpenAI(**OPENAI_CONFIG)
chain = load_ir_chain(
    llm=chat_model,
    retriever=retriever,
    prompt=prompt,
    input_key="question",
    document_variable_name="context",
)

In [None]:
answer = chain("{YOUR_QUESTION}")
print(answer["answer"])

In [5]:
# You can also add memory
from langchain.memory import ConversationBufferWindowMemory

# Prepare your prompt
system_template = """You are an assistant chatbot. Answer the question as best as you can."""

human_template = """Below are documents provided.
{context}

-------------
Previous conversation:
{chat_history}
Human: {question}
AI:"""

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=system_template),
        HumanMessagePromptTemplate.from_template(human_template),
    ]
)

# Add memory
# **Attention**
# 1. As the prompt above contains multiple input_variables(context, question), you must specify what your input_key and output_key are.
# 2. The memory_key and the input_variable of the memory in the prompt should be same.
# 3. The input_key and the input_variable of the user input in the prompt should be same.
memory = ConversationBufferWindowMemory(
    k=3,
    memory_key="chat_history",
    input_key="question",
    output_key="answer",
    return_messages=True,
)

chain = load_ir_chain(
    llm=chat_model,
    retriever=retriever,
    prompt=prompt,
    input_key="question",
    document_variable_name="context",
    memory=True,
    memory_key="chat_history",
)

In [None]:
answer = chain("{YOUR_QUESTION}")
print(answer["answer"])

In [None]:
answer = chain("{YOUR_QUESTION}")
print(answer["answer"])

In [None]:
print(chain.memory.buffer_as_str)

## IRChain: without memory
- Below are the specific guide about how to use IRChain, without attaching the memory to the chain.

    - 1. customize document_prompt
        - see `format_document` function in langchain.schema to know how each document is formatted.
        - below is the example of document_prompt.
        - all the documents will be joined according to the `document_separator` parameter in the IRChain.
    - 2. add callbacks
        - you can add callbacks to the IRChain as well as the llm.
        - you can also add callbacks in running the chain.
        - pertaining to the detail functionality of callbacks, please refer to the langchain docs. IRChain has exactly the same mechanism as langchain.chains.LLMChain.
    - 3. add output_parser
    - 4. dynamic call function of IRChain
        - _call_, predict, run, generate, apply, etc.
        - all the functions are based on `generate`, but with some differences in output format.
        - see the langchain documentation to know how the LLMChain works. IRChain has exactly the same mechanism as it.
    - 5. support asynchronization function.
        - it is useful when you combine the callbacks attached to the `on_new_token` with Frontend-side streaming functionality.
        - it is also useful when you want to speed up the test.
            - suppose you have 1000 questions to test.
            - you can asynchronously run the chain to accerate the process. 

In [4]:
from chains import IRChain
from langchain.chat_models import AzureChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.messages import SystemMessage
from langchain.prompts import HumanMessagePromptTemplate

# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate.from_template("Title {title}\n{page_content}")

# Prepare your prompt
system_template = """You are an assistant chatbot. Answer the question as best as you can."""

human_template = """Below are documents provided.
{context}

-------------
Question: {question}
Answer:"""

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=system_template),
        HumanMessagePromptTemplate.from_template(human_template),
    ]
)

# Initialize llm and IRChain

# You can add streaming callbacks to llm model
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# OPENAI_CONFIG.update({
#     "streaming": True,
#     "callback_manager": CallbackManager([StreamingStdOutCallbackHandler()])
# })

chat_model = AzureChatOpenAI(**OPENAI_CONFIG)

# As IRChain inherits from LLMChain, it has the same interface as it.
# That means, you can add callbacks, output_parser, or any other modules
# provided through langchain.
from langchain.callbacks import StdOutCallbackHandler

chain = IRChain(
    prompt=prompt,
    document_prompt=document_prompt,
    llm=chat_model,
    retriever=retriever,
    document_variable_name="context",
    document_separator="\n\n",
    output_key="answer",
    return_final_only=False,
    output_parser=StrOutputParser()
    # callbacks=[StdOutCallbackHandler()]
)

In [None]:
from langchain.callbacks.streaming_stdout import BaseCallbackHandler
from langchain.schema import Document
from typing import Sequence, Optional, Any
from uuid import UUID


# A benefit achieved from the langchain interface is that we can add callback handler also to the Retriever.
# Without hard-coding the retrieved documents, such as filtering based on any condition,
# we can simpy use callback handler on_retriever_start, on_retriever_end, etc to perform anything we want.
# Below is just the example to print out the number of documents we retrieve through IR.
class DocumentCallbackHandler(BaseCallbackHandler):
    def on_retriever_end(
        self,
        documents: Sequence[Document],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        **kwargs: Any,
    ):
        print(f"on_retriever_end() called with {len(documents)} documents")


# You can see that, if related documents provided,
# the document_prompt formatted (=context) is set as "".
# You can change it with setting `document_prompt_if_no_docs_found` parameter in the chain
answer = chain("안녕", callbacks=[DocumentCallbackHandler()])
print(f"__call__ output: {type(answer)}\noutput: {answer}")

In [None]:
# Instead of __call__ method, you can also use
# predict, apply, run (only when len(output_keys) == 1) just as LLMChain.
print(f"length of output_keys: {len(chain.output_keys)}")
answer = chain.run({"question": "{YOUR_QUESTION}"}, callbacks=[DocumentCallbackHandler()])
print(f"run output: {type(answer)}\noutput: {answer}")

In [None]:
answer = chain.predict(question="{YOUR_QUESTION}", callbacks=[DocumentCallbackHandler()])
print(f"predict output: {type(answer)}\noutput: {answer}")

In [None]:
answer = chain.generate([{"question": "{YOUR_QUESTION}"}, {"question": "{YOUR_QUESTION}"}])
print(f"generate output: {type(answer)}, \noutput: {answer}")

In [None]:
answer = chain.apply([{"question": "{YOUR_QUESTION}"}, {"question": "{YOUR_QUESTION}"}])
print(f"apply output: {type(answer)}, \noutput[0] contains: {answer[0]}")

In [None]:
# It also supports async function.
answer = await chain.apredict(question="{YOUR_QUESTION}")
print(f"apredict output: {type(answer)}\noutput: {answer}")

answer = await chain.agenerate(
    [{"question": "{YOUR_QUESTION}"}, {"question": "{YOUR_QUESTION}"}]
)
print(f"agenerate output: {type(answer)}, \noutput: {answer}")

answer = await chain.aapply(
    [{"question": "{YOUR_QUESTION}"}, {"question": "{YOUR_QUESTION}"}]
)
print(f"aapply output: {type(answer)}, \noutput[0] contains: {answer[0]}")

## IRChain: with memory

In [25]:
from chains import IRChain
from langchain.chat_models import ChatAnthropic
from langchain.prompts import PromptTemplate, MessagesPlaceholder
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.messages import SystemMessage
from langchain.prompts import HumanMessagePromptTemplate
from langchain.memory import ConversationBufferWindowMemory

# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate.from_template("Title {title}\n{page_content}")

# Prepare your prompt
system_template = """You are an assistant chatbot. Answer the question as best as you can."""

human_template = """Below are documents provided.
{context}

-------------
Previous conversation:
{chat_history}
Human: {question}
AI:"""

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=system_template),
        HumanMessagePromptTemplate.from_template(human_template),
    ]
)

# Initialize llm and IRChain
# You can add streaming callbacks to llm model
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

chat_model = ChatAnthropic(**ANTHROPIC_CONFIG)

# As IRChain inherits from LLMChain, it has the same interface as it.
# That means, you can add callbacks, output_parser, or any other modules
# provided through langchain.
from langchain.callbacks import StdOutCallbackHandler

# Add memory
# **Attention**
# 1. As the prompt above contains multiple input_variables(context, question), you must specify what your input_key and output_key are.
# 2. The memory_key and the input_variable of the memory in the prompt should be same.
# 3. The input_key and the input_variable of the user input in the prompt should be same.
memory = ConversationBufferWindowMemory(
    k=3,
    memory_key="chat_history",
    input_key="question",
    output_key="answer",
    return_messages=True,
)

chain = IRChain(
    memory=memory,
    prompt=prompt,
    document_prompt=document_prompt,
    llm=chat_model,
    retriever=retriever,
    document_variable_name="context",
    document_separator="\n\n",
    output_key="answer",
    return_final_only=False,
    output_parser=StrOutputParser(),
    # callbacks=[StdOutCallbackHandler()]
)

In [None]:
answer = chain("{YOUR_QUESTION}", callbacks=[DocumentCallbackHandler()])
print(f"__call__ output: {type(answer)}\noutput contains: {answer.keys()}")

In [None]:
# Check memory
print(chain.memory.buffer_as_str)

In [None]:
from utils import timeit


# You can use this decorator to count time spent on function running.
@timeit
def run_chain(chain, question):
    return chain.predict(question=question)


run_chain(chain, "{YOUR_QUESTION}")