# LLM Background

In [None]:
from transformers import AutoTokenizer
from dotenv import load_dotenv
import os
load_dotenv()

First to work with LLM, you need to understand this:

LLM requires a ```chat_template```, if you do not use the model's specific template, then it would not generate the tokens as you expected.

For example have a look to ```meta-llama/Meta-Llama-3-8B-Instruct```, to run the cell belows you need an ```api_token``` from Huggingface.

In [None]:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct",
                                          token=os.getenv("HF_KEY"))

The output is
```
"""
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the weather like?<|im_end|>
"""
```


So this would be an input prompt for ```meta-llama/Meta-Llama-3-8B-Instruct```. In this case it add 2 tokens ```<|im_start|>system\n``` and ```<|im_end|>``` around the system message, ```<|im_start|>user\n``` and ```<|im_end|>``` around the user message to indicate the system message and user message.

In [None]:
print(
    tokenizer.apply_chat_template([
        {"role":"system",
        "content":"You are a helpful assistant."},
        {"role":"user",
        "content":"What is the weather like?"}],
    add_generation_prompt=False,
    tokenize=False 
    )
)

You can futher add more message to see how it works, by adding more messages you will ad a chat memory to the LLM.

In [None]:
print(
    tokenizer.apply_chat_template([
    {"role":"system",
    "content":"You are a helpful assistant."},
    {"role":"user",
    "content":"What is the weather like?"},
    {"role":"assistant",
     "content":"It is sunny today."},
    {"role":"user",
    "content":"And how are you doing?"},
    ],add_generation_prompt=False,
    tokenize=False)
)

In background if you are using OpenAI or any LLMs from any providers, they will apply the template for you, you need only specify the roles.

# LangChain

In [None]:
from dotenv import load_dotenv
import os
load_dotenv()

## BASIC

### invoke

calls the chain on a single input

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# parser convert a langchain AIMessage or output of an lanmgchain LLM to a string, if it is already a string it will return it as is.
parser = StrOutputParser()

llm = ChatOpenAI(model="gpt-3.5-turbo-0125",
                 temperature=0,
                 api_key=os.getenv("OPENAI_API_KEY"))

# This will create a chat prompt template for `user`-role.
prompt = ChatPromptTemplate.from_template("{question}")

In [None]:
llm.invoke("What is the capital of Germany?")

By calling `invoke` the prompt will return a ``HumanMessage``, this is similar to the `user`-role.

In [None]:
prompt.invoke("What is the capital of Germany?")

In [None]:
chain = prompt | llm 

chain.invoke({"question":"What is the capital of Germany?"})

In [None]:
chain = prompt | llm | parser

chain.invoke({"question":"What is the capital of Germany?"})

### streaming

Every LangChain components and chain have the ``stream``-method.

In [None]:
for chunk in chain.stream({"question":"Write a poem about water."}):
    print(chunk, end="", flush=True)

## Specialcase of Runnable

In [None]:
# This prompt will require 2 inputs, `question` and `answer`.
test_prompt = ChatPromptTemplate.from_template("""Here is the question: {question}
                                               Here is the answer: {answer}""")

### dictionary as component

In [None]:
# we can chain a dictionary with a LangChain component.
# let consider the following problem:
# our test_prompt requires a dictionary with keys `question` and `answer` and we want to pass a dictionary with keys `a` and `b`.
({"question":lambda x: x["a"],
  "answer": lambda x: x["b"]} | test_prompt).invoke({"a":"What is the capital of Germany?","b":"Berlin"})

### function as a component

I would recommend to use 

```
from langchain_core.runnables import RunnableLambda
```
to wrap around your function so it would also have `ìnvoke`, `stream`

but for now let see what happens.


In [None]:
def test_string_parsing(input_prompt)->str:
    """Convert LangChain ChatPromptValue to a string."""
    return input_prompt.messages[0].content

In [None]:
({"question":lambda x: x["a"],
  "answer": lambda x: x["b"]} | 
  test_prompt | 
  test_string_parsing).invoke({"a":"What is the capital of Germany?","b":"Berlin"})

### multiple inputs

If your function requires multiple arguments, then you have to pass the arguments as a dictionary!

In [None]:
def test_func1(input_dict:dict):
    question = input_dict["question"]
    answer = input_dict["answer"]
    return {"question":"hier is the question: "+question, "answer":"here is the answer: "+answer}

In [None]:
(test_func1 | test_prompt).invoke({"question":"What is the capital of Germany?","answer":"Berlin"})

### Adding config to function

If you have a `config` in the function argument, langchain will pass the runtime config to this argument!

In [None]:
def test_func2(input_dict:dict,config:dict):
    print(config)
    question = input_dict["question"]
    answer = input_dict["answer"]
    return {"question":"hier is the question: "+question, "answer":"here is the answer: "+answer}

In [None]:
(test_func2 | test_prompt).invoke({"question":"What is the capital of Germany?","answer":"Berlin"})

In [None]:
(test_func2 | test_prompt).invoke({"question":"What is the capital of Germany?","answer":"Berlin"},{"tags": ["my-tag"]})

## Simple RAG

![Example Image](pics/rag.png)

In [None]:
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.messages.system import SystemMessage
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
import platform

In [None]:
def check_os()->str:
    """Check the OS of the system"""
    os_name = platform.system()
    return os_name
if check_os() == "Windows":
    embedding_path="huggingface_models\\BAAI\\bge-large-en-v1.5"
else:
    embedding_path="huggingface_models/BAAI/bge-large-en-v1.5"
embeddings = HuggingFaceEmbeddings(model_name=embedding_path)

### LOADING VECTOR DATABASE

For `search_type` you have the following 2 options:

- `similarity`: ranking by the score, in this case the cosine-similarity score.

- `mmr`: Maximal Marginal Relevance

select examples based on a combination of which examples are most similar to the input query, while optimizing for diversity

        `$MMR = arg\ max_{D_i\in R\setminus S} [\lambda \ cosin(D_i,Q) - (1-\lambda)\ max_{D_J\in S} cosin(D_i,D_j)]$`

- `k`: Number of returned documents

In [None]:
qdrant_db = Qdrant.from_existing_collection(embedding=embeddings,
                                         path="./qdrant-database",
                                        collection_name="llm_papers")
retriever = qdrant_db.as_retriever(search_type="similarity",search_kwargs={'k': 5,})

In [None]:
retriever.invoke("How does self-rag work?")

ADDING CONTEXT TO USER PROMPT

In [None]:
# this template will require a dictionary with keys `question` and `context`
template = """Question: {question}

Context: ```{context}```
"""
def formatting_page_content(page_contents)->str:
    """Convert a list of langchain Document to a string."""
    context = ""
    for page in page_contents:
        context += "\n--------------NEW DOCUMENT----------------\n"+page.page_content
    return context

In [None]:
# using `from_messages` method to create a ChatPromptTemplate from a list of messages.
rag_prompt = ChatPromptTemplate.from_messages([
    SystemMessage("You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question."\
                  " If you don't know the answer, just say that you don't know."\
                " You must answer the question using only the provided context and do not include any information from outside the given context."),
    ("human",template)]
)

## RunnableParallel

Run the chain in parallel instead of sequential. So if you would use multiple LLMs, it will call the LLMs in parallel.

In [None]:
simple_rag_chain = RunnableParallel({
    "context": retriever | formatting_page_content,
    "question": RunnablePassthrough()
})| rag_prompt | llm | parser

![Example Image](pics/rag2.png)

### RAG STREAMING EDXAMPLE

In [None]:
for chunk in simple_rag_chain.stream("How does self-rag work?"):
    print(chunk, end="", flush=True)

## create reranker

In [None]:
from FlagEmbedding import FlagReranker
if check_os() == "Windows":
    ranker_path="huggingface_models\\BAAI\\bge-reranker-large"
else:
    ranker_path="huggingface_models/BAAI/bge-reranker-large"
reranker = FlagReranker(ranker_path, use_fp16=True) 

In [None]:
from functools import partial
def ranking_documents(input_dict:dict,
                      k:int=3,
                      query_key:str="question",
                      documents_key:str="documents")->list:
    """Ranking the documents based on the scores computed by the reranker model."""
    
    query = input_dict[query_key]
    documents = input_dict[documents_key]
    # compute scores between query and documents
    scores = [reranker.compute_score([query, doc.page_content]) for doc in documents]
    # ranking and sorting documents by scores
    zip_docs= list(zip(documents,scores))
    zip_docs.sort(key=lambda x: x[1],reverse=True)
    # retrun top k documents by scores
    return [doc[0] for doc in zip_docs][:k]

In [None]:
from langchain_core.runnables import ConfigurableField
from langchain_core.runnables import RunnableLambda
from operator import itemgetter


### Set configureable

First we need to introduce to `configurable`.

```k``` is number of retrieved documents from database

In [None]:
retriever = qdrant_db.as_retriever(search_type="similarity",
                                   search_kwargs={'k': 5,}).configurable_fields(
    # make search_kwargs field configurable
    search_kwargs=ConfigurableField(
        id="retriever_kwargs",
        name="retriever_kwargs",
        description="Return number of trevied documents",
    )
)

In [None]:
# we can change the number of retrieved documents by changing the value of `retriever_kwargs` field.
# we have the following 2 options to change the value of `retriever_kwargs` field.
docs = retriever.with_config(configurable={"retriever_kwargs": {"k":3}}).invoke("How does self-rag work?")

In [None]:
docs

In [None]:
retriever.invoke("How does self-rag work?",config={"configurable": {"retriever_kwargs": {"k":7}}})

### define chain with ranker

In [None]:
ranker = RunnableLambda(partial(ranking_documents,
                                query_key="question",
                                documents_key="documents"))

In [None]:
ranker.invoke({"question":"How does self-rag work?","documents":docs})

In [None]:
rag_with_ranker = RunnableParallel({
    "documents": retriever,
    "question": RunnablePassthrough()
})| RunnableParallel({
    "context": ranker|formatting_page_content,
    "question": itemgetter("question")}) | rag_prompt | llm | parser

![Example Image](pics/rag3.png)

In [None]:
# the graph can be also generated by calling `get_graph` method.
rag_with_ranker.get_graph().print_ascii()

In [None]:
rag_with_ranker.invoke("How does self-rag work?",{"configurable": {"retriever_kwargs": {"k":5}}})

In [None]:
rag_with_ranker.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke("How does self-rag work?")

## Add memory

In [None]:
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.messages.human import HumanMessage
from langchain_core.messages.ai import AIMessage
from langchain_core.prompts import MessagesPlaceholder

# MessagesPlaceholder is a placeholder for a list of messages.

# I have to change the system message to make the LLM not to use the context.
rag_prompt_With_memory = ChatPromptTemplate.from_messages([
    SystemMessage("You are a helpful assistant. Use the context given from the user to answer the question if needed. Otherwise you can ignore it."),
    MessagesPlaceholder(variable_name="history"), # a placeholder for the history
    ("human",template)]
)


### Example on placeholder
It requires the key `history`

In [None]:
rag_prompt_With_memory.invoke({"history":[],
                               "context":"This is the context",
                               "question":"This is the question"}).messages

In [None]:
rag_prompt_With_memory.invoke({"history":[HumanMessage("hello"),
                                          AIMessage("hi")],
                               "context":"This is the context",
                               "question":"This is the question"}).messages

### APPLY MEMORY TO RANKER PIPELINE

In [None]:
rag_memory_with_ranker = RunnableParallel({
    "documents": itemgetter("input")|retriever,
    "question": itemgetter("input"),
    "history": itemgetter("history"),
}) | RunnableParallel({
    "context": ranker|formatting_page_content,
    "question": itemgetter("question"),
    "history":itemgetter("history")}) | rag_prompt_With_memory | llm | parser

If we use this chain we have to create a list of messages to add the `human` and `ai` message mannually

In [None]:
rag_memory_with_ranker.invoke({"history":[HumanMessage("My Name is Long."),
                                          AIMessage("Hello Long.")],
                               "input":"What is my Name?"})

### RunnableWithMessageHistory

We can also use `RunnableWithMessageHistory`. It automatically append the message to the chat history.

append to message history via:

`input_messages_key`: the key of from input-dictionary should be consider as user_message to add to history

`history_messages_key`: the key of from input-dictionary should be consider as list of human-ai chat history

Later on:
`output_messages_key`: the key from output-dictionary (output of the end of the chain) should be consider as ai_message to add to history


In [None]:
store = {}
def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


rag_memory_with_ranker_chain = RunnableWithMessageHistory(
    rag_memory_with_ranker, # the chain
    get_session_history, # the function to get the history
    input_messages_key="input",
    history_messages_key="history",
)

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"Hello"},config={"configurable": {"session_id": "demorag"}})

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"How do I make spaghetti carbonara?"},config={"configurable": {"session_id": "demorag"}})

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"Can you give me a detail about the recipe for 4 person?"},config={"configurable": {"session_id": "demorag"}})

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"What is self-rag about?"},config={"configurable": {"session_id": "demorag"}})

# EVALUATION LLM APPLICATION

In [None]:
from langfuse.callback import CallbackHandler
from langfuse import Langfuse
load_dotenv()

# if you run to error, replace the values with your own values, instead loading from enviroment variables.
langfuse_handler = CallbackHandler(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),
)
langfuse_client = Langfuse(    
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),)

def scoring_run(trace_id,metric_name,metric_value):
    """Scoring the langfuse-run."""
    langfuse_client.score(
        trace_id=trace_id,
        name=metric_name,
        value=metric_value,
    )

## tracing run

In [None]:
langfuse_handler = CallbackHandler(
    secret_key="sk-lf-ac81d8f4-4519-4905-bc6f-8965b78e0bcf",
    public_key="pk-lf-37e4e930-70ea-4688-8955-fc7381de892b",
    host="https://cloud.langfuse.com",
)
langfuse_client = Langfuse(    
    secret_key="sk-lf-ac81d8f4-4519-4905-bc6f-8965b78e0bcf",
    public_key="pk-lf-37e4e930-70ea-4688-8955-fc7381de892b",
    host="https://cloud.langfuse.com",)

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"Hello."},
                                                                                                config={"configurable": 
                                                                                                        {"session_id": "demorag2"},
                                                                                                        "callbacks": [langfuse_handler]})

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"How do I make spaghetti?"},
                                                                                                config={"configurable": 
                                                                                                        {"session_id": "demorag2"},
                                                                                                        "callbacks": [langfuse_handler]})

In [None]:
rag_memory_with_ranker_chain.with_config(configurable={"retriever_kwargs": {"k":5}}).invoke({"input":"How do I use the recipe to make it for 4 person?"},
                                                                                                config={"configurable": 
                                                                                                        {"session_id": "demorag2"},
                                                                                                        "callbacks": [langfuse_handler]})

## EVALUTING RAG PIPELINE

### Create Evaluation Dataset

In [None]:
evaluation_questions = ["How does Self-Reflective Retrieval-Augmented Generation (SELF-RAG) work?",
             "What is RAPTOR and hwo does it works?",
             "How can I improve my LLM on domain specific data such that it performs better than GPT-4?",
             "How does Direct Preference Optimization (DPO) work?",
             "What is the difference between RAG and LLM?",
             "How can I evaluate a Large Language Model (LLM)?",
             "Is there a way to build an LLM-agent for medical data?"]
dataset_name = "dida-workshop"

In [None]:
langfuse_client.create_dataset(
    name=dataset_name,
    # optional description
    description="dataset for evaluating faithfullness of RAG system",
    # optional metadata
    metadata={
        "author": "test-user",
        "type": "benchmark demo"
    }
)

### Adding dataset

In [None]:
for question in evaluation_questions:
    langfuse_client.create_dataset_item(
        dataset_name=dataset_name,
        input={
            "question": question
        },
        expected_output={
        },
        metadata={
            "info": "dataset contains only question.",
        }
    )

## Evaluation

### Gather all the components again

In [None]:
template = """Question: {question}

Context: ```{context}```
"""
rag_prompt = ChatPromptTemplate.from_messages([
    SystemMessage("You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question."\
                  " If you don't know the answer, just say that you don't know."\
                " You must answer the question using only the provided context and do not include any information from outside the given context."),
    ("human",template)]
)

### Alternative Component
This makes easier to test between models/components.

In [None]:
# run_name will change the name of the component
llm = ChatOpenAI(model="gpt-3.5-turbo-0125",
                 temperature=0,
                 api_key=os.getenv("OPENAI_KEY")).with_config(
                     {"run_name": "GPT-3.5"}).configurable_alternatives(
    ConfigurableField(id="llm"),
    default_key="gpt-3.5",
    gpt4=ChatOpenAI(model="gpt-4o", #gpt-4o is cheaper than gpt-4 turbo
                      temperature=0,
                      api_key=os.getenv("OPENAI_KEY")).with_config(
                          {"run_name": "gpt-4o"}),
)

### evaluate on on rag without memory

In [None]:
rag_chain = RunnableParallel({
    "documents": retriever,
    "question": RunnablePassthrough(),
}) | RunnableParallel({
    "retrieved_documents": ranker,
    "question": itemgetter("question"),
    })|RunnableParallel({"context": itemgetter("retrieved_documents") | RunnableLambda(formatting_page_content),
                         "question": itemgetter("question"),
                         "retrieved_documents":itemgetter("retrieved_documents")}) | RunnableParallel({"completion": rag_prompt | llm | parser,
                                                                                                       "retrieved_documents": itemgetter("retrieved_documents")})

In [None]:
rag_chain.get_graph().print_ascii()

In [None]:
x=rag_chain.with_config(configurable={"llm": "gpt4",
                                      "retriever_kwargs": {"k":5}}).invoke("How does self-rag work?",config={"callbacks": [langfuse_handler]})

In [None]:
x

Create completions from the dataset on langfuse

In [None]:
from datetime import datetime
datetime.now().strftime("_%Y-%m-%d_%H-%M-%S")
dataset = langfuse_client.get_dataset(dataset_name)
ids = []
contexts = []
questions = []
answers = []
model_ids=["gpt-3.5","gpt4"]
exp_time= datetime.now().strftime("_%Y-%m-%d_%H-%M-%S")
for model_id in model_ids:
    for item in dataset.items:
        store = {}
        handler = item.get_langchain_handler(run_name=model_id+exp_time)
        answer=rag_chain.with_config(configurable={"llm": model_id,"retriever_kwargs": {"k":5}}).invoke(
            item.input["question"],
            config={"callbacks": [handler]})
        
        id = handler.get_trace_id()
        ids.append(id)
        answers.append(answer["completion"])
        questions.append(item.input["question"])
        contexts.append([d.page_content for d in answer["retrieved_documents"]])


### USE RAGAS FOR EVALUATE RAG

`faithfullness`: is the response supported by the retrieved context.

`answer_relevancy`: is the response relevant to the query

Both metrics does not requires groundtruth.

In [None]:
from datasets import Dataset 
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy

### Compute the metrics

In [None]:
data_samples = {
    'question': questions,
    'answer': answers,
    'contexts' : contexts,
    #'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}

eval_dataset = Dataset.from_dict(data_samples)

score = evaluate(eval_dataset,metrics=[faithfulness,answer_relevancy])
df = score.to_pandas()
df["ids"] = ids

## Adding score to each run

In [None]:
for i in range(len(df)):
    exp = df.iloc[i]
    exp_id = exp["ids"]
    faithfullness = exp["faithfulness"]
    answer_relevancy = exp["answer_relevancy"]
    scoring_run(exp_id,"faithfulness",faithfullness)
    scoring_run(exp_id,"answer_relevance",answer_relevancy)

# ROUTING

BACKGROUND:

Our chain always call the retrieval. We want the LLM

- database related question: stay faithfull to the documents.

- not related to the database question: do not use Retrieval.

### Routing Prompt

In [None]:
routing_message = """You are a helpful assistant. Your task is to decide to use information from a database to answer the question or not. The database contain information related to machine learning. 

Here are a list of short name of the paper in the database:
- Direct Preference Optimization (DPO)
- Self-Reflective Retrieval-Augmented Generation (SELF-RAG)
- RAPTOR: Recursive Abstractive Processing For Tree-Organized Retrieval
- Medagents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
- Replacing Judeges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
- Iterative Reasoning Preference Optimization (IRPO)
- LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Decide to use the retrieved document or not. return only one word `YES` or `NO`."""

`router` chain

In [None]:
router_prompt = ChatPromptTemplate.from_messages([
    SystemMessage(routing_message),
    ("human","{question}")]
)
router_chain = router_prompt | llm | parser

In [None]:
router_chain.invoke("What is DPO?")

In [None]:
router_chain.with_config(configurable={"llm": "gpt4"}).invoke("What is DPO?")

### Definie a router-function to select the `chain` base on its decision

In [None]:
def routing_function(input_dict):
    if "YES" in input_dict["decision"]:
        return itemgetter("question")|rag_chain
    return ChatPromptTemplate.from_template("{question}")|llm | parser

In [None]:
full_rag_chain = {"decision":router_chain,"question":RunnablePassthrough()}|RunnableLambda(routing_function)

In [None]:
full_rag_chain.invoke("How does self-rag work?")

In [None]:
full_rag_chain.invoke("Hello")

# FULL RAG CHAIN WITH MEMORY

## Collect what we did

### ROUTER PROMPT

In [None]:
routing_message = """You are a helpful assistant. Your task is to decide to use information from a database to answer the question or not. The database contain information related to machine learning. 

Here are a list of short name of the paper in the database:
- Direct Preference Optimization (DPO)
- Self-Reflective Retrieval-Augmented Generation (SELF-RAG)
- RAPTOR: Recursive Abstractive Processing For Tree-Organized Retrieval
- Medagents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
- Replacing Judeges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
- Iterative Reasoning Preference Optimization (IRPO)
- LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Repor

Decide to use the retrieved document or not. return only one word `YES` or `NO`."""
router_prompt = ChatPromptTemplate.from_messages([
    SystemMessage(routing_message),
    ("human","{question}")]
)

### RAG PROMPT

In [None]:
rag_template = """Question: {question}

Context: ```{context}```
"""
rag_prompt_messages = ChatPromptTemplate.from_messages([
    SystemMessage("You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question."\
                  " If you don't know the answer, just say that you don't know."\
                " You must answer the question using only the provided context and do not include any information from outside the given context."),
    MessagesPlaceholder(variable_name="history"),
    ("human",rag_template)]
)

### QA BOT PROMPT

In [None]:
qa_prompt_messages = ChatPromptTemplate.from_messages([
    SystemMessage("You are an helpful assistant. If you do not know the answer, just say that you don't know."),
    MessagesPlaceholder(variable_name="history"),
    ("human","{question}")]
)

### RAG

In [None]:
retriever_component = RunnableParallel({
    "documents": itemgetter("question")|retriever,
    "question": itemgetter("question"),
    "history": itemgetter("history"),
})

ranker_component = RunnableParallel({
    "retrieved_documents": ranker,
    "question": itemgetter("question"),
    "history": itemgetter("history"),
    })

context_formatter_component = RunnableParallel({"context": itemgetter("retrieved_documents") | RunnableLambda(formatting_page_content),
                         "question": itemgetter("question"),
                         "history": itemgetter("history"),
                         "retrieved_documents":itemgetter("retrieved_documents")})

rag_llm_component = RunnableParallel({"completion": rag_prompt_messages | llm | parser,
                                        "retrieved_documents": itemgetter("retrieved_documents")})

rag_chain_with_memory_placeholder = retriever_component | ranker_component | context_formatter_component | rag_llm_component

### ROUTER

In [None]:
router_chain = router_prompt | llm | parser

### QA LLM

Use a runnable with `completion` and `retrieved_documents` to make sure this chain return the same structure as `rag_chain_with_memory_placeholder`

In [None]:
qa_llm_component = RunnableParallel({"completion":qa_prompt_messages|llm | parser,
                                     "retrieved_documents":lambda x: []})

## CHATBOT

In [None]:
def routing_function(input_dict):
    if "YES" in input_dict["decision"]:
        return rag_chain_with_memory_placeholder
    return qa_llm_component
workshop_chatbot = RunnableParallel({"decision":router_chain,
                         "question":itemgetter("question"),
                         "history":itemgetter("history")}) | RunnableLambda(routing_function)

Here we need also 

`output_messages_key` because `RunnableLambda(routing_function)` return a dictionary with 

```python
{"completion":...,
"retrieved_documents":...
}

```

In [None]:
# reset the store we can also change `session_id`
store = {}
workshop_chatbot_with_memory = RunnableWithMessageHistory(
    workshop_chatbot,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    output_messages_key="completion"
)

In [None]:
for x in workshop_chatbot_with_memory.stream({"question":"Hello I am Long"},config={"configurable": {"session_id": "demorag"}}):
    print(x.get("completion",""), end="", flush=True)

In [None]:
for x in workshop_chatbot_with_memory.stream({"question":"How does Self-Rag work?"},config={"configurable": {"session_id": "demorag"}}):
    print(x.get("completion",""), end="", flush=True)

In [None]:
for x in workshop_chatbot_with_memory.stream({"question":"What is my name?"},config={"configurable": {"session_id": "demorag"}}):
    print(x.get("completion",""), end="", flush=True)

In [None]:
for x in workshop_chatbot_with_memory.stream({"question":"How do I make spaghetti for 4 person?"},config={"configurable": {"session_id": "demorag"}}):
    print(x.get("completion",""), end="", flush=True)