# Read OpenAI API key

In [1]:
import os
from dotenv import load_dotenv

load_dotenv("secrets.env")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

# Call to OpenAI API

**Pros of Using OpenAI Chat API in a Company's Chatbot:**

1. **Advanced Natural Language Processing (NLP):** The API uses state-of-the-art NLP, allowing the chatbot to understand and respond to user queries more effectively, even with complex or ambiguous language.
2. **Versatility:** OpenAI's API can handle a wide variety of topics, making the chatbot adaptable to diverse user needs across multiple domains without extensive retraining.
3. **Scalability:** The API can handle high volumes of requests, enabling businesses to scale their chatbot solutions without worrying about performance degradation.
4. **Continuous Improvement:** OpenAI continually updates and improves its models, so your chatbot can benefit from cutting-edge advancements in AI without needing significant internal resources.
5. **Rapid Deployment:** Implementing the OpenAI Chat API can accelerate the development and deployment of a chatbot, reducing time-to-market compared to building a custom NLP system from scratch.

**Cons of Using OpenAI Chat API in a Company's Chatbot:**

1. **Cost:** Usage of the API incurs ongoing costs based on the volume of requests, which can add up, especially for high-traffic applications.
2. **Limited Customization:** Although versatile, the API may not always meet highly specific business requirements, requiring additional layers of customization that can be complex to implement.
3. **Data Privacy Concerns:** Using a third-party API may raise concerns about data privacy and security, particularly in industries that handle sensitive customer information.
4. **Dependency on External Service:** Relying on an external service like OpenAI means potential downtime, rate-limiting, or API changes that are beyond the company's control.
5. **Ethical Considerations:** AI-generated responses can sometimes produce unexpected or inappropriate content, which may require careful monitoring and mitigation strategies to maintain brand integrity.

In [2]:
from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is AI?",
        }
    ],
    model="gpt-3.5-turbo",
)

print(chat_completion.choices[0].message.content)

AI, or Artificial Intelligence, refers to the simulation of human intelligence processes by machines, typically computer systems. This includes learning, reasoning, problem solving, perception, and language understanding. AI technologies include machine learning, natural language processing, computer vision, and robotics. AI is used in various applications, including self-driving cars, speech recognition, chatbots, medical diagnosis, and many others.


# Do I need to write utility classes on my own? - Utilise LangChain

In [3]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

model = ChatOpenAI(model="gpt-3.5-turbo", api_key=OPENAI_API_KEY)
response = model.invoke([HumanMessage(content="What is AI?")])

print(response.content)

AI, or artificial intelligence, refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning, problem-solving, perception, and language understanding. AI technologies are used to automate tasks, make predictions, and assist in decision-making across various industries and applications.


In [4]:
response = model.invoke([HumanMessage(content="It is really interesting what you have written. Can you say more about it?")])
print(response.content)

Of course! What specifically would you like to know more about?


As you can see - the model by itself doesn't remember the history of messages. We need to pass all messages to get the desired answers.

In [5]:
from langchain_core.messages import AIMessage

response = model.invoke(
    [
        HumanMessage(content="What is AI?"),
        AIMessage(
            content="AI, or artificial intelligence, refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI technology aims to mimic human cognitive functions and improve efficiency and accuracy in tasks that were previously only achievable by humans."
        ),
        HumanMessage(content="It is really interesting what you have written. Can you say more about it?"),
    ]
)

print(response.content)

Of course! Artificial intelligence can be categorized into two main types: narrow AI and general AI. Narrow AI, also known as weak AI, is designed to perform specific tasks or solve specific problems, such as facial recognition, speech recognition, or autonomous vehicles. General AI, also known as strong AI, is a hypothetical form of AI that can understand, learn, and apply knowledge across a wide range of tasks – essentially possessing human-like intelligence.

AI technologies rely on various techniques, including machine learning, deep learning, natural language processing, and computer vision. Machine learning is a subset of AI that enables systems to learn from data and improve their performance without being explicitly programmed. Deep learning is a subset of machine learning that involves artificial neural networks inspired by the structure and function of the human brain.

AI applications are vast and have been integrated into various industries, including healthcare, finance, t

# Message History

In [6]:
from langchain_core.chat_history import (
    BaseChatMessageHistory,
    InMemoryChatMessageHistory,
)
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]


chain_with_history = RunnableWithMessageHistory(model, get_session_history)

In [7]:
config = {"configurable": {"session_id": "session_1"}}

response = chain_with_history.invoke(
    [HumanMessage(content="What is AI?")],
    config=config,
)

print(response.content)

Parent run 18d1ab39-6cec-4f1d-ac20-623be21b31bd not found for run aed2acd9-be75-4596-bcfe-7b7f70df05d5. Treating as a root run.


AI, or artificial intelligence, refers to the simulation of human intelligence in machines that are programmed to think and act like humans. This includes tasks such as learning, reasoning, problem-solving, perception, and language understanding. AI technologies are used in various applications such as virtual assistants, self-driving cars, medical diagnosis, and many others.


In [8]:
response = chain_with_history.invoke(
    [HumanMessage(content="It is really interesting what you have written. Can you say more about it?")],
    config=config,
)

print(response.content)

Parent run e9b2683c-ebcb-41e4-bf30-0a2085b4d60a not found for run 349e05f7-c276-4cbf-9478-75a5801a8a6a. Treating as a root run.


Certainly! Artificial intelligence is a broad field that encompasses various subfields and technologies. Some of the key concepts and techniques used in AI include:

1. Machine Learning: This is a subset of AI that involves developing algorithms that allow computers to learn from and make predictions or decisions based on data. Machine learning techniques include neural networks, deep learning, and reinforcement learning.

2. Natural Language Processing (NLP): NLP is a branch of AI that focuses on enabling computers to understand, interpret, and generate human language. This technology is used in applications such as chatbots, language translation, and sentiment analysis.

3. Computer Vision: Computer vision is the field of AI that enables machines to interpret and understand visual information from the real world. This technology is used in facial recognition, object detection, autonomous vehicles, and medical image analysis.

4. Robotics: AI-powered robots are designed to perform tas

After a session_id change, another user will not have access to the previous conversation history.

In [9]:
config = {"configurable": {"session_id": "session_2"}}

response = chain_with_history.invoke(
    [HumanMessage(content="What was I asking about earlier?")],
    config=config,
)

print(response.content)

Parent run ca97334c-89cf-4a21-96d0-67b14dfd16c7 not found for run 7f3ca574-f525-44d3-8ab3-03d9ae8c4ac7. Treating as a root run.


I'm sorry, but I do not have the ability to remember previous interactions or conversations. Can you please provide more context or details about what you were asking about earlier?


## Different types of memories

- all messages (presented earlier)
- trim messages
    - (+) helps to reduce context
    - (-) older parts of conversation will be discarded and not passed to LLM
- summarize conversation
    - (+) helps to reduce context
    - (-) some parts of conversation during summarization might be lost, but will better preserve older parts of conversation than trimming

### Trim messages

In [10]:
from langchain_core.runnables import RunnablePassthrough


# Trim to four messages
def trim_messages(chain_input):
    current_store = store[config["configurable"]["session_id"]]
    if len(current_store.messages) <= 4:
        return False

    current_store.clear()

    for message in current_store.messages[-4:]:
        current_store.add_message(message)

    return True


chain_with_trimming = RunnablePassthrough.assign(messages_trimmed=trim_messages) | chain_with_history

response = chain_with_trimming.invoke(
    {"input": "My name is Bob."},
    config=config,
)
print(f"1: {response.content}")

response = chain_with_trimming.invoke(
    {"input": "What is my name?"},
    config=config,
)
print(f"2: {response.content}")

response = chain_with_trimming.invoke(
    {"input": "What is my name?"},
    config=config,
)
print(f"3: {response.content}")

Parent run 075978dc-2121-4add-bfa6-0e22c41b6de6 not found for run 3629406b-e7da-44d4-b3e3-179671541d71. Treating as a root run.
Parent run b7d81374-584f-494d-97ca-3afc66f554ed not found for run 9a728fdd-dff0-47cf-a5b3-a884ef0bf7d0. Treating as a root run.


1: Hello Bob! How can I assist you today?


Parent run 8b759e47-73e1-41cd-a70f-90adde3b8906 not found for run b3663c06-3075-48f4-8f30-e433ff846b10. Treating as a root run.


2: Your name is Bob. Is there anything else you would like to know or discuss?
3: I'm sorry, I do not have the capability to know your name as I am an AI assistant.


### Summarize conversation

In [11]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder


def summarize_messages(chain_input):
    current_store = store[config["configurable"]["session_id"]]
    if len(current_store.messages) == 0:
        return False

    summarization_prompt = ChatPromptTemplate.from_messages(
        [
            MessagesPlaceholder(variable_name="chat_history"),
            (
                "user",
                "Distill the above chat messages into a single summary message. Include as many specific details as you can.",
            ),
        ]
    )

    summarization_chain = summarization_prompt | model
    summary_message = summarization_chain.invoke({"chat_history": current_store.messages})

    current_store.clear()
    current_store.add_message(summary_message)

    return True


chain_with_summarization = RunnablePassthrough.assign(messages_summarized=summarize_messages) | chain_with_history

response = chain_with_summarization.invoke(
    {"input": "My name is Bob."},
    config=config,
)
print(f"1: {response.content}")

response = chain_with_summarization.invoke(
    {"input": "With what name did I introduce myself?"},
    config=config,
)
print(f"2: {response.content}")

response = chain_with_summarization.invoke(
    {"input": "With what name did I introduce myself?"},
    config=config,
)
print(f"3: {response.content}")
print(store[config["configurable"]["session_id"]].messages)

Parent run 97b320bd-4d1c-4dc9-b274-650501d3a932 not found for run 999d6deb-6afd-4384-bebb-18584a247880. Treating as a root run.


1: Nice to meet you, Bob! How can I assist you today?


Parent run 31c65c8d-ff19-4428-91f1-939174b3ee61 not found for run 175454ca-3f1f-4caa-9270-2fa5bb6e8036. Treating as a root run.


2: You introduced yourself as Bob.


Parent run af22a2ae-6125-472d-a521-f352949e611a not found for run 98175ee6-bda5-435c-864f-ce3d861259d1. Treating as a root run.


3: You introduced yourself as "Bob."
[AIMessage(content='Bob introduced himself to the AI assistant as "Bob," and the AI assistant welcomed him and offered assistance.', response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 75, 'total_tokens': 96}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-056f493d-d046-46a3-925d-dfff606b36f8-0', usage_metadata={'input_tokens': 75, 'output_tokens': 21, 'total_tokens': 96}), HumanMessage(content='With what name did I introduce myself?'), AIMessage(content='You introduced yourself as "Bob."', response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 40, 'total_tokens': 47}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-98175ee6-bda5-435c-864f-ce3d861259d1-0', usage_metadata={'input_tokens': 40, 'output_tokens': 7, 'total_tokens': 47})]


# Prompting techniques

- Use delimiters to clearly indicate distinct parts of the input
- Ask for a structured output
- Ask the model to check whether conditions are satisfied
- "Few-shot" prompting
- Specify the steps required to complete a task

## Use delimiters to clearly indicate distinct parts of the input

In [12]:
text = f"""
You should express what you want a model to do by
providing instructions that are as clear and
specific as you can possibly make them.
This will guide the model towards the desired output,
and reduce the chances of receiving irrelevant
or incorrect responses. Don't confuse writing a
clear prompt with writing a short prompt.
In many cases, longer prompts provide more clarity
and context for the model, which can lead to
more detailed and relevant outputs.
"""
prompt = f"""
Summarize the text delimited by triple backticks \
into a single sentence.
```{text}```
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

Clear and specific instructions should be provided to guide a model towards the desired output and reduce the chances of irrelevant or incorrect responses, emphasizing that longer prompts can provide more clarity and context for more detailed and relevant outputs.


## Ask for a structured output

In [13]:
prompt = """
Generate a list of three made-up book titles along with their number of pages (int), release date (yyyy-mm-dd) and if cover is hard or not (boolean).
Provide them in a yaml structured output format where title name is a key with three elemnts: number of pages, release date and if cover type.
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

```yaml
- Title 1:
    - Number of pages: 320
    - Release date: 2023-07-15
    - Cover type: true
- Title 2:
    - Number of pages: 250
    - Release date: 2024-02-29
    - Cover type: false
- Title 3:
    - Number of pages: 400
    - Release date: 2022-10-10
    - Cover type: true
```


### Task - try to rewrite this prompt to get the results in JSON format

In [14]:
prompt = """
Place for your prompt
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

What is your favorite childhood memory and why?


## Ask the model to check whether conditions are satisfied

In [15]:
text_1 = f"""
Preheat the oven to 350°F (175°C).
Mix 1 cup of softened butter, 1 cup of sugar, 2 cups of flour, and 1 tsp vanilla extract until combined.
Scoop spoonfuls onto a baking sheet and bake for 10-12 minutes, until golden brown.
Let cool, and enjoy!
"""
prompt = f"""
You will be provided with text delimited by triple quotes.
If it contains a sequence of instructions,
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions,
then simply write \"No steps provided.\"

\"\"\"{text_1}\"\"\"
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)


Step 1 - Preheat the oven to 350°F (175°C).
Step 2 - Mix 1 cup of softened butter, 1 cup of sugar, 2 cups of flour, and 1 tsp vanilla extract until combined.
Step 3 - Scoop spoonfuls onto a baking sheet and bake for 10-12 minutes, until golden brown.
Step 4 - Let cool, and enjoy!


In [16]:
text_2 = f"""
A short story is a piece of prose fiction. It can typically be read
in a single sitting and focuses on a self-containedincident or series
of linked incidents, with the intent of evoking a single effect or mood.
"""
prompt = f"""
You will be provided with text delimited by triple quotes.
If it contains a sequence of instructions,
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions,
then simply write \"No steps provided.\"

\"\"\"{text_2}\"\"\"
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

No steps provided.


## "Few-shot" prompting

In [17]:
prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest
valley flows from a modest spring; the
grandest symphony originates from a single note;
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about perseverance.
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

<grandparent>: Just as the oak tree grows strong
by enduring every storm, and the bird builds its nest
brick by brick, so too must we persevere through challenges
to achieve our goals and dreams.


## Specify the steps required to complete a task

In [18]:
text = """
Preheat the oven to 350°F (175°C).
Mix 1 cup of softened butter, 1 cup of sugar, 2 cups of flour, and 1 tsp vanilla extract until combined.
Scoop spoonfuls onto a baking sheet and bake for 10-12 minutes, until golden brown.
Let cool, and enjoy!
"""

prompt = f"""
I will give you text delimited with ###.
Perform the following actions:

1 - If it contains the instruction, rewrite them in the following format:

A - <instruction 1>
B - <instruction 2>
C - <instruction 3>

2 - Translate each instruction to one of the following languages: spanish, french, german, polish, norwegian.
3 - Output a list of objects in a json format. Each object in a list is for one translated sentence and contains the following keys:
original_sentence, translated_sentence, translation_language. There should be as many objects as instructions in the previous step.
4 - If it does not contin instructions output json object with a key \"error" and value for it which will be an error message

###{text}###
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

{
  "objects": [
    {
      "original_sentence": "Preheat the oven to 350°F (175°C).",
      "translated_sentence": "Vorheizen Sie den Ofen auf 350°F (175°C).",
      "translation_language": "german"
    },
    {
      "original_sentence": "Mix 1 cup of softened butter, 1 cup of sugar, 2 cups of flour, and 1 tsp vanilla extract until combined.",
      "translated_sentence": "Mélanger 1 tasse de beurre ramolli, 1 tasse de sucre, 2 tasses de farine et 1 cuillère à café d'extrait de vanille jusqu'à ce qu'ils soient combinés.",
      "translation_language": "french"
    },
    {
      "original_sentence": "Scoop spoonfuls onto a baking sheet and bake for 10-12 minutes, until golden brown.",
      "translated_sentence": "Scoop löffelweise auf ein Backblech und backen Sie 10-12 Minuten, bis sie goldbraun sind.",
      "translation_language": "german"
    },
    {
      "original_sentence": "Let cool, and enjoy!",
      "translated_sentence": "Lassen Sie abkühlen und genießen Sie!",
      "t

### Task - Write a prompt that generates three different made-up movies and their descriptions. Each movie name and description should be in a different style (poem, pirate-like, William Shakespeare). The output should be an HTML table. Use all of the aforementioned techniques.

In [19]:
prompt = """
Place for your prompt.
"""

response = model.invoke([HumanMessage(content=prompt)])
print(response.content)

Write a story about a young artist who discovers a hidden talent for painting and finds themselves torn between pursuing their passion and fulfilling their parents' expectations of going to college and pursuing a more stable career.


# Prompt templates

In [20]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant and a pirate. Answer all questions to the best of your ability, but remember to use pirate-like english.",
        ),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{input}"),
    ]
)

chain = prompt | model
pirate_chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

response = pirate_chain_with_history.invoke({"input": "What is AI?"}, config=config)
print(response.content)

Parent run 93f2fc22-9765-487b-a414-c4e599be8a94 not found for run 530378d9-cc3b-49a6-8922-8bce42913bbf. Treating as a root run.


Arrr, AI be short for Artificial Intelligence, matey. It be the brains behind this here assistant, helpin' it think and respond to yer questions.


# Build RAG - add vector storage
## Initialize vector storage

In [21]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-large", api_key=OPENAI_API_KEY)

vector_store = Chroma(
    collection_name="shrek_collection", embedding_function=embeddings, persist_directory="./chroma_langchain_db"
)

## Add documents

Let's start by creating storage where text chunks are quite big...
Creating larger chunks results in more text being passed as context. It may improve the accuracy of our LLM's answer, but it increases the cost of the API call.

In [22]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader("./docs/shrek-script.pdf")
document = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chunked_documents = text_splitter.split_documents(document)

vector_store.add_documents(chunked_documents)

['84f22cd0-ea3f-49b2-9cbc-df29c669b871',
 '51963e35-dcb8-4ccf-ab93-94287ab07aae',
 '7974aab8-f95b-45e7-bc68-1928f60ae876',
 '2d591904-70a6-43e5-8d58-a7ce29bd3557',
 '036e6678-df99-4086-aca2-f9ae39f44e1d',
 'fb203dad-0703-47e9-8086-aa7c534ba1e8',
 '4db61c65-9b1c-408b-aad2-2c6e08c21355',
 '8486a6fe-65a4-4c2f-99d6-69e82a8394a6',
 '1eeffbb0-8e2f-4168-a985-b9bd2ab09945',
 'cd172100-fb3a-45b6-aa7b-2d42a29b57c8',
 '1ac14c5e-6b07-4f1d-91db-fecf518aac7a',
 '451ed942-567b-413a-8bb5-d2a392dcffa2',
 '0b5a4c0b-bc6a-4b7f-a295-1b9fc419bc45',
 'f414bf88-7813-40ce-aee7-21c7b097cb08',
 '6be533f7-86b7-4e95-a6d0-701fd29ec22d',
 'b88650ba-bf9c-40ca-a30d-dfe4dd005e1e',
 '5851e2a2-2069-4282-84fd-6f8fe23cd5a4',
 '6d4224f4-b7b0-40e5-913e-5bb320d5a946',
 'c03d1f40-3758-49aa-8c7b-309c063c3403',
 'eecd2036-5f7e-454b-88a7-8dcf1f58f33e',
 'c714b92a-5b29-45e7-9dab-413dda5b77a9',
 '050125f2-7e64-45a3-ad32-89b317c9052a',
 '2392e179-73bd-4917-8d6f-05a2191b9652',
 '451e2104-b7fb-4549-abb8-a69457cf5928',
 '65093ec5-fa56-

## Ask and wait for answears :)

In [23]:
config = {"configurable": {"session_id": "rag_session_1"}}

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a helpful assistant and a literature specialist. Answer all questions to the best of your ability.
            During answearing use this context from documents: {context}. If you do not know the answear - do not provide it.
            Answear as short as possible, preferably in one sentence""",
        ),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{input}"),
    ]
)

chain = prompt | model

rag_chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)


retriever = vector_store.as_retriever()
rag_chain = (
    {
        "context": retriever | format_docs,
        "input": RunnablePassthrough(),
    }
    | rag_chain_with_history
)

response = rag_chain.invoke("What was the name of the princess?", config=config)
print(response.content)

Parent run e7028ba7-94e9-4f1e-9c57-98ea4b553ed5 not found for run f7b5be19-1be6-4a31-a8c5-0a3d30f99810. Treating as a root run.


Princess Fiona.


Let's try to make the chunk size smaller to reduce costs by providing fewer tokens in context. As you will see, larger chunks do not always give better results.
You have to find a chunk size that is not too big (we don't want our context to be too big) and that is sufficient for our LLM to get an answer from it.

In [24]:
vector_store = Chroma(
    collection_name="shrek_collection_small", embedding_function=embeddings, persist_directory="./chroma_langchain_db"
)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
chunked_documents = text_splitter.split_documents(document)

vector_store.add_documents(chunked_documents)

config = {"configurable": {"session_id": "rag_session_2"}}

retriever = vector_store.as_retriever()
rag_chain = {
    "context": retriever | format_docs,
    "input": RunnablePassthrough(),
} | rag_chain_with_history

response = rag_chain.invoke("What was the name of the princess?", config=config)
print(response.content)

Parent run f612fd2a-248b-4d31-b693-657a23959d43 not found for run 2b26ca12-63e0-4934-938e-d4721f028b14. Treating as a root run.


Princess Fiona.


## Vector storage search types
- similarity (default)
- mmr
- similarity_score_threshold

### Task - set up how many documents (3) should be retrieved from our vector storage using "mmr" search type

Try to set up how many documents should be retrieved from our vector storage (and passed to our prompt). It is another hyperparameter which when well tuned can help us reduce api call costs without losing answear precision. Also - change used search algorithm to "mmr".
[Link to docs](https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/#query-directly)

In [25]:
# Place for your code

# Put it all together - Demo with Frontend

In [26]:
import panel as pn

pn.extension()


async def callback(contents: str, user: str, instance: pn.chat.ChatInterface):
    message = ""
    for response in rag_chain.stream(contents, config=config):
        message += response.content
        yield message


config = {"configurable": {"session_id": "rag_session_2"}}
chat_interface = pn.chat.ChatInterface(callback=callback, callback_user="Shrek director")
chat_interface.servable()

# Additional task - use HuggingFace model

In [31]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from huggingface_hub import login

HF_TOKEN = os.environ.get("HUGGINGFACEHUB_API_TOKEN")

login(token=HF_TOKEN)

llm = HuggingFacePipeline.from_model_id(
    model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        return_full_text=False,
        do_sample=True,
        repetition_penalty=1.03,
        temperature=0.1
    ),
)

hg_model = ChatHuggingFace(llm=llm)
chain = prompt | hg_model
hg_chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)
rag_chain = {
    "context": retriever | format_docs,
    "input": RunnablePassthrough(),
} | hg_chain_with_history

response = rag_chain.invoke("Who was Shrek's partner when rescuing Fiona?", config=config)
print(response.content)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /Users/wiktorsmura/.cache/huggingface/token
Login successful


Parent run acd4971b-4e15-4e32-bdfd-686be8751834 not found for run 22620778-c886-4e3b-a3f4-d245a2dad4a7. Treating as a root run.


In "Shrek Goes To Town," Shrek is partnered with Donkey when rescuing Fiona. Donkey is Shrek's faithful companion and loyal friend.
