# Chatbot with Memory

Let's add some history to the interaction and build a chatbot. Unlike many people think. LLMs are fixed in their state. They are trained until a certain cutoff date and do not know anything after that point unless you feed them current information. That is also why LLMs do not remember anything about you or the prompts you send to the model. If the model seems to remember you and what you said it is always because the application you are using (e.g. ChatPGT or the chat function in SAP AI Launchpad) is sending the chat history to the model to provide the conversation history to the model as context.

Below you can find a simple implementation of a chatbot with memory.

The code in this exercise is based on the [help documentation](https://help.sap.com/doc/generative-ai-hub-sdk/CLOUD/en-US/_reference/orchestration-service.html) of the Generative AI Hub Python SDK.

In [None]:
import os
import json
import variables
from typing import List

with open('/home/user/projects/generative-ai-codejam/.aicore-config.json', 'r') as config_file:
    config_data = json.load(config_file)

os.environ["AICORE_AUTH_URL"]=config_data["url"]+"/oauth/token"
os.environ["AICORE_CLIENT_ID"]=config_data["clientid"]
os.environ["AICORE_CLIENT_SECRET"]=config_data["clientsecret"]
os.environ["AICORE_BASE_URL"]=config_data["serviceurls"]["AI_API_URL"]

os.environ["AICORE_RESOURCE_GROUP"]=variables.RESOURCE_GROUP
os.environ["AICORE_ORCHESTRATION_DEPLOYMENT_URL"] = variables.AICORE_ORCHESTRATION_DEPLOYMENT_URL

In [None]:
from gen_ai_hub.orchestration.models.llm import LLM
from gen_ai_hub.orchestration.models.message import Message, SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.template import Template, TemplateValue
from gen_ai_hub.orchestration.models.config import OrchestrationConfig
from gen_ai_hub.orchestration.service import OrchestrationService

llm = LLM(
    name="gpt-4o-mini",
    version="latest",
    parameters={"max_tokens": 500, "temperature": 1},
)

template = Template(
    messages=[
        SystemMessage("You are a helpful translation assistant."),
        UserMessage(
            "Translate the following text to {{?to_lang}}: {{?text}}",
        ),
    ],
    defaults=[
        TemplateValue(name="to_lang", value="English"),
    ],
)

config = OrchestrationConfig(
    template=template,
    llm=llm,
)

orchestration_service = OrchestrationService(
    api_url=os.environ["AICORE_ORCHESTRATION_DEPLOYMENT_URL"],
    config=config,
)

In [None]:
class ChatBot:
    def __init__(self, orchestration_service: OrchestrationService):
        self.service = orchestration_service
        self.config = OrchestrationConfig(
            template=Template(
                messages=[
                    SystemMessage("You are a helpful chatbot assistant."),
                    UserMessage("{{?user_query}}"),
                ],
            ),
            llm=LLM(name="gpt-4o"),
        )
        self.history: List[Message] = []

    def chat(self, user_input):
        self.history.append(Message(role="user", content=user_input))
        response = self.service.run(
            config=self.config,
            template_values=[
                TemplateValue(name="user_query", value=user_input+str(self.history)),
            ]
        )

        bot_message = response.orchestration_result.choices[0].message

        self.history.append(Message(role="assistant", content=bot_message.content))

        return bot_message.content
    
    def reset(self):
        self.history = []

In [None]:
bot = ChatBot(orchestration_service=orchestration_service)
print(bot.chat("Hello, how are you?"))

In [None]:
print(bot.chat("What's the weather like today?"))
bot.history

In [None]:
print(bot.chat("Can you remember what I first asked you?"))

And to prove to you that the model does indeed not remember you, let's delete the history and try again :)

In [None]:
bot.reset()
print(bot.chat("Can you remember what I first asked you?"))

# Streaming
For very long output text streaming is a powerful addition to make sure your users do not get bored waiting for the result! Streaming let's you print the output as it is created instead of waiting for the model to finish the entire response and then sending it at once.

In [None]:
from gen_ai_hub.proxy.native.openai import chat

def stream_openai(prompt, model_name='gpt-4o'):
    messages = [
        {"role": "system", "content": "You love to write poems."},
        {"role": "user", "content": prompt}
    ]
    
    kwargs = dict(model_name=model_name, messages=messages, max_tokens=500, stream=True)
    stream = chat.completions.create(**kwargs)
    
    for chunk in stream:
        if chunk.choices:
            content = chunk.choices[0].delta.content
            if content:
                print(content, end='')

In [None]:
stream_openai("Why is the sky blue?")

[Next exercise - OPTIONAL](11-semantic-chunking.ipynb)