[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)

In [1]:
# !pip install -qU \
#     langchain==0.0.292 \
#     openai==0.28.0 \
#     datasets==2.10.1 \
#     pinecone-client==2.2.4 \
#     tiktoken==0.5.1

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flagembedding 1.1.8 requires transformers==4.34.0, but you have transformers 4.36.2 which is incompatible.
ragas 0.0.22 requires openai>1, but you have openai 0.28.0 which is incompatible.

[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a `ChatOpenAI` object. For this we do need an [OpenAI API key](https://platform.openai.com/account/api-keys).

OpenAI Exploration for Amharic Language

In [27]:
import os
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY") 


In [28]:
import os
from langchain.chat_models import ChatOpenAI

OPENAI_API_KEY= os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") 

chat = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model='gpt-3.5-turbo'
)
# print(OPENAI_API_KEY) 

Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```

In LangChain there is a slightly different format. We use three _message_ objects like so:

In response we get another AI message object. We can print it more clearly like so:

Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [29]:
messages = [
    SystemMessage(content="You are a helpful assistant. we make Amharic questions and you make a brief answers in amharic language "),
    HumanMessage(content="ሃይ ሰላም ነው??"),
    AIMessage(content="እግዚአብሔር ይመስገን ሰላም ነኝ? ምን ልርዳዎ??"),
    HumanMessage(content="ሰለ ኢትዮጵያ ታሪክ ንገረኝ")
]
res = chat(messages)
res

AIMessage(content='ኢትዮጵያ ታሪክ ከአንድ ዓመት በላይ ነው የሚጠራው?')

In [31]:
messages = [
    SystemMessage(content="You are a helpful assistant. we make Amharic questions and you make a brief answers in amharic language "),
    HumanMessage(content="ሃይ ሰላም ነው?"),
    AIMessage(content="እግዚአብሔር ይመስገን ሰላም ነኝ? ምን ልርዳዎ??"),
    HumanMessage(content="ሃይ ሰላም ነው?")
]
res = chat(messages)
res

AIMessage(content='እግዚአብሔር ይመስገን ሰላም ነኝ።')

In [32]:
messages = [
    SystemMessage(content="You are a helpful assistant. we make Amharic questions and you make a brief answers in amharic language "),
    HumanMessage(content="ሃይ ሰላም ነው?"),
    AIMessage(content="እግዚአብሔር ይመስገን ሰላም ነኝ? ምን ልርዳዎ??"),
    HumanMessage(content="tell me two bible verse in amharic?")
    ]
res = chat(messages)
res

AIMessage(content='እግዚአብሔር ከአንተ ጋር ነው (ሴላም 23:4)\nእግዚአብሔር መልካም ነው (መጽሐፍ ቅዱስ ዮሐንስ 11:28)')

In [33]:
messages = [
    SystemMessage(content="You are a helpful assistant. we make Amharic questions and you make a brief answers in amharic language "),
    HumanMessage(content="ሃይ ሰላም ነው?"),
    AIMessage(content="እግዚአብሔር ይመስገን ሰላም ነኝ? ምን ልርዳዎ??"),
    HumanMessage(content="tell me two ethiopian premier league club  in amharic?")
    ]
res = chat(messages)
res

AIMessage(content='ሁሉም ኢትዮጵያውያንን የሰላም አሰራር አድራሻ ያለው ስፕራንት ቡጥ ሊገኝ እንደሚችል ይጠቀሙ።')

In [34]:
messages = [
    SystemMessage(content="You are a helpful assistant. we make Amharic questions and you make a brief answers in amharic language "),
    HumanMessage(content="ሃይ ሰላም ነው?"),
    AIMessage(content="እግዚአብሔር ይመስገን ሰላም ነኝ? ምን ልርዳዎ??"),
    HumanMessage(content="tell me two humanitarian law  in amharic?")
    ]
res = chat(messages)
res

AIMessage(content='ሁሉም ሰው በሁሉም በኩል እንዲሁም በሁሉም በኩል በመጠበቅ ለመስራት የሚደረግ የህብረት ምልክት\nበሰላምና በጥርስ አለው።')

In [35]:
messages = [
    SystemMessage(content="You are a helpful assistant. we make Amharic questions and you make a brief answers in amharic language "),
    HumanMessage(content="ሃይ ሰላም ነው?"),
    AIMessage(content="እግዚአብሔር ይመስገን ሰላም ነኝ? ምን ልርዳዎ??"),
    HumanMessage(content="የኢትዮጵያ መሪ ማን ነው??")
    ]
res = chat(messages)
res

AIMessage(content='የኢትዮጵያ መሪ አለማየሁሽ ብዙ ሰዎች የሚታየውን ማንም አይነቱን ሊያውቅ ይችላል። አንድነት አይደለም።')

There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [36]:
amh_information = [
    "ሐምሌ 22 ቀን 1928 ስምንት ወታደሮች በስተጀርባቸው ሃያ እርምጃ ርቀው ተንበርክከው በተንጠቀቅ ቆሙ።\nወዲያው አዛዡ «ተኩስ» በማለት ትእዛዝ ሲሰጥ ስምንቱም ተኩሰው መቷቸው። ግን በስምንት ጥይት ሳይሞቱ ቀሩ። መሞታቸውንና አለመሞታቸውን ለማረጋገጥ ዶክተር ተጠራ። ዶክተሩም እንዳልሞቱ አረጋገጠ። ከዚያም ሌላ ወታደር በሦስት የሽጉጥ ጥይት ራስቅላቸውን መትቶ ገደላቸው።\n\nአቡነ ጴጥሮስ ለእናት ሀገራቸው ኢትዮጵያ ክብር ሲሉ የሞትን ፅዋ ተጎነጩ ።\n\nለዛሬ እኛነታችን ያለፉት ትውልዶች ብዙ ዋጋ ከፍለዋል እና ሊዘከሩ፣ሊመሰገኑና ሊከበሩ ይገባል። የኃላው ከሌለ የለም የፊቱ ! "

]

source_knowledge = "\n".join(amh_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [37]:
query = "ሐምሌ 22 ምን ተፈጠረ??"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [38]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [39]:
print(res.content)

ሐምሌ 22 ምን ተፈጠረ? ተፈጠረው የኢትዮጵያ መሪ ነው።
