# Movie Chatbot with vCore Azure Cosmos DB for MongoDB and LangChain

In this sample, we'll demonstrate how to build a RAG Pattern application using a subset of the Movie Lens dataset. This sample will leverage the SDK's for MongoDB to perform vector search and cache the results. And Azure OpenAI to generate embeddings and LLM completions.

There are two implementations in this project. One using LangChain and this simple implementation. The simple implementation connects directly to Azure Cosmos DB for MongoDB to perform vector search, and cache responses. It also connects directly to Azure OpenAI to generate embeddings and completions. This version requires a user to define and build the LLM payloads for LLM generation and also define the RAG Pattern request pipeline. Cache must be manually consulted in the pipelin and responses must also be manually cached.

The vector search will be done using Azure Cosmos DB for MongoDB's [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) functionality to do vector search over the vectorized movie data as well as the conversation history which is also used as a cache.

At the end we will create a simple UX using Gradio to allow users to type in questions and display responses generated by a GPT model or served from the cache. The resopnses will also display an elapsed time so you can see the impact caching has on performance versus generating a response.

**Important Note**: This sample requires you to have a v-Core Azure Cosmos DB for MongoDB account setup. To get started, visit:
- [vCore Azure Cosmos DB for MongoDB Quickstart](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/quickstart-portal)
- [vCore Azure Cosmos DB for MongoDB Vector Search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search)

# Preliminaries <a class="anchor" id="preliminaries"></a>
First, let's start by installing the packages that we'll need later. 

In [None]:
# ! pip install gradio
# ! pip install langchain
# ! pip install langchain_community
# ! pip install langchain_openai
# ! pip install openai
# ! pip install pymongo
# ! pip install python-dotenv

In [None]:
from dotenv import dotenv_values
import gradio as gr
from langchain.chains import ConversationalRetrievalChain
from langchain.globals import set_llm_cache
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_community.cache import AzureCosmosDBSemanticCache
from langchain_community.chat_message_histories import MongoDBChatMessageHistory
from langchain_community.vectorstores.azure_cosmos_db import (
    AzureCosmosDBVectorSearch,
    CosmosDBSimilarityType,
    CosmosDBVectorSearchType)
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import pymongo

Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file.
Make sure to modify the env_name accordingly. 

In [None]:
# specify the name of the .env file name 
env_name = "../fabconf.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)
cosmos_conn = config['cosmos_for_mongodb_connection_string']
cosmos_database = config['cosmos_database_name']
cosmos_collection = config['cosmos_collection_name']
cosmos_vector_property = config['cosmos_vector_property_name']
cosmos_semcache = config['cosmos_semcache_collection_name']
cosmos_chat_history = config['cosmos_chathistory_collection_name']

openai_endpoint = config['openai_endpoint']
openai_key = config['openai_key']
openai_api_version = config['openai_api_version']
openai_embeddings_deployment = config['openai_embeddings_deployment']
openai_embeddings_model = config['openai_embeddings_model']
openai_embeddings_dimensions = int(config['openai_embeddings_dimensions'])
openai_completions_deployment = config['openai_completions_deployment']
openai_completions_model = config['openai_completions_model']

# Azure Cosmos DB for MongoDB Connections

In [None]:
# Establish connection to db
cosmos_client = pymongo.MongoClient(cosmos_conn)

# Get the database
database = cosmos_client[cosmos_database]

# Get the movie collection
movies = database[cosmos_collection]

# Get the cache collection
cache = database[cosmos_semcache]

# Get the chat history collection
chathistory = database[cosmos_chat_history]

# Generate Embeddings with Azure OpenAI

In [None]:
azure_openai_embeddings = AzureOpenAIEmbeddings(
    azure_deployment = openai_embeddings_deployment,
    api_key= openai_key,
    azure_endpoint= openai_endpoint,
    model= openai_embeddings_model,
    dimensions= openai_embeddings_dimensions,
)

In [None]:
azure_openai_embeddings.embed_query("test")

#  Configure Vector Search w/ LangChain

In [None]:
# similarity_algorithm = CosmosDBSimilarityType.COS
# kind = CosmosDBVectorSearchType.VECTOR_HNSW
# m=10
# ef_construction = 64
# ef_search = 40
# score_threshold = 0.7
# vectorstore.create_index(
#      0, openai_embeddings_dimensions, similarity_algorithm, kind, m, ef_construction)

In [None]:
cdb = AzureCosmosDBVectorSearch(
    collection= cosmos_collection,
    embedding=azure_openai_embeddings)

vectorstore = cdb.from_connection_string(
    connection_string=cosmos_conn,
    namespace = cosmos_database + "." + cosmos_collection,
    embedding = azure_openai_embeddings,
    embedding_key = cosmos_vector_property,
    text_key = "overview")

In [None]:
vectorstore.similarity_search_with_score("Buzz Lightyear", k=5, score_threshold=0.2)

# Setup RAG, Semantic Caching, and History with your LLM

In [None]:
prompt_template = """
system_prompt = '''
You are an intelligent assistant for movies. You are designed to provide helpful answers to user questions about movies in your database.
You are friendly, helpful, and informative and can be lighthearted. Be concise in your responses, but still friendly.
    - Only answer questions related to the information provided below. Provide at least 3 candidate movie answers in a list.
    - Write two lines of whitespace between each answer in the list.

You can use this context
'''

{context},

or this chat history

{chat_history},

to answer this question. 

Question: {question}
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
"""
chatbot_prompt = PromptTemplate(
    template = prompt_template, input_variables = ["context", "question", "chat_history"])

In [None]:
def prepare_chain():
    llm = AzureChatOpenAI(
            azure_endpoint = openai_endpoint,
            api_key = openai_key,
            api_version = openai_api_version,
            azure_deployment = "completions", 
            cache = True,
            n = 1)

    retriever = vectorstore.as_retriever(
    search_type = "similarity",
    search_kwargs = {"k": 5, 'score_threshold': 0.2})

    sem_qa = ConversationalRetrievalChain.from_llm(
    llm = llm,
    chain_type = "stuff",
    retriever = retriever,
    return_source_documents = True,
    combine_docs_chain_kwargs = {"prompt": chatbot_prompt})

    similarity_algorithm = CosmosDBSimilarityType.COS
    kind = CosmosDBVectorSearchType.VECTOR_IVF
    num_lists = 1
    score_threshold = 0.99

    sem_cache = AzureCosmosDBSemanticCache(
            cosmosdb_connection_string = cosmos_conn,
            cosmosdb_client = None,
            embedding = azure_openai_embeddings,
            database_name = cosmos_database, 
            collection_name = cosmos_semcache, 
            similarity = similarity_algorithm,
            num_lists = num_lists,
            kind = kind,
            dimensions = openai_embeddings_dimensions, 
            score_threshold = score_threshold)

    set_llm_cache(sem_cache)

    return retriever, llm, sem_qa, sem_cache 

In [None]:
retriever, llm, chain, semantic_cache = prepare_chain()

In [None]:
# Clearing Semantic Cache inbetween testing
cache.drop_indexes()
database.drop_collection(cache)

In [None]:
# test
chain.invoke({'question': "Tell me about movies with Buzz Lightyear.", 'chat_history': []})

# Gradio / UI integration

In [None]:
chatbot_chain_retriever, chatbot_chain_llm, chatbot_chain, chatbot_chain_cache = prepare_chain()

# Clearing Semantic Cache inbetween testing
cache.drop_indexes()
database.drop_collection(cache)
database.drop_collection(chathistory)

cosmos_message_history = MongoDBChatMessageHistory(
    session_id = "test_session",
    connection_string = cosmos_conn,
    database_name = cosmos_database,
    collection_name = cosmos_chat_history)

conversational_memory = ConversationBufferMemory(
    chat_memory=cosmos_message_history,
    memory_key='chat_history',
    return_messages=True)

# Load history locally. Grab last 
hist = []
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("Clear")

    def user(user_message, chat_history):
        # Get response from QA chain
        response = chatbot_chain.invoke({"question": user_message, "chat_history":conversational_memory.buffer_as_messages[-6:]},temperature=0.2)
        # Append user message and response to chat history
        hist.append(["User: "+user_message, "Chatbot: "+response['answer']])
        cosmos_message_history.add_user_message(user_message)
        cosmos_message_history.add_ai_message(response['answer'])
        return gr.update(value=""), hist

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False)
    clear.click(lambda: None, None, chatbot, queue=False)


In [None]:
demo.launch(debug=True)

In [None]:
demo.close()