# Semantic Search with Azure CosmosDB for MongoDB (vCore)

This notebook illustrates the Native Vector Search.  Azure CosmosDB for MongoDB (vCore) Seamlessly integrate your AI-based applications with your data that's stored in Azure Cosmos DB for MongoDB vCore. This integration is an all-in-one solution, unlike other vector search solutions that send your data between service integrations.

In [1]:
# Import Python libraries
import os
import openai
from Utilities.envVars import *

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"{OpenAiEndPoint}"
openai.api_base = openAiEndPoint

In [2]:
# Import required libraries
from langchain.llms.openai import AzureOpenAI, OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import (
    PDFMinerLoader,
    UnstructuredFileLoader,
    PyPDFLoader, 
    DirectoryLoader
)

In [3]:
# Flexibility to change the call to OpenAI or Azure OpenAI
embeddingModelType = "azureopenai"

In [4]:
# For single pdf 
# fileName = "employee_handbook.pdf"
# pdfFilePath = "Data/Contoso/" + fileName
# Load the PDF with Document Loader available from Langchain
# loader = PDFMinerLoader(pdfFilePath)

# For all files in the directory
loader = DirectoryLoader('Data/Contoso', glob="*.pdf", loader_cls=PyPDFLoader)

rawDocs = loader.load()
textSplitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
documents = textSplitter.split_documents(rawDocs)

In [5]:
documents
print(len(documents))

515


In [6]:
from langchain.embeddings import AzureOpenAIEmbeddings

openai_embeddings = AzureOpenAIEmbeddings(
    openai_api_version=openai.api_version,
    openai_api_key=openai.api_key,
    azure_deployment = "embedding",
    azure_endpoint=openai.api_base)

In [7]:
# Create a vector index 
from pymongo import MongoClient

from langchain.vectorstores.azure_cosmos_db import AzureCosmosDBVectorSearch,CosmosDBSimilarityType

CONNECTION_STRING = "mongodb+srv://<username>:<password>@rpopenai.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
INDEX_NAME = "contoso-employee-handbook-index1"
NAMESPACE = "contoso-employee-db1.contoso-employee-collection1"
DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")

In [8]:
client: MongoClient = MongoClient(CONNECTION_STRING)

In [9]:
collection = client[DB_NAME][COLLECTION_NAME]

In [10]:
print(collection)

Collection(Database(MongoClient(host=['c.rpopenai.mongocluster.cosmos.azure.com:10260'], document_class=dict, tz_aware=False, connect=True, tls=True, authmechanism='SCRAM-SHA-256', retrywrites=False, maxidletimems=120000), 'contoso-employee-db1'), 'contoso-employee-collection1')


In [11]:
vectorstore = AzureCosmosDBVectorSearch.from_documents(
    documents,
    openai_embeddings,
    collection=collection,
    index_name=INDEX_NAME,
)

In [12]:
num_lists = 100
dimensions = 1536
similarity_algorithm = CosmosDBSimilarityType.COS

vectorstore.create_index(num_lists, dimensions, similarity_algorithm)

{'raw': {'defaultShard': {'numIndexesBefore': 2,
   'numIndexesAfter': 2,
   'createdCollectionAutomatically': False,
   'note': 'all indexes already exist',
   'ok': 1}},
 'ok': 1}

In [13]:
from langchain.chains import RetrievalQA
from langchain.memory import RedisChatMessageHistory,MongoDBChatMessageHistory
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI,AzureChatOpenAI
from langchain.agents import ZeroShotAgent, Tool, AgentExecutor,initialize_agent

In [14]:
llm = AzureChatOpenAI(    
    openai_api_version=openai.api_version,
    openai_api_key=openai.api_key,
    azure_deployment = "chat",
    azure_endpoint=openai.api_base)

In [15]:
vector_store = AzureCosmosDBVectorSearch.from_connection_string(
    connection_string=CONNECTION_STRING,
    embedding=openai_embeddings,
    namespace=NAMESPACE,
)

In [16]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever(),
    verbose=True
)

In [17]:
# You can create the tool to pass to an agent
tools = [
    Tool("Contoso Electronics Handbook and benefits Search",
    description = "use this tool to search for information related to contoso electronics employee benefits and handbook",
     func=qa.run)
]
sid ="your_session_id"
message_history = MongoDBChatMessageHistory(connection_string=f"{CONNECTION_STRING}", database_name="chatmemorydb", collection_name="chatemorycoll",session_id=sid)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, chat_memory=message_history)

In [18]:
prefix = """Have a conversation with a human, answering the following questions based on LLM knowledge , current events or Contoso Electronics Benefit Knowledge Base"""
suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools=tools,
    prefix=prefix,
    suffix=suffix,
    input_variables=["input", "chat_history", "agent_scratchpad"],
)

In [19]:
agent = initialize_agent(
    agent='chat-conversational-react-description',
    tools=tools,
    llm=llm,
    verbose=True,
    prompt=prompt,
    memory=memory)

agent.run(input="what are difference between stand and plus plans offered to contoso employees?")
# agent.run(input="what are the company values for Contoso Electronics?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Contoso Electronics Handbook and benefits Search",
    "action_input": "difference between stand and plus plans"
}[0m

[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m

Observation: [36;1m[1;3mThe main difference between the Northwind Standard and Northwind Health Plus plans is the level of coverage they offer. 

Northwind Health Plus is a comprehensive plan that provides coverage for medical, vision, and dental services. It also includes prescription drug coverage, mental health and substance abuse coverage, and coverage for preventive care services. This plan allows you to choose from a variety of in-network providers, including primary care physicians, specialists, hospitals, and pharmacies. It also offers coverage for emergency services, both in-network and out-of-network.

On the other hand, Northwind Standard is a basic plan that provides coverage for medical, vision, and dental

'The main difference between the Northwind Standard and Northwind Health Plus plans offered to Contoso employees is that Northwind Health Plus is a comprehensive plan that provides coverage for medical, vision, and dental services, prescription drugs, mental health and substance abuse services, and coverage for emergency services both in-network and out-of-network. On the other hand, Northwind Standard is a basic plan that provides coverage for medical, vision, and dental services, prescription drugs, and coverage for preventive care services, but does not include coverage for mental health and substance abuse services or emergency services outside of the network.'