# How to use Azure Cosmos DB for Mongo DB v-Core as vector database for RAG (retrieval augmented generation ) Part 2


we'll demonstrate how to leverage a sample dataset stored in Azure Cosmos DB for MongoDB vCore to ground OpenAI models. We'll do this taking advantage of Azure Cosmos DB for Mongo DB vCore's [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) functionality. In the end, we'll create an interatice chat session with the GPT-3.5 completions model to answer questions about Azure services informed by our dataset. This process is known as Retrieval Augmented Generation, or RAG.




# Preliminaries <a class="anchor" id="preliminaries"></a>
First, let's start by installing the packages that we'll need later. 

In [None]:
! pip install numpy
! pip install openai 
! pip install pymongo
! pip install python-dotenv
! pip install azure-core
! pip install azure-cosmos
! pip install tenacity

In [5]:
import json
import datetime
import time

from azure.core.exceptions import AzureError
from azure.core.credentials import AzureKeyCredential
import pymongo

import openai
from dotenv import load_dotenv
from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import AzureOpenAI


Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file.
Make sure to modify the env_name accordingly. 

In [6]:
print(openai.VERSION)

1.13.3


In [7]:
from dotenv import dotenv_values

# specify the name of the .env file name 
env_name = "example.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)

cosmosdb_endpoint = config['cosmos_db_api_endpoint']
cosmosdb_key = config['cosmos_db_api_key']
cosmosdb_connection_str = config['cosmos_db_connection_string']

COSMOS_MONGO_USER = config['cosmos_db_mongo_user']
COSMOS_MONGO_PWD = config['cosmos_db_mongo_pwd']
COSMOS_MONGO_SERVER = config['cosmos_db_mongo_server']

openai.api_type = config['openai_api_type']
openai.api_key = config['openai_api_key']
openai.api_base = config['openai_api_endpoint']
openai.api_version = config['openai_api_version']
embeddings_deployment = config['openai_embeddings_deployment']
completions_deployment = config['openai_completions_deployment']

## Create an Azure Cosmos DB for MongoDB vCore resource<a class="anchor" id="cosmosdb"></a>
Let's start by creating an Azure Cosmos DB for MongoDB vCore Resource following this quick start guide: https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal

Then copy the connection details (server, user, pwd) into the config.json file.

## Azure OpenAI <a class="anchor" id="azureopenai"></a>

Finally, let's setup our Azure OpenAI resource Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps:

- Create an Azure OpenAI resource following this quickstart: https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal
- Deploy a `completions` and `embeddings` model 
    - For more information on `completions`, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/completions
    - For more information on `embeddings`, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/embeddings
- Copy the endpoint, key, deployment names for (embeddings model, completions model) into the config.json file.

# Load data and create embeddings <a class="anchor" id="loaddata"></a>
Here we load a sample dataset containing descriptions of Azure services, then we use Azure OpenAI to create vector embeddings from this data.

In [8]:
# Load text-sample.json data file. Embeddings will need to be generated using the function below.
#data_file = open(file="../../DataSet/AzureServices/text-sample.json", mode="r")

# OR Load text-sample_w_embeddings.json which has embeddings pre-computed
data_file = open(file="./text-sample_w_embeddings.json", mode="r") 
data = json.load(data_file)
data_file.close()

In [9]:
# Take a peek at one data item
print(json.dumps(data[0], indent=2))

{
  "id": "1",
  "title": "Azure App Service",
  "content": "Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful APIs. It supports a variety of programming languages and frameworks, such as .NET, Java, Node.js, Python, and PHP. The service offers built-in auto-scaling and load balancing capabilities. It also provides integration with other Azure services, such as Azure DevOps, GitHub, and Bitbucket.",
  "category": "Web",
  "titleVector": [
    -0.0017071267357096076,
    -0.01391641329973936,
    0.0017036213539540768,
    -0.018410328775644302,
    -0.007154508493840694,
    0.01852250099182129,
    -0.00961179006844759,
    -0.0291648767888546,
    -0.00613093376159668,
    -0.014722653664648533,
    0.011820187792181969,
    0.007168530020862818,
    0.001400404842570424,
    -0.011764101684093475,
    -0.007151003461331129,
    0.01304707583039999,
    0.04686010256409645,
    0.000935

# Creating Open AI Client for Library 1.xx

In [None]:
client = AzureOpenAI(
  api_key =config['openai_api_key'],  
  api_version = "2023-05-15",
  azure_endpoint =config['openai_api_endpoint']
)

In [10]:
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(10))
def generate_embeddings(text):
    '''
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    '''
   
   
    embeddings = client.embeddings.create(input = "test", model="myembeddingmodel").data[0].embedding 
    time.sleep(0.2) # rest period to avoid rate limiting on AOAI for free tier
    return embeddings

In [12]:
# Generate embeddings for title and content fields
n = 0
for item in data:
    n+=1
    title = item['title']
    content = item['content']
    title_embeddings = generate_embeddings(title)
    content_embeddings = generate_embeddings(content)
    item['titleVector'] = title_embeddings
    item['contentVector'] = content_embeddings
    item['@search.action'] = 'upload'
    print("Creating embeddings for item:", n, "/" ,len(data), end='\r')
# Save embeddings to sample_text_w_embeddings.json file
with open("./text-sample_w_embeddings.json", "w") as f:
    json.dump(data, f)

Creating embeddings for item: 108 / 108

# Connect and setup Cosmos DB for MongoDB vCore

## Set up the connection

In [15]:
mongo_conn = "mongodb+srv://"+COSMOS_MONGO_USER+":"+COSMOS_MONGO_PWD+"@"+COSMOS_MONGO_SERVER+"?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
mongo_client = pymongo.MongoClient("mongodb+srv://evoadmin:tech=123456789@techdaysdemolive.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000")

##  Set up the DB and collection

In [16]:
# create a database called TutorialDB
db = mongo_client['ExampleDB']

# Create collection if it doesn't exist
COLLECTION_NAME = "ExampleCollection"

collection = db[COLLECTION_NAME]

if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    db.create_collection(COLLECTION_NAME)
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
else:
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

Using collection: 'ExampleCollection'.



In [None]:
## Use only if re-reunning code and want to reset db and collection
collection.drop_index("VectorSearchIndex")
mongo_client.drop_database("ExampleDB")

## Create the vector index

**IMPORTANT: You can only create one index per vector property.** That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.

### IVF
IVF is the default vector indexing algorithm, which works on all cluster tiers. It's an approximate nerarest neighbors (ANN) approach that uses clustering to speeding up the search for similar vectors in a dataset. 

In [17]:
db.command({
  'createIndexes': 'ExampleCollection',
  'indexes': [
    {
      'name': 'VectorSearchIndex',
      'key': {
        "contentVector": "cosmosSearch"
      },
      'cosmosSearchOptions': {
        'kind': 'vector-ivf',
        'numLists': 1,
        'similarity': 'COS',
        'dimensions': 1536
      }
    }
  ]
})

{'raw': {'defaultShard': {'numIndexesBefore': 2,
   'numIndexesAfter': 2,
   'createdCollectionAutomatically': False,
   'note': 'all indexes already exist',
   'ok': 1}},
 'ok': 1}

### HNSW (preview)

NSW is a graph-based data structure that organizes vectors into clusters and subclusters. It facilitates fast approximate nearest neighbor search, achieving higher speeds with improved accuracy. As a preview feature, you can enable HNSW using Azure Feature Enablement Control (AFEC) by selecting the “mongoHnswIndex” feature. For detailed instructions, refer to the enable preview features documentation.

Keep in mind that HNSW operates on M50 cluster tiers and higher while in preview. 🚀

In [None]:
db.command(
{ 
    "createIndexes": "ExampleCollection",
    "indexes": [
        {
            "name": "VectorSearchIndex",
            "key": {
                "contentVector": "cosmosSearch"
            },
            "cosmosSearchOptions": { 
                "kind": "vector-hnsw", 
                "m": 16, # default value 
                "efConstruction": 64, # default value 
                "similarity": "COS", 
                "dimensions": 1536
            } 
        } 
    ] 
}
)

## Upload data to the collection
A simple `insert_many()` to insert our data in JSON format into the newly created DB and collection.

In [19]:
collection.insert_many(data)

InsertManyResult([ObjectId('65e4601100f73962b871a23e'), ObjectId('65e4601100f73962b871a23f'), ObjectId('65e4601100f73962b871a240'), ObjectId('65e4601100f73962b871a241'), ObjectId('65e4601100f73962b871a242'), ObjectId('65e4601100f73962b871a243'), ObjectId('65e4601100f73962b871a244'), ObjectId('65e4601100f73962b871a245'), ObjectId('65e4601100f73962b871a246'), ObjectId('65e4601100f73962b871a247'), ObjectId('65e4601100f73962b871a248'), ObjectId('65e4601100f73962b871a249'), ObjectId('65e4601100f73962b871a24a'), ObjectId('65e4601100f73962b871a24b'), ObjectId('65e4601100f73962b871a24c'), ObjectId('65e4601100f73962b871a24d'), ObjectId('65e4601100f73962b871a24e'), ObjectId('65e4601100f73962b871a24f'), ObjectId('65e4601100f73962b871a250'), ObjectId('65e4601100f73962b871a251'), ObjectId('65e4601100f73962b871a252'), ObjectId('65e4601100f73962b871a253'), ObjectId('65e4601100f73962b871a254'), ObjectId('65e4601100f73962b871a255'), ObjectId('65e4601100f73962b871a256'), ObjectId('65e4601100f73962b871a2

# Vector Search in Cosmos DB for MongoDB vCore

In [20]:
# Simple function to assist with vector search
def vector_search(query, num_results=3):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results #, "efsearch": 40 # optional for HNSW only 
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]
    results = collection.aggregate(pipeline)
    return results

Let's run a test query below.

In [21]:
query = "What are the services for running ML models?"
results = vector_search(query)
for result in results: 
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

Similarity Score: 1.0
Title: Azure Functions
Content: Azure Functions is a serverless compute service that enables you to run code on-demand without having to manage infrastructure. It allows you to build and deploy event-driven applications that automatically scale with your workload. Functions support various languages, including C#, F#, Node.js, Python, and Java. It offers a variety of triggers and bindings to integrate with other Azure services and external services. You only pay for the compute time you consume.
Category: Compute

Similarity Score: 1.0
Title: Azure Cognitive Services
Content: Azure Cognitive Services are a set of AI services that enable you to build intelligent applications with powerful algorithms using just a few lines of code. These services cover a wide range of capabilities, including vision, speech, language, knowledge, and search. They are designed to be easy to use and integrate into your applications. Cognitive Services are fully managed, scalable, and co

# Q&A over the data with GPT-3.5

Finally, we'll create a helper function to feed prompts into the `Completions` model. Then we'll create interactive loop where you can pose questions to the model and receive information grounded in your data.

In [29]:
#This function helps to ground the model with prompts and system instructions.

def generate_completion(prompt):
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - Only provide answers that have products that are part of Microsoft Azure.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."
    '''

    listmessages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input},
    ]

    for item in results:
        listmessages.append({"role": "system", "content": prompt['content']})
    
            
    response = client.chat.completions.create(
    model="demo35", # model = "deployment_name".
    messages=listmessages
)
    
    return response.choices[0].message.content

In [30]:
generate_completion("Where i can host container in azure")

"I'm sorry, but I need some more information or a specific question in order to provide a helpful answer about Azure services."

In [32]:
# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""
print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")
user_input = input("Prompt: ")
while user_input.lower() != "end":
    results_for_prompt = vector_search(user_input)
   # print(f"User Prompt: {user_input}")
    completions_results = generate_completion(results_for_prompt)
    print("\n")
    print(completions_results)
    user_input = input("Prompt: ")


*** Please ask your model questions about Azure services. Type 'end' to end the session.



1. Azure Machine Learning: Azure Machine Learning is a cloud-based machine learning service that allows you to build, deploy, and manage machine learning models at scale. It provides a complete set of tools and services for data preparation, model training, and model deployment.

2. Azure Databricks: Azure Databricks is an Apache Spark-based analytics platform that helps you build and deploy large-scale machine learning models. It provides a collaborative environment for data engineers, data scientists, and machine learning engineers to work together on ML projects.

3. Azure Cognitive Services: Azure Cognitive Services are pre-built AI models that allow you to add intelligence to your applications without having to build and train your own models. They offer a wide range of services including computer vision, natural language processing, and speech recognition that can be used for running ML mo