# Introduction

In this tutorial, we'll demonstrate how to leverage a sample dataset stored in Azure Cosmos DB for MongoDB vCore to ground OpenAI models. We'll do this taking advantage of Azure Cosmos DB for Mongo DB vCore's [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) functionality. In the end, we'll create an interatice chat session with the GPT-3.5 completions model to answer questions about Azure services informed by our dataset. This process is known as Retrieval Augmented Generation, or RAG.

This tutorial borrows some code snippets and example data from the Azure Cognitive Search Vector Search demo 

# Preliminaries <a class="anchor" id="preliminaries"></a>
First, let's start by installing the packages that we'll need later. 

In [None]:
! pip install numpy
! pip install openai
! pip install pymongo
! pip install python-dotenv
! pip install azure-core
! pip install azure-cosmos
! pip install azure-storage-blob
! pip install json
! pip install ijson
! pip install tenacity

In [3]:
import json
import datetime
import time

from azure.core.exceptions import AzureError
from azure.core.credentials import AzureKeyCredential
import pymongo

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, PublicAccess
import ijson

from openai import AzureOpenAI
from dotenv import load_dotenv
from tenacity import retry, wait_random_exponential, stop_after_attempt

Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file.
Make sure to modify the env_name accordingly. 

In [4]:
from dotenv import dotenv_values

# specify the name of the .env file name 
env_name = "fabcondemo.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)


mongo_conn = config['mongo_connection_string']
mongo_database = config['mongo_database_name']
mongo_collection = config['mongo_collection_name']


storage_account_url = config['storage_account_url']
storage_container_name = config['storage_container_name']
storage_file_name = config['storage_file_name']


openai_endpoint = config['openai_endpoint']
openai_key = config['openai_key']
openai_version = config['openai_version']
openai_embeddings_deployment = config['openai_embeddings_deployment']
openai_embeddings_model = config['openai_embeddings_model']
openai_embeddings_dimensions = int(config['openai_embeddings_dimensions'])
openai_completions_deployment = config['openai_completions_deployment']
openai_completions_model = config['openai_completions_model']


# Connect and setup Cosmos DB for MongoDB vCore

In [5]:
mongo_client = pymongo.MongoClient(mongo_conn)

##  Set up the DB and collection

In [None]:
## Use only if re-reunning code and want to reset db and collection
db = mongo_client[mongo_database]
collection = db[mongo_collection]
collection.drop_index("VectorSearchIndex")
mongo_client.drop_database(mongo_database)

In [6]:
# Create the database FabConfDB
db = mongo_client[mongo_database]

collection = db[mongo_collection]

# Create collection if it doesn't exist
if mongo_collection not in db.list_collection_names():
    # Creates a collection
    db.create_collection(mongo_collection)
    print("Created collection '{}'.\n".format(mongo_collection))
else:
    print("Using collection: '{}'.\n".format(mongo_collection))

Using collection: 'FabConfCollection'.



## Create the vector index

**IMPORTANT: You can only create one index per vector property.** That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.

### HNSW (preview)

HNSW stands for Hierarchical Navigable Small World, a graph-based data structure that partitions vectors into clusters and subclusters. With HNSW, you can perform fast approximate nearest neighbor search at higher speeds with greater accuracy. HNSW is an approximate (ANN) method. As a preview feature, this must be enabled using Azure Feature Enablement Control (AFEC) by selecting the "mongoHnswIndex" feature. For more information, see [enable preview features](https://learn.microsoft.com/azure/azure-resource-manager/management/preview-features).

HNSW works on M50 cluster tiers and higher while in preview.

In [None]:
db.command(
{ 
    "createIndexes": mongo_collection,
    "indexes": [
        {
            "name": "VectorSearchIndex",
            "key": {
                "contentVector": "cosmosSearch"
            },
            "cosmosSearchOptions": { 
                "kind": "vector-hnsw", 
                "m": 16, # default value 
                "efConstruction": 64, # default value 
                "similarity": "COS", 
                "dimensions": openai_embeddings_dimensions
            } 
        } 
    ] 
}
)

# Initialize the Azure OpenAI Client <a class="anchor" id="initopenai"></a>


In [7]:
openai_client = AzureOpenAI(azure_endpoint=openai_endpoint, api_key=openai_key, api_version=openai_version)

# Define a function to generate embeddings

In [8]:
#@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(10))
def generate_embeddings(text):
    '''
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    '''
    # OpenAI asks for a model but it's actually a deployment.
    response = openai_client.embeddings.create(input = text, model = openai_embeddings_deployment, dimensions= openai_embeddings_dimensions)
    embeddings = response.model_dump()
    # print(embeddings["data"])#['data'][0]['embedding'])
    #time.sleep(0.5) # rest period to avoid rate limiting on AOAI for free tier
    return embeddings['data'][0]['embedding']

# Ingest, vectorize & store

Read the data out of blob storage, generate vectors on it, then store in Mongo vCore

In [32]:
# Create a blob service client
blob_service_client = BlobServiceClient(account_url = storage_account_url)

# Create a blob client for the json file
blob_client = blob_service_client.get_blob_client( storage_container_name, storage_file_name)

# Download the blob to a stream
stream = blob_client.download_blob().readall()

# Use ijson to parse the json file incrementally
objects = ijson.items(stream, 'item')

for obj in objects:
    # serialize the object to a string
    sObject = json.dumps(obj)
    # generate an embedding for each object
    vectorArray = generate_embeddings(sObject)

    # add the embedding to the object
    obj["contentVector"] = vectorArray

    #print(obj)

    # insert the object into the collection
    collection.insert_one(obj)

print("Data inserted into collection: '{}'.\n".format(mongo_collection))

Data inserted into collection: 'FabConfCollection'.



# Vector Search in Cosmos DB for MongoDB vCore

In [9]:
# Simple function to assist with vector search
def vector_search(query, num_results=3):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results #, "efsearch": 40 # optional for HNSW only 
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]
    results = collection.aggregate(pipeline)
    return results

Let's run a test query below.

In [None]:
query = "What are the services for running ML models?"
results = vector_search(query)
for result in results: 
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

# Q&A over the data with GPT

Finally, we'll create a helper function to feed prompts into the `Completions` model. Then we'll create interactive loop where you can pose questions to the model and receive information grounded in your data.

In [1]:
#This function helps to ground the model with prompts and system instructions.

def generate_completion(prompt):
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - Only provide answers that have products that are part of Microsoft Azure.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."
    '''

    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ]

    #for item in results:
    #    messages.append({"role": "system", "content": prompt['content']})

    response = openai_client.chat.completions.create(model = openai_completions_deployment, messages=messages)
    #print(response.dict())
    return response.model_dump()

In [None]:
# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""
print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")
user_input = input("Prompt: ")
while user_input.lower() != "end":
    results_for_prompt = vector_search(user_input)
   # print(f"User Prompt: {user_input}")
    completions_results = generate_completion(results_for_prompt)
    print("\n")
    print(completions_results['choices'][0]['message']['content'])
    user_input = input("Prompt: ")
