# Introduction

In this tutorial, we'll demonstrate how to leverage a sample dataset stored in Azure Cosmos DB for MongoDB vCore to ground OpenAI models. We'll do this taking advantage of Azure Cosmos DB for Mongo DB vCore's [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) functionality. In the end, we'll create an interatice chat session with the GPT-3.5 completions model to answer questions about Azure services informed by our dataset. This process is known as Retrieval Augmented Generation, or RAG.

This tutorial borrows some code snippets and example data from the Azure Cognitive Search Vector Search demo 

# Preliminaries <a class="anchor" id="preliminaries"></a>
First, let's start by installing the packages that we'll need later. 

In [None]:
! pip install numpy
! pip install openai
! pip install pymongo
! pip install python-dotenv
! pip install azure-core
! pip install azure-cosmos

In [1]:
import json
import datetime
import time
import urllib 

from azure.core.exceptions import AzureError
from azure.core.credentials import AzureKeyCredential
import pymongo

from openai import AzureOpenAI
from dotenv import load_dotenv

Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file.
Make sure to modify the env_name accordingly. 

In [2]:
from dotenv import dotenv_values

# specify the name of the .env file name 
env_name = "example.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)

COSMOS_MONGO_USER = config['cosmos_db_mongo_user']
COSMOS_MONGO_PWD = config['cosmos_db_mongo_pwd']
COSMOS_MONGO_SERVER = config['cosmos_db_mongo_server']

AOAI_client = AzureOpenAI(api_key=config['openai_api_key'], azure_endpoint=config['openai_api_endpoint'], api_version=config['openai_api_version'],)

## Create an Azure Cosmos DB for MongoDB vCore resource<a class="anchor" id="cosmosdb"></a>
Let's start by creating an Azure Cosmos DB for MongoDB vCore Resource following this quick start guide: https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/quickstart-portal

Then copy the connection details (server, user, pwd) into the config.json file.

## Azure OpenAI <a class="anchor" id="azureopenai"></a>

Finally, let's setup our Azure OpenAI resource Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps:

- Create an Azure OpenAI resource following this quickstart: https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal
- Deploy a `completions` and `embeddings` model 
    - For more information on `completions`, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/completions
    - For more information on `embeddings`, go here: https://learn.microsoft.com/azure/ai-services/openai/how-to/embeddings
- Copy the endpoint, key, deployment names for (embeddings model, completions model) into the config.json file.

# Load data and create embeddings <a class="anchor" id="loaddata"></a>
Here we load a sample dataset containing descriptions of Azure services, then we use Azure OpenAI to create vector embeddings from this data.

In [3]:
# Load text-sample_w_embeddings.json which has embeddings pre-computed
data_file = open(file="../../DataSet/AzureServices/text-sample_w_embeddings.json", mode="r") 

# OR Load text-sample.json data file. Embeddings will need to be generated using the function below.
# data_file = open(file="../../DataSet/AzureServices/text-sample.json", mode="r")

data = json.load(data_file)
data_file.close()

In [4]:
# Take a peek at one data item
print(json.dumps(data[0], indent=2))

{
  "id": "1",
  "title": "Azure App Service",
  "content": "Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful APIs. It supports a variety of programming languages and frameworks, such as .NET, Java, Node.js, Python, and PHP. The service offers built-in auto-scaling and load balancing capabilities. It also provides integration with other Azure services, such as Azure DevOps, GitHub, and Bitbucket.",
  "category": "Web",
  "titleVector": [
    -0.010636103339493275,
    -0.021644677966833115,
    0.0019778874702751637,
    -0.014540146104991436,
    -0.021975763142108917,
    0.011774207465350628,
    -0.026569565758109093,
    -0.008615105412900448,
    0.013195114210247993,
    -0.025355586782097816,
    0.014636713080108166,
    -0.01378141064196825,
    0.005511184688657522,
    0.0016166255809366703,
    -0.023838115856051445,
    0.014595326967537403,
    0.014305627904832363,
    0.

In [5]:

from langchain_community.embeddings import OllamaEmbeddings


In [8]:
ollama_emb = OllamaEmbeddings(
    model="llama3",
)
r2 = ollama_emb.embed_query(
    "What is the second letter of Greek alphabet"
)
len(r2)

4096

In [9]:
def generate_embeddings(text):
    '''
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    '''
    # response = AOAI_client.embeddings.create(input=text, model=config['openai_embeddings_deployment'])
    # embeddings =response.model_dump()
    # time.sleep(0.5) 
    ollama_emb = OllamaEmbeddings(
    model="llama3"
    )
    r2 = ollama_emb.embed_query(text)
    return r2[:1536]


In [10]:
# Generate embeddings for title and content fields
n = 0
for item in data:
    n+=1
    title = item['title']
    content = item['content']
    title_embeddings = generate_embeddings(title)
    content_embeddings = generate_embeddings(content)
    item['titleVector'] = title_embeddings
    item['contentVector'] = content_embeddings
    item['@search.action'] = 'upload'
    print("Creating embeddings for item:", n, "/" ,len(data), end='\r')
# Save embeddings to sample_text_w_embeddings.json file
with open("../../DataSet/AzureServices/text-sample_w_embeddings.json", "w") as f:
    json.dump(data, f)

Creating embeddings for item: 108 / 108

In [None]:
# Take a peek at one data item with embeddings created
print(json.dumps(data[0], indent=2))

# Connect and setup Cosmos DB for MongoDB vCore

## Set up the connection

In [11]:
mongo_conn = "mongodb+srv://"+urllib.parse.quote(COSMOS_MONGO_USER)+":"+urllib.parse.quote(COSMOS_MONGO_PWD)+"@"+COSMOS_MONGO_SERVER+"?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
mongo_client = pymongo.MongoClient(mongo_conn)

  mongo_client = pymongo.MongoClient(mongo_conn)


##  Set up the DB and collection

In [12]:
# create a database called TutorialDB
db = mongo_client['ExampleDB']

# Create collection if it doesn't exist
COLLECTION_NAME = "ExampleCollection"

collection = db[COLLECTION_NAME]

if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    db.create_collection(COLLECTION_NAME)
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
else:
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

Using collection: 'ExampleCollection'.



In [None]:
# Use only if re-reunning code and want to reset db and collection
collection.drop_indexes()
mongo_client.drop_database("ExampleDB")

## Create the vector index

**IMPORTANT: You can only create one index per vector property.** That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.

### IVF index
IVF is an approximate nerarest neighbors (ANN) approach that uses clustering to speed up the search for similar vectors in a dataset. It's a good choice for proof-of-concepts and smaller datasets (under a few thousand documents). However it's not recommended to use at scale or when higher throughput is needed.

IVF is supported on all cluster tiers, including the free tier. 

In [None]:
db.command({
  'createIndexes': 'ExampleCollection',
  'indexes': [
    {
      'name': 'VectorSearchIndex',
      'key': {
        "contentVector": "cosmosSearch"
      },
      'cosmosSearchOptions': {
        'kind': 'vector-ivf',
        'numLists': 1,
        'similarity': 'COS',
        'dimensions': 1536
      }
    }
  ]
})

### HNSW Index

HNSW stands for Hierarchical Navigable Small World, a graph-based index that partitions vectors into clusters and subclusters. With HNSW, you can perform fast approximate nearest neighbor search at higher speeds with greater accuracy.  HNSW is now available on M40 and higher cluster tiers.

In [None]:
db.command({ 
    "createIndexes": "ExampleCollection",
    "indexes": [
        {
            "name": "VectorSearchIndex",
            "key": {
                "contentVector": "cosmosSearch"
            },
            "cosmosSearchOptions": { 
                "kind": "vector-hnsw", 
                "m": 16, # default value 
                "efConstruction": 64, # default value 
                "similarity": "COS", 
                "dimensions": 1536
            } 
        } 
    ] 
}
)

## Upload data to the collection
A simple `insert_many()` to insert our data in JSON format into the newly created DB and collection.

In [13]:
collection.insert_many(data)

InsertManyResult([ObjectId('664b83ddab8c9066e10d8e17'), ObjectId('664b83ddab8c9066e10d8e18'), ObjectId('664b83ddab8c9066e10d8e19'), ObjectId('664b83ddab8c9066e10d8e1a'), ObjectId('664b83ddab8c9066e10d8e1b'), ObjectId('664b83ddab8c9066e10d8e1c'), ObjectId('664b83ddab8c9066e10d8e1d'), ObjectId('664b83ddab8c9066e10d8e1e'), ObjectId('664b83ddab8c9066e10d8e1f'), ObjectId('664b83ddab8c9066e10d8e20'), ObjectId('664b83ddab8c9066e10d8e21'), ObjectId('664b83ddab8c9066e10d8e22'), ObjectId('664b83ddab8c9066e10d8e23'), ObjectId('664b83ddab8c9066e10d8e24'), ObjectId('664b83ddab8c9066e10d8e25'), ObjectId('664b83ddab8c9066e10d8e26'), ObjectId('664b83ddab8c9066e10d8e27'), ObjectId('664b83ddab8c9066e10d8e28'), ObjectId('664b83ddab8c9066e10d8e29'), ObjectId('664b83ddab8c9066e10d8e2a'), ObjectId('664b83ddab8c9066e10d8e2b'), ObjectId('664b83ddab8c9066e10d8e2c'), ObjectId('664b83ddab8c9066e10d8e2d'), ObjectId('664b83ddab8c9066e10d8e2e'), ObjectId('664b83ddab8c9066e10d8e2f'), ObjectId('664b83ddab8c9066e10d8e

# Vector Search in Cosmos DB for MongoDB vCore

In [14]:
# Simple function to assist with vector search
def vector_search(query, num_results=5):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results#, #, "efsearch": 40 # optional for HNSW only 
                    #"filter": {"title": {"$ne": "Azure Cosmos DB"}}
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]
    results = collection.aggregate(pipeline)
    return results

Let's run a test query below.

In [15]:
query = "What are some NoSQL databases in Azure?"#"What are the services for running ML models?"
results = vector_search(query)
for result in results: 
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

Similarity Score: 0.7185315020145158
Title: Azure Table Storage
Content: Azure Table Storage is a fully managed, NoSQL datastore that enables you to store and query large amounts of structured, non-relational data. It provides features like automatic scaling, schema-less design, and a RESTful API. Table Storage supports various data types, such as strings, numbers, and booleans. You can use Azure Table Storage to store and manage your data, build scalable applications, and reduce the cost of your storage. It also integrates with other Azure services, such as Azure Functions and Azure Cosmos DB.
Category: Storage

Similarity Score: 0.7164377865219214
Title: Azure Cosmos DB
Content: Azure Cosmos DB is a globally distributed, multi-model database service that enables you to build and manage NoSQL applications in Azure. It provides features like automatic scaling, low-latency access, and multi-master replication. Cosmos DB supports various data models, such as key-value, document, graph, a

## Filtered vector search (Preview)
You can add additional query filters to your vector search by creating a filtered index and specifying it in the search pipeline.

**Note:** filtered vector search preview and needs to be enabled via Azure Preview Features for your subscription. Search for the preview feature "filtering on vector search". Learn more about it here: https://learn.microsoft.com/azure/azure-resource-manager/management/preview-features?tabs=azure-portal

In [None]:
# Add a filter index
db.command( {
    "createIndexes": "ExampleCollection",
    "indexes": [ {
        "key": { 
            "title": 1 
               }, 
        "name": "title_filter" 
    }
    ] 
} 
)

In [None]:
# Verify all indexes are present
for i in collection.list_indexes():
    print(i)

In [None]:
# Simple function to assist with vector search
def filtered_vector_search(query, num_results=5):
    query_embedding = generate_embeddings(query)
    embeddings_list = []
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results,
                    "filter": {"title": {"$nin": ["Azure SQL Database", "Azure Database for MySQL"]}}
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]
    results = collection.aggregate(pipeline)
    return results

In [None]:
query = "What are some NoSQL databases in Azure?"#"What are the services for running ML models?"
results = filtered_vector_search(query)
for result in results: 
#     print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

# Q&A over the data with GPT-3.5

Finally, we'll create a helper function to feed prompts into the `Completions` model. Then we'll create interactive loop where you can pose questions to the model and receive information grounded in your data.

In [None]:
import json
import requests

In [None]:
def chat(messages):
    r = requests.post(
        "http://localhost:11434/api/chat",
        json={"model": "llama3", "messages": messages, "stream": True},
    )
    r.raise_for_status()
    output = ""

    for line in r.iter_lines():
        body = json.loads(line)
        if "error" in body:
            raise Exception(body["error"])
        if body.get("done") is False:
            message = body.get("message", "")
            content = message.get("content", "")
            output += content
            # the response streams one token at a time, print that as we receive it
            print(content, end="", flush=True)

        if body.get("done", False):
            message["content"] = output
            return message

In [None]:
#This function helps to ground the model with prompts and system instructions.

def generate_completion(vector_search_results, user_prompt):
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide at least 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."
        - Only provide answers that have products that are part of Microsoft Azure and part of these following prompts.
    '''

    messages=[{"role": "system", "content": system_prompt}]
    for item in vector_search_results:
        messages.append({"role": "system", "content": item['document']['content']})
    messages.append({"role": "user", "content": user_prompt})
    # print(messages)
    r = chat(messages)
    # response = AOAI_client.chat.completions.create(model=config['openai_completions_deployment'], messages=messages,temperature=0)
    # print(r.json())
    return r

In [None]:
# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""
print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")
user_input = input("User prompt: ")
while user_input.lower() != "end":
    search_results = vector_search(user_input)
    print(search_results)
    completions_results = generate_completion(search_results, user_input)
    print("\n")
    # print(completions_results['choices'][0]['message']['content'])
    user_input = input("User prompt: ")

In [18]:
user_input = ""
print("*** Please ask your vector search n.\n")
user_input = input("User prompt: ")
while user_input.lower() != "end":
    print(user_input)
    results = vector_search(user_input)
    for result in results: 
        print(f"Similarity Score: {result['similarityScore']}")  
        print(f"Title: {result['document']['title']}")  
        print(f"Content: {result['document']['content']}")  
        print(f"Category: {result['document']['category']}\n")  
    user_input = input("User prompt: ")


*** Please ask your vector search n.

tell me the best service to deploy apache storm application
Similarity Score: 0.5362366021641527
Title: Azure App Configuration
Content: Azure App Configuration is a fully managed configuration service that enables you to centrally manage and distribute your application settings and feature flags. It provides features like key-value storage, versioning, and access control. App Configuration supports various platforms, such as .NET, Java, Node.js, and Python. You can use Azure App Configuration to build and deploy your applications, ensure the consistency of your settings, and improve your application lifecycle. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Similarity Score: 0.5314138049898438
Title: Azure App Service
Content: Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can host web apps, mobile app backends, and RESTful API

KeyboardInterrupt: Interrupted by user