# Introduction

In this tutorial, we'll demonstrate how to leverage a sample dataset stored in Azure Cosmos DB for MongoDB vCore to ground OpenAI models. We'll do this taking advantage of Azure Cosmos DB for Mongo DB vCore's [vector similarity search](https://learn.microsoft.com/azure/cosmos-db/mongodb/vcore/vector-search) functionality. In the end, we'll create an interatice chat session with the GPT-3.5 completions model to answer questions about Azure services informed by our dataset. This process is known as Retrieval Augmented Generation, or RAG.

This tutorial borrows some code snippets and example data from the Azure Cognitive Search Vector Search demo 

# Preliminaries <a class="anchor" id="preliminaries"></a>
First, let's start by installing the packages that we'll need later. 

In [None]:
! pip install openai
! pip install pymongo
! pip install python-dotenv
! pip install azure-storage-blob
! pip install json
! pip install ijson

In [1]:
import json
import ijson
from dotenv import dotenv_values
import pymongo
#from azure.storage.blob import BlobServiceClient
from openai import AzureOpenAI


Please use the example.env as a template to provide the necessary keys and endpoints in your own .env file.
Make sure to modify the env_name accordingly. 

In [2]:

# specify the name of the .env file name 
env_name = "fabcondemo.env" # following example.env template change to your own .env file name
config = dotenv_values(env_name)

mongo_conn = config['mongo_connection_string']
mongo_database = config['mongo_database_name']
mongo_collection = config['mongo_collection_name']
mongo_cache_collection = config['mongo_cache_collection_name']
mongo_client = pymongo.MongoClient(mongo_conn)

openai_endpoint = config['openai_endpoint']
openai_key = config['openai_key']
openai_version = config['openai_version']
openai_embeddings_deployment = config['openai_embeddings_deployment']
openai_embeddings_model = config['openai_embeddings_model']
openai_embeddings_dimensions = int(config['openai_embeddings_dimensions'])
openai_completions_deployment = config['openai_completions_deployment']
openai_completions_model = config['openai_completions_model']
openai_client = AzureOpenAI(azure_endpoint=openai_endpoint, api_key=openai_key, api_version=openai_version)


#  Set up the MongoDB vCore database and collection

In [3]:
# Get the database FabConfDB
db = mongo_client[mongo_database]

# Get the collection FabConfCollection
collection = db[mongo_collection]
# Get the collection CacheCollection
cache = db[mongo_cache_collection]


# Define a function to generate embeddings

This is used to vectorize the user input for the vector search

In [4]:
#@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(10))
def generate_embeddings(text):
    '''
    Generate embeddings from string of text.
    This will be used to vectorize data and user input for interactions with Azure OpenAI.
    '''
    # OpenAI asks for a model but it's actually a deployment.
    response = openai_client.embeddings.create(input = text, model = openai_embeddings_deployment, dimensions= openai_embeddings_dimensions)

    embeddings = response.model_dump()
    return embeddings['data'][0]['embedding']

# Vector Search in Cosmos DB for MongoDB vCore

This defines a function for performing a vector search over data in Azure Cosmos DB for MongoDB vCore

In [16]:
# Simple function to assist with vector search
def vector_search(query, num_results=3):

    query_embedding = query # generate_embeddings(query)
        
    pipeline = [
        {
            '$search': {
                "cosmosSearch": {
                    "vector": query_embedding,
                    "path": "contentVector",
                    "k": num_results #, "efsearch": 40 # optional for HNSW only 
                },
                "returnStoredSource": True }},
        {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
    ]

    results = collection.aggregate(pipeline)
    
    return results

Let's run a test query below.

In [21]:
query = "What are the services for running ML models?"
embeddings = generate_embeddings(query)
results = vector_search(embeddings)
for result in results: 
    #print(result)
    print(f"Similarity Score: {result['similarityScore']}")  
    print(f"Title: {result['document']['title']}")  
    print(f"Content: {result['document']['content']}")  
    print(f"Category: {result['document']['category']}\n")  

Similarity Score: 0.5570008226634062
Title: Azure Machine Learning
Content: Azure Machine Learning is a cloud-based service for building, training, and deploying machine learning models. It offers a visual interface for creating and managing experiments, as well as support for popular programming languages like Python and R. Machine Learning supports various algorithms, frameworks, and data sources, making it easy to integrate with your existing data and workflows. You can deploy your models as web services, and scale them based on your needs. It also integrates with other Azure services, such as Azure Databricks and Azure Data Factory.
Category: AI + Machine Learning

Similarity Score: 0.529827014500708
Title: Azure Machine Learning
Content: Azure Machine Learning is a fully managed, end-to-end platform that enables you to build, train, and deploy machine learning models at scale. It provides features like automated machine learning, data labeling, and model management. Machine Learni

# Q&A over the data with GPT

Finally, we'll create a helper function to feed prompts into the `Completions` model. Then we'll create interactive loop where you can pose questions to the model and receive information grounded in your data.

In [13]:
def get_conversation_history(completions=3):

    # Sort by _id in descending order and limit the results to 3
    results = cache.find({}, {"prompt": 1, "completion": 1}).sort([("_id", -1)]).limit(completions)
    
    return results

In [18]:
#This function helps to ground the model with prompts and system instructions.

def generate_completion(vector_search_results, user_prompt):
    
    system_prompt = '''
    You are an intelligent assistant for Microsoft Azure services.
    You are designed to provide helpful answers to user questions about Azure services given the information about to be provided.
        - Only answer questions related to the information provided below, provide 3 clear suggestions in a list format.
        - Write two lines of whitespace between each answer in the list.
        - Only provide answers that have products that are part of Microsoft Azure and based on the content items.
        - If you're unsure of an answer, you can say ""I don't know"" or ""I'm not sure"" and recommend users search themselves."
    '''
    # Create a list of messages as a payload to send to the OpenAI API

    # System Prompt
    messages=[{"role": "system", "content": system_prompt}]

    # Add the conversation history
    conversation_history = get_conversation_history(3)
    for item in conversation_history:
        messages.append({"role": "system", "content": item['prompt'] + " " + item['completion']})

    # User Prompt
    messages.append({"role": "user", "content": user_prompt})

    # Add the vector search results
    for item in vector_search_results:
        messages.append({"role": "system", "content": item['document']['content']})

    response = openai_client.chat.completions.create(model = openai_completions_deployment, messages = messages, user= "Mark", )
    
    return response.model_dump()

In [11]:
def cache_generation(user_prompt, user_embeddings, response):

    chat = [
        {
            'prompt': user_prompt,
            'completion': response['choices'][0]['message']['content'],
            'completionTokens': str(response['usage']['completion_tokens']),
            'promptTokens': str(response['usage']['prompt_tokens']),
            'totalTokens': str(response['usage']['total_tokens']),
            'model': response['model'],
            'vectorContent': user_embeddings
         }
    ]

    cache.insert_one(chat[0])


In [19]:
# Create a loop of user input and model output. You can now perform Q&A over the sample data!

user_input = ""

print("*** Please ask your model questions about Azure services. Type 'end' to end the session.\n")

user_input = input("Prompt: ")

while user_input.lower() != "end":
    
    # Generate embeddings from the user input
    user_embeddings = generate_embeddings(user_input)
    
    # Perform a vector search on the user input
    search_results = vector_search(user_embeddings)
    
    # Generate completions based on the search results and user input
    completions_results = generate_completion(search_results, user_input)

    # print the user input
    print("\n" + user_input)

    # Print the generated LLM completions
    print(completions_results['choices'][0]['message']['content'])

    # Cache the conversation
    cache_generation(user_input, user_embeddings, completions_results)
    
    # Ask for more user input
    user_input = input("Prompt: ")

*** Please ask your model questions about Azure services. Type 'end' to end the session.


what is a good nosql database?
I don't know. It would be best to search for a good NoSQL database on the Microsoft Azure website as they offer a variety of database services to meet different needs and requirements.
