# RAG with Azure Data Explorer

The first step to create a RAG pattern is the generation of the embeddings for the content.  
Please run the notebook [RAG - Azure Data Explorer - create embeddings](./RAG%20-%20Azure%20Data%20Explorer%20-%20create%20embeddings.ipynb) first. 


In [1]:
# Import required libraries
import os
import json
from dotenv import load_dotenv

from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import AzureOpenAI

In [2]:
# Configure environment variables
load_dotenv()
AAD_TENANT_ID = os.getenv("AAD_TENANT_ID")
KUSTO_CLUSTER = os.getenv("KUSTO_CLUSTER")
KUSTO_DATABASE = os.getenv("KUSTO_DATABASE")
KUSTO_MANAGED_IDENTITY_APP_ID = os.getenv("KUSTO_MANAGED_IDENTITY_APP_ID")
KUSTO_MANAGED_IDENTITY_SECRET = os.getenv("KUSTO_MANAGED_IDENTITY_SECRET")

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
azure_openai_embedding_dimensions = 1536

In [3]:
# Configure OpenAI API
aoai_client = AzureOpenAI(
  azure_endpoint = OPENAI_DEPLOYMENT_ENDPOINT, 
  api_key=OPENAI_API_KEY,  
  api_version="2023-05-15"
)

In [4]:
# Generate Document Embeddings using OpenAI Ada Model
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
# Function to generate embeddings for title and content fields, also used for query embeddings
def calc_embeddings(text):
    # model = "deployment_name"
    embeddings = aoai_client.embeddings.create(input = [text], model=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME).data[0].embedding
    return embeddings


In [5]:
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.exceptions import KustoServiceError
from azure.kusto.data.helpers import dataframe_from_result_table

# Connect to adx using AAD app registration
cluster = KUSTO_CLUSTER
kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(cluster, KUSTO_MANAGED_IDENTITY_APP_ID, KUSTO_MANAGED_IDENTITY_SECRET,  AAD_TENANT_ID)
print(kcsb)
client = KustoClient(kcsb)
kusto_db = KUSTO_DATABASE

Data Source=https://aidemos-adx.westeurope.kusto.windows.net;Initial Catalog=NetDefaultDB;AAD Federated Security=True;Application Client Id=22870bd2-cd57-4696-9420-9699f9bdc0c1;Application Key=****;Authority Id=16b3c013-d300-468d-ac64-7eda0820b6d3


In [7]:
#testing the connection to kusto works - sample query to get the top 2 results from the table
table_name = "embeddingscsv"
query = table_name + " | take 2"

response = client.execute(kusto_db, query)
for row in response.primary_results[0]:
    txt = (row["name"])[0:10]
    print("Name :{}".format(txt))

Name :Sony Turnt
Name :Bose Acous


In [14]:
def do_search(question, nr_of_answers=1):
        searchedEmbedding = calc_embeddings(question)
        kusto_query = table_name + " | extend similarity = series_cosine_similarity(dynamic("+str(searchedEmbedding)+"), description_embedding) | top " + str(nr_of_answers) + " by similarity desc "
        response = client.execute(kusto_db, kusto_query)

        for row in response.primary_results[0]:
                print(row['name'] + " : " + row['description'])

In [15]:
# Pure Vector Search
do_search("nonstick grills", 3)
# We get non stick grills as the top result and cast iron grills as the second and third result


Cuisinart Countertop Griddler - GR4 : Cuisinart Countertop Griddler - GR4/ Nonstick Grill/ Knob Selector/ Light Indicator/ Temperature Controls/ Cleaning/Scraping Tool Included
Weber Cast Iron Griddle - 7531 : Weber Cast Iron Griddle - 7531/ Heavy-Duty Cast Iron Griddle/ Fits Weber Genesis Silver A & Spirit 500 Gas Grills
Weber Cast Iron Griddle - 7542 : Weber Cast Iron Griddle - 7542/ Heavy-Duty Cast Iron Griddle/ Two-Sided For Cooking A Variety Of Foods/ Fits Several Weber Grills
