# Access Azure OpenAI LLMs from a notebook 

## Overview
Models you deploy to Azure OpenAI can be accessed via API calls. This tutorial gives you the basics of creating local embeddings from custom data and querying over those.

## Prerequisites
We assume you have access to Azure AI Studio, Azure AI Search, and have already deployed an LLM.

## Learning objectives
+ Get familiar with Azure OpenAI APIs
+ Learn how to create embeddings from custom data
+ Learn how to query over those embedings
+ Learn how to access deployed LLMs outside of the Azure console

## Get started

### Install packages

In [None]:
pip install "openai" "requests"

### Run a query on a local csv file by creating local embeddings

Import required libraries

In [None]:
import os
from openai import AzureOpenAI
import dotenv
import requests
import numpy as np
import pandas as pd

You also need to [deploy a new model](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model). You need to select and deploy `text-embedding-ada-0021`. If you get an error downstream about your model not being ready, give it up to five minutes for everything to sync. 

For simplicity, we just use a microsoft example here, but you could theoretically use any csv file as long as you match the expected format of the downstream code. This example is a recent earning report given by the CEO of Microsoft. 

In [None]:
# read the data file to be embedded
df = pd.read_csv('microsoft-earnings.csv')
print(df)

In [None]:
# set keys and configure Azure OpenAI
os.environ["AZURE_OPENAI_ENDPOINT"] = "<YOUR OPENAI ENDPOINT>"
os.environ["AZURE_OPENAI_KEY"] = "<YOUR OPENAI KEY>"

#create embeddings functions to apply to a given column
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_KEY"),  
    api_version="2023-05-15",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )


Create an **get_embedding** function to quickly transform our text into numerical vectors that represent the relationship or how similar they are to each other. Then with our **cosine_similarity** function we can gather the texts that are similar to each other based on the query we will submit soon.

In [None]:
def get_embedding(text, model="text-embedding-3-large"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

In [None]:
# calculate word embeddings
df['embedding'] = df['text'].apply(lambda x: get_embedding(x))
df.to_csv('microsoft-earnings_embeddings.csv', index=False)
print(df)

Query the embeddings. After each query you put into the little box, you need to rerun this cell to reset the query. 

In [None]:
# read in the embeddings .csv 
# convert elements in 'embedding' column back to numpy array
df = pd.read_csv('microsoft-earnings_embeddings.csv')
df['embedding'] = df['embedding'].apply(eval).apply(np.array)

# caluculate user query embedding 
search_term = input("Enter a search term: ")
if search_term:
    search_term_vector = get_embedding(search_term, engine='text-embedding-ada-002')

    # find similiarity between query and vectors 
    df['similarities'] = df['embedding'].apply(lambda x:cosine_similarity(x, search_term_vector))
    df1 = df.sort_values("similarities", ascending=False).head(5)

    # output the response 
    print('\n')
    print('Answer: ', df1['text'].loc[df1.index[0]])
    print('\n')
    print('Similarity Score: ', df1['similarities'].loc[df1.index[0]]) 
    print('\n')

### Query your own data

In the README, we show how to add your own data. Go to **Azure AI Search** click on the your service you created to fill in the AI search variable needed below. When you have done this, type in a query, and then similar to what we show above you will get a response.

In [None]:
endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
api_key = os.getenv("AZURE_OPENAI_KEY")
deployment_id = "<YOUR DEPLOYMENT ID>" # Add your deployment ID here
# Azure AI Search setup
search_endpoint = "https://{}.search.windows.net".format("<YOUR SEARCH SERVICE NAME>") # Add your Azure AI Search name here
# This is different than the key from above, its the key for the AI search
search_key = "<YOUR SEARCH KEY>"; # Add your Azure AI Search admin key here
search_index_name = "<YOUR SEARCH INDEX>"; # Add your Azure AI Search index name here


Now run the query, note that the query is defined in the block below within the `message` parameter.

In [None]:
dotenv.load_dotenv()

client = openai.AzureOpenAI(
    base_url=f"{endpoint}/openai/deployments/{deployment_id}/extensions",
    api_key=api_key,
    api_version="2023-08-01-preview",
)

completion = client.chat.completions.create(
    model=deployment_id,
    messages=[
        {
            "role": "user",
            "content": "What were some of the phenotypic presentations of MPOX on patients with HIV?",
        },
    ],
    stream=True,
    extra_body={
        "dataSources": [
            {
                "type": "AzureCognitiveSearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "key": search_key,
                    "indexName": search_index_name
                }
            }
        ]
    }
)

for chunk in completion:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

## Conclusion
In this notebook you learned how to feed a PDF document directly to an LLM that you deployed in the Azure console and summarize the document.

## Clean up
Make sure to shut down your Azure ML compute and if desired you can delete your deployed model on Azure OpenAI and Azure AI Search index and service.