# Access Azure OpenAI LLMs from a notebook 

## Overview
Models you deploy to Azure OpenAI can be accessed via API calls. This tutorial gives you the basics of creating local embeddings from custom data and querying over those.

## Prerequisites
We assume you have access to Azure AI Studio and have already deployed an LLM.

## Learning objectives
+ Get familiar with Azure OpenAI APIs
+ Learn how to create embeddings from custom data
+ Learn how to query over those embedings
+ Learn how to access deployed LLMs outside of the Azure console

## Get started

### Install packages

In [None]:
pip install -r requirements.txt

### Run a query on a local csv file by creating local embeddings

Import required libraries

In [None]:
import os
import openai
import requests
import numpy as np
import pandas as pd
from openai.embeddings_utils import get_embedding, cosine_similarity

You also need to [deploy a new model](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model). You need to select and deploy `text-embedding-ada-0021`. If you get an error downstream about your model not being ready, give it up to five minutes for everything to sync. 

For simplicity, we just use a microsoft example here, but you could theoretically use any csv file as long as you match the expected format of the downstream code. This example is a recent earning report given by the CEO of Microsoft. 

In [None]:
# read the data file to be embedded
df = pd.read_csv('microsoft-earnings.csv')
print(df)

In [None]:
# set keys and configure Azure OpenAI
openai.api_type = "azure"
openai.api_base = "<YOUR BASE URL>"
openai.api_version = "2023-07-01-preview"
# get the key from the instructions in the README of this repo. 
#You can also just click View Code in the chat playground
openai.api_key = "<YOUR KEY>"


In [None]:
# calculate word embeddings 
df['embedding'] = df['text'].apply(lambda x:get_embedding(x, engine='text-embedding-ada-002'))
df.to_csv('microsoft-earnings_embeddings.csv')
print(df)

Query the embeddings. After each query you put into the little box, you need to rerun this cell to reset the query. 

In [None]:
# read in the embeddings .csv 
# convert elements in 'embedding' column back to numpy array
df = pd.read_csv('microsoft-earnings_embeddings.csv')
df['embedding'] = df['embedding'].apply(eval).apply(np.array)

# caluculate user query embedding 
search_term = input("Enter a search term: ")
if search_term:
    search_term_vector = get_embedding(search_term, engine='text-embedding-ada-002')

    # find similiarity between query and vectors 
    df['similarities'] = df['embedding'].apply(lambda x:cosine_similarity(x, search_term_vector))
    df1 = df.sort_values("similarities", ascending=False).head(5)

    # output the response 
    print('\n')
    print('Answer: ', df1['text'].loc[df1.index[0]])
    print('\n')
    print('Similarity Score: ', df1['similarities'].loc[df1.index[0]]) 
    print('\n')

### Query your own data

In the README, we show how to add your own data. When you have done this, type in a query, and then similar to what we show for above, if you click **View Code** in the Chat Playground, it will show you all the metadata you need to fill in here.

In [None]:
openai.api_type = "azure"
openai.api_version = "2023-08-01-preview"
# Azure OpenAI setup
openai.api_base = "<YOUR BASE URL>" # Add your endpoint here
deployment_id = "<YOUR DEPLOYMENT ID>" # Add your deployment ID here
# Azure Cognitive Search setup
search_endpoint = "<YOUR COG SEARCH BASE URL>"; # Add your Azure Cognitive Search endpoint here
# This is different than the key from above, its the key for the Cog search
search_key = "<YOUR SEARCH KEY>"; # Add your Azure Cognitive Search admin key here
search_index_name = "<YOUR SEARCH INDEX>"; # Add your Azure Cognitive Search index name here


Now run the query, note that the query is defined in the block below, and will output in Json format

In [None]:
def setup_byod(deployment_id: str) -> None:
    """Sets up the OpenAI Python SDK to use your own data for the chat endpoint.

    :param deployment_id: The deployment ID for the model to use with your own data.

    To remove this configuration, simply set openai.requestssession to None.
    """

    class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):

        def send(self, request, **kwargs):
            request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"
            return super().send(request, **kwargs)

    session = requests.Session()

    # Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`
    session.mount(
        prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",
        adapter=BringYourOwnDataAdapter()
    )

    openai.requestssession = session

setup_byod(deployment_id)

completion = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": "What were some of the phenotypic presentations of MPOX on patients with HIV?"}],
    deployment_id=deployment_id,
    dataSources=[  # camelCase is intentional, as this is the format the API expects
        {
            "type": "AzureCognitiveSearch",
            "parameters": {
                "endpoint": search_endpoint,
                "key": search_key,
                "indexName": search_index_name,
            }
        }
    ]
)
print(completion)


## Conclusion
In this notebook you learned how to feed a PDF document directly to an LLM that you deployed in the Azure console and summarize the document.

## Clean up
Make sure to shut down your Azure ML compute and if desired you can delete your deployed model on Azure OpenAI.