# Azure OpenAI Quickstart

## Overview  
> "Large Language Models (LLMs) are models designed to generate text token by token based on the context provided. Given an input sequence, an LLM predicts the next piece of text."  

This `Azure OpenAI Quickstarts` notebook introduces the concept of LLMs, essential package requirements for utilizing LLMs, a brief guide on prompt design, and a few simple examples across various use cases. For more quickstart examples, refer to the [official Azure OpenAI quickstart guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?pivots=programming-language-studio).  

The content is configured in a container-based environment to allow for immediate hands-on practice. The environment is pre-configured based on [DevContainer](https://code.visualstudio.com/docs/devcontainers/containers). By leveraging GitHub Codespaces or downloading the repository locally while having Docker installed, Visual Studio Code IDE will automatically set up the development environment (Python Runtime 3.11.4, Azure OpenAI 1.13.3). After adding the required API credentials in the `.env` file, you can begin using the environment.

### Getting started with Azure OpenAI Service

New customers need to [request access](https://aka.ms/oai/access) to the Azure OpenAI service.
After approval, they can log in to the Azure Portal, create Azure OpenAI resources, and start experimenting with models via the studio.

Here is a [great starting resource](https://techcommunity.microsoft.com/t5/educator-developer-blog/azure-openai-is-now-generally-available/ba-p/3719177) to explore more.

## Build your first prompt  
This short exercise provides an introduction to submitting a prompt to an OpenAI model for the simple task of "summarization."

![](images/generative-AI-models-reduced.jpg)  

**Steps**:  
1. Install the `openai` library into your Python environment
2. Load standard helper libraries and set OpenAI secure credentials
3. Choose the appropriate model for the task
4. Create a simple prompt for the model
5. Submit the request to the model's API!

## 1. Install the `openai` library into your Python environment
When the DevContainer starts, libraries specified in the `requirements.txt` file are installed automatically.
Thus, no additional installation steps are needed for immediate practice.
Python version: **Python 3.11.4**, **Azure OpenAI 1.13.3**.

## 2. Load standard helper libraries and set OpenAI secure credentials
Copy the `.env.sample` file in the root directory and rename it to `.env`. Input your Azure OpenAI Endpoint URL (`AZURE_OPENAI_ENDPOINT`) and API Key (`AZURE_OPENAI_API_KEY`) into the file.  
If the code below doesn't execute correctly, ensure the file is saved and closed before reopening to proceed. (Alternatively, restart the kernel using the `Restart` button at the top.)

In [1]:
import os
import json
from openai import AzureOpenAI
from dotenv import load_dotenv
load_dotenv()

client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key        = os.getenv("AZURE_OPENAI_API_KEY"),
    api_version    = os.getenv("OPENAI_API_VERSION")
)

## 3. Select an Appropriate OpenAI Model
As of December 2024, this exercise utilizes the following models. For consistency, we recommend deploying the models under the same names as listed below:  
- LLM Model: `gpt-4o-mini`  
- Embedding Model: `text-embedding-3-large`  

The `gpt-4o` and `gpt-4o-mini` models are optimized with Korean tokenizer enhancements, providing faster, more accurate performance at lower costs compared to the standard `gpt-4`.  
For more details on the models, refer to: [Azure OpenAI Service models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)  

In [2]:
deployment_name = os.getenv("DEPLOYMENT_NAME")
deployment_embedding_name = os.getenv("DEPLOYMENT_EMBEDDING_NAME")

## 4. Prompt Design  

The power of large language models (LLMs) lies in their training on massive amounts of text data to minimize prediction errors for the next piece of text (token).
For example, they learn concepts like:

* How to write
* How grammar works
* How to paraphrase
* How to answer questions
* How to lead conversations
* How to write in multiple languages
* How to code
* And more

### Controlling Large Language Models (LLMs)  
Among all inputs to an LLM, the most impactful is the **text prompt**.

There are several ways to guide a language model to generate desired outputs: 

* **Instruction**: Provide a clear explanation of what you want the model to do.  

* **Completion**: Guide the model to complete the beginning of the desired output.  

* **Demonstration**: Show the model what you want using one of these methods:  
Include a few examples inside the prompt or use a dataset of hundreds or thousands of examples through fine-tuning.  


### Three Basic Guidelines for Crafting Prompts

* **Show and tell**: Provide clear and concise instructions and clarify expectations using examples or a combination of both.  

* **Provide quality data**: Present high-quality examples to encourage the desired outcome. The most effective way to minimize hallucinations is to begin with accurate data (leveraging the RAG pattern).  

* **Check your settings**: Temperature and Top-P settings influence how the model generates responses. For tasks with a single correct answer, consider lowering the temperature (0). For more diverse, creative responses, increase the temperature (1).

In [15]:
# Create your first prompt
system_message = """You are an agent capable of distinguishing between positive and negative sentiments. Please return the results in a compact JSON format. Response example: {"1": "Positive", "2": "Negative"}"""
user_message = """1. The monitor is too hot. 2. The monitor's market reaction is too hot."""

## 5. Submit!

In [16]:
# Simple API Call
response = client.chat.completions.create(
    model=deployment_name,
    max_tokens=60,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ]
)

response.choices[0].message.content

'{"1": "Negative", "2": "Positive"}'

### Analyze the Output Data
In addition to the result of the API, let us also explore what other information is included in the response.

In [17]:
print(json.dumps(response.model_dump(), indent=2))

{
  "id": "chatcmpl-AlEPOsLvdYgnGTzrEd0EpJMQUPOq1",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{\"1\": \"Negative\", \"2\": \"Positive\"}",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null,
        "refusal": null
      }
    }
  ],
  "created": 1735820166,
  "model": "gpt-4o-2024-11-20",
  "object": "chat.completion",
  "system_fingerprint": "fp_82ce25c0d4",
  "usage": {
    "completion_tokens": 12,
    "prompt_tokens": 69,
    "total_tokens": 81,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}


### Check Azure OpenAI Usage Costs
Here, we calculate the [cost](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/#pricing) of using the `gpt-4o` series.  
We compare the following versions: gpt-4o (0806) and gpt-4o-mini (0718).  
Costs may vary depending on the deployed model version.  
(Azure OpenAI is continuously working to reduce costs.)

| Model | Version | Prompt (1k) | Completion (1k) |
|-----|-----|-----|-----|
| ***gpt-4o*** | 1120 | $0.00250 | $0.0100 |
| gpt-4o | 0806 | $0.00250 | $0.0100 |
| gpt-4o | 0513 | $0.00500 | $0.0150 |
| ***gpt-4o-mini*** | 0718 | $0.00015 | $0.0006 |

Comparatively:
- gpt-35-turbo is approximately 1/10 the cost of gpt-4o (0513).
- gpt-4o-mini is approximately 1/30 the cost of gpt-4o (0513).
Below is an example calculation of costs using gpt-4o.

In [27]:
# Function to calculate OpenAI API usage costs
def openai_cost(model_name):

    response = client.chat.completions.create(
        model=deployment_name,
        max_tokens=60,
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message},
        ]
    )

    print(response.choices[0].message.content)

    if model_name == "gpt-4o":
        model_prompt_cost = 0.0025
        model_complet_cost = 0.01
    elif model_name == "gpt-4o-mini":
        model_prompt_cost = 0.00015
        model_complet_cost = 0.0006
    else:
        print("No selected model.")

    print("Prompt tokens:", response.usage.prompt_tokens)
    print("Completion tokens:", response.usage.completion_tokens)
    print("Total tokens:", response.usage.total_tokens)

    prompt_cost  = round(response.usage.prompt_tokens * model_prompt_cost / 1000, 9)
    complet_cost = round(response.usage.completion_tokens * model_complet_cost / 1000, 9)

    print(deployment_name, "Cost of usage: $", format(prompt_cost, '.10f').rstrip('0'),
          " + $", format(complet_cost, '.10f').rstrip('0'), " = $", format(prompt_cost + complet_cost, '.10f').rstrip('0'))

    return prompt_cost + complet_cost;

### Call the `gpt-4o-mini` Model
If the `gpt-4o-mini` model is not deployed in the Azure OpenAI Studio, an error may occur.  
In case of an error, ensure that the model is deployed with the name `gpt-4o-mini`.

In [28]:
deployment_name = "gpt-4o-mini"
gpt_4o_mini_pricing = openai_cost(deployment_name)

{"1": "Negative", "2": "Positive"}
Prompt tokens: 69
Completion tokens: 12
Total tokens: 81
gpt-4o-mini Cost of usage: $ 0.00001035  + $ 0.0000072  = $ 0.00001755


### Call the `gpt-4o` Model
If the `gpt-4o` model is not deployed in the Azure OpenAI Studio, an error may occur.  
In case of an error, ensure that the model is deployed with the name `gpt-4o`.  
(Costs may vary depending on the version; we use version 0806 for illustration here.)

In [29]:
deployment_name = "gpt-4o"
gpt_4o_pricing = openai_cost(deployment_name)

{"1": "Negative", "2": "Positive"}
Prompt tokens: 69
Completion tokens: 12
Total tokens: 81
gpt-4o Cost of usage: $ 0.0001725  + $ 0.00012  = $ 0.0002925


The `gpt-4o` and `gpt-4o-mini` models are more cost-efficient and perform faster due to optimized Korean tokenizers.

## Applications for Various Use Cases
1. Summarization
2. Classification
3. Product Name Generation
4. Embedding

### 1. Summarization
LLMs can be used across diverse use cases. One way to summarize is as follows:
To check the token length of a text:
1. Tokenizer: https://platform.openai.com/tokenizer 

In [32]:
original_text = "Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.\n\nTl;dr"
system_message = "Summarize the text below as a bullet point list of the most important points"
user_message = "Text: ```" + original_text + "```"

In [33]:
# deployment_name = os.getenv("DEPLOYMENT_NAME")

#Setting a few additional, typical parameters during API Call
response = client.chat.completions.create(
  model=deployment_name,
  messages=[
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message},
  ],
  temperature=0.7,
  max_tokens=300
)

print(response.choices[0].message.content)

- Pre-training on a large text corpus followed by task-specific fine-tuning has shown significant improvements in NLP tasks.  
- This approach typically requires large fine-tuning datasets (thousands or tens of thousands of examples).  
- Humans can perform new language tasks with only a few examples or instructions, a capability current NLP systems lack.  
- Scaling up language models significantly improves few-shot, task-agnostic performance.  
- Larger models can sometimes rival state-of-the-art fine-tuning methods without requiring extensive datasets.  


#### When No Reference Text is Provided
Let’s see how the model behaves when no reference text is available. In an RAG (Retrieval-Augmented Generation) pattern, this ensures the model generates outputs only based on provided data, not relying on its own knowledge.

In [34]:
original_text = ""
system_message = "Summarize the text below as a bullet point list of the most important points"
user_message = "Text: ```" + original_text + "```"

#Setting a few additional, typical parameters during API Call
response = client.chat.completions.create(
  model=deployment_name,
  messages=[
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message},
  ],
  temperature=0.7,
  max_tokens=300
)

print(response.choices[0].message.content)

It seems there is no text provided. Could you please share the text you'd like summarized?


### 2. Classification

Infer the proper classification for an item based on given categories. The following example includes the categories to classify into and the text to be classified in the prompt.

Customer Inquiry: Hello, one of the keys on my laptop keyboard broke recently, and I need a replacement.

Classification Categories:

In [35]:
system_message = """Classify the following inquiry into one of the following:
categories: [Pricing, Hardware Support, Software Support]
"""
user_message = """inquiry: Hello, one of the keys on my laptop keyboard broke recently and I'll need a replacement.
Classified category:
"""

In [36]:
response = client.chat.completions.create(
  model=deployment_name,
  messages=[
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message},
  ],
  temperature=0,
  max_tokens=60
)

print(response.choices[0].message.content)

Hardware Support


### 3. Product Name Generation
Generate product names based on seed words. The prompt contains information about the product and examples showcasing desired naming patterns. The temperature is set higher to encourage creativity.

Product Description: A pair of shoes that can fit any foot size.  
Seed words: adaptable, fit, omni-fit.  

> Adjust the `temperature` value between 0 and 1 to observe varying results upon repeated execution.

In [37]:
system_message = """Come up with five product names that fit the given product description and seed words."""
user_message = """Product description: A pair of shoes that can fit any foot size.
Seed words: adaptable, fit, omni-fit.
Product names:"""

In [39]:
response = client.chat.completions.create(
  model=deployment_name,
  messages=[
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message},
  ],
  # temperature=0.8,
  temperature=0,
  max_tokens=60
)

print(response.choices[0].message.content)

1. **AdaptiFit**  
2. **OmniStep**  
3. **FlexiFit Shoes**  
4. **UniSize Soles**  
5. **EverFit Footwear**  


one more time: temperature value is 0.

In [40]:
response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ],
    # temperature=0.8,
    temperature=0,
    max_tokens=60
)

print(response.choices[0].message.content)

1. **AdaptiFit**  
2. **OmniStep**  
3. **FlexiFit Shoes**  
4. **UniSize Soles**  
5. **EverFit Footwear**  


temperature value is 0.8.

In [42]:
response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ],
    temperature=0.8,
    # temperature=0,
    max_tokens=60
)

print(response.choices[0].message.content)

1. FlexiFit  
2. OmniStep  
3. Adaptasole  
4. UniSize  
5. EverFit


one more time: temperature value is 0.8.

In [43]:
response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ],
    temperature=0.8,
    # temperature=0,
    max_tokens=60
)

print(response.choices[0].message.content)

1. **Adaptix**  
2. **OmniStep**  
3. **FitMorph**  
4. **UniSole**  
5. **FlexaFit**  


When the temperature in an LLM (Language Learning Model) is close to:  
>- 0: The model becomes more deterministic and focused, generating predictable and less diverse responses. It tends to choose the most likely words based on training.
>- 1: The model becomes more creative and random, generating diverse and less predictable responses. It explores a wider range of possibilities.

In short:  
>- Low temperature (0): Focused and reliable.
>- High temperature (1): Creative and varied.

### 4. Embedding
This section showcases how to retrieve embeddings using the API to find similarities between words, sentences, and documents.  
For the `text-embedding-ada-002` API, the default dimension size is 1,536, whereas for `text-embedding-3-large`, it is double at 3,072 dimensions.  
(Update 2024.12: The embedding API has transitioned from `text-embedding-ada-002` to `text-embedding-3-large`.)

In [44]:
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

In [45]:
text = 'the quick brown fox jumped over the lazy dog'

In [46]:
vector = client.embeddings.create(input = [text], model=deployment_embedding_name).data[0].embedding
print(vector)
len(vector)

[-0.005379660055041313, -0.006699045188724995, 0.0018830852350220084, -0.001022727694362402, 0.007556849624961615, -0.028103310614824295, 0.008390145376324654, 0.06751330196857452, -0.013602329418063164, 0.02217220515012741, 0.00883130170404911, 0.032122738659381866, 0.0074302214197814465, -0.045096009969711304, -0.018626613542437553, 0.03297237306833267, -0.04244907200336456, 0.01620842143893242, 0.004954842384904623, 0.013520633801817894, 0.010979898273944855, -0.028609823435544968, -0.04097854718565941, 0.019803030416369438, -0.01885536126792431, 0.018790004774928093, -0.005191759672015905, 0.009983210824429989, 0.009550224058330059, 0.007675308268517256, 0.00872509740293026, 0.03217175602912903, -0.01211546827107668, 0.010530571453273296, 0.030162042006850243, 0.0007852996350266039, 0.024672092869877815, 0.0012560709146782756, -0.004452413879334927, 0.00523260748013854, 0.031861312687397, -0.013153002597391605, -0.026273326948285103, -0.018005724996328354, -0.012352385558187962, 0.

3072

In [53]:
text = 'The new movie is awesome.'
vector = client.embeddings.create(input=[text], model=deployment_embedding_name).data[0].embedding

sentences1 = ['The new movie is awesome',  
              'The new movie is awesome',  
              'The new movie is awesome']  
  
sentences2 = ['The dog plays in the garden',  
              'This recent movie is so good',  
              'The new movie is awesome']  

embeddings1 = [vector for _ in range(len(sentences1))]
embeddings2 = [client.embeddings.create(input = s, model=deployment_embedding_name).data[0].embedding for s in sentences2]  
  
for i in range(len(sentences1)):  
    print("Score: {:.4f}\t{}\t{}".format(cosine_similarity(embeddings1[i], embeddings2[i]), sentences1[i], sentences2[i]))

Score: 0.1309	The new movie is awesome	The dog plays in the garden
Score: 0.5902	The new movie is awesome	This recent movie is so good
Score: 0.9593	The new movie is awesome	The new movie is awesome


Below is an example of how embedding APIs are used to convert Korean sentences into vector values, followed by a comparison to analyze similarity scores.

In [56]:
text = 'Banana milk tastes better after taking a bath.'
vector = client.embeddings.create(input=[text], model=deployment_embedding_name).data[0].embedding

sentences1 = [
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after taking a bath.',
]
  
sentences2 = [
    'Banana milk tastes better after taking a bath.',
    'Banana milk tastes better after a bath.',
    'After taking a bath, drinking banana milk tastes better.',
    'Tastes better banana bath milk after taking.',
    'Strawberry milk tastes better when you drink it before taking a bath.',
    'Milk tastes richer when eaten with cereal.',
    'Yesterday, the weather was colder because it rained and snowed.',
    'VWXYZ FGHI ABCDE JKL MNO PQR STU',
    'The development of LLMs is boundless.',    
]

embeddings1 = [vector for _ in range(len(sentences1))]  
embeddings2 = [client.embeddings.create(input = s, model=deployment_embedding_name).data[0].embedding for s in sentences2]  

for i in range(len(sentences1)):  
    print("Score: {:.4f}\t{}\t{}".format(cosine_similarity(embeddings1[i], embeddings2[i]), sentences1[i], sentences2[i]))

Score: 1.0000	Banana milk tastes better after taking a bath.	Banana milk tastes better after taking a bath.
Score: 0.9829	Banana milk tastes better after taking a bath.	Banana milk tastes better after a bath.
Score: 0.9205	Banana milk tastes better after taking a bath.	After taking a bath, drinking banana milk tastes better.
Score: 0.8675	Banana milk tastes better after taking a bath.	Tastes better banana bath milk after taking.
Score: 0.7919	Banana milk tastes better after taking a bath.	Strawberry milk tastes better when you drink it before taking a bath.
Score: 0.5203	Banana milk tastes better after taking a bath.	Milk tastes richer when eaten with cereal.
Score: 0.1262	Banana milk tastes better after taking a bath.	Yesterday, the weather was colder because it rained and snowed.
Score: 0.0682	Banana milk tastes better after taking a bath.	VWXYZ FGHI ABCDE JKL MNO PQR STU
Score: -0.0015	Banana milk tastes better after taking a bath.	The development of LLMs is boundless.


Cosine similarity measures the angle between two vectors, yielding values between -1 and 1.  
However, depending on the characteristics of embedding models, the actual range of these values can be more restricted.  

For the `text-embedding-ada-002` (1536 dimensions) model:
>- Cosine similarity scores between embedding vectors typically range from approximately 0.68 to 1.  
>This suggests that even unrelated text pairs may exhibit a similarity score above 0.68. 

For the `text-embedding-3-large` (3072 dimensions) model:
>- Cosine similarity scores span a broader spectrum, often yielding lower values.  
>This indicates an enhanced ability to distinguish between texts, allowing for more nuanced similarity assessments.

# Feel free to reach out with any questions.  
If you have questions related to Prompt Engineering, please contact the following email:  
MS Korea, Hyounsoo Kim: [<hyounsookim@microsoft.com>](hyounsookim@microsoft.com)