<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/customization/llms/AzureOpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Azure OpenAI

Azure openAI resources unfortunately differ from standard openAI resources as you can't generate embeddings unless you use an embedding model. The regions where these models are available can be found here: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models#embeddings-models

Furthermore the regions that support embedding models unfortunately don't support the latest versions (<*>-003) of openAI models, so we are forced to use one region for embeddings and another for the text generation.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [1]:
%pip install llama-index-embeddings-azure-openai
%pip install llama-index-llms-azure-openai



In [2]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.12.49-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.12-py3-none-any.whl.metadata (439 bytes)
Collecting llama-index-cli<0.5,>=0.4.2 (from llama-index)
  Downloading llama_index_cli-0.4.4-py3-none-any.whl.metadata (1.4 kB)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.7.10-py3-none-any.whl.metadata (3.3 kB)
Collecting llama-index-multi-modal-llms-openai<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_multi_modal_llms_openai-0.5.3-py3-none-any.whl.metadata (441 bytes)
Collecting llama-index-program-openai<0.4,>=0.3.0 (from llama-index)
  Downloading llama_index_program_openai-0.3.2-py3-none-any.whl.metadata (473 bytes)
Collecting llama-index-question-gen-openai<0.4,>=0.3.0 (from llama-index)
  Downloading llama_index_question_gen_openai-0.3.1-py3-

In [3]:
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Here, we setup the embedding model (for retrieval) and llm (for text generation).
Note that you need not only model names (e.g. "text-embedding-ada-002"), but also model deployment names (the one you chose when deploying the model in Azure.
You must pass the deployment name as a parameter when you initialize `AzureOpenAI` and `OpenAIEmbedding`.

In [4]:
from google.colab import userdata
AzOpenAPI_Key = userdata.get('AzOpenAPI_Key')

In [19]:
api_version="2024-12-01-preview"
azure_endpoint="https://azureopenai-pj.openai.azure.com/"
api_key=AzOpenAPI_Key

llm = AzureOpenAI(
    model="gpt-4o",
    deployment_name="gpt-4o",
    api_key=AzOpenAPI_Key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=AzOpenAPI_Key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

In [20]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

In [21]:
documents = SimpleDirectoryReader(
    input_files=["/content/paul_graham_essay.txt"]
).load_data()
index = VectorStoreIndex.from_documents(documents)

In [22]:
query = "What is most interesting about this essay?"
query_engine = index.as_query_engine()
answer = query_engine.query(query)

print(answer.get_formatted_sources())
print("query was:", query)
print("answer was:", answer)

> Source (Doc id: 5ddacebf-5558-4d94-90b4-0be7f3768eda): Notes

[1] My experience skipped a step in the evolution of computers: time-sharing machines with...

> Source (Doc id: e5fe439c-679c-46e8-92d1-c6d9b4ddbf53): A lot of Lisp hackers dream of building a new Lisp, partly because one of the distinctive feature...
query was: What is most interesting about this essay?
answer was: The essay highlights the transformative power of pursuing unprestigious work, emphasizing how it can lead to genuine discoveries and align with pure motives. It also explores the shift in publishing brought about by the internet, enabling a new generation of essays and democratizing access to audiences. The author's reflections on personal experiences, such as building a new Lisp dialect, starting Y Combinator, and embracing online essay writing, provide a compelling narrative about innovation, independence, and the value of unconventional paths.


In [25]:
# prompt: another query : ""give 50 words summary of the eassy ?""

query = "Give a 50-word summary of this essay."
answer = query_engine.query(query)

print("query was:", query)
print("answer was:", answer)


query was: Give a 50-word summary of this essay.
answer was: The essay reflects on the author's journey through writing, programming, and career choices. It explores early experiences with computers, transitioning from batch processing to microcomputers, and the excitement of programming. The author discusses shifting academic interests from philosophy to AI, influenced by cultural inspirations, and highlights the evolution of technology and personal growth.


In [26]:
# prompt: print embadding

embeddings = embed_model.get_text_embedding("This is a test sentence.")
print(embeddings[:5]) # Print the first 5 elements of the embedding

[-0.0011391325388103724, -0.003206387162208557, 0.002380132209509611, -0.004501554183661938, -0.010328996926546097]
