# Azure AI Models Quotas and Usages Discovery Recipe

This Jupyter notebook serves as a guide for users looking to understand how to discover Azure AI Models, Quotas and subscription Usages across regions.

## Pre-requisites

Authenticate to Azure CLI

```console
az login --use-device-code
```

## Get Azure Subscription

In [1]:
import requests
import json
import pandas as pd
from datetime import datetime
from azure.mgmt.subscription import SubscriptionClient
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
sub_client = SubscriptionClient(credential)
subscription = next(sub_client.subscriptions.list(), None)
if not subscription:
    raise Exception("Authenticate using the az cli")
subscriptionId = subscription.subscription_id

print(f"Using subscription {"x" * 8  + "-" + ("x" * 4 + "-")*3}{subscriptionId[-12:]}")

Using subscription xxxxxxxx-xxxx-xxxx-xxxx-2f5e9419c1cd


## Get API token and headers

In [2]:
token = credential.get_token('https://management.azure.com/.default')
headers = {'Authorization': 'Bearer ' + token.token}

## Locations API: Discover regions and geography groups

### Regions

In [3]:
def get_regions():
    locations_request = f"https://management.azure.com/subscriptions/{subscriptionId}/locations?api-version=2021-04-01"
    response = requests.get(locations_request, headers=headers)
    data = json.loads(response.text)
    return data["value"]

regions = get_regions()

def get_region_names(regions):
    return set([r['name'] for r in regions])

regions_list = get_region_names(regions)
print(f"{len(regions_list)} regions:")
print(", ".join(regions_list))

92 regions:
japan, brazilsoutheast, southafricawest, westus3, norwaywest, eastus2stage, eastus2euap, unitedstateseuap, canadacentral, southcentralusstage, centralindia, germanywestcentral, southeastasia, israel, swedencentral, ukwest, canada, eastus2, australiaeast, uk, switzerlandwest, canadaeast, australia, italynorth, australiacentral, northcentralusstage, westus2, southindia, uae, poland, uksouth, northcentralus, uaecentral, westeurope, qatarcentral, europe, westindia, germany, koreasouth, korea, southafrica, southafricanorth, newzealandnorth, france, southcentralus, westus, switzerlandnorth, brazilus, eastasia, polandcentral, unitedstates, australiasoutheast, westusstage, qatar, westus2stage, japanwest, global, northeurope, eastusstage, asiapacific, australiacentral2, westcentralus, koreacentral, eastasiastage, eastusstg, jioindiacentral, sweden, brazil, israelcentral, eastus, italy, singapore, centralusstage, centraluseuap, asia, brazilsouth, spaincentral, switzerland, newzealand

### Geography groups

In [4]:
def get_geography_groups(regions):
    return set([region['metadata']['geographyGroup'] for region in regions if 'geographyGroup' in region['metadata']])

geography_groups = get_geography_groups(regions)
print(", ".join(geography_groups))

Asia Pacific, US, Mexico, Canada, Middle East, Europe, South America, Africa


### Let's focus on US regions

In [5]:
def filter_by_geography_group(regions, *args):
    return filter(lambda r: r['metadata']['geographyGroup'] in args, [r for r in regions if 'geographyGroup' in r['metadata']])
regions_list = get_region_names(filter_by_geography_group(regions, 'US'))
print(", ".join(regions_list))

northcentralusstage, centralusstage, westusstage, centraluseuap, westus2, westus2stage, westus3, northcentralus, centralus, eastusstage, eastus2stage, eastus2euap, westcentralus, southcentralusstage, eastusstg, southcentralusstg, southcentralus, eastus2, westus, eastus


### Helper to parallelize calls per region

Most of the API calls we need to make going forward are region-specific, and we aim to aggregate results across multiple regions. Depending on the number of regions being queried, this could involve up to 90 regions. The following helper function enables parallel execution of queries for each region.

In [6]:
def parallel_map(fn, *iterables, executor=None, **kwargs):
    from concurrent.futures import ThreadPoolExecutor, as_completed
    from tqdm import tqdm

    """
    Equivalent to executor.map(fn, *iterables), but displays a tqdm-based progress bar.
    
    **kwargs is passed to tqdm.
    """
    with executor if executor else ThreadPoolExecutor(max_workers=len(len(*iterables))) as ex:
        futures = []
        for iterable in iterables:
            futures += [ex.submit(fn, i) for i in iterable]
        for f in tqdm(as_completed(futures), total=len(futures), **kwargs):
            yield f.result()

### Discover regions where Azure AI is supported using the Models API

Not all Azure regions support Azure AI, let's filter out not supported regions

In [7]:
def get_models_response(region):
    url = f"https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{region}/models?api-version=2023-05-01"
    return (region, requests.get(url, headers=headers))

def find_azureai_supported_regions():
    regions_successful = []
    regions_failed = []
    
    for (region, result) in parallel_map(get_models_response, regions_list):
        if result.status_code == 200:
            regions_successful.append(region)
        else:
            regions_failed.append(region)
    return (regions_successful, regions_failed)

regions_successful, regions_failed = find_azureai_supported_regions()
print("Potential Azure OpenAI regions based on control plane response:")
print(", ".join(regions_successful))

100%|██████████| 20/20 [00:00<00:00, 33.02it/s]

Potential Azure OpenAI regions based on control plane response:
westus, westus2, westcentralus, westus3, centralus, southcentralus, northcentralus, eastus2, eastus





In [8]:
print("Azure OpenAI Not Supported:")
print(", ".join(regions_failed))

Azure OpenAI Not Supported:
centralusstage, northcentralusstage, eastus2euap, westusstage, centraluseuap, eastus2stage, eastusstage, westus2stage, southcentralusstage, southcentralusstg, eastusstg


## Models API: Discover models and SKUs

### List models

In [9]:
def get_models(region):
    (_, response) = get_models_response(region)
    parsed = json.loads(response.text)
    return parsed['value']

def get_models_regions(regions):
    def get_models_region(region):
        return [{
            "model": model,
            "region": region
        } for model in get_models(region)]

    import itertools
    return itertools.chain.from_iterable(parallel_map(get_models_region, regions))

models_regions = list(get_models_regions(regions_successful))

100%|██████████| 9/9 [00:00<00:00, 13.74it/s]


In [10]:
def format_model_region(model_region):
    model = model_region['model']
    return {
        'region': model_region['region'],
        'kind': model['kind'],
        'modelName': model['model']['name'],
        'modelVersion': model['model']['version'],
        'skus': ', '.join([sku['name'] for sku in model['model']['skus']]),
    }

## Convert the list to a DataFrame
df = pd.DataFrame(map(format_model_region, models_regions))
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 100)
df.head()

Unnamed: 0,region,kind,modelName,modelVersion,skus
0,westus,OpenAI,gpt-35-turbo,0613,GlobalBatch
1,westus,OpenAI,gpt-35-turbo,1106,"Standard, GlobalBatch, ProvisionedManaged"
2,westus,OpenAI,gpt-35-turbo,0125,"Standard, GlobalBatch, ProvisionedManaged"
3,westus,OpenAI,gpt-4,0125-Preview,ProvisionedManaged
4,westus,OpenAI,gpt-4,1106-Preview,"Standard, ProvisionedManaged"


> **Note**: Only the top 5 are displayed

### List models by region

In [11]:
region_model_data = {}  

excluded_models =  ['text-similarity-ada-001', 'text-babbage-001', 'text-curie-001', 'text-similarity-curie-001', 'text-davinci-002','text-davinci-003', 'text-davinci-fine-tune-002', 'code-davinci-002', 'code-davinci-fine-tune-002','text-ada-001', 'text-search-ada-doc-001', 'text-search-ada-query-001', 'code-search-ada-code-001','code-search-ada-text-001', 'text-similarity-babbage-001', 'text-search-babbage-doc-001','text-search-babbage-query-001', 'code-search-babbage-code-001', 'code-search-babbage-text-001', 'text-search-curie-doc-001', 'text-search-curie-query-001', 'text-davinci-001','text-similarity-davinci-001', 'text-search-davinci-doc-001', 'text-search-davinci-query-001','code-cushman-001']

for region in regions_successful:
    data_test = []

    for item in get_models(region):
        model_name = None
        version = None
        sku_name = None
        if item["model"]["capabilities"].get("scaleType") == "Manual": #skip legacy models
            continue
        model_name = item["model"]["name"]
        if model_name in excluded_models: # if in list skip
            continue
        version = item["model"]["version"]
        rdate = item["model"]["deprecation"]
        for sku in item["model"]["skus"]:
            sku_name = sku["name"]
        if sku_name == "Standard": # This example is only targeting Standard Model deployments SKUI
            data_test.append({"Model Name": model_name, "Version": version, "SKU Name": sku_name})
                #print(data_test)

    region_model_data[region] = data_test  # store the model data under corresponding region name

In [12]:
rows = []
for region, models in region_model_data.items():
    for model in models:
        row = model.copy()  
        row['Region'] = region  
        rows.append(row)

df = pd.DataFrame(rows)
df = df[['Region', 'Model Name', 'Version', 'SKU Name']]
pd.set_option('display.max_rows', None)

df['Exist'] = True 
pivot_df = df.pivot_table(index='Region', columns=['Model Name', 'Version'], values='Exist', fill_value=False, aggfunc='any')
pivot_df.reset_index(inplace = True)

pivot_df

Model Name,Region,babbage-002,dall-e-2,dall-e-3,davinci-002,gpt-35-turbo,gpt-35-turbo-16k,gpt-35-turbo-instruct,gpt-4,text-embedding-3-large,text-embedding-3-small,text-embedding-ada-002,text-embedding-ada-002,tts,tts-hd,whisper
Version,Unnamed: 1_level_1,1,2.0,3.0,1,0301,0613,0914,vision-preview,1,1,1,2,001,001,001
0,eastus,False,True,True,False,True,True,True,False,True,True,True,True,False,False,False
1,eastus2,False,False,False,False,False,True,False,False,True,True,False,True,False,False,True
2,northcentralus,True,False,False,True,False,True,False,False,False,False,False,True,True,True,True
3,southcentralus,False,False,False,False,True,False,False,False,False,False,True,True,False,False,False
4,westus,False,False,False,False,False,False,False,True,False,True,False,True,False,False,False
5,westus3,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False


## Usages API: Discover quotas and usages

The Usages API allows to query current usage and limits and deduce the remaining quota available

In [13]:
def get_usages(region):
    url = f"https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{region}/usages?api-version=2023-05-01"
    response = requests.get(url, headers=headers)
    return json.loads(response.text)['value']

### Example for eastus2

In [14]:
usages = get_usages('eastus2')

In [15]:
def format_usage(usage):
    return {
        'usageName': usage['name']['value'],
        'localizedUsageName': usage['name']['localizedValue'],
        'currentUsage': usage['currentValue'],
        'usageLimit': usage['limit'],
        'remainingQuota': usage['limit'] - usage['currentValue'],
        'usageUnit': usage['unit'],
    }

## Convert the list to a DataFrame
df = pd.DataFrame(map(format_usage, usages))
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 100)
df.sort_values(by='currentUsage', ascending=False).head(10)

Unnamed: 0,usageName,localizedUsageName,currentUsage,usageLimit,remainingQuota,usageUnit
20,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
23,OpenAI.DataZoneStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - DataZoneStandard,10,30,20,Count
34,AccountCount,Maximum Resources per Region,4,200,196,Count
32,AIServices.S0.AccountCount,Maximum resources for AIServices S0 sku.,4,50,46,Count
18,OpenAI.Standard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o,1,8,7,Count
0,OpenAI.ProvisionedManaged,Provisioned Managed Throughput Unit,0,0,0,Count
1,OpenAI.GlobalProvisionedManaged,Global Provisioned Managed Throughput Unit,0,0,0,Count
2,OpenAI.FineTuned.Deployments,Standard Fine-Tuned Deployments,0,5,5,Count
3,OpenAI.S0.AccountCount,Maximum resources for OpenAI S0 sku.,0,30,30,Count
5,OpenAI.Standard.gpt-35-turbo,Tokens Per Minute (thousands) - GPT-35-Turbo,0,30,30,Count


> **Note**: Only the top 5 are displayed

> **Important**: The `usageName` is the value we'll use to join models with usages

### Discover usages for all regions

In [16]:
def get_regions_usages(regions):
    import itertools
    def get_region_usages(region):
        return (region, get_usages(region))
    return dict(parallel_map(get_region_usages, regions))

regional_usages = get_regions_usages(regions_successful)

100%|██████████| 9/9 [00:00<00:00, 18.91it/s]


### Index usages by name and region

We will use this index to join usages with the models on the **region** and **usage name**.

In [17]:
def index_usages_by_name(usages):
    from functools import reduce
    def usage_name_reducer(j, u):
        j[u['name']['value']] = u
        return j
    return reduce(usage_name_reducer, usages, {})

regional_usages_by_name = dict([(region, index_usages_by_name(usages)) for (region, usages) in regional_usages.items()])

## Stitch APIs together

### Flatten models with SKUs

SKUs are listed as an array under each model. We want one SKU per line

In [18]:
def flatten_models_sku(models_region):
    for model_join in models_region:
        model = model_join['model']
        skus = model['model']['skus']
        if skus:
            for sku in skus:
                yield model_join | {
                    "sku": sku
                }
model_region_skus = list(flatten_models_sku(models_regions))

In [19]:
def format_model_region_sku(model_region_sku):
    sku = model_region_sku['sku']
    return format_model_region(model_region_sku) | {
        'skuName': sku['name'],
        'usageName': sku['usageName'],
    }

## Convert the list to a DataFrame
df = pd.DataFrame(map(format_model_region_sku, model_region_skus))
df.loc[:, df.columns != 'skus'].head()

Unnamed: 0,region,kind,modelName,modelVersion,skuName,usageName
0,westus,OpenAI,gpt-35-turbo,613,GlobalBatch,OpenAI.GlobalBatch.gpt-35-turbo
1,westus,OpenAI,gpt-35-turbo,1106,Standard,OpenAI.Standard.gpt-35-turbo
2,westus,OpenAI,gpt-35-turbo,1106,GlobalBatch,OpenAI.GlobalBatch.gpt-35-turbo
3,westus,OpenAI,gpt-35-turbo,1106,ProvisionedManaged,OpenAI.ProvisionedManaged
4,westus,OpenAI,gpt-35-turbo,125,Standard,OpenAI.Standard.gpt-35-turbo


> **Note**: Only the top 5 are displayed

> **Important**: The `region` and `usageName` are the keys we'll use to join models with usages

### Join models with usages

Use the **region** and the **SKU**'s `usageName` to join models with usages

In [20]:
def join_models_usages(models_flattened, regional_usages_by_name):
    for model_join in models_flattened:
        model = model_join['model']
        sku = model_join['sku']
        region = model_join['region']
        usage = None
        if sku:
            usageName = sku['usageName']
            usages_by_name = regional_usages_by_name[region]
            usage = usages_by_name[usageName]
        yield model_join | {
            "usage": usage
        }

joined_model_sku_usages = list(join_models_usages(model_region_skus, regional_usages_by_name))

In [21]:
def format_model_region_usage(model_region_usage):
    usage = model_region_usage['usage']
    return format_model_region_sku(model_region_usage) | format_usage(usage)

## Convert the list to a DataFrame
df = pd.DataFrame(map(format_model_region_usage, joined_model_sku_usages))
df.sort_values(by='currentUsage', ascending=False).loc[:, df.columns != 'skus'].head()

Unnamed: 0,region,kind,modelName,modelVersion,skuName,usageName,localizedUsageName,currentUsage,usageLimit,remainingQuota,usageUnit
951,eastus2,AIServices,gpt-4o,2024-08-06,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
946,eastus2,AIServices,gpt-4o,2024-05-13,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
907,eastus2,OpenAI,gpt-4o,2024-08-06,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
902,eastus2,OpenAI,gpt-4o,2024-05-13,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
905,eastus2,OpenAI,gpt-4o,2024-05-13,DataZoneStandard,OpenAI.DataZoneStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - DataZoneStandard,10,30,20,Count


> **Note**: Only the top 5 are displayed

## Find models matching requirements

### Find models matching name, TPM and SKU requirements

Let's define a helper function to help us filter models by name, remaining tpm and SKU:

In [22]:
def model_matches(model_names, tpm, sku_names):
    def criteria(model, sku, usage, **kwargs):
        if not usage:
            return False
        limit = usage['limit']
        current = usage['currentValue']
        remaining = limit - current
        model_name = model['model']['name']
        sku_name = sku['name']
        matches = True
        if model_names:
            matches = matches and model_name in model_names
        if sku_names:
            matches = matches and sku_name in sku_names
        if tpm:
            matches = matches and remaining >= tpm
        return matches
    return criteria

def filter_models_sku_usages(criteria, models_sku_usages):
    return list(filter(lambda msu: criteria(**msu), models_sku_usages))

### Example of the Contoso Creative Writer application

Let's take the example for the [contoso-creative-writer](https://github.com/Azure-Samples/contoso-creative-writer) sample Azure AI application:

| Requirement	| Model	| SKU	| TPM |
| --- | --- | --- | --- |
| editor	| gpt-35-turbo or gpt-35-turbo-16k	| Standard or Global Standard	| 10k |
| eval	| gpt-4 or gpt-4-32k	| Standard or Global Standard	| 20k |
| writer	| gpt-4o	| Standard or Global Standard	| 15k |
| embeddings	| text-embedding-3-small or text-embedding-ada-002	| Standard or Global Standard	| 30k |


Here's how to encode those requirements using the previous `model_matches` function:

In [23]:
editor_req      = model_matches(model_names = ['gpt-35-turbo', 'gpt-35-turbo-16k'], tpm = 10, sku_names = ['Standard', 'GlobalStandard'])
eval_req        = model_matches(model_names = ['gpt-4', 'gpt-4-32k'], tpm = 5, sku_names = ['Standard', 'GlobalStandard'])
writer_req      = model_matches(model_names = ['gpt-4o'], tpm = 15, sku_names = ['Standard', 'GlobalStandard'])
embedding_req   = model_matches(model_names = ['text-embedding-3-small', 'text-embedding-ada-002'], tpm = 30, sku_names = ['Standard', 'GlobalStandard'])

requirements = [editor_req, eval_req, writer_req, embedding_req]

### Find all models matching requirements

Let's say you want to get an overview of all models matching any of the requirements

For this, we'll need a helper to build a predicate that returns true if any of the requirements is true

In [24]:
def __any__(predicates):
    from functools import reduce
    def pred(*args, **kwargs):
        return reduce(lambda bool_value, predicate: bool_value or predicate(*args, **kwargs), predicates, False)
    return pred

In [25]:
any_of_requirements = __any__(requirements)

models_sku_usages_filtered = filter_models_sku_usages(any_of_requirements, joined_model_sku_usages)

df = pd.DataFrame(map(format_model_region_usage, models_sku_usages_filtered))
df.sort_values(by='currentUsage', ascending=False).loc[:, df.columns != 'skus'].head()

Unnamed: 0,region,kind,modelName,modelVersion,skuName,usageName,localizedUsageName,currentUsage,usageLimit,remainingQuota,usageUnit
76,eastus2,OpenAI,gpt-4o,2024-05-13,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
77,eastus2,OpenAI,gpt-4o,2024-08-06,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
87,eastus2,AIServices,gpt-4o,2024-08-06,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
86,eastus2,AIServices,gpt-4o,2024-05-13,GlobalStandard,OpenAI.GlobalStandard.gpt-4o,Tokens Per Minute (thousands) - gpt-4o - GlobalStandard,11,30,19,Count
4,westus,OpenAI,gpt-4,turbo-2024-04-09,Standard,OpenAI.Standard.gpt-4-turbo,Tokens Per Minute (thousands) - GPT-4-Turbo,0,8,8,Count


> **Note**: Only the top 5 are displayed

## Finding regions matching all model requirements

Sometimes, you're interested in deploying an AI application in a region where all requirements are met.

In [26]:
def unique_regions(model_sku_usages):
    return set([model_sku_usage['region'] for model_sku_usage in model_sku_usages])

In [27]:
def regions_matching_all(requirements, joined_model_sku_usages):
    filter_msu_sets = [filter_models_sku_usages(req, joined_model_sku_usages) for req in requirements]
    regions_sets = [unique_regions(model_sku_usages) for model_sku_usages in filter_msu_sets]
    return set.intersection(*regions_sets)

final_regions = regions_matching_all(requirements, joined_model_sku_usages)
print(", ".join(final_regions))

southcentralus, westus, eastus2, eastus, westus3, northcentralus


## Useful links

- [Azure REST API Browser](https://learn.microsoft.com/en-us/rest/api/)
- [Azure Locations API MS Learn](https://learn.microsoft.com/en-us/rest/api/resources/subscriptions/list-locations)
- [Azure AI Models API MS Learn](https://learn.microsoft.com/en-us/rest/api/aiservices/accountmanagement/models/list)
- [Azure AI Usages API MS Learn](https://learn.microsoft.com/en-us/rest/api/aiservices/accountmanagement/usages/list) (Quotas)