# Setting Usage Limits with the MLflow AI Gateway
This notebooks shows how to set usage limits with the MLflow AI Gateway in Databricks. Usage limits can be set per-endpoint or per-user and are an effective way to control costs and prevent exceeding hosted model usage limits.

This example will provide examples for both OpenAI and Mosaic endpoints.

In [0]:
!pip install --upgrade 'mlflow[gateway]'
dbutils.library.restartPython()

In [0]:
# configure keys
import os
from dotenv import load_dotenv
import mlflow.gateway

load_dotenv()

OPENAI_API_KEY = dbutils.secrets.get(scope="daniel.liden", key="OPENAI_API_KEY")
MOSAIC_API_KEY = dbutils.secrets.get(scope="daniel.liden", key="MOSAIC_API_KEY")

In [0]:
mlflow.gateway.set_gateway_uri("databricks")

In [0]:

# mlflow.gateway.delete_route("dl-gpt-3_5-turbo")
mlflow.gateway.create_route(
    name="dl-gpt-3_5-turbo",
    route_type="llm/v1/chat",
    model= {
        "name": "gpt-3.5-turbo", 
        "provider": "openai",
        "openai_config": {
          "openai_api_key": OPENAI_API_KEY,
        }
    }
)

# mlflow.gateway.delete_route("dl-llama-70b-chat-mosaic")
mlflow.gateway.create_route(
    name="dl-llama-70b-chat-mosaic",
    route_type="llm/v1/chat",
    model= {
        "name": "llama2-70b-chat", 
        "provider": "mosaicml",
        "mosaicml_config": {
          "mosaicml_api_key": MOSAIC_API_KEY,
        }
    }
)

## Set limits with `set_limits`
- for the openai route, we'll set limits to 3 requests per user per minute.
- for the mosaic route, we'll set limits to 2 requests for the endpoint per minute.

In [0]:
mlflow.gateway.set_limits(
    "dl-gpt-3_5-turbo", [{"key": "user", "renewal_period": "minute", "calls": 1}]
)

mlflow.gateway.set_limits(
    "dl-llama-70b-chat-mosaic",
    [{"key": "user", "renewal_period": "minute", "calls": 1}],
)

In [0]:
mlflow.gateway.get_limits("dl-gpt-3_5-turbo")

In [0]:
mlflow.gateway.query(
    "dl-gpt-3_5-turbo",
    {"messages": [{"role": "user", "content": "Very concisely explain MLflow runs."}]},
)

In [0]:
mlflow.gateway.query(
    "dl-gpt-3_5-turbo",
    {
        "messages": [
            {"role": "user", "content": "Very concisely explain MLflow artifacts."}
        ]
    },
)

In [0]:
mlflow.gateway.query(
    "dl-llama-70b-chat-mosaic",
    {"messages": [{"role": "user", "content": "Very concisely explain MLflow runs."}]},
)
mlflow.gateway.query(
    "dl-llama-70b-chat-mosaic",
    {
        "messages": [
            {"role": "user", "content": "Very concisely explain MLflow experiments."}
        ]
    },
)
mlflow.gateway.query(
    "dl-llama-70b-chat-mosaic",
    {
        "messages": [
            {"role": "user", "content": "Very concisely explain MLflow models."}
        ]
    },
)