# Setting Usage Limits with the MLflow AI Gateway
This notebooks shows how to set usage limits with the MLflow AI Gateway in Databricks. Usage limits can be set per-endpoint or per-user and are an effective way to control costs and prevent exceeding hosted model usage limits.

This example will provide examples for both OpenAI and Mosaic endpoints.

## 1. Set up MLflow with AI Gateway

In [0]:
!pip install --upgrade 'mlflow[gateway]'
dbutils.library.restartPython()

## 2. Load keys

You can add API keys from the Databricks CLI with:

```bash
databricks secrets create-scope --scope <scope-name>
databricks secrets put --scope <scope-name> --key <key-name>
```

In [0]:
# configure keys
import os

# from dotenv import load_dotenv
import mlflow.gateway

# load_dotenv()
OPENAI_API_KEY = dbutils.secrets.get(scope="daniel.liden", key="OPENAI_API_KEY")
MOSAIC_API_KEY = dbutils.secrets.get(scope="daniel.liden", key="MOSAIC_API_KEY")

In [0]:
mlflow.gateway.set_gateway_uri("databricks")

## 3. Search available routes and add new routes if needed

In [0]:
mlflow.gateway.search_routes()

In [0]:
# mlflow.gateway.delete_route("dl-gpt-3_5-turbo")
mlflow.gateway.create_route(
    name="dl-gpt-3_5-turbo",
    route_type="llm/v1/chat",
    model= {
        "name": "gpt-3.5-turbo", 
        "provider": "openai",
        "openai_config": {
          "openai_api_key": OPENAI_API_KEY,
        }
    }
)

# mlflow.gateway.delete_route("dl-llama-70b-chat-mosaic")
mlflow.gateway.create_route(
    name="dl-llama-70b-chat-mosaic",
    route_type="llm/v1/chat",
    model= {
        "name": "llama2-70b-chat", 
        "provider": "mosaicml",
        "mosaicml_config": {
          "mosaicml_api_key": MOSAIC_API_KEY,
        }
    }
)

## 4. Set limits with `set_limits`
See documentation [here](https://mlflow.org/docs/latest/python_api/mlflow.gateway.html#mlflow.gateway.MlflowGatewayClient.set_limits). You can set limits per-route or per-user; in the example below, they are set per-user. For demonstration purposes, we set the limits as low as possible: one request per minute.



In [0]:
mlflow.gateway.set_limits(
    "dl-gpt-3_5-turbo", [{"key": "user", "renewal_period": "minute", "calls": 1}]
)

mlflow.gateway.set_limits(
    "dl-llama-70b-chat-mosaic",
    [{"key": "user", "renewal_period": "minute", "calls": 1}],
)

In [0]:
mlflow.gateway.get_limits("dl-gpt-3_5-turbo")

## 5. Query the route with the Python client

In [0]:
mlflow.gateway.query(
    "dl-gpt-3_5-turbo",
    {"messages": [{"role": "user", "content": "Very concisely explain MLflow runs."}],
     "max_tokens": 40},
)

In [0]:
mlflow.gateway.query(
    "dl-gpt-3_5-turbo",
    {"messages": [{"role": "user", "content": "Very concisely explain MLflow experiments."}],
     "max_tokens": 40},
)

## 6. Query the route with the REST API

In [0]:
route = mlflow.gateway.get_route("dl-gpt-3_5-turbo")
route_url = route.route_url

In [0]:
token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().getOrElse(None)

In [0]:
import requests
import json

headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer ' + token
}
data = {
    'messages': [
        {
            'role': 'user',
            'content': 'Tell me about the mlflow rest API.'
        }
    ],
    'max_tokens': 40
}

response = requests.post(route_url, headers=headers, data=json.dumps(data))
response, response.json()