In this notebook, I will be exploring the API rate limit based on the mathematical approach mentioned in the `README.md`

### Function to calculate API gateway maximum rate limit based on the max input token size

In [4]:
def max_rate_limit_per_minute(
    total_llm_tokens_per_minute: float,
    max_input_tokens_per_request: float,
    max_output_tokens_per_request: float = 4096
) -> float:
    return total_llm_tokens_per_minute / (max_input_tokens_per_request + max_output_tokens_per_request)

### Function to calculate maximum input token limit based on the required rate limit

In [6]:
def max_input_token_limit_per_request(
    total_llm_tokens_per_minute: float,
    max_rate_limit_per_minute: float,
    max_output_tokens_per_request: float = 4096
) -> float:
    return (total_llm_tokens_per_minute / max_rate_limit_per_minute) - max_output_tokens_per_request

## Function to convert number of tokens to approx. number of words

This conversion is derived based on the [documentation](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) here.

In [None]:
def token_to_word(tokens: float):
    return 0.75 * token

# Model Exploration

Now in this section, I will pick up a model from a provider and then tabulate the different API constraints based on the above formule

## AWS Bedrock - Claude 3 Sonnet

It has the total token limit of 1,000,000 tokens per minute as per the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html), the maximum Bedrock API rate limit of 500 requests per minute and the maximum input token window of 180,000 (it can actually take 200,000 but I am leaving a buffer) token per request and maximum output token limit of 4096

In [None]:
import pandas as pd

# AWS Bedrock - Claude 3 Sonnet constraints
total_llm_tokens_per_minute = 1_000_000
max_rate_limit_per_minute = 500
max_output_tokens_per_request = 4096
max_input_tokens_per_request = 180_000

# Generate data for each 4096 input size till 180,000
input_tokens_list = list(range(4096, max_input_tokens_per_request + 1, 4096))

# Calculate the rate limits and word equivalents
data = []
for input_tokens in input_tokens_list:
    rate_limit = max_rate_limit_per_minute(
        total_llm_tokens_per_minute,
        input_tokens,
        max_output_tokens_per_request
    )
    words = token_to_word(input_tokens)
    data.append((input_tokens, rate_limit, words))

# Create a DataFrame for better visualization
df = pd.DataFrame(data, columns=['Input Tokens', 'Max Rate Limit Per Minute', 'Approx. Words'])

# Display the DataFrame
import ace_tools as tools; 
tools.display_dataframe_to_user(name="AWS Bedrock Claude 3 Sonnet API Constraints", dataframe=df)
