# 如何处理速率限制

当你反复调用 OpenAI API 时，可能会遇到 `429: 'Too Many Requests'` 或 `RateLimitError` 的错误信息。这些错误信息是由于超过 API 的速率限制所致。

本指南分享了一些避免和处理速率限制错误的技巧。

要查看一个用于限制并行请求以避免速率限制错误的示例脚本，请参见 [api_request_parallel_processor.py](https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py)。

## 速率限制存在的原因

速率限制是 API 的常见做法，它们的设置有几个不同的原因。

- 首先，它们有助于防止对 API 的滥用或误用。例如，恶意行为者可能会通过大量请求来淹没 API，试图使其过载或导致服务中断。通过设置速率限制，OpenAI 可以防止这种活动。
- 其次，速率限制有助于确保每个人都能公平地使用 API。如果某个人或某个组织发出了过多的请求，可能会拖慢整个 API 的速度。通过限制单个用户的请求数量，OpenAI 确保每个人都有机会在不遇到减速的情况下使用 API。
- 最后，速率限制可以帮助 OpenAI 管理其基础设施的整体负载。如果对 API 的请求量急剧增加，可能会给服务器带来负担并导致性能问题。通过设置速率限制，OpenAI 可以帮助保持所有用户的体验平稳和一致。

尽管遇到速率限制可能会令人沮丧，但速率限制的存在是为了保护 API 用户的可靠操作。

## 默认速率限制

您的速率限制和消费限制（配额）会根据多种因素自动调整。随着您对OpenAI API的使用量增加并成功支付账单，我们会自动提高您的使用等级。您可以使用以下资源找到有关速率限制的具体信息。

### 其他速率限制资源

了解更多关于 OpenAI 速率限制的信息，请参阅以下资源：

- [指南：速率限制](https://platform.openai.com/docs/guides/rate-limits?context=tier-free)
- [帮助中心：API 使用是否受到任何速率限制？](https://help.openai.com/en/articles/5955598-is-api-usage-subject-to-any-rate-limits)
- [帮助中心：如何解决 429：“请求过多”错误？](https://help.openai.com/en/articles/5955604-how-can-i-solve-429-too-many-requests-errors)

### 请求增加速率限制

如果您希望增加您所在组织的速率限制，请访问您的 [速率限制设置页面](https://platform.openai.com/account/limits) 以了解如何提高您的使用层级。

In [None]:
import openai
import os

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

## 示例速率限制错误

当API请求发送得太快时，将会发生速率限制错误。如果使用OpenAI Python库，错误信息看起来会像这样：

```
RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min. Contact support@openai.com if you continue to have issues or if you’d like to request an increase.
```

下面是触发速率限制错误的示例代码。

In [3]:
# 在循环中请求大量完成
for _ in range(100):
    client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=10,
    )

## 如何避免速率限制错误

### 使用指数退避进行重试

避免速率限制错误的一种简单方法是使用随机指数退避自动重试请求。使用指数退避进行重试意味着在遇到速率限制错误时执行短暂休眠，然后重试未成功的请求。如果请求仍然不成功，则增加休眠时间并重复该过程。这将持续到请求成功为止，或者达到最大重试次数为止。

这种方法有许多好处：

- 自动重试意味着您可以在没有崩溃或丢失数据的情况下从速率限制错误中恢复
- 指数退避意味着您的第一次重试可以很快尝试，同时如果您的前几次重试失败，仍然可以从较长的延迟中受益
- 将随机抖动添加到延迟中有助于避免所有重试同时发生

请注意，未成功的请求会影响您的每分钟限制，因此持续重新发送请求是行不通的。

以下是一些示例解决方案。

#### 示例＃1：使用Tenacity库

[Tenacity](https://tenacity.readthedocs.io/en/latest/)是一个Apache 2.0许可的通用重试库，用Python编写，旨在简化向几乎任何内容添加重试行为的任务。

要为您的请求添加指数退避，您可以使用`tenacity.retry` [装饰器](https://peps.python.org/pep-0318/)。以下示例使用`tenacity.wait_random_exponential`函数为请求添加随机指数退避。

请注意，Tenacity库是一个第三方工具，OpenAI不对其可靠性或安全性提供任何保证。

In [6]:
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # 指数退避

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)


completion_with_backoff(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Once upon a time,"}])

ChatCompletion(id='chatcmpl-8PAu6anX2JxQdYmJRzps38R8u0ZBC', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='in a small village nestled among green fields and rolling hills, there lived a kind-hearted and curious young girl named Lily. Lily was known for her bright smile and infectious laughter, bringing joy to everyone around her.\n\nOne sunny morning, as Lily played in the meadows, she stumbled upon a mysterious book tucked away beneath a tall oak tree. Intrigued, she picked it up and dusted off its weathered cover to reveal intricate golden patterns. Without hesitation, she opened it, discovering that its pages were filled with magical tales and enchanting adventures.\n\nAmong the stories she found, one particularly caught her attention—a tale of a long-lost treasure hidden deep within a mysterious forest. Legend had it that whoever found this hidden treasure would be granted one wish, no matter how big or small. Excited by the prospect of findin

#### 示例 #2：使用backoff库

另一个提供退避和重试功能装饰器的库是[backoff](https://pypi.org/project/backoff/)。

与Tenacity类似，backoff库也是一个第三方工具，OpenAI不对其可靠性或安全性提供任何保证。

In [10]:
import backoff  # 指数退避

@backoff.on_exception(backoff.expo, openai.RateLimitError)
def completions_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)


completions_with_backoff(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Once upon a time,"}])


ChatCompletion(id='chatcmpl-8PAwkg7Q9pPeAkvVuAZ8AyA108WhR', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="in a small village, there lived a young girl named Lily. She had fiery red hair, lively green eyes, and a spirit as wild as the rushing river nearby. Lily was known for her curious nature and her desire to explore the world beyond the village boundaries.\n\nOne day, while playing near the river, Lily spotted an injured bird nested on a branch. Its wing was broken, and it seemed unable to fly away. Lily's heart filled with sadness, and she knew she couldn't leave the bird alone.\n\nCarefully, she climbed up the tree and gently placed the bird inside her pocket. Lily brought it home and made a cozy bed for it in a small wooden box. She named the bird Ruby, after its shimmering red feathers.\n\nDays turned into weeks, and Ruby's wing slowly healed under Lily's constant care and attention. As they spent time together, a deep bond grew between the

#### 示例3：手动退避实现

如果你不想使用第三方库，你可以实现自己的退避逻辑。

In [11]:
# 导入
import random
import time

# 定义一个重试装饰器
def retry_with_exponential_backoff(
    func,
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 10,
    errors: tuple = (openai.RateLimitError,),
):
    """使用指数退避算法重试函数。"""

    def wrapper(*args, **kwargs):
        # 初始化变量
        num_retries = 0
        delay = initial_delay

        # 循环直到收到成功的响应、达到最大重试次数或引发异常为止。
        while True:
            try:
                return func(*args, **kwargs)

            # 在指定错误时重试
            except errors as e:
                # 增加重试次数
                num_retries += 1

                # 检查是否已达到最大重试次数
                if num_retries > max_retries:
                    raise Exception(
                        f"Maximum number of retries ({max_retries}) exceeded."
                    )

                # 增加延迟
                delay *= exponential_base * (1 + jitter * random.random())

                # 睡眠以应对延误
                time.sleep(delay)

            # 对未指定的任何错误抛出异常
            except Exception as e:
                raise e

    return wrapper


@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)


completions_with_backoff(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Once upon a time,"}])

ChatCompletion(id='chatcmpl-8PAxGvV3GbLpnOoKSvJ00XCUdOglM', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="in a faraway kingdom, there lived a young princess named Aurora. She was known for her beauty, grace, and kind heart. Aurora's kingdom was filled with lush green meadows, towering mountains, and sparkling rivers. The princess loved spending time exploring the enchanting forests surrounding her castle.\n\nOne day, while Aurora was wandering through the woods, she stumbled upon a hidden clearing. At the center stood a majestic oak tree, its branches reaching towards the sky. Aurora approached the tree with curiosity, and as she got closer, she noticed a small door at its base.\n\nIntrigued, she gently pushed open the door and was amazed to find herself in a magical realm. The forest transformed into a breathtaking wonderland, with colorful flowers blooming in every direction and woodland creatures frolicking joyously. Aurora's eyes widened with

## 如何在给定速率限制的情况下最大化批处理的吞吐量

如果您正在处理用户的实时请求，退避和重试是一种很好的策略，可以最大限度地减少延迟，同时避免速率限制错误。

然而，如果您正在处理大量的批处理数据，其中吞吐量比延迟更重要，除了退避和重试之外，还有一些其他事项可以帮助您提高吞吐量。

### 主动在请求之间添加延迟

如果您不断地达到速率限制，然后退避，然后再次达到速率限制，然后再次退避，那么您的请求预算中可能有相当大一部分会被“浪费”在需要重试的请求上。这会限制您的处理吞吐量，即使速率限制是固定的。

在这种情况下，一个潜在的解决方案是计算您的速率限制，并为每个请求添加一个等于其倒数的延迟（例如，如果您的速率限制是每分钟20个请求，则为每个请求添加3-6秒的延迟）。这可以帮助您在接近速率限制上限的情况下运行，而不会达到上限并产生浪费的请求。

#### 添加延迟到请求的示例

In [12]:
# 导入
import time

# 定义一个函数，为Completion API调用添加延迟
def delayed_completion(delay_in_seconds: float = 1, **kwargs):
    """延迟指定时间量以完成操作。"""

    # 因延误而休息
    time.sleep(delay_in_seconds)

    # 调用完成API并返回结果
    return client.chat.completions.create(**kwargs)


# 根据您的速率限制计算延迟
rate_limit_per_minute = 20
delay = 60.0 / rate_limit_per_minute

delayed_completion(
    delay_in_seconds=delay,
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Once upon a time,"}]
)


ChatCompletion(id='chatcmpl-8PAyCR1axKsomV0e349XiCN1Z81pH', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="in a small village, there lived a young girl named Maya. Maya was known for her kindness and love for nature. She spent hours exploring the forests surrounding the village, admiring the vibrant flowers and talking to the animals.\n\nOne sunny day, as Maya was picking wildflowers, she stumbled upon a wounded blackbird with a broken wing. Feeling sorry for the bird, Maya gently picked it up and cradled it in her hands. She knew she had to help the bird, so she hurried back to her cottage.\n\nMaya set up a cozy nest for the blackbird and carefully splinted its wing. She fed it worms and berries, doing everything she could to nurse it back to health. Each day, she would sing lullabies and tell stories to keep the blackbird company. Slowly, the bird's wing healed, and before long, it was ready to fly again.\n\nOn a beautiful morning, Maya opened t

### 批量请求

OpenAI API对每分钟请求次数和每分钟令牌数有单独的限制。

如果你达到了每分钟请求次数的限制，但在每分钟令牌数上还有余地，你可以通过将多个任务合并到每个请求中来增加吞吐量。这将允许你每分钟处理更多的令牌，特别是对于较小的模型。

发送一批提示的请求与普通的API调用完全相同，只是将一组字符串传递给`prompt`参数，而不是单个字符串。

**警告：** 响应对象可能不会按照提示的顺序返回完成，因此请始终记得使用`index`字段将响应与提示匹配回来。

#### 无批量处理示例

In [13]:
num_stories = 10
content = "Once upon a time,"

# 连续示例，每次请求完成一个故事情节
for _ in range(num_stories):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": content}],
        max_tokens=20,
    )

    # 打印故事
    print(content + response.choices[0].message.content)


Once upon a time,in a small village nestled between rolling green hills, there lived a young girl named Lily. She had
Once upon a time,in a small village nestled in the heart of a lush forest, lived a young girl named Evelyn.
Once upon a time,in a faraway kingdom, there lived a young princess named Aurora. She was known for her kind
Once upon a time,in a faraway kingdom called Enchantia, there lived a young girl named Ella. Ella was
Once upon a time,in a small village nestled among the rolling hills, lived a young woman named Lucy. Lucy was known
Once upon a time,in a small village nestled between rolling hills, there lived a young girl named Ava. Ava was a
Once upon a time,in a faraway kingdom, there lived a wise and just king named Arthur. King Arthur ruled over
Once upon a time,in a small village nestled among towering mountains, lived a young girl named Lily. She was known for
Once upon a time,in a small village nestled in the heart of a lush forest, there lived a young girl named 

#### 批处理示例

In [15]:
num_stories = 10
prompts = ["Once upon a time,"] * num_stories

# 批量示例，每请求包含10个故事完成项
response = client.chat.completions.create(
    model="curie",
    prompt=prompts,
    max_tokens=20,
)

# 按索引将完成项与提示匹配
stories = [""] * len(prompts)
for choice in response.choices:
    stories[choice.index] = prompts[choice.index] + choice.text

# 打印故事
for story in stories:
    print(story)


Once upon a time, I lived in hope. I convinced myself I knew best, because, naive as it might sound,
Once upon a time, Thierry Henry was invited to have a type of frosty exchange with English fans, in which
Once upon a time, and a long time ago as well, PV was passively cooled because coils cooled by use of metal driving
Once upon a time, there was a land called Texas. It was about the size of Wisconsin. It contained, however,
Once upon a time, there was an old carpenter who had three sons. The locksmith never learned to read or write
Once upon a time, there was a small farming town called Moonridge Village, far West across the great vast plains that lay
Once upon a time, California’s shorelines, lakes, and valleys were host to expanses of untamed wilderness
Once upon a time, she said. It started with a simple question: Why don’t we know any stories?
Once upon a time, when I was a young woman, there was a movie named Wuthering Heights. Stand by alleges
Once upon a time, a very long tim

## 并行处理脚本示例

我们编写了一个示例脚本，用于并行处理大量的API请求：[api_request_parallel_processor.py](https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py)。

该脚本结合了一些方便的功能：
- 从文件中流式传输请求，以避免在处理大型作业时耗尽内存
- 并发进行请求，以最大化吞吐量
- 对请求和令牌使用进行限速，以保持在速率限制之下
- 重试失败的请求，以避免丢失数据
- 记录错误，以诊断请求中的问题

请随意使用原样或根据您的需求进行修改。