Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Enforce rate limits on 429s #2642

Merged
merged 5 commits into from
Apr 14, 2023
Merged

Conversation

ben-reilly
Copy link
Contributor

@ben-reilly ben-reilly commented Mar 29, 2023

Prior to this change, webhook requests were not being properly rate limited. In RequestQueueBucket.cs, when a 429 occurred, the UpdateRateLimit function was only setting a delay timer for the rate limit, but was otherwise not blocking requests until that timer expired.

See #2592, which may be resolved by this PR.

Problem 1: 429s not being enforced

In that class, the integer _semaphore contains the number of requests remaining before the rate limit will be activated. If that value is non-positive, then according to the API we are being rate limited.

The problem was that _semaphore was not being set to zero when a 429 occurred, so requests weren't actually being limited.

Problem 2: Rate limits being updated twice, with different values

It was also theoretically possible for the Remaining value coming back from the response to be zero, indicating that we are out of remaining requests, and in that case _semaphore was likewise not being set to zero. To avoid any problems that could have stemmed from that I added code to update _semaphore to the Remaining value on every request (prior to checking for a 429).

That change uncovered another lurking problem: on 429s the UpdateRateLimit function was being called twice -- first for the 429 and second with the value made to look like a 429 had not occured, which undid my changes to _sempahore.

Problem 3: Using WindowCount == 0 to mean "unlimited"

I didn't dig too much into the cause of this problem but somewhere in the midst of fixing the other two, I began encountering situations where the API was returning a Limit value of zero. I think this happened because the incorrect rate limiting resulted in the API banning requests for a short time.

The problem there was that in RequestQueueBucket.cs, a limit of zero meant "no limit", and requests stopped being throttled. See this line.

I fixed this by changing the "no limit" value to -1, and therefore treating zero to mean what the API intends, namely that there is a limit of zero requests.

Test code

Any code that sends a bunch of requests on a webhook will do fine. Note that webhooks have an undocumented limit of 30/min/channel, which kicks in after 40 or so requests. This bug only gets triggered after you hit that limit and you start to receive 429s.

DiscordSocketConfig _config = new()
{
    GatewayIntents = GatewayIntents.DirectMessages | GatewayIntents.GuildMessages | GatewayIntents.Guilds,
};

var client = new DiscordRestClient(_config);

// Log in and set up a channel and webhook
await client.LoginAsync(TokenType.Bot, botToken);
var guild = await client.GetGuildAsync(guildId);
var channel = await guild.CreateTextChannelAsync("webhook-test-zone");

var webhook = await channel.CreateWebhookAsync("Test Webhook");
var webhookClient = new DiscordWebhookClient(webhook);

// The Webhook log will capture when rate-limiting occurs via 429 response.
webhookClient.Log += (LogMessage lm) =>
{
    return Task.Run(() => Console.WriteLine($"{DateTimeOffset.UtcNow.ToString("HH:mm:ss.ffff")} WEBHOOK: {lm.Message}"));
};

// We'll log every IRateLimitInfo we get; it is generated on every request.
var requestOptions = new RequestOptions()
{
    RatelimitCallback = (IRateLimitInfo info) =>
    {
        Console.WriteLine($"{DateTimeOffset.UtcNow.ToString("HH:mm:ss.ffff")} | RETRY AFTER: {info.RetryAfter?.ToString() ?? "-"} LIMIT: {info.Limit} REMAINING: {info.Remaining}");
        return Task.CompletedTask;
    }
};

for (int i = 0; i < 100; ++i)
{
    Console.WriteLine();
    Console.WriteLine($"[{i}] vvv Start");
    await webhookClient.SendMessageAsync($"Message {i}", options: requestOptions);
    Console.WriteLine($"[{i}] ^^^ End");
    Console.WriteLine();
}

// Clean up
await webhookClient.DeleteWebhookAsync();
await channel.DeleteAsync();

Bad example

Prior to the fix, the output on a failed request looks like this:

[43] vvv Start
03:03:57.3270 | RETRY AFTER: 1 LIMIT: 5 REMAINING: 3
03:03:57.3271 WEBHOOK: Rate limit triggered: POST <snip>
03:03:57.4340 | RETRY AFTER: 1 LIMIT: 5 REMAINING: 2
03:03:57.4340 WEBHOOK: Rate limit triggered: POST <snip>
03:03:57.5296 | RETRY AFTER: 2 LIMIT: 5 REMAINING: 2
03:03:57.5296 WEBHOOK: Rate limit triggered: POST <snip>
03:03:57.6251 | RETRY AFTER: 2 LIMIT: 5 REMAINING: 1
03:03:57.6252 WEBHOOK: Rate limit triggered: POST <snip>
03:03:59.8739 | RETRY AFTER: - LIMIT: 5 REMAINING: 4
[43] ^^^ End

Notice that the request immediately retries, eating up all of the remaining requests until the hard stop is hit (_semaphore == 0).

Good example

With the fix, the request is only tried once, then sleeps the required amount of time before trying again (and succeeding).

[40] vvv Start
03:02:49.3677 | RETRY AFTER: 1 LIMIT: 5 REMAINING: 3
03:02:49.3677 WEBHOOK: Rate limit triggered: POST <snip>
03:02:50.2219 | RETRY AFTER: - LIMIT: 5 REMAINING: 4
[40] ^^^ End

@ben-reilly
Copy link
Contributor Author

I haven't had a chance to test it yet, but actually I think the _semaphore = info.Remaining.Value; line here should probably be moved out of the if altogether. Some brief testing I did with the current change showed that messages still got spammed when the Remaining value was down to zero.

@ben-reilly
Copy link
Contributor Author

I'm no longer so sure that this is the solution. But I still see the same problems with requests -- they're being retried even when the rate limiting has kicked in.

Will close this PR, but hopefully someone else finds a solution.

@ben-reilly ben-reilly closed this Mar 31, 2023
@ben-reilly ben-reilly deleted the fix/rate-limits branch March 31, 2023 00:50
@ben-reilly ben-reilly reopened this Mar 31, 2023
@ben-reilly
Copy link
Contributor Author

I gave up too soon. I've done more testing and I think I've solved the problem. I'll annotate my changes and share my test code.

@csmir csmir merged commit 69cce5b into discord-net:dev Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants