Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to delay/limit the number of GitHub repo checks and Pull requests in time #7690

Closed
Apollon77 opened this issue May 24, 2022 · 22 comments
Closed
Assignees
Labels
enhancement Adding or requesting a new feature. undecided These features might not be implemented. Can be prioritized by sponsorship.
Milestone

Comments

@Apollon77
Copy link

Apollon77 commented May 24, 2022

Describe the problem

We have an on premise Weblate installation which is connected to more then 80 different GitHub repositories because we have a very decentralized approach of plugins for a smart home system. They are configured as described in https://docs.weblate.org/en/latest/vcs.html?highlight=github#github-pull-requests

Today after a reboot of the Weblate Container we get many emails from Weblate telling us that

or later too

  • Rate limits for user reached

It seems to me that there was something hanging and so weblate did not updated repos over time and so many many were outdated on start. Now he started doing the updates and pull requests one after the other (when checking process list felt like partially 2 in parallel?)
Also: Repos that got such an error seems to be retried every 20 seconds! (at least from what Weblate shows me in "klast seen" on the warnings page

Describe the solution you'd like

It would be great to have an option to delay such "mass updates" to be able to match them with the GitHub rate limuts ... e.g. one per 5 mins or something like that.

Additionally there should be a configurable "delay after one repo got an error before it is retried" ... seems we have some repos that are tried near to once every 20seconds!

Describe alternatives you've considered

None so far, because none came into our mind ... We also need to find out what actually happened that made it "stop processing the updates "in general ...

Screenshots

No response

Additional context

No response

@nijel
Copy link
Member

nijel commented May 24, 2022

Seems like your Celery workers were not running, and the queued tasks are now being processed. I don't think this is a scenario we should support. In normal operation, rate limiting is not needed as commits don't happen often enough to hit any limits.

@nijel nijel added the question This is more a question for the support than an issue. label May 24, 2022
@github-actions
Copy link

This issue looks more like a support question than an issue. We strive to answer these reasonably fast, but purchasing the support subscription is not only more responsible and faster for your business but also makes Weblate stronger.

In case your question is already answered, making a donation is the right way to say thank you!

@Apollon77
Copy link
Author

You could nbe right with the celery workers ... now they run but simply too fast ... also as said they are sometime trying evers 25seconds ... in my eyes this should be slowed down in general ... with many repositories and an active transkation community t could be possible to run into rate limits also in normal cases :-)

@nijel
Copy link
Member

nijel commented May 24, 2022

We're not running into GitHub limits on Hosted Weblate with thousands of projects, but indeed it could possibly happen.

@nijel nijel added enhancement Adding or requesting a new feature. undecided These features might not be implemented. Can be prioritized by sponsorship. and removed question This is more a question for the support than an issue. labels May 24, 2022
@github-actions
Copy link

This issue has been put aside. It is currently unclear if it will ever be implemented as it seems to cover too narrow of a use case or doesn't seem to fit into Weblate.

Please try to clarify the use case or consider proposing something more generic to make it useful to more users.

@Apollon77
Copy link
Author

Apollon77 commented May 24, 2022

We're not running into GitHub limits on Hosted Weblate with thousands of projects, but indeed it could possibly happen.

We also did not the last 2 years :-) now we had that issue and no way to fix it beside:

  • stop weblate
  • wait 1-x hours for the rate limits to get free
  • start again ... see what come through and stop again as soon as we see rate limits ...
  • restart at step 1 :-)

We see that the same repo is hammered every 5-20 seconds ... this makes the rate limit not better :-) and it "blocks the weblate repo" but still tries to push because it can (it thinks at least). Why it do not detectes the rate limit response and change the retry strategy?

In fact completely up to you what you do ... I can just say: I have a real life issue and try to discuss/bring up ideas on how to prevent such issues for others - even if they are rare ones that "should" never happen.
Also: There was nothing - no info not anything that made me thinks that celery somehow was not running as it should for long time ... so I have no idea (beside a weekly restrt or such) on how to prevent for the future).

@nijel
Copy link
Member

nijel commented May 25, 2022

There is no reason why a single repo should get a pull request that often. See https://docs.weblate.org/en/latest/admin/continuous.html#lazy-commit for info when Weblate commits changes to Git (what in default configuration triggers pushing to upstream repository).

There was nothing - no info not anything that made me thinks that celery somehow was not running as it should for long time

In case the queue is long, there should be an exclamation mark in top navigation for all superusers, or it can be seen in the performance view. See also https://docs.weblate.org/en/latest/admin/install.html#monitoring-weblate, https://docs.weblate.org/en/latest/faq.html#how-can-i-check-whether-my-weblate-is-set-up-properly, https://docs.weblate.org/en/latest/admin/install.html#monitoring-celery-status

nijel added a commit that referenced this issue May 25, 2022
@UncleSamSwiss
Copy link

There is no reason why a single repo should get a pull request that often.

The issue here is that the repo operation fails (due to the rate limit), so it will be retried after about 25 seconds.
Now, this happens for all our ~80 projects at once. This will of course not help the rate limit as we are now making every ~0.3 seconds (25/80 seconds) a request. So, in my opinion there should be some kind of (exponential) back-off or a limit of GitHub requests per minute.

Note: I'm working on the same issue together with @Apollon77

@Apollon77
Copy link
Author

@nijel > there should be an exclamation mark in top navigation for all superusers, or it can be seen in the performance view.

Hm ... Me as admin I'm not in the tool daily ... I would have expected an email ...
We also have Sentry linked toit and I checked it and only found one entry: https://sentry.iobroker.net/share/issue/fe33d940637d40b4be4a9a616855f68f/ ... but because of it happened also earlier I did not expected it to be a issue ... I expected it being retried :-)

BTW Re the requests ...
Bildschirmfoto 2022-05-25 um 11 00 03

Here you see sentry about that ... since yesterday we have 12.000 (!!) times this error for roughly 80 repositories ... and most of the time in the last 24h weblate was offline to restore the rate limits hopefully somehow -so let it be only for 5h in that timeframe

@nijel
Copy link
Member

nijel commented May 25, 2022

Hm ... Me as admin I'm not in the tool daily ... I would have expected an email ...

The e-mails are being sent using Celery as well, so it would be a bit unpractical to try to notify about non-working Celery using e-mail.

The best approach is to add Weblate metrics to whatever monitoring you are using. There is metrics API endpoint exposing all important info - most important to look for is configuration_errors, but if you want more insight looking at individual celery_queues is helpful as well. There is existing integration for Munin.

Anyway, back to the original topic - there had to be a huge queue of tasks which caused this. There is no retrying in failed pushes, it just tried again when commit is triggered. Also, it's not caused by Weblate being offline, but by part of Weblate being alive and part not.

nijel added a commit that referenced this issue May 25, 2022
@shun2wang
Copy link

Seems we also have this problem: https://hosted.weblate.org/projects/jasp/jaspcircular-qml/#alerts
from JASP project and we also have a annually subscription.

@nijel
Copy link
Member

nijel commented Dec 19, 2022

@shun2wang Does it happen regularly? I've just manually pushed the repo (you could have done that as well).

@shun2wang
Copy link

shun2wang commented Dec 19, 2022

@nijel Yes, not sure the regularity of this, but we've had this problem many times. this will clear the translated characters on Weblate.

here we using a github workflow to update translations. this work is done automatically.

EDIT: I just learned that my colleague of JASP has contacted you, thank you

@nijel
Copy link
Member

nijel commented Dec 19, 2022

That action can always lose translations – you don't force Weblate to push changes before trying to merge. So, there is always the possibility that Weblate has pending changes which were not committed yet (see https://docs.weblate.org/en/latest/admin/continuous.html#lazy-commit). To be on the safe side, invoke wlc push before finding existing pull requests in findAndMergeWeblate.sh. That way, you will also better handle the failed pull request situation – the script will fail instead of discarding translations.

@nijel
Copy link
Member

nijel commented Dec 19, 2022

Back to the original topic – the problem is that GitHub doesn't tell any information on what is triggering this:

Requests that create content which triggers notifications, such as issues, comments and pull requests, may be further limited and will not include a Retry-After header in the response. Please create this content at a reasonable pace to avoid further limiting.

I will ask their support for more info.

@nijel
Copy link
Member

nijel commented Jan 6, 2023

Okay, there is nothing better than trying and slowing down if we hit this:

Secondary rate limits are there to prevent problematic traffic patterns which cause performance and reliability concerns on our infrastructure. It helps ensure the stability of our platform for all customers and applications; not just one.

We have different abuse rate limiters and the one you're hitting does not return the retry-after header as you expected.

The algorithm behind anti-abuse rate limits is not that simple(they observe multiple factors, and we also tweak them over time). In addition to that, since these are designed specifically to prevent abuse and performance problems on our end -- we can't really share all the details about how they work. I'm sure you would also agree that having them published defeats their purpose.

As referenced in our documentation, our recommendation for situations like this is that integrators create this content at a reasonable pace to avoid being further limited (i.e slow down the request their App is making to create PRs significantly, then gradually increase the request rate until the error appears again. At that point, you can then slow down a bit more.)

@mcm1957
Copy link

mcm1957 commented Jan 8, 2023

The problem accured again.

This time a single string was changed which caused updates in app 100 components. Weblate tried to create pull requests for all of them (nearly) at once hitting the github limit again.

According to github (https://docs.github.com/en/rest/guides/best-practices-for-integrators?apiVersion=2022-11-28#dealing-with-secondary-rate-limits) there should be a delay between 2 github requests of at least 1s.

So please consider to simply add a possibility to configure / add some delay between two github requests.

nijel added a commit that referenced this issue Jan 9, 2023
Share the code among classes to have consistent error handling and to
reuse the code.

This will allow to add more features to shared code later (rate limiting
as described in #7690).
nijel added a commit that referenced this issue Jan 9, 2023
Perform API requests with lock held to ensure we do not perform
more of them at one time.

Issue #7690
@nijel nijel closed this as completed in 08fb1ea Jan 9, 2023
@nijel nijel added this to the 4.15.1 milestone Jan 9, 2023
nijel added a commit that referenced this issue Jan 9, 2023
This should reduce number of issues with GitHub secondary rate limiting
and make Weblate behave nicer to the services.

Fixes #7690
@nijel nijel self-assigned this Jan 9, 2023
@UncleSamSwiss
Copy link

Thanks a lot for the improvement, we are looking forward to version 4.15.1!

@nijel
Copy link
Member

nijel commented Jan 10, 2023

I'm curious if it really addresses the issue in all cases, or some additional modifications will be needed. Feedback is welcome.

@mcm1957
Copy link

mcm1957 commented Jan 10, 2023

I'm curious if it really addresses the issue in all cases, or some additional modifications will be needed. Feedback is welcome.

Thanks very much for the change.

If the issue still occures one improvement could be to delay retries significantly. As weblate normally does not update in realtime it could be a good idea to delay retries AFTER errors for several minutes or longer to avoid triggering any ratelimits due to (failing) retries. But this is not prior - at the moment we will simply evaluate whether the problem raises again.

@shun2wang
Copy link

I think still have problems on 4.15.1-dev see here.

nijel added a commit that referenced this issue Jan 12, 2023
nijel added a commit that referenced this issue Jan 12, 2023
nijel added a commit that referenced this issue Jan 12, 2023
This way we better handle sleeping in concurrent contexts.

Fixes #7690
nijel added a commit that referenced this issue Jan 12, 2023
Otherwise the next consumer might not get the updated timestamp.

Fixes #7690
@nijel
Copy link
Member

nijel commented Jan 12, 2023

Okay, I will try some more tweaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature. undecided These features might not be implemented. Can be prioritized by sponsorship.
Projects
None yet
Development

No branches or pull requests

5 participants