Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate Limit API Calls issue when building an application #587

Open
guellaeq opened this issue Nov 8, 2022 · 25 comments · May be fixed by #592
Open

Rate Limit API Calls issue when building an application #587

guellaeq opened this issue Nov 8, 2022 · 25 comments · May be fixed by #592

Comments

@guellaeq
Copy link

guellaeq commented Nov 8, 2022

I am currently facing an issue when building an Eclipse Theia based application which downloads VScode extensions. I get some 429 http error when downloading those plugins at build time (more than 80 extensions).

This error only started to happen recently without much change on the application source code itself. I'm pretty sure it is related to #509

Is this kind of use cases in the scope of this API call rate limiter ?

@amvanbaren
Copy link
Contributor

cc @kineticsquid

@marcdumais-work
Copy link

We are also seeing 429's in GitHub CI builds for Theia-based applications.

Is the API rate limit applicable per client IP address? If so, it might disproportionately impact builds/users behind a proxy, that will all seem to originate from the same IP.

When a similar issue happened with the GitHub API rate limit, we were able to authenticate by using a Personal Access Token during the build, and then the rate limit for that build would be accounted for that specific user, instead of all users behind the same IP address.

@kineticsquid
Copy link

@amvanbaren Will the approach outlined by @marcdumais-work work?

@amvanbaren
Copy link
Contributor

@kineticsquid It will work, but there's currently no limit on the amount of active personal access tokens a user can have.

@kineticsquid
Copy link

@amvanbaren I think that's fine for now.

@marcdumais-work @guellaeq OK with this approach?

@marcdumais-work
Copy link

AFAIK, only "publisher" PAT are available in openvsx? For the above to be a practical solution, I think we would need the ability to obtain a token that does not require signing the Eclipse Foundation publisher agreement (or enable publishing).

A basic "read-only" token would only serve to identify a user and permit applying API rate limits towards that user, instead of using the IP address as a proxy to account usage.

@marcdumais-work
Copy link

@guellaeq as a mitigation you can use the following option to force your Theia application's builtin extensions to be downloaded sequentially - it seems to help in our CI:

yarn -s download:plugins --parallel=false

@amvanbaren
Copy link
Contributor

@marcdumais-work You need to sign the Eclipse publisher agreement to be able to create a namespace or publish an extension.

public ResultJson createNamespace(NamespaceJson json, UserData user) {
var namespaceIssue = validator.validateNamespace(json.name);
if (namespaceIssue.isPresent()) {
throw new ErrorResultException(namespaceIssue.get().toString());
}
eclipse.checkPublisherAgreement(user);

public ExtensionJson publish(InputStream content, String tokenValue) throws ErrorResultException {
var token = users.useAccessToken(tokenValue);
if (token == null || token.getUser() == null) {
throw new ErrorResultException("Invalid access token.");
}
// Check whether the user has a valid publisher agreement
eclipse.checkPublisherAgreement(token.getUser());

You can create a basic "read-only" token by using a user that hasn't signed the publisher agreement.
public ResponseEntity<AccessTokenJson> createAccessToken(@RequestParam(required = false) String description) {
if (description != null && description.length() > TOKEN_DESCRIPTION_SIZE) {
var json = AccessTokenJson.error("The description must not be longer than " + TOKEN_DESCRIPTION_SIZE + " characters.");
return new ResponseEntity<>(json, HttpStatus.BAD_REQUEST);
}
var user = users.findLoggedInUser();
if (user == null) {
return new ResponseEntity<>(HttpStatus.FORBIDDEN);
}
var token = new PersonalAccessToken();
token.setUser(user);
token.setValue(users.generateTokenValue());
token.setActive(true);
token.setCreatedTimestamp(TimeUtil.getCurrentUTC());
token.setDescription(description);
entityManager.persist(token);
var json = token.toAccessTokenJson();
// Include the token value after creation so the user can copy it
json.value = token.getValue();
var serverUrl = UrlUtil.getBaseUrl();
json.deleteTokenUrl = createApiUrl(serverUrl, "user", "token", "delete", Long.toString(token.getId()));
return new ResponseEntity<>(json, HttpStatus.CREATED);
}

@marcdumais-work
Copy link

You can create a basic "read-only" token by using a user that hasn't signed the publisher agreement.

Here's what I see:
image

@kineticsquid
Copy link

@mmatthewkhouzam FYI...

@amvanbaren
Copy link
Contributor

Ah, oke. There's a check in the webui:

const agreement = this.context.user?.publisherAgreement;
if (agreement && (agreement.status === 'none' || agreement.status === 'outdated')) {
return <Box>
<Typography variant='body1' className={this.props.classes.empty}>
Access tokens cannot be created as you currently do not have an Eclipse Foundation Open VSX
Publisher Agreement signed. Please return to
your <RouteLink className={this.props.classes.link} to={UserSettingsRoutes.PROFILE}>Profile</RouteLink> page
to sign the Publisher Agreement. Should you believe this is in error, please
contact <Link className={this.props.classes.link} href='mailto:license@eclipse.org'>license@eclipse.org</Link>.
</Typography>
</Box>;
}

I think we can remove this. Just have to make sure to check for a valid publisher agreement when a publishing operation is performed.

@MatthewKhouzam
Copy link

Hey, I was just mulling an idea: could we have a separate vsx for bots... something with awful (10 minute) timeouts... and a token so only member projects can use it. I think VSX is owned by CDT, it would be interesting to separate the robots from the humans.

@guellaeq
Copy link
Author

@amvanbaren I think that's fine for now.

@marcdumais-work @guellaeq OK with this approach?

In the CI context, I'm not sure that a PAT will be sufficient to avoid all errors in case of peak load. At least, it would differentiate calls made by other developers from CI calls.

@kineticsquid
Copy link

@MatthewKhouzam Your idea is a special access token for bots (CI piplines) that doesn't have the same rate limits? One would have to be a member of the Cloud DevTools working group to get such a token?

@amvanbaren what'd the work effort be like to implement this kind of special member token? Could there be light weight, even manual, admin process for generating and assigning the tokens?

@amvanbaren
Copy link
Contributor

@kineticsquid I don't know. What you propose is a complete new process.

@MatthewKhouzam
Copy link

@MatthewKhouzam Your idea is a special access token for bots (CI piplines) that doesn't have the same rate limits? One would have to be a member of the Cloud DevTools working group to get such a token?

Depends on your decision as you own the product. I would say a tiered approach is not bad. Focus on the paying customers first and all.

@kineticsquid
Copy link

Well my desire would be to keep this as simple as possible :). Let's assume we have a manual process for generating these bot keys for work group member companies. This bot access key would have a different set of rate limiting constraints. What would those constraints need to be? Where would this token live?

@amvanbaren
Copy link
Contributor

@kineticsquid We can add the premium role to users. Users with the premium role can make a higher number of requests per second than regular users. This requires a database lookup to find the user that is associated to the token. The database query can slow down the request rate. It might be better to use JWT as token implementation, because it lets you define claims as part of the token.

@marcdumais-work
Copy link

Starting in Theia v1.32.0 released yesterday, the download of built-in extensions will be sequential by default. This should mitigate the triggering of openvsx API limit when Theia-based applications are build, for example during CI.

https://github.com/eclipse-theia/theia/pull/11860/files#diff-11e38961bd54fcce86babf5ea40cf9bf4538cb2f519763c8fa7a897e94c35f71

@amvanbaren
Copy link
Contributor

Ok, so this issue can be closed?

@guellaeq
Copy link
Author

In case of multiple builds in parallel, I'm not sure the sequentialization of the download will be enough. I work with a multi repo environment, where each new feature will trigger a Theia build. All this behind one single IP/user. Maybe it will be a temporary fix, but in the end, this issue can come back during peak loads on CI.

@kineticsquid
Copy link

Let's explore this a bit. @amvanbaren, what are the current rate limits? I see the 429 in the API doc but I couldn't find if/where we document what those limits are.

@amvanbaren
Copy link
Contributor

@kineticsquid
Copy link

If I'm reading the .yml file correctly,

  • /api/(?!(.*/.*/review(/delete)?)|(-/(namespace/create|publish))).* is set for 15 calls/sec?
  • /vscode/asset/.*/.*/.*/Microsoft.VisualStudio.Services.Icons.Default is set for 75 calls/sec?

Once that is exceeded, where does the wait period get specified?

@amvanbaren
Copy link
Contributor

amvanbaren commented Nov 29, 2022

If a client does 15 calls in the first 15 ms, then the client has to wait 985 ms to do the 16th call. There's no additional wait or timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: On Hold
Development

Successfully merging a pull request may close this issue.

5 participants