Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRESOLVER-464] Workaround for JDK-822647 bug #408

Merged
merged 3 commits into from Jan 17, 2024

Conversation

cstamas
Copy link
Member

@cstamas cstamas commented Jan 17, 2024

Artificially limit the max concurrent request count doable with
JDK HttpClient, as otherwise you may end up with IOEx.

The value is exposed as config param, so users can tweak it.
Value 100 is verified to work with Maven Central.

JDK Bug: https://bugs.openjdk.org/browse/JDK-8225647


https://issues.apache.org/jira/browse/MRESOLVER-464

Artificially limit the max concurrent request count doable with
JDK HttpClient, as otherwise you may end up with IOEx.

The value is exposed as config param, so users can tweak it.
Value 100 is verified to work with Maven Central.

JDK Bug: https://bugs.openjdk.org/browse/JDK-8225647

---

https://issues.apache.org/jira/browse/MRESOLVER-464
@cstamas cstamas self-assigned this Jan 17, 2024
@cstamas cstamas requested a review from gnodet January 17, 2024 16:59
@rmannibucau
Copy link
Contributor

why not using jdk.httpclient.maxstreams? I kind of fail to see how a semaphore helps, we should just ensure to respect server settings no, so a "warmup" call then we know and just use the right number of connections so sounds like we misuse http2 and delegate the fact it works to end user, hope im missing something

@cstamas cstamas merged commit fcf2894 into apache:master Jan 17, 2024
4 checks passed
@cstamas cstamas deleted the MRESOLVER-464 branch January 17, 2024 17:21
@cstamas
Copy link
Member Author

cstamas commented Jan 17, 2024

@rmannibucau sorry, we did it at same moment 😄

Can you provide a PR to show what you mean? Also, the fact HttpClient inits this value to Integer.MAX_VALUE and lowers it on first response, while client MAY create thousands of requests. Also, this never happened in Maven, but in JBang, here jbangdev/jbang#1716 that is way more aggressive than Maven is.

@cstamas
Copy link
Member Author

cstamas commented Jan 17, 2024

Ah, so if you mean the system property, that will set it, but will it queue requests over 100?

@rmannibucau
Copy link
Contributor

the fact HttpClient inits this value to Integer.MAX_VALUE and lowers it on first response

just means each connection much await the first response, a semaphore does not make it right IMO, assuming each connection can get a single concurrent request and your pool is #10, how does your config helps? HttpClient does not enable to control connections nor to access http2 stream/frame data so looks like the workaround has the same bug than before to me.

The system property does not queue but enables to avoid the first request (useless IMHO but similarly to the semaphore) and makes inits this value to Integer.MAX_VALUE no more true.

What you can do to scale more is to create more clients and limit one request per client, even if I don't like it much and I would just forbid http2 for maven case, it would be saner than using a semaphone.

ultimately, using the async flavor would probably also make it smoother since queueing will be simple to impl (IMHO - just a chain) but resolver is not greatly friendly with that option so maybe the multiple clients option will fit better short term?

but from my window we kind of want to violate http2 so i'm really mixed and would just version(HTTP_1_1) since perf boost is not insane in practise.

@cstamas
Copy link
Member Author

cstamas commented Jan 17, 2024

Wrong, as if remote server on first response tells client "my max_concurrent_stream=12" then client will have to obey it AFAIK.

Again, see first paragraph. Yes, instead of Integer.MAX_VALUE it will be set to 100, but on first response to 12. So what gives?

Creating more clients does not help (as I read). Same thing happens.

Async flavor does not helps (as I read), same thing happens.

Finally, this "works" as without this change the JBang UT fails with "java.io.IOException: too many concurrent streams", while with this change the error is gone and UT passes OK. Again, this was never hit in Maven (nor can, as Resolver operates with pools of size 4-8, not 100s).

@rmannibucau
Copy link
Contributor

It works cause it is per connection but agree it is bad from a design standpoint - but it is http2, we cant help.

As a side note my thread count for resolver is always 512 (just to enable as much concurrent downloads but having 4 threads is sufficient for an async impl - just to highlight it is what we should target in terms of concurrent downloads, not threads).

About async helping it is just cause you chain more easily without adding a queue but agree it is an impl detail.

Now about your fix we still have the same issue, you set it to 100 but then it is set to 12 so maybe we should revert and default to http1?

@cstamas
Copy link
Member Author

cstamas commented Jan 17, 2024

Now about your fix we still have the same issue, you set it to 100 but then it is set to 12 so maybe we should revert and default to http1?

Hence the "workaround": exposed config settable globally but also per repository. So the workaround is that if you hit this issue with remote repository "foo", then you can check in .mvn/maven.config that sets proper value for repository "foo" (check the config page diff for new option this PR introduces).

As checked with curl site, the value is usually 100 or 128 or above. I expect exotic values in "closed environments" like companies etc, but there again, Maven dev can consult IT about these values (or try, fail, adapt, loop)...

OR as you propose, fall back to HTTP/1.1 (there is already a config for that), fallback the transport to "apache", many options, really...

@rmannibucau
Copy link
Contributor

rmannibucau commented Jan 17, 2024

So the workaround is that if you hit this issue with remote repository "foo", then you can check in .mvn/maven.config that sets proper value for repository "foo" (check the config page diff for new option this PR introduces).

But the limit is per connection so a limit per repo is pointless 🤔

As checked with curl site, the value is usually 100 or 128 or above.

Spec mentions 100 but it is the concurrent streams per connection, not the concurrent connections, see why the fix does not help?

So overall if the system property I mentionned which is already available with .mvn/jvm.config is not sufficient as workaround then it means the client does not work - I don't see why.
Would be great to get a test cause when reading the code or using a thread dump the first thing you will do is to drop that semaphore - even if there is a comment saying there could be a bug maybe.

OR as you propose, fall back to HTTP/1.1

If we can't make http2 reliable by default - without too much reflection technically speaking - let's ensure the default always works, guess it is the minimum we should provide.
As mentionned, perf diff is not key enough for maven to be noticeable so no point defaulting to http2.

@cstamas
Copy link
Member Author

cstamas commented Jan 19, 2024

Usually remote reposes have different URLs (usually different hosts), so yes, a repo is one distinct connection. Usually. I may imagine someone defines central1, central2, etc and all point to central, but I find it highly not plausible.

This workaround is about "concurrent requests", that in HTTP/2 is "concurrent streams" while in HTTP/1.1 is usually "concurrent requests". Also, unsure why you keep repeating "not work": check out the reproducer, w/o this commit it will reproduce the error (IOEx: too many concurrent streams) and with this commit it will work OK. So it works. From where you get that "the fix does not help?".

The http2 is reliable by default, did you try it out? Of course, currently it is only Maven Central that supports it fully, as all the MRMs out there are HTTP/1.1 only.

@rmannibucau
Copy link
Contributor

@cstamas can you point out the reproducer? The thing is that http2 without ssl uses a pool whereas it does not over ssl and there the system property is a similar workaround than a semaphore but with a more relevant limitation than the semaphore which is global whatever the number of connection is - you dont know from consumer side, so yes semaphore is still as buggy as before even if issues pops up less often (didnt check on 11 and 21 but it is the bahvior on jre17, dont know if it changed)

@cstamas
Copy link
Member Author

cstamas commented Jan 19, 2024

Reproducer: is JBang test on this branch: jbangdev/jbang#1716
If you use resolver w/o this patch, it consistently ends up with IOEx: too many streams. With this patch it passes OK

Just build all of jbang and if unpatched resolver used, it will be the only failing UT, do not remember test name from top of my head.

@rmannibucau
Copy link
Contributor

rmannibucau commented Jan 19, 2024

@cstamas are we talking about testGAVCliReposAndDepsSingleRepo (https://gist.github.com/rmannibucau/8a68d01504b2ab02ef8aacf6a1e26e65), it is the only issue I get with the PR (I assume this merge is not in the PR since it uses an alpha release)? I'm not sure it is the same error than yours and then rebuilds seem to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants