-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build downloads fail due to response code 503 #4206
base: master
Are you sure you want to change the base?
Conversation
The 'openConnection' method of 'ConfigureProxy' makes up to three connections to a remote host for each external file, yet only one is ever used. The extra connections cause the build to fail when the proxy or host responds with status code "503 Service Unavailable." Serialize the tasks so that connections are attempted one at a time: first using the 'http_proxy' environment variable, if present; then using the 'https_proxy' environment variable, if present; and finally using a direct connection with no proxy. Make a subsequent connection only if a previous one failed or was skipped.
I discovered this issue while trying to build NetBeans on Launchpad, which uses a local proxy server for all external connections. The bug report Bug 1948501, "HTTPS proxy returns 503 Service Unavailable," describes the original problem. For my local tests, I ran Tinyproxy on my workstation at $ printenv | grep http
https_proxy=http://192.168.1.11:8888/
http_proxy=http://192.168.1.11:8888/ On my workstation, I modified the Tinyproxy configuration at 120c120
< LogLevel Info
---
> LogLevel Connect
224a225
> Allow 10.203.206.0/24 A row in each of the tables below represents a build with:
Note that all connections made to an external host used a secure HTTPS connection. The type of connection in the columns below indicates only which proxy was used: HTTP for the proxy defined by the Before the fixThe table below shows the number and types of connections made before the fix. Connections made:
The table below shows the types of connections actually used before the fix. Connections used:
After the fixThe table below shows the number and types of connections made after the fix. Connections made:
After the fix, each of the 565 connections made were in fact used to transfer a file. |
Hi @jgneff,
I am not sure why it connects in parallel, I tracked it down to the pre-apache repo to this commit: emilianbold/netbeans-releases@3050498, unfortunately that doesn't help a lot. Maybe @jtulach remembers. It might have been performance reasons, e.g one thread timing out would not influence the others - but that is only speculation. However I have to wonder: wouldn't the safer fix be to simply ignore errors unless all tasks fail? I can't imagine 1500 connections would be a problem in a ~10min build. Thats probably similar to reading an article online without an adblocker active :) If we end up deciding to change this to an sequential approach, I believe we should use one task which does all sequentially. Having a countdown latch, thee tasks on one thread is probably not necessary. |
@mbien Thank you, Michael, for the link to the old history. I was wondering how to go back even further.
I would like the number and type of connections to be deterministic. As it is now, it's almost random. You can define proxy environment variables but end up using mostly direct connections. It's a race, but instead of the first one winning, the last one to get to the instruction
I decided on the most conservative change for now. I kept the same structure as the current code, but just made the tasks sequential. Eventually, I would like to rethink the entire approach. For example, the values of the proxy variables are not going to change during the build, so the code should get them just once from the environment instead of 565 times. Furthermore, I think the build should fail when the I thought we could follow-up with a bigger design change once we have some experience with this small change. |
Please leave the connection attempts in place. In a corporate environment, the proxy setup is peculiar. In particular |
Just to clarify, this pull request leaves the connection attempts in place:
The difference is that it tries them one at a time, in order, and uses every connection it makes. Before this change, the build:
This pull request fixes all four issues. For example, after this change, I see only three connections from the proxy server to
Before this change, I saw 85 such connections, with 81 of them unnecessary and abandoned (just 137 bytes transferred):
When testing locally, before this change there were 617 connections made but only 17 of them actually used. The build made 600 unnecessary connections through the proxy: 136 to |
Below are a few more notes from my tests while they're still fresh in my mind. Proxy connection reuseThe connection counts in the set of tables in my prior comment above are from the perspective of the The tables below, on the other hand, show the counts of actual connections made through the proxy server to an external host, obtained directly from the access log of the Squid proxy server. The proxy access logs for each of the six tests below are available in the following archive: squid-access-logs.tar.gz - Squid access logs for the tables below BEFOREThe table below lists the number of connection made through the proxy server to each external host before the changes in this pull request. The columns contain the connection counts when
AFTERThe table below lists the same information after the changes in this pull request.
Notice that there's a large reduction in the number of proxy connections made, and no excess unused connections are made at all. I suspect that the single thread used in the pull request is allowing the Java runtime to reuse the keep-alive proxy connections, keeping them to a minimum. Making the requests sequential through the proxy keep-alive connections, though, does not affect the build time:
A partial workaroundThe first table above suggests a workaround for the original problem of the build failing when both environment variables are set. The build fails because the external host tells it to back off on the excess connections by sending a response of "503 Service Unavailable." The workaround is to remove one of the proxy variables from the environment before starting the build: unset https_proxy
ant -Dmetabuild.branch=master It's only a partial workaround because:
Frequency of failuresNow that I have a workaround, I expect the original problem to be rare. The screenshot below shows that the NetBeans builds on Launchpad almost always failed over the past six months: For future searches, the error messages in the build logs are illustrated by the following two samples:
|
I tested this (DO NOT RUN YOURSELF!):
These are 2000 requests in parallel and the statistics is: 840 Requests were rejected, 1160 requests went through cleanly. This test was executed without a proxy in between. The reply is pretty explicit:
So we should respect that. I understand, that there are (mostly) corporate setups, that are broken beyond imagination, but this change looks sane (as already indicated by @jgneff it could be simplified as the parallel nature of the connections is gone). I suggest to raise this on the mailing list and if noone comes up with an existing setup, that is broken by this, I suggest to go ahead. We can't prepare for all broken setups administrators can imagine. |
What a good test! I never thought to try it directly. Thank you, Matthias.
I started the thread with the subject "Anyone else building NetBeans behind a proxy server?" but no reply yet. |
@jtulach if I remember correctly you were affected by the proxy problems when building netbeans. Could you please see if you can still build netbeans with this PR? |
Build NetBeans with only one of the following proxy environment variables set, preferably just the 'http_proxy' variable: export http_proxy=http://10.10.10.1:8222/ export https_proxy=http://10.10.10.1:8222/ Removing one of the variables works around the following problem: HTTPS proxy returns 503 Service Unavailable https://bugs.launchpad.net/launchpad-buildd/+bug/1948501 For details, including a fix, see the following NetBeans pull request: Build downloads fail due to response code 503 apache/netbeans#4206
TestingTo test this pull request, clone the pull request branch as follows: $ git clone -b go-gentle-on-proxies --depth 1 https://github.com/jgneff/netbeans.git
$ cd netbeans Clear the repository and the external cache of all downloaded files before each test with the commands: $ rm -r ~/.hgexternalcache/*
$ git clean -xdf Set one, both, or none of the standard proxy environment variables as appropriate for each specific test, as in the following example: $ export https_proxy=http://10.10.10.1:8222/
$ export http_proxy=http://10.10.10.1:8222/ Then run Ant with the target $ ant -quiet -Dmetabuild.branch=master download-all-extbins See the full mailing list thread for details. |
See the links below for more information about the problem identified in this pull request:
Now that I have great testing environment in jgneff/netbeans-proxies, I am prepared to provide a more thorough fix, if that's what we decide. |
Sorry, I forgot about this. I plan to merge this shortly, just would like to see the rest of the checks become green. I have a local build running with this merged to master. |
If you wait, I can add the one-line fix to remove all those unnecessary threads, too. I would prefer getting in a more complete fix, but I could add that with a follow-up pull request, if you prefer. |
Of course I can wait - also for the full followup - ping me when you think this is ready. |
@jgneff when you update this PR, could you then please rebase it onto the then current master? That way it will be checked with the then current tests. |
+1 for a code cleanup since this looks like it could now run on one task without the need of a CountDownLatch which would make it less confusing to read |
what is the status on this? still baking? ready to review? if its ready, please squash to one commit / rebase to latest master, this will also trigger a CI build with latest changes - I enabled all jobs just to be sure. |
Thanks for the reminder, Michael. It's next on my list. I just got a bit side-tracked this year. The code is done, but I'll need to remind myself what I did 🙄 and test it again. |
The
openConnection
method ofConfigureProxy
makes up to three connections to a remote host for each external file, yet only one is ever used. The extra connections cause the build to fail when the proxy or host responds with status code "503 Service Unavailable."Serialize the tasks so that connections are attempted one at a time: first using the
http_proxy
environment variable, if present; then using thehttps_proxy
environment variable, if present; and finally using a direct connection with no proxy. Make a subsequent connection only if a previous one failed or was skipped.