Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build downloads fail due to response code 503 #4206

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jgneff
Copy link
Contributor

@jgneff jgneff commented Jun 7, 2022

The openConnection method of ConfigureProxy makes up to three connections to a remote host for each external file, yet only one is ever used. The extra connections cause the build to fail when the proxy or host responds with status code "503 Service Unavailable."

Serialize the tasks so that connections are attempted one at a time: first using the http_proxy environment variable, if present; then using the https_proxy environment variable, if present; and finally using a direct connection with no proxy. Make a subsequent connection only if a previous one failed or was skipped.

The 'openConnection' method of 'ConfigureProxy' makes up to three
connections to a remote host for each external file, yet only one
is ever used. The extra connections cause the build to fail when the
proxy or host responds with status code "503 Service Unavailable."

Serialize the tasks so that connections are attempted one at a time:
first using the 'http_proxy' environment variable, if present; then
using the 'https_proxy' environment variable, if present; and finally
using a direct connection with no proxy. Make a subsequent connection
only if a previous one failed or was skipped.
@jgneff
Copy link
Contributor Author

jgneff commented Jun 7, 2022

I discovered this issue while trying to build NetBeans on Launchpad, which uses a local proxy server for all external connections. The bug report Bug 1948501, "HTTPS proxy returns 503 Service Unavailable," describes the original problem.

For my local tests, I ran Tinyproxy on my workstation at 192.168.1.11 and built NetBeans in an LXD container at 10.203.206.244. I ran Ant with the target download-all-extbins after removing any previously downloaded files. Each build ran with none, both, or one of the following environment variables defined in the container:

$ printenv | grep http
https_proxy=http://192.168.1.11:8888/
http_proxy=http://192.168.1.11:8888/

On my workstation, I modified the Tinyproxy configuration at /etc/tinyproxy/tinyproxy.conf as follows:

120c120
< LogLevel Info
---
> LogLevel Connect
224a225
> Allow 10.203.206.0/24

A row in each of the tables below represents a build with:

  • None - No proxy environment variables
  • Both - Both proxy environment variables
  • HTTP - Only the http_proxy environment variable
  • HTTPS - Only the https_proxy environment variable

Note that all connections made to an external host used a secure HTTPS connection. The type of connection in the columns below indicates only which proxy was used: HTTP for the proxy defined by the http_proxy environment variable, and HTTPS for the proxy defined by the https_proxy environment variable.

Before the fix

The table below shows the number and types of connections made before the fix.

Connections made:

Variables Made Direct HTTP HTTPS
None 565 565 0 0
Both 1263 565 349 349
HTTP 1130 565 565 0
HTTPS 1130 565 0 565

The table below shows the types of connections actually used before the fix.

Connections used:

Variables Used Direct HTTP HTTPS
None 565 565 0 0
Both 565 550 14 1
HTTP 565 119 446 0
HTTPS 565 550 0 15

After the fix

The table below shows the number and types of connections made after the fix.

Connections made:

Variables Made Direct HTTP HTTPS
None 565 565 0 0
Both 565 0 565 0
HTTP 565 0 565 0
HTTPS 565 0 0 565

After the fix, each of the 565 connections made were in fact used to transfer a file.

@mbien
Copy link
Member

mbien commented Jun 7, 2022

Hi @jgneff,

The openConnection method of ConfigureProxy makes up to three connections to a remote host for each external file, yet only one is ever used. The extra connections cause the build to fail when the proxy or host responds with status code "503 Service Unavailable."

I am not sure why it connects in parallel, I tracked it down to the pre-apache repo to this commit: emilianbold/netbeans-releases@3050498, unfortunately that doesn't help a lot. Maybe @jtulach remembers. It might have been performance reasons, e.g one thread timing out would not influence the others - but that is only speculation.

However I have to wonder: wouldn't the safer fix be to simply ignore errors unless all tasks fail? I can't imagine 1500 connections would be a problem in a ~10min build. Thats probably similar to reading an article online without an adblocker active :)

If we end up deciding to change this to an sequential approach, I believe we should use one task which does all sequentially. Having a countdown latch, thee tasks on one thread is probably not necessary.

@jgneff
Copy link
Contributor Author

jgneff commented Jun 7, 2022

@mbien Thank you, Michael, for the link to the old history. I was wondering how to go back even further.

However I have to wonder: wouldn't the safer fix be to simply ignore errors unless all tasks fail?

I would like the number and type of connections to be deterministic. As it is now, it's almost random. You can define proxy environment variables but end up using mostly direct connections. It's a race, but instead of the first one winning, the last one to get to the instruction conn[0] = test before being terminated wins!

If we end up deciding to change this to an sequential approach, I believe we should use one task which does all sequentially. Having a countdown latch, thee tasks on one thread is probably not necessary.

I decided on the most conservative change for now. I kept the same structure as the current code, but just made the tasks sequential. Eventually, I would like to rethink the entire approach. For example, the values of the proxy variables are not going to change during the build, so the code should get them just once from the environment instead of 565 times. Furthermore, I think the build should fail when the http_proxy doesn't work, for example, instead of moving on to try the value of https_proxy and then even a direct connection.

I thought we could follow-up with a bigger design change once we have some experience with this small change.

@sdedic
Copy link
Member

sdedic commented Jun 9, 2022

Please leave the connection attempts in place. In a corporate environment, the proxy setup is peculiar. In particular no_proxy evaluated by PAC (javascript) is +- impossible to mimic in this environment, so at least proxy + no_proxy combo ensures that the connection happens - somehow.

@jgneff
Copy link
Contributor Author

jgneff commented Jun 10, 2022

Please leave the connection attempts in place.

Just to clarify, this pull request leaves the connection attempts in place:

  • http_proxy if set and working;
  • otherwise, https_proxy if set and working;
  • otherwise, a direct connection.

The difference is that it tries them one at a time, in order, and uses every connection it makes.

Before this change, the build:

  • makes up to three connections for every one connection used,
  • has a race condition that determines which connection to use,
  • doesn't properly close the unused connections it makes, and
  • makes direct connections even when proxies are defined and working.

This pull request fixes all four issues.

For example, after this change, I see only three connections from the proxy server to netbeans.osuosl.org during a NetBeans build on Launchpad:

"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 570446 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 227491 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 50266718 "-" "Java/11.0.15"

Before this change, I saw 85 such connections, with 81 of them unnecessary and abandoned (just 137 bytes transferred):

"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 6331 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 570446 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 137 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 137 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 227491 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 137 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 137 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 137 "-" "Java/11.0.15"
     ^^^ 75 more lines just like the one above ^^^
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 137 "-" "Java/11.0.15"
"CONNECT netbeans.osuosl.org:443 HTTP/1.1" 200 50266718 "-" "Java/11.0.15"

When testing locally, before this change there were 617 connections made but only 17 of them actually used. The build made 600 unnecessary connections through the proxy: 136 to netbeans.osuosl.org and 464 to repo1.maven.org.

@jgneff
Copy link
Contributor Author

jgneff commented Jun 13, 2022

Below are a few more notes from my tests while they're still fresh in my mind.

Proxy connection reuse

The connection counts in the set of tables in my prior comment above are from the perspective of the ConfigureProxy class, obtained from Ant debug log messages. Yet multiple ConfigureProxy connections can be carried by a single persistent keep-alive proxy connection.

The tables below, on the other hand, show the counts of actual connections made through the proxy server to an external host, obtained directly from the access log of the Squid proxy server. The proxy access logs for each of the six tests below are available in the following archive:

squid-access-logs.tar.gz - Squid access logs for the tables below

BEFORE

The table below lists the number of connection made through the proxy server to each external host before the changes in this pull request. The columns contain the connection counts when http_proxy is set, when https_proxy is set, and when both environment variables are set.

Proxy Connections http_proxy https_proxy Both Variables
gitbox.apache.org 2 2 3
netbeans.osuosl.org 11 22 109
services.gradle.org 1 1 2
downloads.gradle-dn.com 1 1 1
repo.gradle.org 1 1 2
repo1.maven.org 1 20 468
Total connections 17 47 585
Unused connections none none 546

AFTER

The table below lists the same information after the changes in this pull request.

Proxy Connections http_proxy https_proxy Both Variables
gitbox.apache.org 2 2 2
netbeans.osuosl.org 11 11 11
services.gradle.org 1 1 1
downloads.gradle-dn.com 1 1 1
repo.gradle.org 1 1 1
repo1.maven.org 1 1 1
Total connections 17 17 17
Unused connections none none none

Notice that there's a large reduction in the number of proxy connections made, and no excess unused connections are made at all. I suspect that the single thread used in the pull request is allowing the Java runtime to reuse the keep-alive proxy connections, keeping them to a minimum.

Making the requests sequential through the proxy keep-alive connections, though, does not affect the build time:

Build Time http_proxy https_proxy Both Variables
Before this PR 10m 42s 9m 39s 9m 33s
After this PR 9m 36s 9m 43s 9m 5s

A partial workaround

The first table above suggests a workaround for the original problem of the build failing when both environment variables are set. The build fails because the external host tells it to back off on the excess connections by sending a response of "503 Service Unavailable."

The workaround is to remove one of the proxy variables from the environment before starting the build:

unset https_proxy
ant -Dmetabuild.branch=master

It's only a partial workaround because:

  • the build still tries to make direct connections even when it's told to use a proxy server, and
  • the build might not take full advantage of the HTTP 1.1 keep-alive connections because up to three different threads are used to open connections through the proxy server.

Frequency of failures

Now that I have a workaround, I expect the original problem to be rare. The screenshot below shows that the NetBeans builds on Launchpad almost always failed over the past six months:

latest-builds

For future searches, the error messages in the build logs are illustrated by the following two samples:

[downloadbinaries] Could not download ...
  ...
java.io.IOException: Skipping download from
  https://netbeans.osuosl.org/binaries/
    EC2751668FDF959F31F88FAF3A03C9A28F3E6746-
    wordlist-en_GB-large-2017.08.24.zip
  due to response code 503

[downloadbinaries] Could not download ...
  ...
java.io.IOException: Skipping download from
  https://netbeans.osuosl.org/binaries/
    89BC047153217F5254506F4C622A771A78883CBC-
    ValidationAPI-b26b94cc001a41ab9138496b11e2ae256a159ffd.jar
  due to response code 503

@jgneff jgneff changed the title Make one connection to download a file, not three Build downloads fail due to response code 503 Jun 13, 2022
@matthiasblaesing
Copy link
Contributor

I tested this (DO NOT RUN YOURSELF!):

for i in `seq 1 2000`; do curl -o $i "https://netbeans.osuosl.org/binaries/EC2751668FDF959F31F88FAF3A03C9A28F3E6746-wordlist-en_GB-large-2017.08.24.zip" & done

These are 2000 requests in parallel and the statistics is: 840 Requests were rejected, 1160 requests went through cleanly.

This test was executed without a proxy in between. The reply is pretty explicit:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Unavailable</title>
</head><body>
<h1>Service Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
<hr>
<address>Apache Server at netbeans.osuosl.org Port 443</address>
</body></html>

So we should respect that. I understand, that there are (mostly) corporate setups, that are broken beyond imagination, but this change looks sane (as already indicated by @jgneff it could be simplified as the parallel nature of the connections is gone).

I suggest to raise this on the mailing list and if noone comes up with an existing setup, that is broken by this, I suggest to go ahead. We can't prepare for all broken setups administrators can imagine.

@jgneff
Copy link
Contributor Author

jgneff commented Jun 14, 2022

These are 2000 requests in parallel and the statistics is: 840 Requests were rejected, 1160 requests went through cleanly.

What a good test! I never thought to try it directly. Thank you, Matthias.

I suggest to raise this on the mailing list and if noone comes up with an existing setup, that is broken by this, I suggest to go ahead.

I started the thread with the subject "Anyone else building NetBeans behind a proxy server?" but no reply yet.

@matthiasblaesing
Copy link
Contributor

@jtulach if I remember correctly you were affected by the proxy problems when building netbeans. Could you please see if you can still build netbeans with this PR?

jgneff added a commit to jgneff/strictly-netbeans that referenced this pull request Jun 14, 2022
Build NetBeans with only one of the following proxy environment
variables set, preferably just the 'http_proxy' variable:

  export http_proxy=http://10.10.10.1:8222/
  export https_proxy=http://10.10.10.1:8222/

Removing one of the variables works around the following problem:

  HTTPS proxy returns 503 Service Unavailable
  https://bugs.launchpad.net/launchpad-buildd/+bug/1948501

For details, including a fix, see the following NetBeans pull request:

  Build downloads fail due to response code 503
  apache/netbeans#4206
@mbien mbien added the build label Jun 15, 2022
@jgneff
Copy link
Contributor Author

jgneff commented Jun 24, 2022

Testing

To test this pull request, clone the pull request branch as follows:

$ git clone -b go-gentle-on-proxies --depth 1 https://github.com/jgneff/netbeans.git
$ cd netbeans 

Clear the repository and the external cache of all downloaded files before each test with the commands:

$ rm -r ~/.hgexternalcache/*
$ git clean -xdf 

Set one, both, or none of the standard proxy environment variables as appropriate for each specific test, as in the following example:

$ export https_proxy=http://10.10.10.1:8222/
$ export http_proxy=http://10.10.10.1:8222/

Then run Ant with the target download-all-extbins, which downloads 564 files into the repository under the netbeans directory and caches them in the external ~/.hgexternalcache directory:

$ ant -quiet -Dmetabuild.branch=master download-all-extbins

See the full mailing list thread for details.

@jgneff
Copy link
Contributor Author

jgneff commented Oct 30, 2022

See the links below for more information about the problem identified in this pull request:

Now that I have great testing environment in jgneff/netbeans-proxies, I am prepared to provide a more thorough fix, if that's what we decide.

@matthiasblaesing
Copy link
Contributor

Sorry, I forgot about this. I plan to merge this shortly, just would like to see the rest of the checks become green. I have a local build running with this merged to master.

@jgneff
Copy link
Contributor Author

jgneff commented Oct 30, 2022

I plan to merge this shortly, just would like to see the rest of the checks become green.

If you wait, I can add the one-line fix to remove all those unnecessary threads, too. I would prefer getting in a more complete fix, but I could add that with a follow-up pull request, if you prefer.

@matthiasblaesing
Copy link
Contributor

Of course I can wait - also for the full followup - ping me when you think this is ready.

@matthiasblaesing
Copy link
Contributor

@jgneff when you update this PR, could you then please rebase it onto the then current master? That way it will be checked with the then current tests.

@mbien
Copy link
Member

mbien commented Oct 30, 2022

+1 for a code cleanup since this looks like it could now run on one task without the need of a CountDownLatch which would make it less confusing to read

@mbien
Copy link
Member

mbien commented Sep 25, 2023

what is the status on this? still baking? ready to review?

if its ready, please squash to one commit / rebase to latest master, this will also trigger a CI build with latest changes - I enabled all jobs just to be sure.

@jgneff
Copy link
Contributor Author

jgneff commented Sep 26, 2023

what is the status on this? still baking? ready to review?

Thanks for the reminder, Michael. It's next on my list. I just got a bit side-tracked this year. The code is done, but I'll need to remind myself what I did 🙄 and test it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build ci:all-tests [ci] enable all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Building with proxies causes DoS attack on OSU
4 participants