Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 6231][Build] Configure maven http connection pool setting and maven http retry setting in Github Flows #8386

Merged
merged 1 commit into from
Oct 27, 2020

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Oct 27, 2020

Fixes #6231

Motivation

See #6231 . This is a common problem in Pulsar CI running on Github Actions.

Modifications

  • Set MAVEN_OPTS to -Dmaven.wagon.httpconnectionManager.ttlSeconds=25 -Dmaven.wagon.http.retryHandler.count=3
    in Github Flows environment.

This will set the http connection pool TTL to 25 seconds. In addition, http requests will be retried 3 times.

https://issues.apache.org/jira/browse/WAGON-545 is a close match to the issue #6231 .
In the last comment, it says "Azure users shall set the TTL to 240 seconds or less."

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
But it would be better to apply this change to mvn invocation in github actions files.
If we commit this file the change will apply to every local environment and to other CI systems (for people who test Pulsar on their own CI systems)

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

It seems like the setting -Dmaven.wagon.http.retryHandler.count=3 isn't effective for the download problem since one of the jobs for this PR failed: https://github.com/apache/pulsar/pull/8386/checks?check_run_id=1313003063

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

Documentation for maven.wagon.http.retryHandler.count can be found in https://maven.apache.org/wagon/wagon-providers/wagon-http/ .

It mentions "Any retry handler can only react to exceptions when executing the request and receiving the response head. It will not salvage in-flight failures of ongoing response body streams." . It seems that it won't be useful for retrying downloads.

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

But it would be better to apply this change to mvn invocation in github actions files.
If we commit this file the change will apply to every local environment and to other CI systems (for people who test Pulsar on their own CI systems)

@eolivelli Eventually I will go with this proposal, so that the settings apply only for github actions. It turns out that -Dmaven.wagon.http.retryHandler.count=3 doesn't help and the correct solution seems to be to use -Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false. This is originally mentioned in an Azure related issue: https://developercommunity.visualstudio.com/content/problem/357696/maven-project-build-failing-with-connection-reset.html . (linked from the SO Q&A https://stackoverflow.com/a/58769870)

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

There is an issue https://issues.apache.org/jira/browse/WAGON-486 , besides disabling the connection pool completely, it would be possible to tweak the maven.wagon.httpconnectionManager.ttlSeconds setting.

Fixes apache#6231

Maven's wagon-http documentation:
https://maven.apache.org/wagon/wagon-providers/wagon-http/

Set maven.wagon.httpconnectionManager.ttlSeconds to
25 seconds.
maven.wagon.httpconnectionManager.ttlSeconds documentation in source code:
https://github.com/apache/maven-wagon/blob/a7c8e3470dd968961e87a7cb9a3829d3bec77383/wagon-providers/wagon-http-shared/src/main/java/org/apache/maven/wagon/shared/http/AbstractHttpClientWagon.java#L297-L305

Also add -Dmaven.wagon.http.retryHandler.count=3 which
will retry http calls 3 times. However this has no impact on
retrying downloads. (documentatation: "Any retry handler can
only react to exceptions when executing the request and receiving
the response head. It will not salvage in-flight failures of
ongoing response body streams.")
@lhotari lhotari changed the title [Issue 6231][Build] Configure maven to retry downloads 3 times [Issue 6231][Build] Configure maven http connection pool setting and maven http retry setting in Github Flows Oct 27, 2020
@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

I also found https://issues.apache.org/jira/browse/WAGON-545 which is a close match to the issue we are having.
In the last comment, it says "Azure users shall set the TTL to 240 seconds or less."

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli
Copy link
Contributor

eolivelli commented Oct 27, 2020

@lhotari I see the same problem in BookKeeper repository
https://github.com/apache/bookkeeper/pull/1901/checks?check_run_id=1314277589

Error: Failed to execute goal on project bookkeeper-common: Could not resolve dependencies for project org.apache.bookkeeper:bookkeeper-common:jar:4.12.0-SNAPSHOT: Could not transfer artifact org.jctools:jctools-core:jar:2.1.2 from/to central (https://repo.maven.apache.org/maven2): Transfer failed for https://repo.maven.apache.org/maven2/org/jctools/jctools-core/2.1.2/jctools-core-2.1.2.jar: Connection reset -> [Help 1]

do you mind porting this PR to Apache BooKeeper repository as well ?

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

do you mind porting this PR to Apache BooKeeper repository as well ?

@eolivelli sure, I can do that. It's apache/bookkeeper#2460 .

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

2 similar comments
@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@sijie sijie added this to the 2.7.0 milestone Oct 27, 2020
@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

btw. Github Actions / Flows do run on Azure.
https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners#cloud-hosts-for-github-hosted-runners

GitHub hosts Linux and Windows runners on Standard_DS2_v2 virtual machines in Microsoft Azure with the GitHub Actions runner application installed. The GitHub-hosted runner application is a fork of the Azure Pipelines Agent. Inbound ICMP packets are blocked for all Azure virtual machines, so ping or traceroute commands might not work. For more information about the Standard_DS2_v2 machine resources, see "Dv2 and DSv2-series" in the Microsoft Azure documentation.

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

1 similar comment
@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@lhotari
Copy link
Member Author

lhotari commented Oct 27, 2020

/pulsarbot run-failure-checks

@merlimat merlimat merged commit 03f71d9 into apache:master Oct 27, 2020
@lhotari lhotari deleted the lh-retry-maven-downloads branch October 27, 2020 22:58
@lhotari
Copy link
Member Author

lhotari commented Oct 29, 2020

Actually, I just now noticed yet another connection reset when using MAVEN_OPTS=-Dmaven.wagon.httpconnectionManager.ttlSeconds=25 -Dmaven.wagon.http.retryHandler.count=3
https://github.com/apache/pulsar/runs/1325014611?check_suite_focus=true

Perhaps we need to switch to -Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false and disable the connection pool completely. That's what others are doing in actions/runner-images#1499

jiazhai pushed a commit to apache/bookkeeper that referenced this pull request Nov 4, 2020
### Motivation

Fixes "Transfer failed for https://repo.maven.apache.org/...
.jar: Connection reset" type of failures in Github Flows environment such as
```
Error:  Failed to execute goal on project bookkeeper-common: Could not resolve dependencies for project org.apache.bookkeeper:bookkeeper-common:jar:4.12.0-SNAPSHOT: Could not transfer artifact org.jctools:jctools-core:jar:2.1.2 from/to central (https://repo.maven.apache.org/maven2): Transfer failed for https://repo.maven.apache.org/maven2/org/jctools/jctools-core/2.1.2/jctools-core-2.1.2.jar: Connection reset -> [Help 1]
```

### Changes

Set `maven.wagon.httpconnectionManager.ttlSeconds` to 25 seconds.
Besides this, set `maven.wagon.http.retryHandler.count` to 3 retries.

https://issues.apache.org/jira/browse/WAGON-545
contains a recommendation "Azure users shall set the TTL to 240 seconds or less."

The reason for the 25 second TTL is to ensure that it's shorter than any common
firewall or NAT timeout. Some NATs have a 30 second idle timeout although
that is very rare. There shouldn't be harm in using the 25 second TTL since
the connection pool will be able to pool connections well with a 25 second TTL.

The documentation for `maven.wagon.httpconnectionManager.ttlSeconds` is
available in the source code:
https://github.com/apache/maven-wagon/blob/wagon-3.4.1/wagon-providers/wagon-http-shared/src/main/java/org/apache/maven/wagon/shared/http/AbstractHttpClientWagon.java#L297-L305

Documentation for `maven.wagon.http.retryHandler.count` is in
https://maven.apache.org/wagon/wagon-providers/wagon-http/
"Any retry handler can only react to exceptions when executing the request and
receiving the response head. It will not salvage in-flight failures of ongoing
response body streams." Therefore the retry count setting is a bit
different than expected. WAGON-545 explains that ConnectionExceptions
aren't part of the retried exceptions by default. If such issues become
problems, it's possible to configure the retry handler in a more fine
grained way.

This is similar to the change made in Pulsar: apache/pulsar#8386
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Nov 13, 2020
Fixes apache#6231

Maven's wagon-http documentation:
https://maven.apache.org/wagon/wagon-providers/wagon-http/

Set maven.wagon.httpconnectionManager.ttlSeconds to
25 seconds.
maven.wagon.httpconnectionManager.ttlSeconds documentation in source code:
https://github.com/apache/maven-wagon/blob/a7c8e3470dd968961e87a7cb9a3829d3bec77383/wagon-providers/wagon-http-shared/src/main/java/org/apache/maven/wagon/shared/http/AbstractHttpClientWagon.java#L297-L305

Also add -Dmaven.wagon.http.retryHandler.count=3 which
will retry http calls 3 times. However this has no impact on
retrying downloads. (documentatation: "Any retry handler can
only react to exceptions when executing the request and receiving
the response head. It will not salvage in-flight failures of
ongoing response body streams.")
flowchartsman pushed a commit to flowchartsman/pulsar that referenced this pull request Nov 17, 2020
Fixes apache#6231

Maven's wagon-http documentation:
https://maven.apache.org/wagon/wagon-providers/wagon-http/

Set maven.wagon.httpconnectionManager.ttlSeconds to
25 seconds.
maven.wagon.httpconnectionManager.ttlSeconds documentation in source code:
https://github.com/apache/maven-wagon/blob/a7c8e3470dd968961e87a7cb9a3829d3bec77383/wagon-providers/wagon-http-shared/src/main/java/org/apache/maven/wagon/shared/http/AbstractHttpClientWagon.java#L297-L305

Also add -Dmaven.wagon.http.retryHandler.count=3 which
will retry http calls 3 times. However this has no impact on
retrying downloads. (documentatation: "Any retry handler can
only react to exceptions when executing the request and receiving
the response head. It will not salvage in-flight failures of
ongoing response body streams.")
v1v added a commit to v1v/apm-agent-java that referenced this pull request Dec 7, 2020
lhotari added a commit to lhotari/pulsar that referenced this pull request Dec 11, 2020
…" issues

- previously the http connection pooling timeout was reduced to 25 seconds in
  PR apache#8386. However the issue has persisted. The mitigation used in other
  projects is to disable maven's http connection pooling completely.
  The solution is proposed here:
  - actions/runner-images#1499 (comment)
lhotari added a commit to lhotari/pulsar that referenced this pull request Dec 12, 2020
…" issues

- previously the http connection pooling timeout was reduced to 25 seconds in
  PR apache#8386. However the issue has persisted. The mitigation used in other
  projects is to disable maven's http connection pooling completely.
  The solution is proposed here:
  - actions/runner-images#1499 (comment)
sijie pushed a commit that referenced this pull request Dec 13, 2020
…" issues (#8921)

### Motivation

PR #8386 configured the http connection pooling timeout to 25 seconds. 
However the "connection reset" issue with maven dependency downloads has persisted. 
he mitigation used in other  projects is to disable maven's http connection pooling completely.
The solution is proposed here:
- actions/runner-images#1499 (comment)

### Modifications

Disable maven's http connection pool by passing `-Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false` in `MAVEN_OPTS` environment variable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent Maven build errors - Could not resolve dependencies / Could not transfer artifact
4 participants