Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait_for_connection timeout/retries fixes #75388

Open
wants to merge 15 commits into
base: devel
Choose a base branch
from

Conversation

lilatomic
Copy link
Contributor

SUMMARY

fixes #74237

Sets the timeout and disables retries on the connection.
Also improves the timeout behaviour so it won't exceed the maximum timeout specified.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

wait_for_connection

ADDITIONAL INFORMATION

Before:

time ansible -i 10.99.99.99, -m wait_for_connection -a '{{ {"timeout": 3, "connect_timeout": 1} }}' all
10.99.99.99 | FAILED! => {
    "changed": false,
    "elapsed": 11,
    "msg": "timed out waiting for ping module test: Failed to connect to the host via ssh: ssh: connect to host 10.99.99.99 port 22: Connection timed out"
}
Command exited with non-zero status 2
2.42user 0.20system 0:12.24elapsed 21%CPU (0avgtext+0avgdata 46592maxresident)k
0inputs+16outputs (0major+30820minor)pagefaults 0swaps

After:

time ansible -i 10.99.99.99, -m wait_for_connection -a '{{ {"timeout": 3, "connect_timeout": 1} }}' all
10.99.99.99 | FAILED! => {
    "changed": false,
    "elapsed": 2,
    "msg": "timed out waiting for ping module test: Failed to connect to the host via ssh: ssh: connect to host 10.99.99.99 port 22: Connection timed out"
}
Command exited with non-zero status 2
1.30user 0.14system 0:03.35elapsed 43%CPU (0avgtext+0avgdata 46864maxresident)k
41912inputs+16outputs (117major+30787minor)pagefaults 0swaps

We can see that the elapsed time no longer exceeds the timeout

@ansibot ansibot added affects_2.12 bug This issue/PR relates to a bug. core_review In order to be merged, this PR must follow the core review workflow. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. new_contributor This PR is the first contribution by a new community member. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Aug 2, 2021
@ansibot ansibot added needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. core_review In order to be merged, this PR must follow the core review workflow. and removed core_review In order to be merged, this PR must follow the core review workflow. needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Aug 2, 2021
@bcoca bcoca changed the title Issue 74237 wait_for_connection timeout/retries fixes Aug 2, 2021
@bcoca bcoca added P3 Priority 3 - Approved, No Time Limitation and removed needs_triage Needs a first human triage before being processed. labels Aug 3, 2021
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Aug 11, 2021
@ansibot ansibot added needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. labels Sep 28, 2021
@ansibot ansibot added core_review In order to be merged, this PR must follow the core review workflow. and removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Oct 2, 2021
Copy link
Contributor

@s-hertel s-hertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added tests are awesome.

max_end_time = datetime.utcnow() + timedelta(seconds=timeout)

e = None
while datetime.utcnow() < max_end_time:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simplify this. The _total_seconds shim doesn't seem necessary since those Python versions aren't supported anyway.

Suggested change
while datetime.utcnow() < max_end_time:
while datetime.utcnow() < max_end_time:
if (datetime.utcnow() + timedelta(seconds=connect_timeout)) > max_end_time:
# connect_timeout is longer than the max time left, so use the remaining time instead
connect_timeout = int((max_end_time - datetime.utcnow()).total_seconds())
if connect_timeout < 1:
# invalid timeout or only a fraction of a second is left
break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's much simpler. I modified the suggestion slightly so it doesn't reassign to the connect_timeout argument and so it uses the same value for 'now' throughout an iteration of the while loop (good practice for guarding against timing edge cases).

@ansibot
Copy link
Contributor

ansibot commented Oct 6, 2021

@lilatomic This PR contains @ mentions in at least one commit message. Those mentions can cause cascading notifications through GitHub and need to be removed. Please squash or amend your commits to remove the mentions.

click here for bot help

@ansibot ansibot removed the core_review In order to be merged, this PR must follow the core review workflow. label Oct 6, 2021
@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jun 25, 2022
@webknjaz
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lilatomic
Copy link
Contributor Author

looks like a flaky CI? I don't think this would affect the galaxy setup
looks like this hung:

03:53 TASK [ansible-galaxy-collection : setup test collections for install and download test] ***
50:02 ERROR: Tests aborted after exceeding the 50 minute time limit.

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. labels Jun 27, 2022
@webknjaz
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ansibot ansibot added core_review In order to be merged, this PR must follow the core review workflow. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Jun 28, 2022
@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. core_review In order to be merged, this PR must follow the core review workflow. and removed core_review In order to be merged, this PR must follow the core review workflow. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Jul 6, 2022
@webknjaz
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jul 21, 2022
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jul 29, 2022
@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.12 bug This issue/PR relates to a bug. core_review In order to be merged, this PR must follow the core review workflow. has_issue module This issue/PR relates to a module. needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html P3 Priority 3 - Approved, No Time Limitation stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wait_for_connection ignores short timeouts
5 participants