Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase default HTTP timeout to 180 seconds #1121

Closed

Conversation

adamtheturtle
Copy link
Contributor

@adamtheturtle adamtheturtle commented Dec 7, 2017

See https://jira.mesosphere.com/browse/DCOS_OSS-1957

5 seconds is too slow to pull the universe on my current network connection.
As per the JIRA issue, this matches the documented default.

https://jira.mesosphere.com/browse/DCOS_OSS-1956 may cover using the value from the config but this is an improvement anyways, in my opinion.

@adamtheturtle
Copy link
Contributor Author

cc @bamarni

@@ -796,7 +796,7 @@ def _get_api_url(path):

def _get_timeout():
"""
:returns: timout value for API calls
:returns: timeout value for API calls
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, feel free to update the PR to only keep this commit and I'll merge.

@adamtheturtle
Copy link
Contributor Author

run linux integration tests

@adamtheturtle
Copy link
Contributor Author

@bamarni the test failure looks unrelated to me:

[linux-tests] =================================== FAILURES ===================================
[linux-tests] __________________________ test_show_bad_app_version ___________________________
[linux-tests] 
[linux-tests]     def test_show_bad_app_version():
[linux-tests]         with _zero_instance_app():
[linux-tests]             _update_app(
[linux-tests]                 'zero-instance-app',
[linux-tests]                 'tests/data/marathon/apps/update_zero_instance_sleep.json')
[linux-tests]     
[linux-tests]             returncode, stdout, stderr = exec_command(
[linux-tests]                 ['dcos', 'marathon', 'app', 'show', '--app-version=20:39:32.972Z',
[linux-tests]                  'zero-instance-app'])
[linux-tests]             assert returncode == 1
[linux-tests]             assert stdout == b''
[linux-tests] >           assert stderr.startswith(b'Error while fetching')
[linux-tests] E           assert False
[linux-tests] E            +  where False = <built-in method startswith of bytes object at 0x7f51cae07bd0>(b'Error while fetching')
[linux-tests] E            +    where <built-in method startswith of bytes object at 0x7f51cae07bd0> = b"Error: Text '20:39:32.972Z' could not be parsed at index 0\n".startswith
[linux-tests] 
[linux-tests] tests/integrations/test_marathon.py:172: AssertionError

@bamarni
Copy link
Contributor

bamarni commented Dec 7, 2017

@adamtheturtle : Thanks for the PR. This timeout is a hard topic as increasing and decreasing it affect a lot of different scenarios. To give you some background and suggestions about this :

  • Here is when we introduced a timeout limit for almost all requests [http] always use a timeout by default #1052
  • If we were to increase the default timeout, we should hardcode 5 seconds in all the places that were changed here (Remove the hardcoded 1 second timeout for some HTTP calls #1084, I should probably have done this at the time though).
  • https://jira.mesosphere.com/browse/DCOS_OSS-1956 is a very good point and an embarassing bug, instead of increasing the default timeout, could we just take it in account? It should just be a matter of making a call to config.get_config_val('core.timeout') if no timeout is provided and before using the default.
  • I wouldn't be so quick saying the test failure is unrelated to your changes, the fixture file associated to the failing test contains a sleep 100 command in the app definition.

Personally I find 180 quite long, this would leave some commands hanging when a cluster is unreachable. To clarify also about this timeout, it is not a timeout for the whole request / response round trip, quoting the requests doc :

timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.

So I think we should leave it low by default, if there are specific requests which require a longer one it should be set to None explicitly or to something high (like in #1099 for example). For automated scripts it'd make sense to set the core.timeout config value.

@adamtheturtle
Copy link
Contributor Author

@bamarni Thank you for your considered response.

I'm in a situation where the 5 second timeout is not enough.
That situation is that I'm trying to run the Enterprise integration tests on a cluster which cannot download the universe in 5 seconds.

Given that https://jira.mesosphere.com/browse/DCOS_OSS-1956 exists as a bug, and I am blocked apart from the most hacky hacks - I'd really like to fix the issue that I'm facing.

Options I can think of:

  • Keep the status quo, for now - this would make me sad as I spent a lot of time and effort on fighting the bug.
  • Add a larger timeout just for the cosmos commands. I think that this is possible, and I'd happily help to do that with your guidance.
  • Get to https://jira.mesosphere.com/browse/DCOS_OSS-1956 now. However, I do feel that this is beyond the scope of what I can do, at least now. I tried to add a fix that helped me and didn't add too much of a burden.
  • Increase the timeout now and forever
  • Increase the timeout now, and revert that change when https://jira.mesosphere.com/browse/DCOS_OSS-1956 is done

If https://jira.mesosphere.com/browse/DCOS_OSS-1956 is done and nothing else, I'd want the integration test suite to increase the limit for this command.

@bamarni
Copy link
Contributor

bamarni commented Dec 13, 2017

@adamtheturtle : see #1126

@adamtheturtle adamtheturtle deleted the adamdangoor/timeout-DCOS-19836 branch December 20, 2017 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants