Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecs service module health check settings #47217

Merged

Conversation

stefanhorning
Copy link
Contributor

SUMMARY

Added health check grace period to ecs_service module. New attempt after closing this PR way back:
#39289 and none of the other PRs got merged.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

module ecs_service.py

ADDITIONAL INFORMATION

Added backwards compatibility this time with a check for the botocore version. Also makes sure the healthCheckGracePeriodSeconds param is only applied when loadbalancers are configured (as creation otherwise fails). Added some initial tests.

@ansibot
Copy link
Contributor

ansibot commented Oct 17, 2018

Hi @stefanhorning, thank you for submitting this pull-request!

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Oct 17, 2018

@ansibot
Copy link
Contributor

ansibot commented Oct 17, 2018

@stefanhorning, just so you are aware we have a dedicated Working Group for aws.
You can find other people interested in this in #ansible-aws on Freenode IRC
For more information about communities, meetings and agendas see https://github.com/ansible/community

click here for bot help

@ansibot ansibot added affects_2.8 This issue/PR affects Ansible v2.8 aws cloud community_review In order to be merged, this PR must follow the community review workflow. feature This issue/PR relates to a feature request. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. support:community This issue/PR relates to code supported by the Ansible community. labels Oct 17, 2018
@ansibot
Copy link
Contributor

ansibot commented Oct 17, 2018

The test ansible-test sanity --test pep8 [explain] failed with 2 errors:

lib/ansible/modules/cloud/amazon/ecs_service.py:461:39: E272 multiple spaces before keyword
lib/ansible/modules/cloud/amazon/ecs_service.py:463:1: E302 expected 2 blank lines, found 1

click here for bot help

@ansibot ansibot added ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed community_review In order to be merged, this PR must follow the community review workflow. labels Oct 17, 2018
def health_check_setable(self, params):
load_balancers = params.get('loadBalancers', [])
# check if botocore (and thus boto3) is new enough for using the healthCheckGracePeriodSeconds parameter
return len(load_balancers) > 0 and LooseVersion(botocore.__version__) >= LooseVersion('1.9.0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnsibleAWSModule has a function to check boto3/botocore versions you can use here.

Suggested change
return len(load_balancers) > 0 and LooseVersion(botocore.__version__) >= LooseVersion('1.9.0')
return len(load_balancers) > 0 and self.module.botocore_at_least('1.9.0')

@ansibot ansibot removed the needs_triage Needs a first human triage before being processed. label Oct 17, 2018
@ansibot ansibot removed the ci_verified Changes made in this PR are causing tests to fail. label Oct 17, 2018
@stefanhorning
Copy link
Contributor Author

stefanhorning commented Oct 17, 2018

@ryansb applied your recommended changes, also build is green now. Anything else I should do?

@ansibot ansibot added stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. labels Nov 1, 2018
@ezmac
Copy link
Contributor

ezmac commented Nov 16, 2018

ready_for_review
@stefanhorning, thanks for your work. After reading https://github.com/ansible/ansibullbot/blob/master/ISSUE_HELP.md#commands, I was hoping to kick this back into the review queue, but you may need to be the one who issues the command. I'm not sure.

@stefanhorning
Copy link
Contributor Author

ready_for_review

@ansibot ansibot added community_review In order to be merged, this PR must follow the community review workflow. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Nov 16, 2018
@@ -466,7 +481,8 @@ def main():
security_groups=dict(type='list'),
assign_public_ip=dict(type='bool'),
)),
launch_type=dict(required=False, choices=['EC2', 'FARGATE'])
launch_type=dict(required=False, choices=['EC2', 'FARGATE']),
health_check_grace_period_seconds=dict(required=False, type='int', default=30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't healthcheck grace period 0 seconds by default? Why would it default to 30 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find any docs on what values are allowed. However I assume I had a reason setting it to 30, probably if setting it there is a minimum. It's long ago I actually implented this, but I can recheck if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.create_service

healthCheckGracePeriodSeconds (integer) -- The period of time, in seconds, that the Amazon ECS service scheduler should ignore unhealthy Elastic Load Balancing target health checks after a task has first started. This is only valid if your service is configured to use a load balancer. If your service's tasks take a while to start and respond to Elastic Load Balancing health checks, you can specify a health check grace period of up to 7,200 seconds during which the ECS service scheduler ignores health check status. This grace period can prevent the ECS service scheduler from marking tasks as unhealthy and stopping them before they have time to come up.

I've also checked some of my services that were created before health check grace period was added and the return for it is 0. Having 30 seconds be the default wouldn't be a "problem" for me, but it would be a change from what I think is current behavior and not what I would expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw those docs already unfortunaltely they didn't say anything about the minimum value. But if you say it's 0 I will simply change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However looking at the code again it's actually an optional value, hence if you don't provide the parameter health_check_grace_period_seconds it won't actually be set (no matter what the default value is). I guess I just followed the pattern of the few lines above to always set a sane default if the param is required=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, I ran through a quick test omitting health check grace period with default at 30 and it looks like it gets set even when I omit in yaml.

But I see you dropped it, so I think this looks good. I'll give it another look over shortly and put a full review in if I can.

Thanks.

@ansibot ansibot removed stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. labels Nov 22, 2018
Copy link
Contributor

@ezmac ezmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me except the required botocore version.
I wouldn't have noticed except I looked at implementing this feature too. A little testing and searching through botocore commits suggested it needed 1.8.20 instead of 1.9.0. Not that it would break anything, but to be thorough :)

@@ -121,6 +121,11 @@
required: false
version_added: 2.7
choices: ["EC2", "FARGATE"]
health_check_grace_period_seconds:
description:
- Seconds to wait before health checking the freshly added/updated services. This option requires botocore >= 1.9.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if something else happened in 1.9.0, but I think health check grace period was added to botocore in 1.8.20.
boto/botocore@a110e8d#diff-48d02f400bc4e82bd35bf470ad45ef20R1145
This is the blame for 1.8.20 showing the addition of HealthCheckGracePeriodSeconds to ecs service. I couldn't find anything newer that would make it need 1.9.0.

def health_check_setable(self, params):
load_balancers = params.get('loadBalancers', [])
# check if botocore (and thus boto3) is new enough for using the healthCheckGracePeriodSeconds parameter
return len(load_balancers) > 0 and self.module.botocore_at_least('1.9.0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here as above. Testing with botocore 1.8.19 fails when trying to use healthcheck grace period and succeeds on 1.8.20
Otherwise, looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing, I will update it now

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed community_review In order to be merged, this PR must follow the community review workflow. labels Nov 22, 2018
Copy link
Contributor

@ezmac ezmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Hopefully this helps.

shipit

@stefanhorning
Copy link
Contributor Author

stefanhorning commented Nov 22, 2018

I hope so! It's probably the best reviewed handfull of lines by now, if you also include the initial PR #39289 😉

Also once this get's merged I guess we can also close #36035 and #38965

@ansibot ansibot added community_review In order to be merged, this PR must follow the community review workflow. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Nov 22, 2018
Copy link
Contributor

@willthames willthames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me - there are a couple of longer lines in this PR which I'd welcome shortening just to keep consistent (they go off edge of the github PR review which makes them noticeable) but that is not a blocker.

I'll run the test suite before giving it my full approval, but if anyone does that before me, then I'm happy for this to be merged as is.

@willthames
Copy link
Contributor

TASK [ecs_cluster : create ECS service definition with network configuration] ***
task path: /root/ansible/test/integration/targets/ecs_cluster/playbooks/roles/ecs_cluster/tasks/main.yml:417
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390 `" && echo ansible-tmp-1542968685.9855068-144333392449390="` echo /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390 `" ) && sleep 0'
Using module file /root/ansible/lib/ansible/modules/cloud/amazon/ecs_service.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-883hl1l4e9r/tmp6nkzmr_9 TO /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/ /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/tmp/ansible-test-coverage-r9e6n06y/coverage/injector.py /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py", line 113, in <module>
    _ansiballz_main()
  File "/root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py", line 105, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File "/root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py", line 48, in invoke_module
    imp.load_module('__main__', mod, module, MOD_DESC)
  File "/usr/lib/python3.6/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 170, in load_source
    module = _exec(spec, sys.modules[name])
  File "<frozen importlib._bootstrap>", line 618, in _exec
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/tmp/ansible_ecs_service_payload_1qonwwiu/__main__.py", line 621, in <module>
  File "/tmp/ansible_ecs_service_payload_1qonwwiu/__main__.py", line 566, in main
  File "/tmp/ansible_ecs_service_payload_1qonwwiu/__main__.py", line 411, in create_service
  File "/usr/lib/python3.6/site-packages/botocore/client.py", line 320, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python3.6/site-packages/botocore/client.py", line 597, in _make_api_call
    api_params, operation_model, context=request_context)
  File "/usr/lib/python3.6/site-packages/botocore/client.py", line 633, in _convert_to_request_dict
    api_params, operation_model)
  File "/usr/lib/python3.6/site-packages/botocore/validate.py", line 291, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter healthCheckGracePeriodSeconds, value: None, type: <class 'NoneType'>, valid types: <class 'int'>

fatal: [localhost]: FAILED! => {
    "changed": false,
    "module_stderr": "Traceback (most recent call last):\n  File \"/root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py\", line 113, in <module>\n    _ansiballz_main()\n  File \"/root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py\", line 105, in _ansiballz_main\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n  File \"/root/.ansible/tmp/ansible-tmp-1542968685.9855068-144333392449390/AnsiballZ_ecs_service.py\", line 48, in invoke_module\n    imp.load_module('__main__', mod, module, MOD_DESC)\n  File \"/usr/lib/python3.6/imp.py\", line 235, in load_module\n    return load_source(name, filename, file)\n  File \"/usr/lib/python3.6/imp.py\", line 170, in load_source\n    module = _exec(spec, sys.modules[name])\n  File \"<frozen importlib._bootstrap>\", line 618, in _exec\n  File \"<frozen importlib._bootstrap_external>\", line 678, in exec_module\n  File \"<frozen importlib._bootstrap>\", line 219, in _call_with_frames_removed\n  File \"/tmp/ansible_ecs_service_payload_1qonwwiu/__main__.py\", line 621, in <module>\n  File \"/tmp/ansible_ecs_service_payload_1qonwwiu/__main__.py\", line 566, in main\n  File \"/tmp/ansible_ecs_service_payload_1qonwwiu/__main__.py\", line 411, in create_service\n  File \"/usr/lib/python3.6/site-packages/botocore/client.py\", line 320, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n  File \"/usr/lib/python3.6/site-packages/botocore/client.py\", line 597, in _make_api_call\n    api_params, operation_model, context=request_context)\n  File \"/usr/lib/python3.6/site-packages/botocore/client.py\", line 633, in _convert_to_request_dict\n    api_params, operation_model)\n  File \"/usr/lib/python3.6/site-packages/botocore/validate.py\", line 291, in serialize_to_request\n    raise ParamValidationError(report=report.generate_report())\nbotocore.exceptions.ParamValidationError: Parameter validation failed:\nInvalid type for parameter healthCheckGracePeriodSeconds, value: None, type: <class 'NoneType'>, valid types: <class 'int'>\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1
}

@@ -401,19 +406,24 @@ def create_service(self, service_name, cluster_name, task_definition, load_balan
params['networkConfiguration'] = network_configuration
if launch_type:
params['launchType'] = launch_type
if self.health_check_setable(params):
params['healthCheckGracePeriodSeconds'] = health_check_grace_period_seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this would pass tests if this only set the value if the parameter wasn't None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was broken by removing the default value as requested by @ezmac

Will try to add a better check soonish.

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed community_review In order to be merged, this PR must follow the community review workflow. labels Nov 23, 2018
@willthames
Copy link
Contributor

No worries @stefanhorning - I'm just testing the fix now, I'll push it to your branch if it fixes it

@willthames
Copy link
Contributor

rebuild_merge

@ansibot ansibot added community_review In order to be merged, this PR must follow the community review workflow. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Nov 23, 2018
@willthames willthames merged commit c3b059d into ansible:devel Nov 23, 2018
@willthames
Copy link
Contributor

Merged for inclusion in 2.8. Thanks @stefanhorning for getting this over the line and @ezmac for the reviews

mjmayer pushed a commit to mjmayer/ansible that referenced this pull request Nov 30, 2018
* Added feature health_check_grace_period_seconds to ecs_service, this time with a botocore version check and some initial testing

* Only set health_check_grace_period_seconds when loadbalancers are defined

* Removed leftover commas and fix in test

* Removed blank line

* Minor improvements for ecs_service module

* Removed default (30) for health_check_grace_period_seconds param

* Changed botocore version allowed to 1.8.20 for health check param.

* Fix empty healthcheck failure
@ansible ansible locked and limited conversation to collaborators Jul 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.8 This issue/PR affects Ansible v2.8 aws cloud community_review In order to be merged, this PR must follow the community review workflow. feature This issue/PR relates to a feature request. module This issue/PR relates to a module. support:community This issue/PR relates to code supported by the Ansible community.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants