Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: ecs service hotswap deployment does not wait for deployment to complete #27882

Closed
tomwwright opened this issue Nov 7, 2023 · 4 comments · Fixed by #28448
Closed

cli: ecs service hotswap deployment does not wait for deployment to complete #27882

tomwwright opened this issue Nov 7, 2023 · 4 comments · Fixed by #28448
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p1 package/tools Related to AWS CDK Tools or CLI

Comments

@tomwwright
Copy link
Contributor

tomwwright commented Nov 7, 2023

Describe the bug

ECS hotswap deployments report success immediately, instead of waiting for the deployment to succeed (or fail).

Example showing complete deploy time of 16s:

> pnpm exec cdk deploy --hotswap --exclusively my-cool-stack

✨  Synthesis time: 4.43s

⚠️ The --hotswap and --hotswap-fallback flags deliberately introduce CloudFormation drift to speed up deployments
⚠️ They should only be used for development - never use them for your production Stacks!

my-cool-stack:  start: Building 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack:  success: Built 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack:  start: Publishing 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack:  success: Published 763ed553d17755524a692452e0dbdc4aac573b775b6003699d978e3a3c5d9297:current_account-current_region
my-cool-stack: deploying... [1/1]

✨ hotswapping resources:
   ✨ ECS Task Definition 'my-cool-stack-api'
   ✨ ECS Service 'my-cool-stack-backendServiceC9D5DD77-jJXtgE5oL9az'
   ✨ ECS Task Definition 'my-cool-stack-frontend'
   ✨ ECS Service 'my-cool-stack-frontendService12C63704-yOwzQjJgpvjX'
✨ ECS Task Definition 'my-cool-stack-frontend' hotswapped!
✨ ECS Service 'my-cool-stack-frontendService12C63704-yOwzQjJgpvjX' hotswapped!
✨ ECS Task Definition 'my-cool-stack-api' hotswapped!
✨ ECS Service 'my-cool-stack-backendServiceC9D5DD77-jJXtgE5oL9az' hotswapped!

 ✅  my-cool-stack

✨  Deployment time: 12.54s

Stack ARN:
xxx

✨  Total time: 16.96s

I note this behaviour looks to have been the same since the hotswap was initially implemented and so any users of this feature might expect that it is behaving as expected

Expected Behavior

I expect that the CDK hotswap deployment monitors the state of the triggered deployment via the DescribeServices API to ensure it completes successfully before continuing

Current Behavior

Currently the CDK pushes the ECS hotswap deployment and then immediately reports it as a success and continues.

The CDK does set up a custom waiter to await the successful deployment but the success acceptor is configured as the expression:

length(services[].deployments[? status == 'PRIMARY' && runningCount < desiredCount][]) == `0`

This doesn't wait correctly as the new PRIMARY deployment is first created with an intermediate state of runningCount: 0 and desiredCount: 0. It is then populated correctly with a desired and pending count as the scheduler gets to work. But in that initial zero state runningCount < desiredCount is false and therefore the waiter matches on it for success and continues.

Reproduction Steps

Perform any ECS hotswap deployment

Possible Solution

The following waiter acceptor expression should more accurately interrogate the DescribeServices state. I can raise a PR if we agree this is an issue that needs to be fixed.

length(services[].deployments[? status == 'PRIMARY' && rolloutState == 'COMPLETED'][]) == `1`

Additional Information/Context

Running this command I observed the following deployment state changes:

watch -n 1 aws ecs describe-services --cluster $cluster --services $service --query 'services[].deployments'

New deployment created in "zero" state

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 0,
            "pendingCount": 0,
            "runningCount": 0,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "ACTIVE",
            ....
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Deployment gets correct counts

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 1,
            "runningCount": 0,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "ACTIVE",
            ....
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Deployment launches new task successfully, previous deployment scaled down

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "ACTIVE",
            ....
            "desiredCount": 0,
            "pendingCount": 0,
            "runningCount": 1,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Previous deployment scaled down, moves into DRAINING state

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        },
        {
            "status": "DRAINING",
            ....
            "desiredCount": 0,
            "pendingCount": 0,
            "runningCount": 0,
            ....
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/5831249761506821993 completed."
        }
    ]
]

Previous deployment removed

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "IN_PROGRESS",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 in progress."
        }
    ]
]

New deployment completed

[
    [
        {
            "status": "PRIMARY",
            ...
            "desiredCount": 1,
            "pendingCount": 0,
            "runningCount": 1,
            ...
            "rolloutState": "COMPLETED",
            "rolloutStateReason": "ECS deployment ecs-svc/9717487399336357090 completed."
        }
    ]
]

CDK CLI Version

2.103.0 (build d0d7547)

Framework Version

No response

Node.js Version

18.16.0

OS

MacOS

Language

TypeScript

Language Version

4.9.5

Other information

No response

@tomwwright tomwwright added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 7, 2023
@github-actions github-actions bot added the package/tools Related to AWS CDK Tools or CLI label Nov 7, 2023
@tomwwright tomwwright changed the title (cli): ecs service hotswap deployment does not wait for deployment to complete cli: ecs service hotswap deployment does not wait for deployment to complete Nov 10, 2023
@pahud pahud added p1 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Nov 10, 2023
@pahud
Copy link
Contributor

pahud commented Nov 10, 2023

Thanks for the report. Yes we definitely should fix that. We'll be happy to review your PR when it's ready.

@tomwwright
Copy link
Contributor Author

Thanks @pahud please find my PR now linked to this issue :)

@tomwwright
Copy link
Contributor Author

@pahud Can you please re-open my linked PR? It has been closed for staleness despite me refreshing the PR recently

mergify bot pushed a commit that referenced this issue Dec 26, 2023
Because of PRs that require cli integ tests run, some PRs have the automated review failing while awaiting review (and integ tests run). We shouldn't auto-close these as stale ever, since the ball is in our court. 

See #27882, where we were bad.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
paulhcsun pushed a commit to paulhcsun/aws-cdk that referenced this issue Jan 5, 2024
…8481)

Because of PRs that require cli integ tests run, some PRs have the automated review failing while awaiting review (and integ tests run). We shouldn't auto-close these as stale ever, since the ball is in our court. 

See aws#27882, where we were bad.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@mergify mergify bot closed this as completed in #28448 Mar 26, 2024
mergify bot pushed a commit that referenced this issue Mar 26, 2024
…re (#28448)

Reraising #27943 as it was incorrectly closed as stale. See original PR for details.

Closes #27882. See linked issue for further details.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p1 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
2 participants