Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use waiters in ECS Operators instead of inner sensors #29761

Merged
merged 4 commits into from
Mar 1, 2023

Conversation

Taragolis
Copy link
Contributor

closes: #29556

Use custom botocore.Waiters instead of inner sensors in ECS Operators

Comment on lines 239 to 241
if task_definition_state == EcsTaskDefinitionStates.INACTIVE:
# In some circumstances ECS Task Definition deleted immediately,
# and there is no reason wait for completion.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest this happen almost always 🤣

@@ -55,7 +55,7 @@ def should_retry_eni(exception: Exception):
return False


class EcsClusterStates(Enum):
class EcsClusterStates(str, Enum):
Copy link
Contributor Author

@Taragolis Taragolis Feb 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for allow comparison EcsClusterStates.ACTIVE == "ACTIVE" unfortunetly class StrEnum available only in python 3.11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping to use StrEnum there but... not yet. :( I didn't know you could do it this way, nice.

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions in the waiter config file specifically. Non-blocking.

@@ -55,7 +55,7 @@ def should_retry_eni(exception: Exception):
return False


class EcsClusterStates(Enum):
class EcsClusterStates(str, Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping to use StrEnum there but... not yet. :( I didn't know you could do it this way, nice.

airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
@@ -0,0 +1,81 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love seeing this put to use. I think it's so much cleaner in the implementation. 👍

Comment on lines +28 to +31
"expected": "MISSING",
"matcher": "pathAny",
"state": "failure",
"argument": "failures[].reason"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting catch. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, I've looked into builtin botocore waiters for ECS: https://github.com/boto/botocore/blob/develop/botocore/data/ecs/2014-11-13/waiters-2.json

And after that I checked which state available for different API calls https://docs.aws.amazon.com/AmazonECS/latest/developerguide/api_failures_messages.html

}
]
},
"task_definition_active": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No failure states for this one? I think if the status goes to 'DELETE_IN_PROGRESS' it can fail early. Not sure where 'INACTIVE' is on the state flow on this call though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 'DELETE_IN_PROGRESS' it could be some kind failure step, I can't imagine how it possible that state changed to this state during create task definition, but why not to add it as failure.

Copy link
Contributor

@ferruzzi ferruzzi Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much my thought as well. I don't how it would get there, but if it did then something is clearly wrong so there's no point waiting for the timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}
]
},
"task_definition_inactive": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would 'DELETE_IN_PROGRESS' be a success state here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather say it intermediate state in case of delete task definition.

@@ -39,7 +39,7 @@
EcsTaskDefinitionStates,
should_retry_eni,
)
from airflow.providers.amazon.aws.sensors.ecs import EcsClusterStateSensor, EcsTaskDefinitionStateSensor
from airflow.utils.helpers import prune_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I have a bunch of EMR waiters I have been workigb on converting over but put on pause to verify how they behave if None is passed in, but this would solve that nncely. I'll get the EMR ones up this week and tag you in them. 👍

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. Thanks!

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition!

Just some wording suggestions and a question or two.

airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like everything got fixed. Approved 👍

@o-nikolas o-nikolas merged commit 181a825 into apache:main Mar 1, 2023
@ferruzzi
Copy link
Contributor

ferruzzi commented Mar 1, 2023

ECS System test is failing after this merge, the await_cluster task (which calls EcsClusterStateSensor) is timing out after logging INFO airflow.task.operators:ecs.py:98 Cluster state: EcsClusterStates.ACTIVE, waiting for: EcsClusterStates.ACTIVE. I'll try to take a look at it tomorrow

@Taragolis Taragolis deleted the ecs-operators-waiters branch March 2, 2023 08:26
@Taragolis
Copy link
Contributor Author

ECS System test is failing after this merge, the await_cluster task (which calls EcsClusterStateSensor) is timing out after logging INFO airflow.task.operators:ecs.py:98 Cluster state: EcsClusterStates.ACTIVE, waiting for: EcsClusterStates.ACTIVE. I'll try to take a look at it tomorrow

And this a bit strange, because after this logger record, as I could see there is no external calls (to AWS) and only simple validation which should finished in a milliseconds

@Taragolis
Copy link
Contributor Author

Yep locally it also checked forever ¯\_(ツ)_/¯

I guess it is combination of airflow serializer, templated fields and multi inheritance (which I add for enums)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Different AWS ECS Operators use inner Sensor and do not propagate connection arguments
4 participants