Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678

gibsondan · 2022-09-13T22:09:25Z

Right now the only way to use the EcsRunLauncher involves pulling permissions and other configuration from the task that is launching the run. This creates a problem in situations where you might want system code to have different IAM roles than user code, or even launch runs from outside of ECS. In Cloud, it creates an awkward situation where the grpc server tasks are configured based on fields on the instance, but runs are configured by pulling from the current task.

This PR creates a configuration option that lets you pass in all the configuration you need when launching run, with nothing pulled from the current container. The old version is kept as well.

vercel · 2022-09-13T22:09:28Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Ignored Deployments

Name	Status	Updated
dagit-storybook	⬜️ Ignored (Inspect)	Oct 3, 2022 at 3:22PM (UTC)
dagster	⬜️ Ignored (Inspect)	Oct 3, 2022 at 3:22PM (UTC)
dagster-oss-cloud-consolidated	⬜️ Ignored (Inspect)	Oct 3, 2022 at 3:22PM (UTC)

gibsondan · 2022-09-21T20:32:31Z

Current dependencies on/for this PR:

master
- PR Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678 👈
  - PR Add launch_type config to EcsRunLauncher #9762

This comment was auto-generated by Graphite.

jmsanders · 2022-09-26T16:16:20Z

...n_modules/libraries/dagster-aws/dagster_aws_tests/ecs_tests/launcher_tests/test_launching.py

+    assert container_definition["image"] == image
+    assert not container_definition.get("entryPoint")
+    assert not container_definition.get("dependsOn")
+    # But other stuff is inherited from the parent task definition


I know this is largely copy-pasted from other tests, but isn't this the exact opposite of what we're trying to support here?

Can we maybe reduce the test assertions to just the behavior that diverges from the default?

jmsanders · 2022-09-26T16:18:42Z

...n_modules/libraries/dagster-aws/dagster_aws_tests/ecs_tests/launcher_tests/test_launching.py

-def test_reuse_task_definition(instance):
-    image = "image"
-    secrets = []
+def test_reuse_task_definition(instance, ecs):


It's a little unclear to me if this change is meant to:

force registration of a new task definition revision for the same family

use a task definition without any pre-populated defaults from the parent task

Which probably points to both some poor decisions with how I initially wrote this and also some naming ambiguities that could be cleaned up in this PR?

jmsanders · 2022-09-26T16:32:25Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

-        task_definition = {}
-        with suppress(ClientError):
-            task_definition = self.ecs.describe_task_definition(taskDefinition=family)[
+        if self.use_current_task_definition:


What if we instead let people set an optional "TaskDefinition" dict in config - and we merged that dict with our existing defaults? That might help squash "impossible" states where you set use_current_task_definition to False but also don't provide everything needed to define a new task definition.

I'm also just having difficulty following what even happens differently if you set this to True - what do we still pull from the parent's task definition and not from the launcher config that couldn't be represented as a "default" value?

i'm excited to offer something like a task_definition_kwargs field or something that lets you specify arbitrary task definition config so that we don't have to re-implement the whole spec as Dagster config, but I don't think it would be sufficient here. The pieces config that we're offering are a mixture of things on the dagster container (log configuration / sidecars), the task definition (execution_role_arn, task_role_arn), and the task (cluster / security groups / subnets).

Generally I'm trying to follow the playbook/philosphy here (https://elementl.quip.com/9K0bAWhc6t9n/Agent-Configuration-Options) - where we have a relatively simple and blessed way to set the most common pieces of config we think people are likely to want(secrets etc.) and then eventually also an escape hatch where we expose the full raw config for the power users. I think there's value in not needing to send people to the boto spec if they want to do something like 'set env vars' or 'point at an existing cluster' or 'change the IAM role that's used'. But which things are considered 'the most common pieces of config' is very subjective.

gibsondan · 2022-09-26T18:01:01Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py

        for key in expected_keys
-        if key in metadata.task_definition.keys()
+        if key in current_task_definition_dict.keys()


@jmsanders my understanding is that the answer to your question of what else can end up in the task definition if you set use_current_task_definition=True can be found here - it's the set of things that are a) params going into register_task_definition and b) found on the output of describe_task_definition

But then we merge a bunch of stuff onto it based on what's in your launcher config.

I guess my question is how much is even left that we actually inherit vs. override.

There are many random fields here on the taskDefinition arg to that would no longer be copied over (placementConstraints etc.): https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.register_task_definition

Whether they are actually in use by anybody, I do not know

mostly discussed offline, but did a smallish pass on some naming as well

Ramshackle-Jamathon · 2022-09-28T15:56:44Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py

+            ("environment", Optional[List[Dict[str, str]]]),
+            ("execution_role_arn", Optional[str]),
+            ("task_role_arn", Optional[str]),
+            ("sidecars", List[Dict[str, Any]]),


tags should also be included here (and passed along to run_task)

might be better in a seperate pr though

Ramshackle-Jamathon · 2022-09-28T16:04:26Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py

+            sidecars=sidecars,
+        )
+
+    def task_definition_dict(self):


@jmsanders @gibsondan instead of managing/wrapping task_defs could we create a base task_def in the cloudformation that users get (and that users can also define) and lean more on overrides? Its a common pattern to define task defs in terraform/cloudformation and it kinda feels like we're creeping into IaC with this.

Also the api ratelimits for the describe/set task-def api's aren't that high by default and not checking/setting this stuff at runtime would help that.

I think the original motivations for the way we did things still hold true. Namely that we can't override the image and we want to enable the simple case where a user doesn't need to set up their own infrastructure (and can instead just use docker compose to launch):

https://dagster.phacility.com/D8404
https://dagster.phacility.com/D8486

But the more I've talked with Daniel, the more I think we were overzealous in how much information we copy forward from the original task definition.

gibsondan · 2022-09-29T13:10:51Z

Actually maybe I’m on board, let me run down that option a bit more

…

On Thu, Sep 29, 2022 at 7:36 AM Daniel Gibson ***@***.***> wrote: Joe can the image be overridden? If not I think that’s the big blocker here - although maybe the gRPC servers could expose a task definition to use rather than an image to use I’m not sure I want to block this change on that though since it’s fairly substantial and there are some real wins here (we’re not doing any more task definition registration after this PR than we were before) On Thu, Sep 29, 2022 at 7:31 AM Joe Van Drunen ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py > <#9678 (comment)>: > > > - container_definition: typing.Dict[str, typing.Any] > - assign_public_ip: bool > +class DagsterEcsTaskDefinitionConfig( > + NamedTuple( > + "_DagsterEcsTaskDefinitionConfig", > + [ > + ("family", str), > + ("image", str), > + ("container_name", str), > + ("command", Optional[str]), > + ("log_configuration", Optional[Dict[str, Any]]), > + ("secrets", Optional[List[Dict[str, str]]]), > + ("environment", Optional[List[Dict[str, str]]]), > + ("execution_role_arn", Optional[str]), > + ("task_role_arn", Optional[str]), > + ("sidecars", List[Dict[str, Any]]), > > tags should also be included here (and passed along to run_task) > > might be better in a seperate pr though > ------------------------------ > > In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py > <#9678 (comment)>: > > > + ] > + > + return DagsterEcsTaskDefinitionConfig( > + family=task_definition_dict["family"], > + image=container_definition["image"], > + container_name=container_name, > + command=container_definition.get("command"), > + log_configuration=task_definition_dict.get("logConfiguration"), > + secrets=container_definition.get("secrets"), > + environment=container_definition.get("environment"), > + execution_role_arn=container_definition.get("executionRoleArn"), > + task_role_arn=container_definition.get("taskRoleArn"), > + sidecars=sidecars, > + ) > + > + def task_definition_dict(self): > > @jmsanders <https://github.com/jmsanders> @gibsondan > <https://github.com/gibsondan> instead of managing/wrapping task_defs > could we create a base task_def in the cloudformation that users get (and > that users can also define) and lean more on overrides? Its a common > pattern to define task defs in terraform/cloudformation and it kinda feels > like we're creeping into IaC with this. > > Also the api ratelimits for the describe/set task-def api's aren't that > high by default and not checking/setting this stuff at runtime would help > that. > > — > Reply to this email directly, view it on GitHub > <#9678 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACAPJCYJMMSHMFSBIYWZ6DTWAWD23ANCNFSM6AAAAAAQL3I2YU> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

jmsanders · 2022-09-28T15:11:50Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

+            )
+            check.invariant(
+                not self.include_sidecars,
+                "can only set include_sidecars if use_current_task_definition is True",


Why? Because if you're providing your own task definition, presumably you've already defined the sidecars inside it?

jmsanders · 2022-09-28T15:12:16Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

+            )
+            check.invariant(
+                self.execution_role_arn,
+                "Must set execution_role_arn if not pulling from current task definition",


Why? Because if you're providing your own task definition, presumably you've already defined the execution role arn inside it?

jmsanders · 2022-09-28T15:13:12Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

+                is_required=False,
+                default_value=True,
+                description=(
+                    "Whether to use our current task definition to initialize the task definition "


Can we include an example of why you'd chose not to use the default behavior?

jmsanders · 2022-09-28T15:32:09Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

+                image=image,
+                container_name=self.container_name,
+                command=None,
+                log_configuration={


Are we still painting ourselves into a corner here by introducing another opinionated task definition? What happens when we want to support other logDriver configs?

jmsanders · 2022-09-29T13:44:59Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py

+            sidecars=sidecars,
+        )
+
+    def task_definition_dict(self):


I think the original motivations for the way we did things still hold true. Namely that we can't override the image and we want to enable the simple case where a user doesn't need to set up their own infrastructure (and can instead just use docker compose to launch):

https://dagster.phacility.com/D8404
https://dagster.phacility.com/D8486

But the more I've talked with Daniel, the more I think we were overzealous in how much information we copy forward from the original task definition.

gibsondan · 2022-09-29T16:01:37Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

-            "executionRoleArn"
-        ) and task_definition.get("taskRoleArn") == metadata.task_definition.get("taskRoleArn"):
-            task_definitions_match = True
+        return existing_task_definition_config == desired_task_definition_config


note to self - verify that this handles deep equality correctly

gibsondan · 2022-09-29T16:16:37Z

@jmsanders and @Ramshackle-Jamathon latest rev responds to your feedback about task definitions by changing the plan - instead of having a path where the run launcher constructs a new task def from scratch, the grpc server would specify the task definition arn to use via container_context (likely using its own task definition). That avoids a situation where we have two totally separate task defs to use for user code.

Still need to do a pass on the tests to reflect that new version, but curious for overall thoughts on that plan

jmsanders

Love this new direction.

I think we need to sanitize run_task_kwargs a bit though, don't we?

jmsanders · 2022-09-29T17:15:31Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

+                description="Additional arguments to include while running the task. See "
+                "https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task "
+                "for the available parameters. The overrides and taskDefinition arguments will always "
+                "be set by the tun launcher. If this field is not set, the arguments to run_task "


Suggested change

"be set by the tun launcher. If this field is not set, the arguments to run_task "

"be set by the run launcher. If this field is not set, the arguments to run_task "

Also, let's make sure to add a test for this statement:

The overrides and taskDefinition arguments will always be set by the tun launcher.

jmsanders · 2022-09-29T17:21:44Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

+
+            task_definition = family
+
+        if self.run_task_kwargs != None:


👨‍🍳 💋

Must nicer if/else logic.

jmsanders · 2022-09-29T17:28:09Z

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py

            launchType="FARGATE",
+            overrides=overrides,
+            **task_config.run_task_kwargs,


Don't we need to munge task_config.run_task_kwargs to not include taskDefinition, launchType, and overrides? Otherwise, we can end up in a situation where the same kwarg is passed to the function twice.

>>> def foo(**kwargs): ... pass ... >>> kwargs = {"bar": 1} >>> foo(**kwargs) >>> foo(bar=2, **kwargs) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: foo() got multiple values for keyword argument 'bar'

addressed

gibsondan · 2022-09-29T22:24:37Z

Well this has been a journey! @jmsanders i think this is ready for perusal again

…definitino [september 2022]

gibsondan · 2022-10-11T09:04:41Z

Joe can the image be overridden? If not I think that’s the big blocker here - although maybe the gRPC servers could expose a task definition to use rather than an image to use I’m not sure I want to block this change on that though since it’s fairly substantial and there are some real wins here (we’re not doing any more task definition registration after this PR than we were before)

…

On Thu, Sep 29, 2022 at 7:31 AM Joe Van Drunen ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py <#9678 (comment)>: > - container_definition: typing.Dict[str, typing.Any] - assign_public_ip: bool +class DagsterEcsTaskDefinitionConfig( + NamedTuple( + "_DagsterEcsTaskDefinitionConfig", + [ + ("family", str), + ("image", str), + ("container_name", str), + ("command", Optional[str]), + ("log_configuration", Optional[Dict[str, Any]]), + ("secrets", Optional[List[Dict[str, str]]]), + ("environment", Optional[List[Dict[str, str]]]), + ("execution_role_arn", Optional[str]), + ("task_role_arn", Optional[str]), + ("sidecars", List[Dict[str, Any]]), tags should also be included here (and passed along to run_task) might be better in a seperate pr though ------------------------------ In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py <#9678 (comment)>: > + ] + + return DagsterEcsTaskDefinitionConfig( + family=task_definition_dict["family"], + image=container_definition["image"], + container_name=container_name, + command=container_definition.get("command"), + log_configuration=task_definition_dict.get("logConfiguration"), + secrets=container_definition.get("secrets"), + environment=container_definition.get("environment"), + execution_role_arn=container_definition.get("executionRoleArn"), + task_role_arn=container_definition.get("taskRoleArn"), + sidecars=sidecars, + ) + + def task_definition_dict(self): @jmsanders <https://github.com/jmsanders> @gibsondan <https://github.com/gibsondan> instead of managing/wrapping task_defs could we create a base task_def in the cloudformation that users get (and that users can also define) and lean more on overrides? Its a common pattern to define task defs in terraform/cloudformation and it kinda feels like we're creeping into IaC with this. Also the api ratelimits for the describe/set task-def api's aren't that high by default and not checking/setting this stuff at runtime would help that. — Reply to this email directly, view it on GitHub <#9678 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPJCYJMMSHMFSBIYWZ6DTWAWD23ANCNFSM6AAAAAAQL3I2YU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

gibsondan force-pushed the ecs3september branch 7 times, most recently from 3bbaa62 to 96e869a Compare September 20, 2022 02:33

gibsondan changed the title ~~WIP Add ability to run EcsRunLauncher without using the current task definitino [september 2022]~~ Add option to run EcsRunLauncher without pulling config from the current task definition Sep 20, 2022

gibsondan marked this pull request as ready for review September 20, 2022 02:52

gibsondan requested review from jmsanders and Ramshackle-Jamathon September 20, 2022 02:52

gibsondan force-pushed the ecs3september branch 4 times, most recently from 7b1af28 to ff1ca4b Compare September 21, 2022 20:32

gibsondan mentioned this pull request Sep 21, 2022

Add launch_type config to EcsRunLauncher #9762

Closed

jmsanders previously requested changes Sep 26, 2022

View reviewed changes

gibsondan commented Sep 26, 2022

View reviewed changes

gibsondan force-pushed the ecs3september branch from ff1ca4b to 5b47142 Compare September 27, 2022 18:57

gibsondan requested a review from jmsanders September 27, 2022 18:58

gibsondan changed the title ~~Add option to run EcsRunLauncher without pulling config from the current task definition~~ Add option to run EcsRunLauncher without pulling config from the current ECS task Sep 28, 2022

gibsondan force-pushed the ecs3september branch from 5b47142 to ed6b235 Compare September 28, 2022 15:42

Ramshackle-Jamathon reviewed Sep 29, 2022

View reviewed changes

jmsanders reviewed Sep 29, 2022

View reviewed changes

gibsondan force-pushed the ecs3september branch from ed6b235 to 1d7ab33 Compare September 29, 2022 15:58

gibsondan commented Sep 29, 2022

View reviewed changes

gibsondan force-pushed the ecs3september branch from 1d7ab33 to 62821bd Compare September 29, 2022 16:04

jmsanders previously requested changes Sep 29, 2022

View reviewed changes

gibsondan force-pushed the ecs3september branch 2 times, most recently from 77a64cf to 7cc3605 Compare September 29, 2022 22:14

gibsondan requested review from Ramshackle-Jamathon and jmsanders September 29, 2022 22:15

gibsondan force-pushed the ecs3september branch from 7cc3605 to 8970632 Compare September 29, 2022 22:23

gibsondan changed the title ~~Add option to run EcsRunLauncher without pulling config from the current ECS task~~ Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config Sep 29, 2022

gibsondan force-pushed the ecs3september branch 2 times, most recently from 8a90370 to 3b1dec8 Compare September 29, 2022 22:32

jmsanders approved these changes Sep 30, 2022

View reviewed changes

WIP Add ability to run EcsRunLauncher without using the current task …

725b384

…definitino [september 2022]

gibsondan force-pushed the ecs3september branch from 3b1dec8 to 725b384 Compare October 3, 2022 15:21

gibsondan merged commit 9a2d194 into master Oct 3, 2022

gibsondan deleted the ecs3september branch October 3, 2022 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678

Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678

gibsondan commented Sep 13, 2022 •

edited

vercel bot commented Sep 13, 2022 •

edited

gibsondan commented Sep 21, 2022

jmsanders Sep 26, 2022

jmsanders Sep 26, 2022

jmsanders Sep 26, 2022

gibsondan Sep 26, 2022 •

edited

gibsondan Sep 26, 2022

jmsanders Sep 26, 2022

gibsondan Sep 26, 2022

Ramshackle-Jamathon Sep 28, 2022

Ramshackle-Jamathon Sep 28, 2022

jmsanders Sep 29, 2022

gibsondan commented Sep 29, 2022 via email

jmsanders Sep 28, 2022

jmsanders Sep 28, 2022

jmsanders Sep 28, 2022

jmsanders Sep 28, 2022

jmsanders Sep 29, 2022

gibsondan Sep 29, 2022

gibsondan commented Sep 29, 2022

jmsanders left a comment

jmsanders Sep 29, 2022

jmsanders Sep 29, 2022

jmsanders Sep 29, 2022

jmsanders Sep 29, 2022

gibsondan commented Sep 29, 2022

gibsondan commented Oct 11, 2022 via email

	"be set by the tun launcher. If this field is not set, the arguments to run_task "
	"be set by the run launcher. If this field is not set, the arguments to run_task "

Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678

Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678

Conversation

gibsondan commented Sep 13, 2022 • edited

vercel bot commented Sep 13, 2022 • edited

gibsondan commented Sep 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibsondan Sep 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibsondan commented Sep 29, 2022 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibsondan commented Sep 29, 2022

jmsanders left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibsondan commented Sep 29, 2022

gibsondan commented Oct 11, 2022 via email

gibsondan commented Sep 13, 2022 •

edited

vercel bot commented Sep 13, 2022 •

edited

gibsondan Sep 26, 2022 •

edited