New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to run EcsRunLauncher without pulling network config from the current ECS task, and kwargs to customize the task config #9678
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ |
3bbaa62
to
96e869a
Compare
7b1af28
to
ff1ca4b
Compare
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
assert container_definition["image"] == image | ||
assert not container_definition.get("entryPoint") | ||
assert not container_definition.get("dependsOn") | ||
# But other stuff is inherited from the parent task definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is largely copy-pasted from other tests, but isn't this the exact opposite of what we're trying to support here?
Can we maybe reduce the test assertions to just the behavior that diverges from the default?
def test_reuse_task_definition(instance): | ||
image = "image" | ||
secrets = [] | ||
def test_reuse_task_definition(instance, ecs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little unclear to me if this change is meant to:
- force registration of a new task definition revision for the same family
- use a task definition without any pre-populated defaults from the parent task
Which probably points to both some poor decisions with how I initially wrote this and also some naming ambiguities that could be cleaned up in this PR?
task_definition = {} | ||
with suppress(ClientError): | ||
task_definition = self.ecs.describe_task_definition(taskDefinition=family)[ | ||
if self.use_current_task_definition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we instead let people set an optional "TaskDefinition" dict in config - and we merged that dict with our existing defaults? That might help squash "impossible" states where you set use_current_task_definition
to False
but also don't provide everything needed to define a new task definition.
I'm also just having difficulty following what even happens differently if you set this to True
- what do we still pull from the parent's task definition and not from the launcher config that couldn't be represented as a "default" value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm excited to offer something like a task_definition_kwargs field or something that lets you specify arbitrary task definition config so that we don't have to re-implement the whole spec as Dagster config, but I don't think it would be sufficient here. The pieces config that we're offering are a mixture of things on the dagster container (log configuration / sidecars), the task definition (execution_role_arn, task_role_arn), and the task (cluster / security groups / subnets).
Generally I'm trying to follow the playbook/philosphy here (https://elementl.quip.com/9K0bAWhc6t9n/Agent-Configuration-Options) - where we have a relatively simple and blessed way to set the most common pieces of config we think people are likely to want(secrets etc.) and then eventually also an escape hatch where we expose the full raw config for the power users. I think there's value in not needing to send people to the boto spec if they want to do something like 'set env vars' or 'point at an existing cluster' or 'change the IAM role that's used'. But which things are considered 'the most common pieces of config' is very subjective.
for key in expected_keys | ||
if key in metadata.task_definition.keys() | ||
if key in current_task_definition_dict.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmsanders my understanding is that the answer to your question of what else can end up in the task definition if you set use_current_task_definition=True can be found here - it's the set of things that are a) params going into register_task_definition and b) found on the output of describe_task_definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we merge a bunch of stuff onto it based on what's in your launcher config.
I guess my question is how much is even left that we actually inherit vs. override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many random fields here on the taskDefinition arg to that would no longer be copied over (placementConstraints etc.): https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.register_task_definition
Whether they are actually in use by anybody, I do not know
ff1ca4b
to
5b47142
Compare
mostly discussed offline, but did a smallish pass on some naming as well
5b47142
to
ed6b235
Compare
("environment", Optional[List[Dict[str, str]]]), | ||
("execution_role_arn", Optional[str]), | ||
("task_role_arn", Optional[str]), | ||
("sidecars", List[Dict[str, Any]]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tags
should also be included here (and passed along to run_task)
might be better in a seperate pr though
sidecars=sidecars, | ||
) | ||
|
||
def task_definition_dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmsanders @gibsondan instead of managing/wrapping task_defs could we create a base task_def in the cloudformation that users get (and that users can also define) and lean more on overrides? Its a common pattern to define task defs in terraform/cloudformation and it kinda feels like we're creeping into IaC with this.
Also the api ratelimits for the describe/set task-def api's aren't that high by default and not checking/setting this stuff at runtime would help that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original motivations for the way we did things still hold true. Namely that we can't override the image and we want to enable the simple case where a user doesn't need to set up their own infrastructure (and can instead just use docker compose to launch):
https://dagster.phacility.com/D8404
https://dagster.phacility.com/D8486
But the more I've talked with Daniel, the more I think we were overzealous in how much information we copy forward from the original task definition.
Actually maybe I’m on board, let me run down that option a bit more
…On Thu, Sep 29, 2022 at 7:36 AM Daniel Gibson ***@***.***> wrote:
Joe can the image be overridden? If not I think that’s the big blocker
here - although maybe the gRPC servers could expose a task definition to
use rather than an image to use
I’m not sure I want to block this change on that though since it’s fairly
substantial and there are some real wins here (we’re not doing any more
task definition registration after this PR than we were before)
On Thu, Sep 29, 2022 at 7:31 AM Joe Van Drunen ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py
> <#9678 (comment)>:
>
> > - container_definition: typing.Dict[str, typing.Any]
> - assign_public_ip: bool
> +class DagsterEcsTaskDefinitionConfig(
> + NamedTuple(
> + "_DagsterEcsTaskDefinitionConfig",
> + [
> + ("family", str),
> + ("image", str),
> + ("container_name", str),
> + ("command", Optional[str]),
> + ("log_configuration", Optional[Dict[str, Any]]),
> + ("secrets", Optional[List[Dict[str, str]]]),
> + ("environment", Optional[List[Dict[str, str]]]),
> + ("execution_role_arn", Optional[str]),
> + ("task_role_arn", Optional[str]),
> + ("sidecars", List[Dict[str, Any]]),
>
> tags should also be included here (and passed along to run_task)
>
> might be better in a seperate pr though
> ------------------------------
>
> In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py
> <#9678 (comment)>:
>
> > + ]
> +
> + return DagsterEcsTaskDefinitionConfig(
> + family=task_definition_dict["family"],
> + image=container_definition["image"],
> + container_name=container_name,
> + command=container_definition.get("command"),
> + log_configuration=task_definition_dict.get("logConfiguration"),
> + secrets=container_definition.get("secrets"),
> + environment=container_definition.get("environment"),
> + execution_role_arn=container_definition.get("executionRoleArn"),
> + task_role_arn=container_definition.get("taskRoleArn"),
> + sidecars=sidecars,
> + )
> +
> + def task_definition_dict(self):
>
> @jmsanders <https://github.com/jmsanders> @gibsondan
> <https://github.com/gibsondan> instead of managing/wrapping task_defs
> could we create a base task_def in the cloudformation that users get (and
> that users can also define) and lean more on overrides? Its a common
> pattern to define task defs in terraform/cloudformation and it kinda feels
> like we're creeping into IaC with this.
>
> Also the api ratelimits for the describe/set task-def api's aren't that
> high by default and not checking/setting this stuff at runtime would help
> that.
>
> —
> Reply to this email directly, view it on GitHub
> <#9678 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACAPJCYJMMSHMFSBIYWZ6DTWAWD23ANCNFSM6AAAAAAQL3I2YU>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
) | ||
check.invariant( | ||
not self.include_sidecars, | ||
"can only set include_sidecars if use_current_task_definition is True", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Because if you're providing your own task definition, presumably you've already defined the sidecars inside it?
) | ||
check.invariant( | ||
self.execution_role_arn, | ||
"Must set execution_role_arn if not pulling from current task definition", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Because if you're providing your own task definition, presumably you've already defined the execution role arn inside it?
is_required=False, | ||
default_value=True, | ||
description=( | ||
"Whether to use our current task definition to initialize the task definition " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include an example of why you'd chose not to use the default behavior?
image=image, | ||
container_name=self.container_name, | ||
command=None, | ||
log_configuration={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still painting ourselves into a corner here by introducing another opinionated task definition? What happens when we want to support other logDriver configs?
sidecars=sidecars, | ||
) | ||
|
||
def task_definition_dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original motivations for the way we did things still hold true. Namely that we can't override the image and we want to enable the simple case where a user doesn't need to set up their own infrastructure (and can instead just use docker compose to launch):
https://dagster.phacility.com/D8404
https://dagster.phacility.com/D8486
But the more I've talked with Daniel, the more I think we were overzealous in how much information we copy forward from the original task definition.
ed6b235
to
1d7ab33
Compare
"executionRoleArn" | ||
) and task_definition.get("taskRoleArn") == metadata.task_definition.get("taskRoleArn"): | ||
task_definitions_match = True | ||
return existing_task_definition_config == desired_task_definition_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self - verify that this handles deep equality correctly
1d7ab33
to
62821bd
Compare
@jmsanders and @Ramshackle-Jamathon latest rev responds to your feedback about task definitions by changing the plan - instead of having a path where the run launcher constructs a new task def from scratch, the grpc server would specify the task definition arn to use via container_context (likely using its own task definition). That avoids a situation where we have two totally separate task defs to use for user code. Still need to do a pass on the tests to reflect that new version, but curious for overall thoughts on that plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this new direction.
I think we need to sanitize run_task_kwargs
a bit though, don't we?
description="Additional arguments to include while running the task. See " | ||
"https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task " | ||
"for the available parameters. The overrides and taskDefinition arguments will always " | ||
"be set by the tun launcher. If this field is not set, the arguments to run_task " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"be set by the tun launcher. If this field is not set, the arguments to run_task " | |
"be set by the run launcher. If this field is not set, the arguments to run_task " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's make sure to add a test for this statement:
The overrides and taskDefinition arguments will always be set by the tun launcher.
|
||
task_definition = family | ||
|
||
if self.run_task_kwargs != None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🍳 💋
Must nicer if/else logic.
launchType="FARGATE", | ||
overrides=overrides, | ||
**task_config.run_task_kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to munge task_config.run_task_kwargs
to not include taskDefinition
, launchType
, and overrides
? Otherwise, we can end up in a situation where the same kwarg is passed to the function twice.
>>> def foo(**kwargs):
... pass
...
>>> kwargs = {"bar": 1}
>>> foo(**kwargs)
>>> foo(bar=2, **kwargs)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() got multiple values for keyword argument 'bar'
77a64cf
to
7cc3605
Compare
7cc3605
to
8970632
Compare
Well this has been a journey! @jmsanders i think this is ready for perusal again |
8a90370
to
3b1dec8
Compare
…definitino [september 2022]
3b1dec8
to
725b384
Compare
Joe can the image be overridden? If not I think that’s the big blocker here
- although maybe the gRPC servers could expose a task definition to use
rather than an image to use
I’m not sure I want to block this change on that though since it’s fairly
substantial and there are some real wins here (we’re not doing any more
task definition registration after this PR than we were before)
…On Thu, Sep 29, 2022 at 7:31 AM Joe Van Drunen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py
<#9678 (comment)>:
> - container_definition: typing.Dict[str, typing.Any]
- assign_public_ip: bool
+class DagsterEcsTaskDefinitionConfig(
+ NamedTuple(
+ "_DagsterEcsTaskDefinitionConfig",
+ [
+ ("family", str),
+ ("image", str),
+ ("container_name", str),
+ ("command", Optional[str]),
+ ("log_configuration", Optional[Dict[str, Any]]),
+ ("secrets", Optional[List[Dict[str, str]]]),
+ ("environment", Optional[List[Dict[str, str]]]),
+ ("execution_role_arn", Optional[str]),
+ ("task_role_arn", Optional[str]),
+ ("sidecars", List[Dict[str, Any]]),
tags should also be included here (and passed along to run_task)
might be better in a seperate pr though
------------------------------
In python_modules/libraries/dagster-aws/dagster_aws/ecs/tasks.py
<#9678 (comment)>:
> + ]
+
+ return DagsterEcsTaskDefinitionConfig(
+ family=task_definition_dict["family"],
+ image=container_definition["image"],
+ container_name=container_name,
+ command=container_definition.get("command"),
+ log_configuration=task_definition_dict.get("logConfiguration"),
+ secrets=container_definition.get("secrets"),
+ environment=container_definition.get("environment"),
+ execution_role_arn=container_definition.get("executionRoleArn"),
+ task_role_arn=container_definition.get("taskRoleArn"),
+ sidecars=sidecars,
+ )
+
+ def task_definition_dict(self):
@jmsanders <https://github.com/jmsanders> @gibsondan
<https://github.com/gibsondan> instead of managing/wrapping task_defs
could we create a base task_def in the cloudformation that users get (and
that users can also define) and lean more on overrides? Its a common
pattern to define task defs in terraform/cloudformation and it kinda feels
like we're creeping into IaC with this.
Also the api ratelimits for the describe/set task-def api's aren't that
high by default and not checking/setting this stuff at runtime would help
that.
—
Reply to this email directly, view it on GitHub
<#9678 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAPJCYJMMSHMFSBIYWZ6DTWAWD23ANCNFSM6AAAAAAQL3I2YU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Right now the only way to use the EcsRunLauncher involves pulling permissions and other configuration from the task that is launching the run. This creates a problem in situations where you might want system code to have different IAM roles than user code, or even launch runs from outside of ECS. In Cloud, it creates an awkward situation where the grpc server tasks are configured based on fields on the instance, but runs are configured by pulling from the current task.
This PR creates a configuration option that lets you pass in all the configuration you need when launching run, with nothing pulled from the current container. The old version is kept as well.