Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 'disk' parameter to @resources decorator #1500

Merged

Conversation

jaypond
Copy link
Contributor

@jaypond jaypond commented Aug 11, 2023

We would like to use the @resources decorator instead of @kubernetes to specify resource requirements for our job so we can run flows locally or on Kubernetes by using --with kubernetes instead of modifying the code. The @resources decorator only supports cpu, gpu, memory and shared_memory at the moment.

This PR just adds the disk parameter. Default value (10240mb) is the same as the one in @kubernetes.

@saikonen saikonen added the in review Currently under review label Aug 21, 2023
metaflow/plugins/resources_decorator.py Show resolved Hide resolved
"cpu": "1",
"gpu": "0",
"memory": "4096",
"shared_memory": None,
Copy link
Collaborator

@saikonen saikonen Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1504 opened for this, the shared_memory is not yet supported for Kubernetes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://outerbounds-community.slack.com/archives/C020U025QJK/p1691751137791999 earlier discussions with context on why disk has been missing.

@saikonen
Copy link
Collaborator

Added some context and an addition to the docstring. I'll give this a more thorough spin tomorrow morning :)

saikonen
saikonen previously approved these changes Aug 29, 2023
@saikonen saikonen added testable and removed in review Currently under review labels Aug 29, 2023
@nflx-mf-bot
Copy link
Collaborator

Testing[479] @ c5a7817

@nflx-mf-bot
Copy link
Collaborator

Testing[479] @ c5a7817 had 1 FAILURE.

@saikonen
Copy link
Collaborator

As one last discussion point, we do need to consider the impact of this change as it is a breaking one in the following scenario due to how attribute overriding works between the resources and kubernetes decorators. Consider someone who has an existing flow as such:

from metaflow import step, FlowSpec, resources, kubernetes


class HelloFlow(FlowSpec):
    @resources
    @kubernetes(disk=4096)
    @step
    def start(self):
        print("Starting 👋")
        self.next(self.end)

    @step
    def end(self):
        print("Done! 🏁")


if __name__ == "__main__":
    HelloFlow()

prior to this change they would allocate only 4096 of disk, after this PR is introduced the default 10240 from resources will override this value due to being larger.

@saikonen
Copy link
Collaborator

saikonen commented Aug 30, 2023

An easy solution for the overriding behavior would be to instead have

"disk": None,

as a default for the resources decorator. If this is acceptable, also update the docstring to mark the attrib as optional.
Was there a specific reason to use the kubernetes default also for the resources decorator?

@jaypond
Copy link
Contributor Author

jaypond commented Aug 30, 2023

Good catch. Just tested and pushed the update you recommended.

No particular reason for using the same default disk value for the resources decorator. I have no issues having disk as none by default.

@nflx-mf-bot
Copy link
Collaborator

Testing[479] @ 4d30041

saikonen
saikonen previously approved these changes Aug 31, 2023
@nflx-mf-bot
Copy link
Collaborator

Testing[479] @ 4d30041 had 2 FAILUREs.

@saikonen saikonen dismissed their stale review September 6, 2023 10:43

found one issue with more thorough testing

@saikonen
Copy link
Collaborator

saikonen commented Sep 6, 2023

Encountered one last issue with this feature while testing on other platforms. Specifically with running on batch while using the resource deco, it now fails with an error of

Decorator 'batch' does not support the attribute 'disk'. These attributes are supported: cpu, gpu, memory, image, queue, iam_role, execution_role, shared_memory, max_swap, swappiness, inferentia, host_volumes, use_tmpfs, tmpfs_tempdir, tmpfs_size, tmpfs_path.

This seems to be due to the way we inject attributes for the batch deco via aws_utils.py#compute_resource_attributes() which does not have a distinction between supported or unsupported attributes.

Same issue affects step-functions as well

@jaypond
Copy link
Contributor Author

jaypond commented Sep 13, 2023

I found that the logic for checking decorator attributes is in the base class of the decorator decorator.py. Would it be feasible if we just remove attributes which aren't supported by a decorator instead of throwing an error? Maybe throw a warning instead?

Something like this:

class Decorator(object):
    """
    Base class for all decorators.
    """

    name = "NONAME"
    defaults = {}
    # `allow_multiple` allows setting many decorators of the same type to a step/flow.
    allow_multiple = False

    def __init__(self, attributes=None, statically_defined=False):
        self.attributes = self.defaults.copy()
        self.statically_defined = statically_defined

        if attributes:
            for k, v in attributes.items():
                if k in self.defaults:
                    self.attributes[k] = v
                else:
                    #raise InvalidDecoratorAttribute(self.name, k, self.defaults)
                    del self.attributes[k]

Copy link
Collaborator

@saikonen saikonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good to go with the merged fixes to batch/sfn resource decorator handling.

@saikonen saikonen merged commit 15cd63f into Netflix:master Oct 26, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants