Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Custom Specs for Dask Scheduler and Workers using the DaskKubernetesEnvironment #1534

Merged
merged 10 commits into from
Sep 19, 2019

Conversation

joshmeek
Copy link

@joshmeek joshmeek commented Sep 19, 2019

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

  • adds new tests (if appropriate)
  • updates CHANGELOG.md (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

Note that your PR will not be reviewed unless all three boxes are checked.

What does this PR change?

Closes #1533
Closes #1445

This PR allows for custom scheduler and worker YAML specs. Here's how it works:
A file path is provided to the environment:

        - scheduler_spec_file (str, optional): Path to a scheduler spec YAML file
        - worker_spec_file (str, optional): Path to a worker spec YAML file

Those files are loaded and stored on the environment as _scheduler_spec and _worker_spec
When the environment is serialized only the file paths are stored and the specs remain on the object itself when serialized into byte code
When a flow is executed using cloud the environment is now loaded off of the flow object in the container and the specs are retrieved there (so they are never sent to cloud)
There are some environment variables that users don't have to provide to their specs and are instead placed into the environment for cloud use

    PREFECT__CLOUD__GRAPHQL, PREFECT__CLOUD__AUTH_TOKEN, PREFECT__CONTEXT__FLOW_RUN_ID,
    PREFECT__CONTEXT__NAMESPACE, PREFECT__CONTEXT__IMAGE, PREFECT__CONTEXT__FLOW_FILE_PATH,
    PREFECT__CLOUD__USE_LOCAL_SECRETS, PREFECT__ENGINE__FLOW_RUNNER__DEFAULT_CLASS,
    PREFECT__ENGINE__TASK_RUNNER__DEFAULT_CLASS, PREFECT__ENGINE__EXECUTOR__DEFAULT_CLASS,
    PREFECT__LOGGING__LOG_TO_CLOUD

Why is this PR important?

Allowing for completely customizable scheduler and worker pods for the DaskKubernetesEnvironment is a huge plus because previously only the min/max workers were set.

@joshmeek joshmeek added the enhancement An improvement of an existing feature label Sep 19, 2019
@codecov
Copy link

codecov bot commented Sep 19, 2019

Codecov Report

Merging #1534 into master will decrease coverage by 0.02%.
The diff coverage is 84.48%.

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a cloudpickle serialization test for the situation in which a user provides a custom spec?

Something like:

def test_roundtrip_cloudpickle():
    env = DaskKubernetesEnvironment(...)
    new = cloudpickle.loads(cloudpickle.dumps(env))
    assert isinstance(new, DaskKubernetesEnvironment)
    assert new._scheduler_spec is whatever
    assert new._work_spec is whatever    

something like that?

src/prefect/cli/execute.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fully customizable ability to DaskKubernetesEnvironment How should environments be loaded in Cloud?
2 participants