Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New executor #1434

Merged
merged 10 commits into from
Aug 30, 2019
Merged

New executor #1434

merged 10 commits into from
Aug 30, 2019

Conversation

cicdw
Copy link
Member

@cicdw cicdw commented Aug 30, 2019

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

  • adds new tests (if appropriate)
  • updates CHANGELOG.md (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

Note that your PR will not be reviewed unless all three boxes are checked.

What does this PR change?

This PR implements a new LocalDaskExecutor for running Flows with various dask schedulers. In addition, now that this executor supersedes SynchronousExecutor (and is easier to type), we will be deprecating the SynchronousExecutor.

Why is this PR important?

Closes #1336 and allows for more fine-tuning of how your workflows should be run.

Copy link
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor clarifications!

This enables but does not test a threaded executor, is that correct?

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -71,7 +71,7 @@ checkpointing = false
[engine.executor]

# the default executor, specified using a full path
default_class = "prefect.engine.executors.SynchronousExecutor"
default_class = "prefect.engine.executors.LocalExecutor"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to revert this, but with the known possibility for double execution in deep mapped pipelines, this felt safer. I actually thought we had already done this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine, not asking for reversion was just curious why

with dask.config.set(scheduler=self.scheduler, **self.kwargs) as cfg:
yield cfg

def queue(self, maxsize: int = 0) -> Queue:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important for this PR, but are we using these queues anywhere? Can we kill them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can kill them in this PR, we are not using them anywhere!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although maybe a user is? I'll add this to breaking changes...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Co-Authored-By: Jeremiah Lowin <153965+jlowin@users.noreply.github.com>
@cicdw
Copy link
Member Author

cicdw commented Aug 30, 2019

And yes, @jlowin this doesn't test the threaded executor. I could update all of our executor tests to parametrize over this if you'd like. My reasoning for it being "OK" (but maybe not 100% ideal) is that it's only changing dask's internal scheduling mechanism, and the fully distributed Dask executor tests for threading / process problems that could arise, so I didn't feel too uncomfortable with it.

@codecov
Copy link

codecov bot commented Aug 30, 2019

Codecov Report

Merging #1434 into master will increase coverage by 0.05%.
The diff coverage is 100%.

@cicdw cicdw requested a review from jlowin August 30, 2019 21:41
@cicdw cicdw merged commit 8f8432c into master Aug 30, 2019
@cicdw cicdw deleted the new-exec branch August 30, 2019 21:47
zanieb added a commit that referenced this pull request Mar 17, 2022
Fix issue where gather can fail when a task returns a pandas object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SynchronousExecutor for other dask schedulers
2 participants