Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Wars #956

Merged
merged 22 commits into from
Apr 21, 2019
Merged

Storage Wars #956

merged 22 commits into from
Apr 21, 2019

Conversation

cicdw
Copy link
Member

@cicdw cicdw commented Apr 21, 2019

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

  • adds new tests (if appropriate)
  • updates CHANGELOG.md (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

Note that your PR will not be reviewed unless all three boxes are checked.

What does this PR change?

This PR builds off @joshmeek 's work and beefs up our Storage classes as follows:

  • adds PIN 7 (closes Proposal PIN: Storage & Execution #921 )
  • adds a new Memory storage option for local testing
  • implements a new and expanded Storage interface for storing multiple flows, and accessing flows contained in Storage
  • adds storage submodule to our exposed API documentation

I also removed a few things from the initial implementation for the sake of simplicity / exposing a cleaner interface until we know how we want these things to be used.

Assuming this PR is approved and merged, it would be very easy from here to implement a LocalEnvironment and a DaskClusterEnvironment using this interface. I avoided taking it that far because it's already a large PR.

For the full details of what this PR is trying to achieve, please read the PIN-7 that is included.

@codecov
Copy link

codecov bot commented Apr 21, 2019

Codecov Report

Merging #956 into master will decrease coverage by 0.31%.
The diff coverage is 83.47%.

docs/guide/PINs/PIN-7-Storage-Execution.md Outdated Show resolved Hide resolved
docs/guide/PINs/PIN-7-Storage-Execution.md Show resolved Hide resolved
src/prefect/environments/storage/base.py Show resolved Hide resolved
self.image_tag, # type: ignore
)

def __contains__(self, obj: Any) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also a really good idea!

env_vars=env_vars,
)
)

dockerfile.write(file_contents)

def create_dockerfile_object_from_dockerfile(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention behind removing the entire full dockerfile option? Too many extra required steps i.e. healthcheck, copying flows, etc...?

Copy link
Member Author

@cicdw cicdw Apr 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the intent was to simplify the interface for now, and the only difference between create_dockerfile_object_from_dockerfile and the current method is that the current method's docker file is:

FROM {base_image}

# DO pip stuff

so I figured when we need this feature we can easily generalize our current method (and avoid having two different methods).

src/prefect/environments/storage/memory.py Show resolved Hide resolved
src/prefect/environments/storage/memory.py Show resolved Hide resolved
src/prefect/environments/storage/memory.py Show resolved Hide resolved
@joshmeek
Copy link

This is awesome!

Copy link
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great!

One suggestion is perhaps to change the signature of get_flow_location(flow: Flow) to get_flow_location(flow_name: str). The reason is that in almost all circumstances, get_flow_location will be called from a different process than the one that built the storage, and so the Flow object will be necessarily different (in a "Python ID" sense) than the one that was stored. Therefore a string attribute like its name or some other key will ultimately be the only way to identify it to the Storage. (Memory seems to be an exception to this rule, but would be compatible with using the name as a key instead of the flow itself). Otherwise this will require instantiating a Flow object just to use Storage which is supposed to contain the Flow... and that feels circular.

CHANGELOG.md Outdated Show resolved Hide resolved
src/prefect/environments/storage/base.py Show resolved Hide resolved
Co-Authored-By: cicdw <white.cdw@gmail.com>
@cicdw
Copy link
Member Author

cicdw commented Apr 21, 2019

@jlowin happy to update the signature of that method. Just for the record, the only reason I left it like that was in case new types of storage relied on other permanent attributes of Flows other than name, but I do think it's better to keep it simple for now.

@jlowin
Copy link
Member

jlowin commented Apr 21, 2019

This is such a clean implementation and advances so many of our goals that we thought would take longer, so let's start with the simpler version to get off the ground

@cicdw
Copy link
Member Author

cicdw commented Apr 21, 2019

@jlowin and @joshmeek : I've addressed all of your feedback and updated Docker storage to serialize its flows attribute. That way, knowledge of containment survives serialization of flows / storage objects without any additional work! Additionally, all the agent will have to do when it picks up a flow run is something like:

environment = flow.environment
storage = flow.storage
environment.setup(storage)
environment.execute(storage, storage.flows[flow.name])

@cicdw
Copy link
Member Author

cicdw commented Apr 21, 2019

Actually, writing that makes me wonder whether get_flow_location is even a necessary method at this point, since the flows attribute has been elevated. Thoughts?

jlowin
jlowin previously approved these changes Apr 21, 2019
@cicdw cicdw merged commit 219e410 into master Apr 21, 2019
@cicdw cicdw deleted the local-env branch April 21, 2019 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal PIN: Storage & Execution
3 participants