-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support building and pushing container images shared within a dataset folder #27
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
adlersantos
added
revision: readme
Improvements or additions to the README
feature request
New feature or request
labels
May 17, 2021
adlersantos
changed the title
feat: Supports building and pushing container images
feat: Support building and pushing container images per dataset
May 17, 2021
adlersantos
changed the title
feat: Support building and pushing container images per dataset
feat: Support building and pushing container images shared within a dataset folder
May 17, 2021
leahecole
requested changes
May 17, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions
- How will we be expecting users to represent the requirements for the scripts that will be in their images? (requirements file? Pipfile?) Are we going to be prescriptive, or is that out of scope? Either way, we may want to mention it and show an example of our preferred way, because their Dockerfile will need to copy in those requirements and do the installation path as welll
leahecole
approved these changes
May 18, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Based out of a discussion on #5.
We need to support operators such as
KubernetesPodOperator
and other GKE operators that allows tasks to be run in specific clusters to isolate resource usage from other data pipelines.As a prerequisite for such tasks, Docker images need to be defined, built, and pushed to GCR. This PR adds a workflow for building and pushing images.
Create an
_images
folder under your dataset folder if it doesn't exist.Inside the
_images
folder, create another folder and name it after what the image is expected to do, e.g.process_shapefiles
,read_cdf_metadata
.In that subfolder, create a Dockerfile and any scripts you need to process the data. Use the COPY command in your
Dockerfile
to include your scripts in the image.The resulting file tree for a dataset that uses two container images may look like
Now, running
scripts/generate_dag.py
will build and push the containers automatically.Here's a screenshot of a successful run on the Airflow webserver (Cloud Composer):
Checklist
README
accordingly.