-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8165]Change default docker images name #9487
Conversation
Run Python Dataflow ValidatesContainer |
294f466
to
473377e
Compare
Run Python Dataflow ValidatesContainer |
Run Python PreCommit |
R: @aaltay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo one open comment.
'{version_suffix}:latest'.format( | ||
user=os.environ['USER'], | ||
version_suffix=version_suffix)) | ||
image = ('apachebeam/python{version_suffix}_sdk:latest'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to give people a way to run with their locally built containers for testing purposes, or for using custom containers.
We could reuse worker_harness_container_image flag, introduce a new flag. Or perhaps something else to do this in an easy way.
@tweise might have suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this. When working with master, wouldn't I expect the locally built image to be used? What does it mean to refer to hub for something that wasn't released.
Perhaps do that depending on whether current version is dev/snapshot or a release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thanks for the ping @aaltay !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To build on Thomas's idea, we could do something similar to what dataflowrunner does today (
def _get_required_container_version(use_fnapi): |
- If this is a released version on release branch (i.e. no 'dev' in version') default to image in hub. (release manager will have to build this container as part of the release anyway.)
- If it is a dev version --> this is a question. Dataflowrunner's solution is to have a fixed image that is occasionally updated as needed or if provided a flag use that as an override. (Here we might just simplify and require a locally built version. This occasionally built image is convenient but is error-prone and a hassle.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aaltay and @tweise for comments.
--environment_config
option can be used to pass customized images when use Portable runner. The default image is used when no docker image is specified.
My understanding is, from users perspective, they don't need to worry about creating docker images at local when they write pipelines, so it's ok to pull it from remote. From developers perspective, developers who are working on sdks are expected to know how portable runner is working, so as how docker is used. If they want to test their developing sdks, they should build an image at local and use it. The image name can be same as the default one, then the local image is used instead of pulling it from remote, or can pass it with --environment_config
option if it has different name.
I can add more comments here to explain how to pass customized images.
What scenarios am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the Beam version is not a development version (a release), then by default use the image corresponding to that release. Not latest
, because that could be anything and may not be compatible with the release version? If the Python Beam version is 2.16, then it should not pick up something that could be 2.17, for example.
If the version of Beam is a development version, continue to use the locally built image as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if a locally built container would automatically be picked up, then the distinction won't be necessary. Is that already the case with these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it’s possible that Beam version is not up to date.
It will check local first, if the image is not available, them pull it from remote.
e1f9f7a
to
e86f1bf
Compare
Run Python Dataflow ValidatesContainer |
@aaltay , I fixed to use version as tag when pull default image. For quick workaround, I hardcoded version for docker tasks. Created a ticket for later improvement. https://issues.apache.org/jira/browse/BEAM-8192 |
The version is available as property |
@@ -51,8 +52,13 @@ task copyLauncherDependencies(type: Copy) { | |||
} | |||
} | |||
|
|||
def sdk_version = '2.16.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not a good idea. The reason is:
-
This needs to match the https://github.com/apache/beam/blob/master/sdks/python/apache_beam/version.py - So it needs to be 2.16.0.dev here
-
Release process needs to be updated to change these versions along with other versions. Release process automatically changes the version in the release branch to remove the ".dev" part and changes master branch to the next version.
-
If we have to do this, at least is it possible to put this in a single gradle file?
-
Another idea, we can execute or parse https://github.com/apache/beam/blob/master/sdks/python/apache_beam/version.py and extract version from here. That should be possible with gradle.
@yifanzou could you help with this?
/cc @markflyhigh as the release manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to retain the distinction between -SNAPSHOT/.dev version and release version in the container tag. It can lead to confusing results when a locally built container from snapshot source overrides the image of a release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on not adding another version in this gradle file. Like Ahmet mentioned, we define the version number in the version.py, and it would be a problem to make the version# in different places being constant. Gradle script uses groovy, which has power to interact with files, https://discuss.gradle.org/t/read-project-version-number-dynamically-from-a-file/22607. Hopefully this helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing and comments.
I wanted to make sdk_version
consistent with the one from version.py
, forgot to change it back after testing. Thanks for catching it.
After evaluating many options, I think introducing a new variable sdk_version
to gradle.properties
is the easiest way for now. This version will be consistent to the one from version.py
, so we need to update it when we release it. It would be better to read it from version.py
and assign to sdk_version
, however, as far as I know, gradle.properties
is not supposed to read variables from another file. If it's possible, I am happy to try.
I made a commit with this change, can you please review and make comments?
Thanks,
Hannah
|
I investigated it, this would be the best way to go with, but this version is always |
It is indeed very odd that the version is set to -SNAPSHOT in release tags: https://github.com/apache/beam/blob/v2.15.0/gradle.properties#L26 |
Versions changes are applied partially: c071613 |
e86f1bf
to
482017f
Compare
I also investigated this option during weekend and there is a scenarios we cannot support with string replacement. We have the version defined at several different locations and suffixes are not consistent, which should be improved. I changed scope of the Jira ticket to include this work. |
cb9fb90
to
273b4c7
Compare
OK. I am fine with this change. I will only call 'sdk_version', 'python_sdk_version', since that is only applicable to python. And let's plan to clean it after 2.16. - LGTM. @tweise - What do you think? |
273b4c7
to
53f069d
Compare
Thanks, I addressed your comment. |
I think it is reasonable to go with this for 2.16 and work on a long term solution in master as follow-up. |
Run Java PreCommit |
Run CommunityMetrics PreCommit |
Run Python Dataflow ValidatesContainer |
Can we merge it? |
Thank you @aaltay . |
@Hannah-Jiang Could you update https://github.com/apache/beam/blob/master/sdks/CONTAINERS.md to reflect these changes? |
We are already working on it, a tech writer is on it and it will be released as part of 2.16. |
As we decided to add
_sdk
to image name, this PR is changing default image name for Python, Java and Go images.Publishing docker images as part of release is tackled with following three PRs.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.