Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8165]Change default docker images name #9487

Merged
merged 2 commits into from
Sep 11, 2019

Conversation

Hannah-Jiang
Copy link
Contributor

@Hannah-Jiang Hannah-Jiang commented Sep 5, 2019

As we decided to add _sdk to image name, this PR is changing default image name for Python, Java and Go images.

Publishing docker images as part of release is tackled with following three PRs.

  1. Change default image name. (current PR, [BEAM-8165]Change default docker images name #9487)
  2. Add staging and publishing scripts for docker images to release procedure. ([BEAM-8105] Docker images release scripts #9506 )
  3. Updating release_guide.md ([BEAM-8105] update release guide with docker images #9510)

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@Hannah-Jiang
Copy link
Contributor Author

Run Python Dataflow ValidatesContainer

@Hannah-Jiang Hannah-Jiang changed the title [WIP] Add docker images to release [WIP] Change default docker images name Sep 5, 2019
@Hannah-Jiang Hannah-Jiang changed the title [WIP] Change default docker images name [WIP] [BEAM-6185]Change default docker images name Sep 6, 2019
@Hannah-Jiang
Copy link
Contributor Author

Run Python Dataflow ValidatesContainer

@Hannah-Jiang
Copy link
Contributor Author

Run Python PreCommit

@Hannah-Jiang
Copy link
Contributor Author

R: @aaltay
Can you please review or find someone else to review it? Thanks.

@Hannah-Jiang Hannah-Jiang changed the title [WIP] [BEAM-6185]Change default docker images name [BEAM-6185]Change default docker images name Sep 6, 2019
Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one open comment.

'{version_suffix}:latest'.format(
user=os.environ['USER'],
version_suffix=version_suffix))
image = ('apachebeam/python{version_suffix}_sdk:latest'.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to give people a way to run with their locally built containers for testing purposes, or for using custom containers.

We could reuse worker_harness_container_image flag, introduce a new flag. Or perhaps something else to do this in an easy way.

@tweise might have suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this. When working with master, wouldn't I expect the locally built image to be used? What does it mean to refer to hub for something that wasn't released.

Perhaps do that depending on whether current version is dev/snapshot or a release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thanks for the ping @aaltay !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build on Thomas's idea, we could do something similar to what dataflowrunner does today (

def _get_required_container_version(use_fnapi):
)

  • If this is a released version on release branch (i.e. no 'dev' in version') default to image in hub. (release manager will have to build this container as part of the release anyway.)
  • If it is a dev version --> this is a question. Dataflowrunner's solution is to have a fixed image that is occasionally updated as needed or if provided a flag use that as an override. (Here we might just simplify and require a locally built version. This occasionally built image is convenient but is error-prone and a hassle.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaltay and @tweise for comments.

--environment_config option can be used to pass customized images when use Portable runner. The default image is used when no docker image is specified.

My understanding is, from users perspective, they don't need to worry about creating docker images at local when they write pipelines, so it's ok to pull it from remote. From developers perspective, developers who are working on sdks are expected to know how portable runner is working, so as how docker is used. If they want to test their developing sdks, they should build an image at local and use it. The image name can be same as the default one, then the local image is used instead of pulling it from remote, or can pass it with --environment_config option if it has different name.

I can add more comments here to explain how to pass customized images.

What scenarios am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the Beam version is not a development version (a release), then by default use the image corresponding to that release. Not latest, because that could be anything and may not be compatible with the release version? If the Python Beam version is 2.16, then it should not pick up something that could be 2.17, for example.

If the version of Beam is a development version, continue to use the locally built image as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if a locally built container would automatically be picked up, then the distinction won't be necessary. Is that already the case with these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, it’s possible that Beam version is not up to date.
It will check local first, if the image is not available, them pull it from remote.

@Hannah-Jiang Hannah-Jiang changed the title [BEAM-6185]Change default docker images name [BEAM-8165]Change default docker images name Sep 9, 2019
@Hannah-Jiang Hannah-Jiang force-pushed the add_containers_to_release branch 9 times, most recently from e1f9f7a to e86f1bf Compare September 10, 2019 16:13
@Hannah-Jiang
Copy link
Contributor Author

Run Python Dataflow ValidatesContainer

@Hannah-Jiang
Copy link
Contributor Author

@aaltay , I fixed to use version as tag when pull default image. For quick workaround, I hardcoded version for docker tasks. Created a ticket for later improvement. https://issues.apache.org/jira/browse/BEAM-8192

@tweise
Copy link
Contributor

tweise commented Sep 10, 2019

The version is available as property project.version

@@ -51,8 +52,13 @@ task copyLauncherDependencies(type: Copy) {
}
}

def sdk_version = '2.16.0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not a good idea. The reason is:

@yifanzou could you help with this?
/cc @markflyhigh as the release manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to retain the distinction between -SNAPSHOT/.dev version and release version in the container tag. It can lead to confusing results when a locally built container from snapshot source overrides the image of a release.

Copy link
Contributor

@yifanzou yifanzou Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on not adding another version in this gradle file. Like Ahmet mentioned, we define the version number in the version.py, and it would be a problem to make the version# in different places being constant. Gradle script uses groovy, which has power to interact with files, https://discuss.gradle.org/t/read-project-version-number-dynamically-from-a-file/22607. Hopefully this helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing and comments.
I wanted to make sdk_version consistent with the one from version.py, forgot to change it back after testing. Thanks for catching it.

After evaluating many options, I think introducing a new variable sdk_version to gradle.properties is the easiest way for now. This version will be consistent to the one from version.py, so we need to update it when we release it. It would be better to read it from version.py and assign to sdk_version, however, as far as I know, gradle.properties is not supposed to read variables from another file. If it's possible, I am happy to try.

I made a commit with this change, can you please review and make comments?

Thanks,
Hannah

@aaltay
Copy link
Member

aaltay commented Sep 10, 2019

project.version does not match Python version in master branch. It will be 2.16.0-SNAPSHOT vs 2.16.0.dev. I am guessing we can still use it and with a simple string raplace change '-SNAPSHOT' to '.dev' for python.

@Hannah-Jiang
Copy link
Contributor Author

The version is available as property project.version

I investigated it, this would be the best way to go with, but this version is always {version}-SNAPSHOT, both for released and developing version. How do we know which tag we should use between {version}, {version}.dev and {version}-SNAPSHOT?

@tweise
Copy link
Contributor

tweise commented Sep 10, 2019

It is indeed very odd that the version is set to -SNAPSHOT in release tags: https://github.com/apache/beam/blob/v2.15.0/gradle.properties#L26

@tweise
Copy link
Contributor

tweise commented Sep 10, 2019

Versions changes are applied partially: c071613

@Hannah-Jiang
Copy link
Contributor Author

project.version does not match Python version in master branch. It will be 2.16.0-SNAPSHOT vs 2.16.0.dev. I am guessing we can still use it and with a simple string raplace change '-SNAPSHOT' to '.dev' for python.

I also investigated this option during weekend and there is a scenarios we cannot support with string replacement.
For released versions, it would be {version}-SNAPSHOT at gradle.properties, but {version} at version.py. So if a user build an image from released version, it would be tagged with {version}.dev, however, an image with {version} tag will be used as a default image, which would make difficult to investigate for users who are not familiar with how tagging works with Beam.
This is the same issue as I mentioned previously, given {version}-SNAPSHOT, we cannot tell if it's developing version or released version. I also tried to change {version}-SNAPSHOT to {version} for released versions, and {version}.dev for developing versions, but 2.16 cutoff date is so close and the impact with the change would be wide, so I decided to introduce a new variable for now and make the change later.

We have the version defined at several different locations and suffixes are not consistent, which should be improved. I changed scope of the Jira ticket to include this work.

@Hannah-Jiang Hannah-Jiang force-pushed the add_containers_to_release branch 2 times, most recently from cb9fb90 to 273b4c7 Compare September 10, 2019 20:53
@aaltay
Copy link
Member

aaltay commented Sep 11, 2019

OK. I am fine with this change. I will only call 'sdk_version', 'python_sdk_version', since that is only applicable to python. And let's plan to clean it after 2.16. - LGTM.

@tweise - What do you think?

@Hannah-Jiang
Copy link
Contributor Author

Hannah-Jiang commented Sep 11, 2019

OK. I am fine with this change. I will only call 'sdk_version', 'python_sdk_version', since that is only applicable to python. And let's plan to clean it after 2.16. - LGTM.

Thanks, I addressed your comment.

@tweise
Copy link
Contributor

tweise commented Sep 11, 2019

I think it is reasonable to go with this for 2.16 and work on a long term solution in master as follow-up.

@Hannah-Jiang
Copy link
Contributor Author

Run Java PreCommit

@Hannah-Jiang
Copy link
Contributor Author

Run CommunityMetrics PreCommit

@Hannah-Jiang
Copy link
Contributor Author

Run Python Dataflow ValidatesContainer

@Hannah-Jiang
Copy link
Contributor Author

Can we merge it?

@aaltay aaltay merged commit 8790822 into apache:master Sep 11, 2019
@Hannah-Jiang
Copy link
Contributor Author

Thank you @aaltay .

@ibzib
Copy link
Contributor

ibzib commented Sep 18, 2019

@Hannah-Jiang Could you update https://github.com/apache/beam/blob/master/sdks/CONTAINERS.md to reflect these changes?

@Hannah-Jiang
Copy link
Contributor Author

Hannah-Jiang commented Sep 18, 2019

@Hannah-Jiang Could you update https://github.com/apache/beam/blob/master/sdks/CONTAINERS.md to reflect these changes?

We are already working on it, a tech writer is on it and it will be released as part of 2.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants