Skip to content

[BEAM-8273] Expand portability environment documentation#10116

Merged
tweise merged 6 commits intoapache:masterfrom
ibzib:process-env
Dec 13, 2019
Merged

[BEAM-8273] Expand portability environment documentation#10116
tweise merged 6 commits intoapache:masterfrom
ibzib:process-env

Conversation

@ibzib
Copy link

@ibzib ibzib commented Nov 14, 2019

Document the EXTERNAL environment.
Add additional instructions for using the PROCESS environment.

cc @functicons


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@ibzib ibzib requested review from mxm, robertwb and tweise November 14, 2019 21:44
@tweise
Copy link
Contributor

tweise commented Nov 14, 2019

Thanks for taking this up! For external environment here are some notes in case you want to include:

External environment / worker pool can be used like this:

docker run --rm -p=50000:50000 gcr.io/apache-beam-testing/beam/sdks/snapshot/python3.6:20190904 --worker_pool=true

( or docker run --rm -p=50000:50000 localusername-docker-apache.bintray.io/beam/python:latest --worker_pool=true)

Add to pipeline options: --environment_type=EXTERNAL --environment_config=localhost:50000 

If you run the jobserver outside Docker, export BEAM_WORKER_POOL_IN_DOCKER_VM=1

Document the EXTERNAL environment.
Add additional instructions for using the PROCESS environment.
@ibzib
Copy link
Author

ibzib commented Nov 14, 2019

Thanks for the suggestions Thomas. It looks like BEAM_WORKER_POOL_IN_DOCKER_VM is only needed on macOS. Is that correct?

Use Python 3.6 as example because of recent difficulties with 3.7 (BEAM-8651).
@tweise
Copy link
Contributor

tweise commented Nov 14, 2019

Thanks for the suggestions Thomas. It looks like BEAM_WORKER_POOL_IN_DOCKER_VM is only needed on macOS. Is that correct?

It is probably also needed for Windows.

@ibzib
Copy link
Author

ibzib commented Nov 15, 2019

It is probably also needed for Windows.

Good point. I added your suggestions, PTAL.

@mxm
Copy link
Contributor

mxm commented Nov 15, 2019

Thanks for the suggestions Thomas. It looks like BEAM_WORKER_POOL_IN_DOCKER_VM is only needed on macOS. Is that correct?

It is probably also needed for Windows.

Originally, it was a MacOS specific workaround but it looks like it applies to Windows as well: https://docs.docker.com/docker-for-windows/networking/

@tweise
Copy link
Contributor

tweise commented Nov 15, 2019

@ibzib this entire section doesn't belong into the roadmap. I think this is a good opportunity to add a portability page to https://beam.apache.org/documentation/ and start adding info there.

@ibzib
Copy link
Author

ibzib commented Nov 15, 2019

@ibzib this entire section doesn't belong into the roadmap. I think this is a good opportunity to add a portability page to https://beam.apache.org/documentation/ and start adding info there.

There's a page dedicated to "Runtime environments," but currently it only focuses on how to "customize, build, and push Beam SDK container images." I suppose that might be a logical place for this?

I should also probably update the pipeline options source with some of this information.

@tweise
Copy link
Contributor

tweise commented Nov 15, 2019

Yep, good find. "Runtime Environments" should probably be renamed "Containers" or something like that and then this could be added as "Portable Pipeline Environments"

`export BEAM_WORKER_POOL_IN_DOCKER_VM=1`.
- `LOOPBACK`: User code is executed within the same process that submitted the pipeline. This
option is useful for local testing. However, it is not suitable for a production environment,
as it requires a connection between the original Python process and the worker nodes, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't specific to Python.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the reference to Python. Also added a note that while these are Python options, they might apply to other SDKs as well. (Maybe a follow-up could be to fill this in for Java and Go.)

- `LOOPBACK`: User code is executed within the same process that submitted the pipeline. This
option is useful for local testing. However, it is not suitable for a production environment,
as it requires a connection between the original Python process and the worker nodes, and
performs work on the machine the job originated from, *not the worker nodes*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't any workers in this case. I would phrase this as "rather than starting up worker nodes, it calls back to the original process that submitted the job to process the data" or something like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the references to worker nodes.

@mxm
Copy link
Contributor

mxm commented Dec 2, 2019

@ibzib Do you want to merge/address the remaining suggestions?

@ibzib
Copy link
Author

ibzib commented Dec 13, 2019

@tweise I moved this section to its own page.

Copy link
Contributor

@tweise tweise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving the page!

@tweise tweise merged commit bc18874 into apache:master Dec 13, 2019
@tweise
Copy link
Contributor

tweise commented Dec 13, 2019

@ibzib please check if the JIRA should be closed: https://issues.apache.org/jira/browse/BEAM-8273

bumblebee-coming pushed a commit to bumblebee-coming/beam that referenced this pull request Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants