Skip to content

[BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config#9730

Merged
mxm merged 2 commits intoapache:masterfrom
violalyu:external_payload_params
Oct 7, 2019
Merged

[BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config#9730
mxm merged 2 commits intoapache:masterfrom
violalyu:external_payload_params

Conversation

@violalyu
Copy link
Contributor

@violalyu violalyu commented Oct 4, 2019

Originally, the environment config for environment type of EXTERNAL only support passing in an url for the external worker pool; We want to support passing in arbitrary KV pairs to sdk worker via external environment config, so that the when starting the sdk harness we could get the values from StartWorkerRequest.params.
Jira issue: https://issues.apache.org/jira/browse/BEAM-8351


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@violalyu
Copy link
Contributor Author

violalyu commented Oct 4, 2019

R: @robertwb
R: @chadrik

@chadrik
Copy link
Contributor

chadrik commented Oct 4, 2019

R: @mxm

elif environment_urn == common_urns.environments.EXTERNAL.urn:
def _looks_like_json(environment_config):
import re
return re.match(r'\{.+\}', environment_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use re.search(). also, I don't think this needs to be private, so remove the leading underscore.

maybe add a comment like: "we don't use json.loads to test validity because we don't want to propagate json syntax errors downstream to the runner"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For the re.search() part, do we want it to be not searching from the start? I was thinking that if it is valid json string it should start and end with '{}', maybe re.match(r'\{.+\}$') or re.search(r'^\{.+\}$')?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also whitespace to consider. So if we're using re.match we have to do re.match(r'\s*\{.*\}\s*$').

We just need a simple heuristic that tells us that it looks json and not a url. The most important thing is that we don't end up with a regex that incorrectly returns False for something that is actually json, so I suggested the re.search option because I think it's sufficient at detecting something that's not a url and thus probably json. That said, I think the re.match regex above looks pretty safe. Obviously, we're only talking about json objects (i.e. dict) and not arrays and other scalars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to reiterate, we want json syntax errors to occur here, at submission time, and not later on, so we want to answer "did the user attempt to pass a json map or a url?" So I was even considering return '{' in config or '}' in config, since curly braces should not be in a url.

I'll defer the final answer on this to the reviewer.

Copy link
Contributor Author

@violalyu violalyu Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chad! Updated for now in 7fed22b!

@violalyu violalyu force-pushed the external_payload_params branch from d239472 to 7fed22b Compare October 4, 2019 18:17
@chadrik
Copy link
Contributor

chadrik commented Oct 4, 2019

11:58:18 ************* Module apache_beam.runners.portability.portable_runner_test
11:58:18 W:328, 0: Bad indentation. Found 8 spaces, expected 6 (bad-indentation)
11:58:18 W:333, 0: Bad indentation. Found 8 spaces, expected 6 (bad-indentation)
11:58:18 C:337, 0: Line too long (82/80) (line-too-long)

@violalyu violalyu force-pushed the external_payload_params branch from 77db088 to a0679ef Compare October 4, 2019 21:33
@chadrik
Copy link
Contributor

chadrik commented Oct 4, 2019

14:36:52 apache_beam/runners/portability/portable_runner_test.py:340:1: E302 expected 2 blank lines, found 1

to run the lint tests locally you do something like:

python -m virtualenv .venv
. .venv/bin/activate
pip install tox
cd sdks/python
tox -e py27-lint

@violalyu violalyu force-pushed the external_payload_params branch 2 times, most recently from e822ff7 to 3a4b335 Compare October 5, 2019 00:01
@chadrik
Copy link
Contributor

chadrik commented Oct 5, 2019

Run Python2_PVR_Flink PreCommit

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @violalyu. Looks good to me. Just some minor comments.


if looks_like_json(portable_options.environment_config):
config = json.loads(portable_options.environment_config)
url = config.pop('url', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url = config.pop('url', None)
url = config.get('url', None)

Why pop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Maximilian! I've updated in this commit: 9578559

self.assertEqual(
PortableRunner._create_environment(PipelineOptions.from_dictionary({
'environment_type': "EXTERNAL",
'environment_config': ' {"url":"localhost:50000", '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be testing the case without any space at the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test for this part in this commit: 916e170

@violalyu violalyu force-pushed the external_payload_params branch from 3a4b335 to 916e170 Compare October 5, 2019 20:18
@mxm mxm merged commit 64e5a83 into apache:master Oct 7, 2019
@mxm
Copy link
Contributor

mxm commented Oct 7, 2019

Thanks!

@mxm
Copy link
Contributor

mxm commented Oct 7, 2019

@violalyu One minor note for future PRs, please also include the JIRA issue in the commit subject, just like in the GitHub issue title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants