[BEAM-8351] Support passing in arbitrary KV pairs to sdk worker via external environment config#9730
Conversation
|
R: @mxm |
| elif environment_urn == common_urns.environments.EXTERNAL.urn: | ||
| def _looks_like_json(environment_config): | ||
| import re | ||
| return re.match(r'\{.+\}', environment_config) |
There was a problem hiding this comment.
use re.search(). also, I don't think this needs to be private, so remove the leading underscore.
maybe add a comment like: "we don't use json.loads to test validity because we don't want to propagate json syntax errors downstream to the runner"
There was a problem hiding this comment.
Thanks! For the re.search() part, do we want it to be not searching from the start? I was thinking that if it is valid json string it should start and end with '{}', maybe re.match(r'\{.+\}$') or re.search(r'^\{.+\}$')?
There was a problem hiding this comment.
There's also whitespace to consider. So if we're using re.match we have to do re.match(r'\s*\{.*\}\s*$').
We just need a simple heuristic that tells us that it looks json and not a url. The most important thing is that we don't end up with a regex that incorrectly returns False for something that is actually json, so I suggested the re.search option because I think it's sufficient at detecting something that's not a url and thus probably json. That said, I think the re.match regex above looks pretty safe. Obviously, we're only talking about json objects (i.e. dict) and not arrays and other scalars.
There was a problem hiding this comment.
Just to reiterate, we want json syntax errors to occur here, at submission time, and not later on, so we want to answer "did the user attempt to pass a json map or a url?" So I was even considering return '{' in config or '}' in config, since curly braces should not be in a url.
I'll defer the final answer on this to the reviewer.
There was a problem hiding this comment.
Thanks Chad! Updated for now in 7fed22b!
d239472 to
7fed22b
Compare
|
77db088 to
a0679ef
Compare
to run the lint tests locally you do something like: |
e822ff7 to
3a4b335
Compare
|
Run Python2_PVR_Flink PreCommit |
|
|
||
| if looks_like_json(portable_options.environment_config): | ||
| config = json.loads(portable_options.environment_config) | ||
| url = config.pop('url', None) |
There was a problem hiding this comment.
| url = config.pop('url', None) | |
| url = config.get('url', None) |
Why pop?
There was a problem hiding this comment.
Thanks Maximilian! I've updated in this commit: 9578559
| self.assertEqual( | ||
| PortableRunner._create_environment(PipelineOptions.from_dictionary({ | ||
| 'environment_type': "EXTERNAL", | ||
| 'environment_config': ' {"url":"localhost:50000", ' |
There was a problem hiding this comment.
Should we also be testing the case without any space at the beginning?
There was a problem hiding this comment.
I updated the test for this part in this commit: 916e170
3a4b335 to
916e170
Compare
|
Thanks! |
|
@violalyu One minor note for future PRs, please also include the JIRA issue in the commit subject, just like in the GitHub issue title. |
Originally, the environment config for environment type of EXTERNAL only support passing in an url for the external worker pool; We want to support passing in arbitrary KV pairs to sdk worker via external environment config, so that the when starting the sdk harness we could get the values from
StartWorkerRequest.params.Jira issue: https://issues.apache.org/jira/browse/BEAM-8351
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.