-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-6067] In Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard CloudObject coders #7081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Run Python PostCommit |
robertwb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks fine to me.
|
@CraigChambersG Can you please rebase the PR so that we can run the test on it? |
|
@angoenka How do I do that? Can you give me specific git command(s) to run? Thanks. |
|
Specific command you can run are |
…n non-Beam-standard CloudObject coders
e873c47 to
9db854d
Compare
|
Run Python PostCommit |
|
how can i figure out why the postcommit test failed? i think it's a build failure of something, but i don't know what. |
|
The most informative UI for that is to click "details" in the row that failed and then click "Gradle Build Scan". Here it is: https://scans.gradle.com/s/fgs32qpduwaug. It appears that the Python postcommit at https://scans.gradle.com/s/fgs32qpduwaug/console-log?task=:beam-sdks-python:postCommitIT runs in a way that is not reported/parsed as a collection of test methods, so you just have to scrape the logs. |
|
|
|
Python SDK PostCommit Tests |
|
Run Python PostCommit |
|
Thanks. It's hard to see how my change would affect just that one BigQuery integration test. Maybe it's flaky, or sick for some other reason? But Robert looks to have rerun the python postcommit tests, and got a failure, so maybe there's something real here. But I'll try running the tests again. |
|
Run Python PostCommit |
…ther the FnAPI is being used, to avoid changing earlier behaviors
|
Run Python PostCommit |
| # propagated everywhere. Returning an 'Any' as type hint will trigger | ||
| # usage of the fallback coder (i.e., cPickler). | ||
| element_type = typehints.Any | ||
| use_fnapi = False # TODO(chambers): XXX do the right thing for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate. Is there something better I can/should do here?
In general, passing around the use_fnapi flag is yucky. I'd much rather have the pipeline or the pipeline options be available in an instance variable. Is there a way to do that? Or a reason not to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make options into an instance variable than that cuts off the option for runners to run multiple pipelines with different options. Unconditionally setting it to false here seems the wrong thing to do though; do we have any idea why this is needed (other than that the test fails otherwise?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This here is just a placeholder. I don't know how to get access to the pipeline options otherwise. If you tell me how, I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand why setting the pipeline_proto_coder_id attribute unconditionally breaks things. If that's not workable, I'd rather name this something other than use_fnapi if we don't need the coder id in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One post-commit test failed, on a checksum comparison. I don't have any deeper understanding of what the test is doing or why there was a failure. I have had other experiences where tests were (brittlely) checking for equivalence against some expected representation which can be adversely affected by adding an otherwise unused property to CloudObjects.
To be clear, we do need the coder id in this case, at least when we support multi-output DoFns over the fnapi using the worker code that reads this property. We're not running such tests now. I need advice on how to get a hold of the pipeline object in this branch in order to put in the proper code. Also, the TODO in this branch suggests that the branch may be going away, so maybe it doesn't need to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping? What should I do to make progress on this PR? I'm OK submitting something that (a) doesn't break anything that already exists, and (b) makes some incremental progress on runners using Beam portability. I'm also happy to improve this CL, if given guidance on what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand why this particular test failed, and not others, which may be indicative of other problems, rather than adding a bunch of code to work around it, but at this point we probably shouldn't be blocking on that.
Let's just rename this something like emit_coder_ids and get it in.
(I also hope this code in this whole file live on much longer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could rename this local variable, but that would be masking the intent. The intent of the local variable is indeed whether the CloudObject is being generated for a backend using the FnAPI. This particular line is "I don't know how to figure out if we're using the FnAPI, so for now just assume we're not, to preserve behavior for non-experimental backends; TODO: figure out how to tell". The comment is intending to capture that. The rest of the code in this function is acting as intended.
The one test that failed, which motivated adding all this use_fnapi stuff, doesn't take this branch, so its failure is unrelated to this line.
(Out of curiosity, what is your wish for how this code becomes obsolete?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read this as "if there are multiple outputs, don't use the fn api" but I understand your intent now.
Just thinking about this again, another option would be to simply choose any output on which to check for the fn api flag. This shouldn't(?) be called if there are no outputs.
As for the test failure, it feels like we're working around a still-present bug. But as you say, it may just be brittle testing.
As for how this code becomes obsolete, the Dataflow service should just accept FnApi protos directly, rather than have each SDK translate it to cloud objects just to try to get them translated to the right DFE objects on the other end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this code to just pick the first output to get the pipeline options from. PTAL.
|
Run Python PostCommit |
|
ok, it looks like the extra conditionalizing worked to get the one failing IT test to now pass. what's the process for reviewing and merging from here? |
|
Run Python PostCommit |
|
Run Python PostCommit |
1 similar comment
|
Run Python PostCommit |
|
This looks OK to me. (As an aside, I wonder why the transform nodes themselves don't have a reference to the pipeline...) I resolved the merge conflict, and will merge assuming all tests passing. |
|
Is there something I need to do at this point to get this PR checked in? |
|
Tests look good. Done. |
R: @robertwb
Post-Commit Tests Status (on master branch)