Skip to content

Conversation

@CraigChambersG
Copy link
Contributor

@CraigChambersG CraigChambersG commented Nov 19, 2018

R: @robertwb

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@lukecwik
Copy link
Member

Run Python PostCommit

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks fine to me.

@angoenka
Copy link
Contributor

@CraigChambersG Can you please rebase the PR so that we can run the test on it?

@CraigChambersG
Copy link
Contributor Author

@angoenka How do I do that? Can you give me specific git command(s) to run? Thanks.

@robertwb
Copy link
Contributor

Specific command you can run are

git fetch upstream          # Assuming you followed https://cwiki.apache.org/confluence/display/BEAM/Git+Tips
git rebase upstream/master  # at any time you can git rebase --abort
git push -f

@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

@CraigChambersG
Copy link
Contributor Author

how can i figure out why the postcommit test failed? i think it's a build failure of something, but i don't know what.

@kennknowles
Copy link
Member

The most informative UI for that is to click "details" in the row that failed and then click "Gradle Build Scan". Here it is: https://scans.gradle.com/s/fgs32qpduwaug. It appears that the Python postcommit at https://scans.gradle.com/s/fgs32qpduwaug/console-log?task=:beam-sdks-python:postCommitIT runs in a way that is not reported/parsed as a collection of test methods, so you just have to scrape the logs.

@kennknowles
Copy link
Member

kennknowles commented Nov 21, 2018

test_bigquery_tornadoes_it (apache_beam.examples.cookbook.bigquery_tornadoes_it_test.BigqueryTornadoesIT) ... FAIL

Expected checksum is 83789a7c1bca7959dcf23d3bc37e9204e594330f Actual checksum is d860e636050c559a16a791aff40d6ad809d4daf0

@robertwb
Copy link
Contributor

Python SDK PostCommit Tests

@robertwb
Copy link
Contributor

Run Python PostCommit

@CraigChambersG
Copy link
Contributor Author

Thanks. It's hard to see how my change would affect just that one BigQuery integration test. Maybe it's flaky, or sick for some other reason? But Robert looks to have rerun the python postcommit tests, and got a failure, so maybe there's something real here. But I'll try running the tests again.

@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

…ther the FnAPI is being used, to avoid changing earlier behaviors
@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

# propagated everywhere. Returning an 'Any' as type hint will trigger
# usage of the fallback coder (i.e., cPickler).
element_type = typehints.Any
use_fnapi = False # TODO(chambers): XXX do the right thing for this
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate. Is there something better I can/should do here?

In general, passing around the use_fnapi flag is yucky. I'd much rather have the pipeline or the pipeline options be available in an instance variable. Is there a way to do that? Or a reason not to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make options into an instance variable than that cuts off the option for runners to run multiple pipelines with different options. Unconditionally setting it to false here seems the wrong thing to do though; do we have any idea why this is needed (other than that the test fails otherwise?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is just a placeholder. I don't know how to get access to the pipeline options otherwise. If you tell me how, I'll fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand why setting the pipeline_proto_coder_id attribute unconditionally breaks things. If that's not workable, I'd rather name this something other than use_fnapi if we don't need the coder id in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One post-commit test failed, on a checksum comparison. I don't have any deeper understanding of what the test is doing or why there was a failure. I have had other experiences where tests were (brittlely) checking for equivalence against some expected representation which can be adversely affected by adding an otherwise unused property to CloudObjects.

To be clear, we do need the coder id in this case, at least when we support multi-output DoFns over the fnapi using the worker code that reads this property. We're not running such tests now. I need advice on how to get a hold of the pipeline object in this branch in order to put in the proper code. Also, the TODO in this branch suggests that the branch may be going away, so maybe it doesn't need to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping? What should I do to make progress on this PR? I'm OK submitting something that (a) doesn't break anything that already exists, and (b) makes some incremental progress on runners using Beam portability. I'm also happy to improve this CL, if given guidance on what to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand why this particular test failed, and not others, which may be indicative of other problems, rather than adding a bunch of code to work around it, but at this point we probably shouldn't be blocking on that.

Let's just rename this something like emit_coder_ids and get it in.

(I also hope this code in this whole file live on much longer.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could rename this local variable, but that would be masking the intent. The intent of the local variable is indeed whether the CloudObject is being generated for a backend using the FnAPI. This particular line is "I don't know how to figure out if we're using the FnAPI, so for now just assume we're not, to preserve behavior for non-experimental backends; TODO: figure out how to tell". The comment is intending to capture that. The rest of the code in this function is acting as intended.

The one test that failed, which motivated adding all this use_fnapi stuff, doesn't take this branch, so its failure is unrelated to this line.

(Out of curiosity, what is your wish for how this code becomes obsolete?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as "if there are multiple outputs, don't use the fn api" but I understand your intent now.

Just thinking about this again, another option would be to simply choose any output on which to check for the fn api flag. This shouldn't(?) be called if there are no outputs.

As for the test failure, it feels like we're working around a still-present bug. But as you say, it may just be brittle testing.

As for how this code becomes obsolete, the Dataflow service should just accept FnApi protos directly, rather than have each SDK translate it to cloud objects just to try to get them translated to the right DFE objects on the other end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this code to just pick the first output to get the pipeline options from. PTAL.

@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

@CraigChambersG
Copy link
Contributor Author

ok, it looks like the extra conditionalizing worked to get the one failing IT test to now pass. what's the process for reviewing and merging from here?

@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

1 similar comment
@CraigChambersG
Copy link
Contributor Author

Run Python PostCommit

@robertwb
Copy link
Contributor

robertwb commented Dec 5, 2018

This looks OK to me. (As an aside, I wonder why the transform nodes themselves don't have a reference to the pipeline...) I resolved the merge conflict, and will merge assuming all tests passing.

@CraigChambersG
Copy link
Contributor Author

Is there something I need to do at this point to get this PR checked in?

@robertwb robertwb merged commit 0edc85e into apache:master Dec 6, 2018
@robertwb
Copy link
Contributor

robertwb commented Dec 6, 2018

Tests look good. Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants