Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7210] Fix non-deterministic row access #8474

Merged
merged 1 commit into from May 6, 2019

Conversation

reuvenlax
Copy link
Contributor

No description provided.

@reuvenlax reuvenlax requested a review from mxm May 2, 2019 18:07
Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just wondering whether we should throw an error in case of non-deterministic schema extraction?

@@ -388,7 +388,7 @@ public void testInferredSchemaPipeline() {
new DoFn<Inferred, String>() {
@ProcessElement
public void process(@Element Row row, OutputReceiver<String> r) {
r.output(row.getString(0) + ":" + row.getInt32(1));
r.output(row.getString("stringField") + ":" + row.getInt32("integerField"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically it is not safe to use AutoValue with Schemas. Should we convert the example to use a Pojo instead (like it previously did)? Should we throw an error that schemas do not work deterministically in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is perfectly safe. What's not safe is assuming that the order of the schema fields will match the order of the methods in your class. In fact it's technically not safe for POJOs either, as AFAICT Java makes no guarantees there either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wondered whether we should throw an error when users try to access fields via a non-deterministic position? It seems like this could cause subtle bugs when two fields have the same type.

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @reuvenlax! I still wonder whether we should prevent non-deterministic positional row access but this can be handled independently of this PR.

@mxm mxm merged commit bc1a5e2 into apache:master May 6, 2019
@reuvenlax
Copy link
Contributor Author

reuvenlax commented May 6, 2019 via email

@mxm
Copy link
Contributor

mxm commented May 8, 2019

I see. So on Rows users need to ensure that they either know the structure of the Row or go through the Schema which returns the correct index for a field name. Thanks!

@reuvenlax
Copy link
Contributor Author

reuvenlax commented May 8, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants