Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12442] Make sure that row renames work properly in nested rows. #14960

Merged
merged 2 commits into from
Jun 10, 2021

Conversation

reuvenlax
Copy link
Contributor

No description provided.

@reuvenlax
Copy link
Contributor Author

R: @mouyang

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer if we could just test this through the public interface rather than exposing internals, but this LG overall

Schema expectedSchema =
Schema.builder()
.addStringField("top1")
.addRowField("top_nested", expectedNestedSchema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional super nitty nit: I think this would be easier to grok if the nested schemas were defined inline.

}
}))
.setRowSchema(outputSchema);
}
}

// TODO(reuvenlax): For better performance, we should reuse functionality in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a jira for this

default:
return inputType;
}
}

// Apply the user-specified renames to the input schema.
private static Schema renameSchema(Schema inputSchema, Collection<RenamePair> renames) {
@VisibleForTesting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just test through the public interface, rather than exposing renameSchema and renameRows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how. The problem with the public interface is that it uses the DirectRunner, and the characteristic of this bug is that the DirectRunner tends to obscure the bug by serializing/deserializing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah. Shoot.

@mouyang
Copy link
Contributor

mouyang commented Jun 10, 2021

R: @mouyang

@reuvenlax LGTM. I have two general knowledge questions.

  1. I didn't get a chance to try out your user mailing list suggestion of reshuffle, but will this PR mean that I won't have to do a reshuffle anymore? It seems like it but I'm not 100% sure.
  2. Traversing the Schema and Row is preferred over reshuffling, correct?

@reuvenlax
Copy link
Contributor Author

@mouyang correct on both counts

@reuvenlax reuvenlax merged commit 9afaed7 into apache:master Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants