Skip to content

[BEAM-4461] Add Selected.flattenedSchema#10766

Merged
reuvenlax merged 3 commits intoapache:masterfrom
reuvenlax:make_unnest_great_again
Feb 8, 2020
Merged

[BEAM-4461] Add Selected.flattenedSchema#10766
reuvenlax merged 3 commits intoapache:masterfrom
reuvenlax:make_unnest_great_again

Conversation

@reuvenlax
Copy link
Contributor

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just some minor suggestions

Also: 💯 for your branch naming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to understand what's going on with "selectFields" here and in process. I had no idea that the value on FieldAccess could be an id referencing a member FieldAccessDescriptor until I dug into the code. Let's document that as part of BEAM-9217 as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does BEAM-4457 unblock this? it doesn't mention anything about wildcard descriptors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old comment. I don't know that we've fully decided what to do about BEAM-4457

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names can contain an underscore, right? So in theory we could still end up with a non-unique name here if the schema had two nested fields like:

  • layer1.layer2_layer3 -> "layer1_layer2_layer3"
  • layer1.layer2.layer3 -> "layer1_layer2_layer3"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, though I think we should maybe make underscore a reserved character in fields. We also don't prevent users from putting dots in field names, but all sorts of things will break if they do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added withFieldNameAs to allow the user to rename fields

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keepMostNestedFieldName for consistency? up to you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably indicate it will concat field names with underscores

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bike-shed: Nothing in this file actually makes this a default, seems to be a hold-over from when the function was in Unnest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar bike-shed: This mentions unnesting which isn't a thing after this change, maybe change to flattening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nifty, but maybe made it into this PR by accident? I don't see it used anywhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's no concern with removing Unnest outright (rather than deprecating for a period and pointing to Select.flattenedSchema) since this is all Experimental?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - should be fine I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for this in FieldAccessDescriptorTest, and adding a docstring

@TheNeuralBit TheNeuralBit self-requested a review February 7, 2020 02:08
@reuvenlax
Copy link
Contributor Author

retest this please

2 similar comments
@reuvenlax
Copy link
Contributor Author

retest this please

@reuvenlax
Copy link
Contributor Author

retest this please

@reuvenlax reuvenlax force-pushed the make_unnest_great_again branch from 4618ae8 to dca23a2 Compare February 7, 2020 22:20
@reuvenlax reuvenlax merged commit 2663354 into apache:master Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants