improve exception message for mismatched types when merging CapabilitiesBasedFormat#17760
Conversation
gianm
left a comment
There was a problem hiding this comment.
error message change LGTM, but I'm not sure about the logic change.
| merged.setHasMultipleValues(merged.hasMultipleValues().or(otherSnapshot.hasMultipleValues()).isTrue()); | ||
| merged.setDictionaryValuesSorted( | ||
| merged.areDictionaryValuesSorted().and(otherSnapshot.areDictionaryValuesSorted()).isTrue() | ||
| merged.areDictionaryValuesSorted().or(otherSnapshot.areDictionaryValuesSorted()).isTrue() |
There was a problem hiding this comment.
Doesn't the correct choice depend on how the actual segment merging is expected to happen? Like:
- if the merging preserves sortedness as long as both inputs are sorted, it should be
and - if the merging sorts if either input is sorted, it should be
or - if the merging always sorts regardless of input sortedness, it should just be hard coded to
TRUE - if the merging requires that both inputs have the same sortedness, it should be an error if the two inputs have different sortedness
this change should be in a separate PR anyway, since it's a behavioral change beyond improving the exception message.
There was a problem hiding this comment.
I think what I really need to do is update the javadoc on CapabilitiesBasedFormat to indicate that it's purpose is to capture the 'legacy' column formats that previously relied on merging ColumnCapabilities to work correctly, which in this case the change in this PR doesn't matter because it doesn't affect anything because these 2 properties are not used at all for decisions persisting stuff since they are just computed when the segment is loaded.
if the merging always sorts regardless of input sortedness, it should just be hard coded to TRUE
This is probably the most correct interpretation of the existing behavior for the things that rely on this ColumnFormat though it would be kind of odd to hard code to true because these settings are only applicable to STRING typed columns, which is why they are all using 'OR' logic instead of 'AND'.
Everything going forward should just implement ColumnFormat directly to capture what makes sense for the columns physical storage details, which would be the appropriate place to decide how to merge differing physical formats instead of using this capabilities based fallback thingy, so I should add javadoc to make this clear that CapabilitiesBasedFormat is not a "good" implementation of ColumnFormat.
There was a problem hiding this comment.
To add, your comment highlights why ColumnFormat exists in the first place, which is because simply merging ColumnCapabilities in a generic/universal manner isn't really powerful enough to model these things in a satisfying way.
There was a problem hiding this comment.
TY for the commentary, that makes sense.
Improves exception message when merging
CapabilitiesBasedFormatinIndexMergerV9when faced with mismatched complex types. Prior to this change merging a complex type with a string type for example would create an exception message with a message likeCannot merge columns of type[someComplex] and [null]since it was using the complex type name of both types, which will be null for non-complex types. After this change this exception message for complex types will be something likeCannot merge columns of type[COMPLEX<someComplex>] and [STRING]since it will now use the type strings.I also modified a merge behavior of
ColumnCapabilities.areDictionaryValuesSortedandColumnCapabilities.areDictionaryValuesUniqueinCapabilitiesBasedFormat.mergethat probably doesn't really matter since these are computed for query time stuff and they do not actually affect segment persistence but seemed odd behavior considering everything else in the merge, so now it is consistent and uses 'or' instead of 'and'.