-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix indexMerger to respect the includeAllDimensions flag #12428
Conversation
…ormat should set keepNullColumns if useFieldDiscovery is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just nitpicking so approved, make the changes if you'd like, or not 🤷
@@ -1460,4 +1456,33 @@ private static <T extends TimeAndDimsPointer> T reorderRowPointerColumns( | |||
); | |||
} | |||
} | |||
|
|||
private static class DimensionMergerUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit, this doesn't really seem like a 'util', but does sort of remind me of some other thingswe call Inspector
though not sure what exactly would call this, naming is hard.
Also, it seems like only one place creates this, maybe should push dimensionsSpec
into the constructor of this thing and do the logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
In that vein, maybe DimensionsSpecInspector
or DimensionStorageChecker
would be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to DimensionsSpecInspector
, but am not sure what the right name is for this. I think the code structure being not well-organized in IndexMerger
is probably one of the reasons that it's hard to find a good name. Perhaps, we should add a wrapper of DimensionMerger
and DimensionsSpecInspector
that creates ColumnDescriptor
. Or, there could be a even better refactoring idea than that. But I would like to do such refactoring not in this PR but in others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Very minor style suggestions.
FYI, there also seems to be a checkstyle error due to an extra import.
} | ||
|
||
@Test | ||
public void testUseFieldDiscovery_doNotChangeKeepNullColumnsUserSets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we also add a test where JsonPathSpec.useFieldDiscovery = false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one.
|
||
@RunWith(Parameterized.class) | ||
public class HashPartitionMultiPhaseParallelIndexingWithNullColumnTest extends AbstractMultiPhaseParallelIndexingTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We should probably rename this test class now that it is a parameterized test and runs for DimensionRangePartitionsSpec
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks. I forgot to fix it before making this PR. Renamed it now.
"dim2", "val13" | ||
) | ||
); | ||
final List<Map<String, Object>> rows = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this part might be more readable if we add a private method that returns a row given the input values i.e.
private Map<String, Object> createTestRow(String ts, String... dims) {
// return map with "ts" -> ts, "dim1" -> dims[0] and so on.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored as suggested.
{ | ||
private final List<String> data; | ||
|
||
@JsonCreator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these tests need to serialize/deserialize to run the underlying tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractParallelIndexSupervisorTaskTest
provides a semi-integration testing environment. It serializes and deserializes the given task to mimic the actual task processing.
@@ -1460,4 +1456,33 @@ private static <T extends TimeAndDimsPointer> T reorderRowPointerColumns( | |||
); | |||
} | |||
} | |||
|
|||
private static class DimensionMergerUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
In that vein, maybe DimensionsSpecInspector
or DimensionStorageChecker
would be a better name?
@clintropolis @kfaraz thank you for your review! |
Description
includeAllDimensions
was added recently in #12276. WhenuseFieldDiscovery
andincludeAllDimensions
are both set,IndexMerger
should store every column even if it is empty. However, it currently doesn't respect theincludeAllDimensions
flag but only checks if the given column is explicitly specified in the dimensionsSpec. This PR fixes this bug by checking the flag properly.This PR also fixes another bug in
JsonInputFormat
that implicitly filters null fields out even whenuseFieldDiscovery
is set. Instead of filtering them out in an early stage in ingestion,keepNullColumns
will be set automatically ifuseFieldDiscovery
is set, so thatMapInputRowParser
can check the dimensionsSpec to decide whether to keep the null fields or not. This change will not make any user-facing change as 1)keepNullColumns
is undocumented and used only in the sampler and 2) even if there is someone usingkeepNullColumns
, the existing behavior should remain the same. The behavior changes only whenincludeAllDimensions
is set as well asuseFieldDiscovery
.Key changed/added classes in this PR
IndexMergerV9
JsonInputFormat
This PR has: