Skip to content

Address Bug Related to Map fields Containing Multiple Entries in DataTypeTransformer#13265

Closed
aishikbh wants to merge 2 commits into
apache:masterfrom
aishikbh:fixMultiValueBug
Closed

Address Bug Related to Map fields Containing Multiple Entries in DataTypeTransformer#13265
aishikbh wants to merge 2 commits into
apache:masterfrom
aishikbh:fixMultiValueBug

Conversation

@aishikbh
Copy link
Copy Markdown
Contributor

@aishikbh aishikbh commented May 29, 2024

Issue

In standardize method of DataTypeTransformer, we loop through the values if the value itself is an instance of object[] , we process each of the values of object[] and set the isSingleValue factor true assuming every value here is singleValue and again call standardize . We will get an exception in the case where the value field of a map is multivalued.

We will get an exception when 1. The map has multiple key entries and 2. If either values of the key entries are multivalue (List or Object[])

Example Code to Recreate the Issue:

public void testMap() {
    Map<String, List<String>> testMapWithMultiValueList = new HashMap<>();
    testMapWithMultiValueList.put("testKey1", Arrays.asList("testValue1", "testValue2"));
    Object[] values = new Object[]{
        new Object[0], testMapWithMultiValueList
    };
    Object[] expectedValues = new Object[]{"testValue1", "testValue2"};
    assertEqualsNoOrder((Object[]) DataTypeTransformer.standardize(COLUMN, values, false), expectedValues);
  }

Thrown exception :

java.lang.IllegalStateException: Cannot read single-value from Collection: [testValue1, testValue2] for column: testColumn

	at com.google.common.base.Preconditions.checkState(Preconditions.java:838)
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardizeCollection(DataTypeTransformer.java:197)
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:138)
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardizeCollection(DataTypeTransformer.java:181)
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:141)
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:154)
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformerTest.testMap(DataTypeTransformerTest.java:211)

Solution

Detect multivalued entries in the Object[] and modify the isSingleValue field accordingly.

* Fix for a bug where map type is passed as singlevalue when the value
  of the map contains a list or Object[]
@aishikbh aishikbh force-pushed the fixMultiValueBug branch from aa57ce6 to 35f8265 Compare May 30, 2024 05:20
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 30, 2024

Codecov Report

Attention: Patch coverage is 73.07692% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 62.17%. Comparing base (59551e4) to head (c6f28a4).
Report is 519 commits behind head on master.

Files Patch % Lines
...t/local/recordtransformer/DataTypeTransformer.java 73.07% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13265      +/-   ##
============================================
+ Coverage     61.75%   62.17%   +0.42%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2534      +98     
  Lines        133233   139051    +5818     
  Branches      20636    21554     +918     
============================================
+ Hits          82274    86458    +4184     
- Misses        44911    46121    +1210     
- Partials       6048     6472     +424     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.14% <73.07%> (+0.43%) ⬆️
java-21 62.05% <73.07%> (+0.42%) ⬆️
skip-bytebuffers-false 62.16% <73.07%> (+0.42%) ⬆️
skip-bytebuffers-true 62.02% <73.07%> (+34.29%) ⬆️
temurin 62.17% <73.07%> (+0.42%) ⬆️
unittests 62.17% <73.07%> (+0.42%) ⬆️
unittests1 46.69% <0.00%> (-0.20%) ⬇️
unittests2 27.82% <73.07%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aishikbh aishikbh force-pushed the fixMultiValueBug branch from 7eadc1c to 874b395 Compare May 30, 2024 09:08
* Add ways to handle further multivalued lists/maps/collections
@aishikbh aishikbh force-pushed the fixMultiValueBug branch from 874b395 to c6f28a4 Compare May 30, 2024 09:10

// Check if the value itself is multivalued.
if (singleValue instanceof Object[] || singleValue instanceof List) {
standardizedValue = standardize(column, singleValue, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. This can result in nested arrays being generated, and break the following part.

What you can do is to generate all standardized values for every elements, and continue when:

  • There is no Object[]
  • There is only one Object[], no other value (all nulls)

The same logic also applies to collections and map. But before getting into that, I'd invest why are we getting multiple nested arrays

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. On further inspection ComplexTypeConfig should be used to decipher such objects. Closing the PR since by the time the data comes to DataTypeTransformer it shouldn't be in this state (nested array, multi key multivalue map etc)

@aishikbh aishikbh closed this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants