Skip to content

fix issue with nested column processing/storage of empty fields#19072

Merged
capistrant merged 7 commits intoapache:masterfrom
clintropolis:fix-json-empty-field-name
Mar 4, 2026
Merged

fix issue with nested column processing/storage of empty fields#19072
capistrant merged 7 commits intoapache:masterfrom
clintropolis:fix-json-empty-field-name

Conversation

@clintropolis
Copy link
Member

Description

changes:

  • fix bug in NestedPathFinder.toNormalizedJsonPath creating incorrect paths when given an empty field name
  • fix NestedPathFinder.parseJsonPath to correctly detect illegal empty paths consisting of consecutive . characters
  • added NestedPathFinder.parseBadJsonPath to read nested column fields dictionaries and detect and fixup illegal path expressions, using a newly added FieldsFixupIndexed to swap the bad values with good values
  • NestedDataColumnSupplier on column read attempts to detect bad paths written by the bugged version of NestedPathFinder.toNormalizedJsonPath
  • added 'pathParserVersion' field to nested column part serde so that newly written nested columns after the bug fix can skip checking for the bug

changes:
* fix bug in `NestedPathFinder.toNormalizedJsonPath` creating incorrect paths when given an empty field name
* fix `NestedPathFinder.parseJsonPath` to correctly detect illegal empty paths consisting of consecutive . characters
* added `NestedPathFinder.parseBadJsonPath` to read nested column fields dictionaries and detect and fixup illegal path expressions, using a newly added `FieldsFixupIndexed` to swap the bad values with good values
* `NestedDataColumnSupplier` on column read attempts to detect bad paths written by the bugged version of `NestedPathFinder.toNormalizedJsonPath`
* added 'pathParserVersion' field to nested column part serde so that newly written nested columns after the bug fix can skip checking for the bug
@clintropolis clintropolis force-pushed the fix-json-empty-field-name branch from 28f89c0 to fb1546c Compare March 3, 2026 01:40
@JsonProperty("bitmapSerdeFactory") BitmapSerdeFactory bitmapSerdeFactory,
@JsonProperty("columnFormatSpec") @Nullable FormatSpec columnFormatSpec
@JsonProperty("columnFormatSpec") @Nullable FormatSpec columnFormatSpec,
@JsonProperty("pathParserVersion") @Nullable Byte pathParserVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a @JsonProperty getter for this. Is it going to make it into the serialized form?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a serde test too



@VisibleForTesting
static Supplier<? extends Indexed<ByteBuffer>> getAndFixFieldsSupplier(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have some javadoc explaining what "fix" means. Having a short description and then linking to a GitHub issue or PR for more details is a useful way to do it.

}

@VisibleForTesting
public static class FieldsFixupIndexed implements Indexed<ByteBuffer>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the static creator method, this should have some javadoc explaining what "fix" means. Having a short description and then linking to a GitHub issue or PR for more details is a useful way to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added javadocs

@Override
public boolean isSorted()
{
return delegate.isSorted();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always going to be true (will the fixed-up keys sort the same way as the bad keys)? Also, does it matter (does anything care if this indexed is sorted)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, good catch, removed this since we can't guarantee it and added note in javadocs about it not mattering

return entry.getIntKey();
}
}
return delegate.indexOf(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return this only if the key is not present in fixup. Otherwise it will return a nonnegative index for the bad keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like if a caller is searching for the bad keys? I don't think that should be possible for most callers since when calling this method we generally run it through NestedPathFinder.toNormalizedJsonPath before searching because usually at the point we call this the path expressions have been converted to List<NestedPathPart> during SQL planning.

Also, i believe it should be harmless for indexOf to report the positions of the bad paths if using the bad path expressions.

Added explanation to javadocs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant if someone is searching for the bad keys. In that case the behavior here doesn't adhere to the Indexed interface. If we're going to deviate from it, that can be fine given it's only used in this specific place, but any intentional deviation should be javadoc'd.

}

/**
* split a JSONPath path into a series of extractors to find things in stuff. This method is mostly a duplicate of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely something can make more sense then "find things in stuff".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated javadocs to be a bit clearer

/**
* split a JSONPath path into a series of extractors to find things in stuff. This method is mostly a duplicate of
* {@link #parseJsonPath(String)} fixing up any bugs encountered from bad paths like '$..a' or '$.[0].a' from
* previously bugged versions of {@link #toNormalizedJsonPath(List)} and should be kept in sync.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be made to share a common parse function that is parameterized with some boolean like allowBadPaths? If not, at least add a comment in parseJsonPath too, so someone modifying that can remember to update this one too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, called parameter allowFixEmptyFieldsBadPaths since it is pretty specific on the bad paths it allows

@capistrant capistrant merged commit 9bb5fed into apache:master Mar 4, 2026
37 checks passed
@clintropolis clintropolis deleted the fix-json-empty-field-name branch March 4, 2026 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants