Add CaseInsensitive JSONExtract variants#83770
Add CaseInsensitive JSONExtract variants#83770antaljanosbenjamin merged 13 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [63d5ec8] Summary: ❌
|
|
Ah, didn't know about the aspell stuff, looks like the function names need excluding. Will fix. |
|
@antaljanosbenjamin, I believe I've fixed all CI issues related to my changes; seems like the failing tests may be failing on other PRs and/or be flaky? I didn't want to just blindly push a commit to retry them. Let me know if there's anything more you'd like me to do. |
|
Thanks, you are right. No reason to push just for that, they might be flaky. I will check your PR today. |
tests/queries/0_stateless/02415_all_new_functions_must_have_version_information.reference
Outdated
Show resolved
Hide resolved
| **Implementation Notes** | ||
|
|
||
| - When multiple keys match with different cases, the first match is returned | ||
| - Case-insensitive matching only applies to object keys, not to array indices or the extracted values |
There was a problem hiding this comment.
What do you mean by "not to array indices or the extracted values"? I don't see how key matching is relevant for those.
There was a problem hiding this comment.
Fair, doesn't really need to be said.
|
|
||
| The following functions perform case-insensitive key matching when extracting values from JSON objects. They work identically to their case-sensitive counterparts, except that object keys are matched without regard to case. | ||
|
|
||
| > These functions may be less performant than their case-sensitive counterparts, so use the regular JSONExtract functions if possible. |
There was a problem hiding this comment.
| > These functions may be less performant than their case-sensitive counterparts, so use the regular JSONExtract functions if possible. | |
| :::note | |
| These functions may be less performant than their case-sensitive counterparts, so use the regular JSONExtract functions if possible. | |
| ::: |
tests/queries/0_stateless/03567_json_extract_case_insensitive_edge_cases.sql
Outdated
Show resolved
Hide resolved
|
Thanks @antaljanosbenjamin; will let you know when everything is addressed. |
|
|
|
Thanks for the fixes, I will check them tomorrow. |
|
Hi @antaljanosbenjamin, how's it looking? 🙏 |
|
Sorry for the delay, I got swamped with more priority tasks. Checking it now. |
|
@antaljanosbenjamin, should I push a new commit to trigger a new build? Looks like the failing may be environmental/flake? |
|
|
7f3a2fd
|
Hi @alistairjevans @antaljanosbenjamin — while reviewing this PR I found the following:
Happy to discuss — close anything that's wrong or already addressed. |
Adding case-insensitive JSONExtract variants makes it much easier to search unstructured/loosely-structured data in a performant manner.
For example, we know that we want to get the "level" value from some JSON in a record, but some of the JSON data has "Level".
We can take advantage of the simdjson function
at_key_case_insensitive(internally it does an iteration over keys with a case-insensitive strncasecmp). Compatibility function added for rapidjson.This won't do full multi-byte-encoding-aware case-insensitive comparisons on JSON keys, but I'd add explicit UTF8 variants if those were really needed, and only if (not sure they're needed for JSON keys).
I added a CaseInsensitive variant of all JSONExtract functions that accept a sub-path.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Users can now do case-insensitive JSON key lookups using
JSONExtractCaseInsensitive(and other variants ofJSONExtract).Documentation entry for user-facing changes