Skip to content

Add CaseInsensitive JSONExtract variants#83770

Merged
antaljanosbenjamin merged 13 commits intoClickHouse:masterfrom
alistairjevans:alistairjevans/case-insensitive-json
Jul 23, 2025
Merged

Add CaseInsensitive JSONExtract variants#83770
antaljanosbenjamin merged 13 commits intoClickHouse:masterfrom
alistairjevans:alistairjevans/case-insensitive-json

Conversation

@alistairjevans
Copy link
Copy Markdown
Contributor

@alistairjevans alistairjevans commented Jul 15, 2025

Adding case-insensitive JSONExtract variants makes it much easier to search unstructured/loosely-structured data in a performant manner.

For example, we know that we want to get the "level" value from some JSON in a record, but some of the JSON data has "Level".

We can take advantage of the simdjson function at_key_case_insensitive (internally it does an iteration over keys with a case-insensitive strncasecmp). Compatibility function added for rapidjson.

This won't do full multi-byte-encoding-aware case-insensitive comparisons on JSON keys, but I'd add explicit UTF8 variants if those were really needed, and only if (not sure they're needed for JSON keys).

I added a CaseInsensitive variant of all JSONExtract functions that accept a sub-path.

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Users can now do case-insensitive JSON key lookups using JSONExtractCaseInsensitive (and other variants of JSONExtract).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jul 15, 2025

CLA assistant check
All committers have signed the CLA.

@antaljanosbenjamin antaljanosbenjamin self-assigned this Jul 15, 2025
@antaljanosbenjamin antaljanosbenjamin added the can be tested Allows running workflows for external contributors label Jul 15, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 15, 2025

Workflow [PR], commit [63d5ec8]

Summary:

job_name test_name status info comment
Integration tests (aarch64, distributed plan, 1/4) error
Stress test (arm_asan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Jul 15, 2025
@alistairjevans
Copy link
Copy Markdown
Contributor Author

Ah, didn't know about the aspell stuff, looks like the function names need excluding. Will fix.

@alistairjevans
Copy link
Copy Markdown
Contributor Author

alistairjevans commented Jul 16, 2025

@antaljanosbenjamin, I believe I've fixed all CI issues related to my changes; seems like the failing tests may be failing on other PRs and/or be flaky? I didn't want to just blindly push a commit to retry them.

Let me know if there's anything more you'd like me to do.

@antaljanosbenjamin
Copy link
Copy Markdown
Member

Thanks, you are right. No reason to push just for that, they might be flaky. I will check your PR today.

**Implementation Notes**

- When multiple keys match with different cases, the first match is returned
- Case-insensitive matching only applies to object keys, not to array indices or the extracted values
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "not to array indices or the extracted values"? I don't see how key matching is relevant for those.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, doesn't really need to be said.


The following functions perform case-insensitive key matching when extracting values from JSON objects. They work identically to their case-sensitive counterparts, except that object keys are matched without regard to case.

> These functions may be less performant than their case-sensitive counterparts, so use the regular JSONExtract functions if possible.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> These functions may be less performant than their case-sensitive counterparts, so use the regular JSONExtract functions if possible.
:::note
These functions may be less performant than their case-sensitive counterparts, so use the regular JSONExtract functions if possible.
:::

@alistairjevans
Copy link
Copy Markdown
Contributor Author

Thanks @antaljanosbenjamin; will let you know when everything is addressed.

@alistairjevans
Copy link
Copy Markdown
Contributor Author

alistairjevans commented Jul 16, 2025

A few more checks to run all checks are good barring that one flake @antaljanosbenjamin, but all your requested changes have been applied.

@antaljanosbenjamin
Copy link
Copy Markdown
Member

Thanks for the fixes, I will check them tomorrow.

@alistairjevans
Copy link
Copy Markdown
Contributor Author

Hi @antaljanosbenjamin, how's it looking? 🙏

@antaljanosbenjamin
Copy link
Copy Markdown
Member

Sorry for the delay, I got swamped with more priority tasks. Checking it now.

@alistairjevans
Copy link
Copy Markdown
Contributor Author

@antaljanosbenjamin, should I push a new commit to trigger a new build? Looks like the failing may be environmental/flake?

@antaljanosbenjamin
Copy link
Copy Markdown
Member

Stress test - #81144
Integration tests (aarch64, distributed plan, 1/4) - all tests succeeded, but the report is broken. Will merge this.

@antaljanosbenjamin antaljanosbenjamin added this pull request to the merge queue Jul 23, 2025
Merged via the queue into ClickHouse:master with commit 7f3a2fd Jul 23, 2025
238 of 243 checks passed
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 23, 2025
@clickgapai
Copy link
Copy Markdown
Contributor

Hi @alistairjevans @antaljanosbenjamin — while reviewing this PR I found the following:

Happy to discuss — close anything that's wrong or already addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants