Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement](JSONB) improve performance JSONB initial json parsing using simdjson #15219

Merged
merged 7 commits into from
Dec 29, 2022

Conversation

xiaokang
Copy link
Contributor

@xiaokang xiaokang commented Dec 21, 2022

Proposed changes

Issue Number: close #xxx

Problem summary

Describe your changes.

test data: https://data.gharchive.org/2020-11-13-18.json.gz, 2GB, 197696 lines
before: String 13s vs. JSONB 28s
after: String 13s vs. JSONB 16s

NOTICE: simdjson need to be patched since BOOL is conflicted with a macro BOOL defined in odbc sqltypes.h

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/sql/function Issues or PRs related to the SQL functions area/vectorization labels Dec 21, 2022
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/util/jsonb_parser_simd.h Outdated Show resolved Hide resolved
be/src/util/jsonb_parser_simd.h Outdated Show resolved Hide resolved
be/src/util/jsonb_parser_simd.h Outdated Show resolved Hide resolved
@xiaokang xiaokang force-pushed the simdjson_for_jsonb branch 2 times, most recently from 7d90d79 to 0890799 Compare December 23, 2022 00:25
@hello-stephen
Copy link
Contributor

hello-stephen commented Dec 23, 2022

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.84 seconds
load time: 660 seconds
storage size: 17123687835 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221228124743_clickbench_pr_70100.html

dataroaring
dataroaring previously approved these changes Dec 26, 2022
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 26, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 26, 2022
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 0f3c0b7 into apache:master Dec 29, 2022
morningman pushed a commit that referenced this pull request Dec 29, 2022
…sing simdjson (#15219)

test data: https://data.gharchive.org/2020-11-13-18.json.gz, 2GB, 197696 lines
before: String 13s vs. JSONB 28s
after: String 13s vs. JSONB 16s

**NOTICE: simdjson need to be patched since BOOL is conflicted with a macro BOOL defined in odbc sqltypes.h**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants