Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](catalog) avoid calling checksum when replaying creating jdbc catalog and fix ranger issue #22369

Merged
merged 4 commits into from
Aug 30, 2023

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Jul 30, 2023

Proposed changes

  1. jdbc
    Before, in the constructor of Jdbc catalog, we may call checksum action of the jdbc driver.
    But the download link of the jdbc driver may not be available when replaying, causing replay error.

This PR change the logic to avoid calling checksum when replaying creating jdbc catalog.

  1. ranger
    After this PR, when creating catalog, it will try to init access controller to make sure the config is ok.

  2. catalog priv check
    When creating/dropping/altering/ catalog, doris will only use internal access controller to check catalog level priv.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.29 seconds
stream load tsv: 542 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17160189504 Bytes

zy-kkk
zy-kkk previously approved these changes Jul 30, 2023
Copy link
Member

@zy-kkk zy-kkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 30, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 28, 2023
@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.19 seconds
stream load tsv: 534 seconds loaded 74807831229 Bytes, about 133 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162010503 Bytes

@morningman morningman changed the title [improvement](jdbc-catalog) avoid calling checksum when replaying creating jdbc catalog [improvement](catalog) avoid calling checksum when replaying creating jdbc catalog and fix ranger issue Aug 28, 2023
@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.27 seconds
stream load tsv: 538 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17161948680 Bytes

@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.18 seconds
stream load tsv: 535 seconds loaded 74807831229 Bytes, about 133 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162301498 Bytes

@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.69 seconds
stream load tsv: 548 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162433892 Bytes

Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@zy-kkk zy-kkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 30, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@zy-kkk zy-kkk merged commit b740489 into apache:master Aug 30, 2023
25 checks passed
xiaokang pushed a commit that referenced this pull request Aug 30, 2023
… jdbc catalog and fix ranger issue (#22369)

1. jdbc
Before, in the constructor of Jdbc catalog, we may call checksum action of the jdbc driver.
But the download link of the jdbc driver may not be available when replaying, causing replay error.

This PR change the logic to avoid calling checksum when replaying creating jdbc catalog.

2. ranger
After this PR, when creating catalog, it will try to init access controller to make sure the config is ok.

3. catalog priv check
When creating/dropping/altering/ catalog, doris will only use internal access controller to check catalog level priv.
@xiaokang xiaokang mentioned this pull request Sep 30, 2023
@xiaokang xiaokang mentioned this pull request Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants