Skip to content

Conversation

BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Apr 30, 2025

No description provided.

update

update
@rschu1ze rschu1ze changed the title support apache doris Add Apache Doris May 3, 2025
@BiteTheDDDDt
Copy link
Contributor Author

@rschu1ze Hi, I updated sql and got the new results with accurate machine(aws m6i.8xlarge).
The error message Same backend already exists[127.0.0.1:9050] is because the ALTER SYSTEM ADD BACKEND was repeated, which does not affect subsequent tests.

@rschu1ze
Copy link
Member

@BiteTheDDDDt Thanks for updating the results. Perhaps you also like to add measurements on your aws m6i.8xlarge instance for the 1m, 10m and 100m cases?

To double-check, I re-ran the script locally and got this:

sysctl: permission denied on key "vm.max_map_count", ignoring
vm.max_map_count = 2000000
./start.sh: line 5: ulimit: open files: cannot modify limit: Operation not permitted
Set kernel parameter 'vm.max_map_count' to a value greater than 2000000, example: 'sysctl -w vm.max_map_count=2000000'
Sleep 30 sec
Sleep 10 sec
Create database
Execute DDL
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = replication num should be less than the number of available backends. replication num is 1, available backend num is 0
Load data
Processing file: /home/ubuntu/data/bluesky/file_0001.json.gz
null
null
null
null
null
null
null
null
null
null
ERROR 1051 (42S02) at line 1: errCode = 7, detailMessage = table not found, tableName=bluesky
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Running queries on database: bluesky_1m_materialized
Clearing file system cache...
File system cache cleared.
Running query: SELECT collection AS event, COUNT(*) AS count FROM bluesky GROUP BY event ORDER BY count DESC;
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
Clearing file system cache...
File system cache cleared.
Running query: SELECT collection AS event, COUNT(*) AS count, COUNT(DISTINCT did) AS users FROM bluesky WHERE kind = 'commit' AND operation
= 'create' GROUP BY event ORDER BY count DESC;
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
Clearing file system cache...
File system cache cleared.
Running query: SELECT collection AS event, HOUR(time) AS hour_of_day, COUNT(*) AS count FROM bluesky WHERE kind = 'commit' AND operation = '
create' AND collection IN ('app.bsky.feed.post', 'app.bsky.feed.repost', 'app.bsky.feed.like') GROUP BY event, hour_of_day ORDER BY hour_of_
day, event;
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
Clearing file system cache...
File system cache cleared.
Running query: SELECT did AS user_id, MIN(time) AS first_post_ts FROM bluesky WHERE kind = 'commit' AND operation = 'create' AND collection
= 'app.bsky.feed.post' GROUP BY user_id ORDER BY first_post_ts ASC LIMIT 3;
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
Clearing file system cache...
File system cache cleared.
Running query: SELECT did AS user_id, MILLISECONDS_DIFF(MAX(time),MIN(time)) AS activity_span FROM bluesky WHERE kind = 'commit' AND operati
on = 'create' AND collection = 'app.bsky.feed.post' GROUP BY user_id ORDER BY activity_span DESC LIMIT 3;
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_materialized].
Response time:  s
Result written to _m6i.8xlarge_bluesky_1m_materialized.results_runtime
Dropping table: bluesky_1m_materialized.bluesky
Create database
Execute DDL
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = replication num should be less than the number of available backends. replication
 num is 1, available backend num is 0
Load data
Processing file: /home/ubuntu/data/bluesky/file_0001.json.gz
null
null
null
null
null
null
null
null
null
null
ERROR 1051 (42S02) at line 1: errCode = 7, detailMessage = table not found, tableName=bluesky
ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = Table [bluesky] does not exist in database [bluesky_1m_default].
Running queries on database: bluesky_1m_default
Clearing file system cache...
File system cache cleared.
Running query: SELECT cast(data['commit']['collection'] AS TEXT ) AS event, COUNT(*) AS count FROM bluesky GROUP BY event ORDER BY count DES
C;
[...]

As a result, =all result files (e.g. "doris/_m6i.8xlarge_bluesky_100m_default.results_runtime") were empty.

I suppose the error is this:

ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = replication num should be less than the number of available backends. replication num is 1, available backend num is 0

What does that mean exactly and how to fix the error?

Also, doris/run_queries.sh contains this:

mysql -P 9030 -h 127.0.0.1 -u root $DB_NAME -e "set global parallel_pipeline_task_num=32;"
mysql -P 9030 -h 127.0.0.1 -u root $DB_NAME -e "set global enable_parallel_scan=false;"

All databases in JSONBench are ideally run with default settings. If that is not possible, kindly document what the special configuration does and why it is necessary to set. Thanks!

@rschu1ze
Copy link
Member

Sorry to ask again, when I run the script in my virgin Ubuntu 24.04 environment, I am getting error

ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = replication num should be less than the number of available backends. replication num is 1, available backend num is 0

and all result files are empty. How can this be fixed?

@BiteTheDDDDt
Copy link
Contributor Author

Sorry to ask again, when I run the script in my virgin Ubuntu 24.04 environment, I am getting error

ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = replication num should be less than the number of available backends. replication num is 1, available backend num is 0

and all result files are empty. How can this be fixed?

Seems doris backend not ready, can you check $DORIS_FULL_NAME/be/log/be.out and $DORIS_FULL_NAME/be/log/be.warning to find something reason?

@BiteTheDDDDt
Copy link
Contributor Author

@rschu1ze Hi, did you get a chance to check the log? If there's still any problem, feel free to share the output — I’d be happy to help debug further. thanks again for your time and help reviewing this PR!

@rschu1ze
Copy link
Member

The BE logs were empty and the reason was that the BE node didn't start. When I started it manually, it complained (like above) about

Set max number of open file descriptors to a value greater than 60000.
Ask your system manager to modify /etc/security/limits.conf and append content like
  * soft nofile 655350
  * hard nofile 655350
and then run 'ulimit -n 655350' to take effect on current session.

I then commented that check in start_be.sh out. This worked finally.

Just started the full test suite for reproduction ...

@rschu1ze rschu1ze self-assigned this May 19, 2025
@BiteTheDDDDt
Copy link
Contributor Author

The BE logs were empty and the reason was that the BE node didn't start. When I started it manually, it complained (like above) about

Set max number of open file descriptors to a value greater than 60000.
Ask your system manager to modify /etc/security/limits.conf and append content like
  * soft nofile 655350
  * hard nofile 655350
and then run 'ulimit -n 655350' to take effect on current session.

I then commented that check in start_be.sh out. This worked finally.

Just started the full test suite for reproduction ...

Hi, I noticed that you've already uploaded your test results. Is there anything else I need to do? If so, please let me know.

@rschu1ze
Copy link
Member

@BiteTheDDDDt Turns out there is now a number of submissions for JSONBench (this PR, but also others) where the structure of the original JSON document is not retained (i.e. flattening, see here). This is fine, however to allow users to distinguish both cases, we'll need a button for the JSONBench UI. I asked a colleague to take a look, then I am happy to merge. Thanks for this PR!

@rschu1ze
Copy link
Member

@BiteTheDDDDt Turns out there is now a number of submissions for JSONBench (this PR, but also others) where the structure of the original JSON document is not retained (i.e. flattening, see here). This is fine, however to allow users to distinguish both cases, we'll need a button for the JSONBench UI. I asked a colleague to take a look, then I am happy to merge. Thanks for this PR!

Following up, the same consideration applies here as in #77. Made the corresponding adjustments (removed the materialized results).

@rschu1ze rschu1ze merged commit 92302e2 into ClickHouse:main May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants