Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [StreamLoad] writing JSON has become exceptionally slow #35306

Closed
2 of 3 tasks
15767714253 opened this issue May 23, 2024 · 2 comments
Closed
2 of 3 tasks

[Bug] [StreamLoad] writing JSON has become exceptionally slow #35306

15767714253 opened this issue May 23, 2024 · 2 comments

Comments

@15767714253
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.3.5

What's Wrong?

My Table
CREATE TABLE dwd_ess_big_cell_inc
(
time datetime NOT NULL COMMENT '',
namespace_code VARCHAR(64) NOT NULL COMMENT '',
device_instance_property_code VARCHAR(64) NOT NULL COMMENT '',
device_instance_code VARCHAR(64) NOT NULL COMMENT '',
value VARCHAR(64) NULL COMMENT '',
kafka_time DATETIME NOT NULL COMMENT '创建时间',
create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间'
) ENGINE = OLAP UNIQUE KEY( time, namespace_code,device_instance_property_code,device_instance_code)
COMMENT ''
PARTITION BY RANGE (time) ()
DISTRIBUTED BY HASH(time,namespace_code,device_instance_property_code, device_instance_code)
PROPERTIES
(
"min_load_replica_num" = "1",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "HOUR",
"dynamic_partition.start" = "-24",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "24",
"dynamic_partition.replication_num" = "3",
"compaction_policy" = "time_series",
"enable_unique_key_merge_on_write" = "false"
);

flink doris connector config
image
"properties": {
"format": "json",
"timezone": "Asia/Shanghai",
"read_json_by_line": "true",
"send_batch_parallelism": 10,
"memtable_on_sink_node": "true",
"columns": "time,time=from_unixtime(round(time/1000,0)),namespace_code,device_instance_property_code,device_instance_code,value,kafka_time,kafka_time=from_unixtime(round(kafka_time/1000,0))"
},

My FE Config
enable_single_replica_load = true
fetch_stream_load_record_interval_second = 30

My BE Config
number_tablet_writer_threads = 48
streaming_load_json_max_mb = 1024
enable_single_replica_load = true
jsonb_type_length_soft_limit_bytes = 2147483643
string_type_length_soft_limit_bytes = 2147483643
enable_stream_load_record = true
max_send_batch_parallelism_per_job = 20

1FE 3BE
4 * 64G 32vCpu
image

StreamLoad Result

35e45859d14d1e0ac0c4487d63fbfa7

Sometimes it is like this.
image

What You Expected?

During my previous tests, with 10 concurrent processes, each committing a million writes, it would not take more than 10 seconds. However, I don't understand why writing to the new cluster has become so slow now. This cluster only contains this single table and has plenty of resources. I hope to find out what the problem is.

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@JNSimba
Copy link
Member

JNSimba commented May 24, 2024

Can be upgraded to 1.6.1?

@15767714253
Copy link
Contributor Author

Can be upgraded to 1.6.1?

OK!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants