Skip to content

[Bug] Broker load label has already been used #12628

@yongjinhou

Description

@yongjinhou

Search before asking

  • I had searched in the issues and found no similar issues.

Version

5b6d48e

What's Wrong?

This bug is introduced from #12275


CREATE TABLE test_sys_load_func_strict_test_md5sum_db.test_sys_load_func_strict_test_md5sum_tb_s 
( k1 BIGINT NULL, k2 BIGINT NULL, k3 BIGINT NULL, k4 BIGINT NULL, k5 CHAR(64) NULL ) 
DUPLICATE KEY(k1, k2, k3) 
PARTITION BY RANGE(k1) 
( PARTITION partition_a VALUES LESS THAN ("100000"), PARTITION partition_b VALUES LESS THAN ("1000000000"), 
  PARTITION partition_c VALUES LESS THAN ("10000000000"), PARTITION partition_d VALUES LESS THAN MAXVALUE ) DISTRIBUTED BY HASH(k1) 
BUCKETS 13;

LOAD LABEL test_sys_load_func_strict_test_md5sum_db.label_13_03_51_28_096733_745838666
( DATA INFILE("hdfs://xxx/user/palo/test/data/sys/verify/timestamp_load_file") 
INTO TABLE `test_sys_load_func_strict_test_md5sum_tb_s` 
PARTITION (partition_a, partition_b, partition_c, partition_d) 
COLUMNS TERMINATED BY "," (`k1`, `k2`, `k3`, `k4`) 
SET(k5 = md5sum(k1))) 
WITH BROKER "ahdfs" ("username"="xxx", "password"="xxx") 
PROPERTIES( "max_filter_ratio"="0.05", "strict_mode"="True" )

SHOW LOAD FROM test_sys_load_func_strict_test_md5sum_db 
WHERE LABEL="label_13_03_51_28_096733_745838664";

When above SQL is executed to import data through broker, the error "Label has already been used" is always reported for unused labels.

| 12010 | label_13_03_51_28_096733_745838666 | CANCELLED | ETL:N/A; LOAD:N/A | BROKER | NULL    | cluster:N/A;
 timeout(s):14400; max_filter_ratio:0.05 | type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = Label
 [label_13_03_51_28_096733_745838666] has already been used, relate to txn [1004] | 2022-09-15 17:15:05 | NULL
| NULL          | NULL          | 2022-09-15 17:15:08 | NULL | {"Unfinished backends":
{},"ScannedRows":0,"TaskNumber":0,"LoadBytes":0,"All backends":{},"FileNumber":0,"FileSize":0} | 0             | {}           |

Fe logs are as follows:

2022-09-15 17:15:08,432 WARN (pending-load-task-scheduler-pool-5|290) [LoadTask.exec():73] LOAD_JOB=12010, error_msg={Unexpected failed to execute load task}
java.lang.NullPointerException: null
	at org.apache.doris.load.Load.initColumns(Load.java:988) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.Load.initColumns(Load.java:813) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.planner.BrokerScanNode.initColumns(BrokerScanNode.java:286) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.planner.BrokerScanNode.initParams(BrokerScanNode.java:248) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.planner.BrokerScanNode.init(BrokerScanNode.java:189) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.LoadingTaskPlanner.plan(LoadingTaskPlanner.java:158) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.LoadLoadingTask.init(LoadLoadingTask.java:109) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.BrokerLoadJob.createLoadingTask(BrokerLoadJob.java:207) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.BrokerLoadJob.onPendingTaskFinished(BrokerLoadJob.java:164) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.BrokerLoadJob.onTaskFinished(BrokerLoadJob.java:123) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.LoadTask.exec(LoadTask.java:65) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.task.MasterTask.run(MasterTask.java:31) ~[doris-fe.jar:1.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_275]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_275]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_275]
2022-09-15 17:15:08,433 INFO (pending-load-task-scheduler-pool-6|291) [BrokerLoadPendingTask.executeTask():59] begin to execute broker pending task. job: 12010
2022-09-15 17:15:08,472 INFO (pending-load-task-scheduler-pool-6|291) [BrokerLoadPendingTask.getAllFileStatus():122] get 1 files in file group 0 for table [10009: [10005, 10006, 10007, 10008]]. size: 487. job: 12010, broker: TNetworkAddress(hostname:10.216.181.34, port:8000) 
2022-09-15 17:15:08,472 INFO (pending-load-task-scheduler-pool-6|291) [BrokerLoadPendingTask.getAllFileStatus():134] get 1 files to be loaded. total size: 487. cost: 39 ms, job: 12010
2022-09-15 17:15:08,472 WARN (pending-load-task-scheduler-pool-6|291) [LoadTask.exec():69] LOAD_JOB=12010, error_msg={Failed to execute load task}
org.apache.doris.common.LabelAlreadyUsedException: errCode = 2, detailMessage = Label [label_13_03_51_28_096733_745838666] has already been used, relate to txn [1004]
	at org.apache.doris.transaction.DatabaseTransactionMgr.beginTransaction(DatabaseTransactionMgr.java:322) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.transaction.GlobalTransactionMgr.beginTransaction(GlobalTransactionMgr.java:145) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.BrokerLoadJob.beginTxn(BrokerLoadJob.java:98) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.BrokerLoadPendingTask.executeTask(BrokerLoadPendingTask.java:61) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.load.loadv2.LoadTask.exec(LoadTask.java:63) ~[doris-fe.jar:1.0-SNAPSHOT]
	at org.apache.doris.task.MasterTask.run(MasterTask.java:31) ~[doris-fe.jar:1.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_275]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_275]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_275]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_275]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_275]

What You Expected?

When data is imported through broker, for unused labels, data can be imported normally.
The srcSlotIds should be created before using, otherwise, NPE will be thrown.

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions