Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempted to collect local articles and HDFS text files into Doris. The working speed is very slow whether it is collecting locally or from HDFS #6240

Closed
3 tasks done
duanmuyh opened this issue Jan 18, 2024 · 2 comments

Comments

@duanmuyh
Copy link

duanmuyh commented Jan 18, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Collecting local text files into Doris has a low processing speed.
By observing the information printed in the log, it was found that the reading speed and writing speed of the data were on par. We suspect that the transfer and integration of text files by source may not be as efficient

SeaTunnel Version

seatunnel 2.3.3
Doris 2.0.2
Java 1.8.0_333

SeaTunnel Config

env {
  # You can set flink configuration here
  execution.parallelism = 10
  job.mode = "BATCH"
}

source {
  localFile {
  file_format_type = "text"
  path = "/home/hadoop/yangst/wudao_20240111.txt"
  }
}

sink {
  jdbc {
      url = "jdbc:mysql://xxx.xx.xx:port/wudao?rewriteBatchedStatements=true"
      driver = "com.mysql.cj.jdbc.Driver"
      user = "root"
      password = "xxxx"
      query = "insert into wudao(line) values(?)"
  }
}

Running Command

nohup sh bin/seatunnel.sh -c config/wudao.config -e local > nohup.out 2>&1 &

Error Exception

Job Progress Information
***********************************************
Job Id                    :  799900012245942273
Read Count So Far         :               35192
Write Count So Far        :               26996
Average Read Count        :                50/s
Average Write Count       :                50/s
Last Statistic Time       : 2024-01-17 15:35:05
Current Statistic Time    : 2024-01-17 15:36:05
***********************************************

2024-01-17 15:37:02,846 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
     CoordinatorService Thread Pool Status
***********************************************
activeCount               :                   1
corePoolSize              :                   0
maximumPoolSize           :          2147483647
poolSize                  :                   1
completedTaskCount        :                  67
taskCount                 :                  68
***********************************************

2024-01-17 15:37:02,848 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
                Job info detail
***********************************************
createdJobCount           :                   0
scheduledJobCount         :                   0
runningJobCount           :                   1
failingJobCount           :                   0
failedJobCount            :                   0
cancellingJobCount        :                   0
canceledJobCount          :                   0
finishedJobCount          :                   0
restartingJobCount        :                   0
suspendedJobCount         :                   0
reconcilingJobCount       :                   0
***********************************************

2024-01-17 15:37:05,614 INFO  org.apache.seatunnel.engine.client.job.JobMetricsRunner - 
***********************************************
           Job Progress Information
***********************************************
Job Id                    :  799900012245942273
Read Count So Far         :               37192
Write Count So Far        :               28996
Average Read Count        :                33/s
Average Write Count       :                33/s
Last Statistic Time       : 2024-01-17 15:36:05
Current Statistic Time    : 2024-01-17 15:37:05
***********************************************

2024-01-17 15:38:02,845 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
     CoordinatorService Thread Pool Status
***********************************************
activeCount               :                   1
corePoolSize              :                   0
maximumPoolSize           :          2147483647
poolSize                  :                   1
completedTaskCount        :                  67
taskCount                 :                  68
***********************************************

2024-01-17 15:38:02,849 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
                Job info detail
***********************************************
createdJobCount           :                   0
scheduledJobCount         :                   0
runningJobCount           :                   1
failingJobCount           :                   0
failedJobCount            :                   0
cancellingJobCount        :                   0
canceledJobCount          :                   0
finishedJobCount          :                   0
restartingJobCount        :                   0
suspendedJobCount         :                   0
reconcilingJobCount       :                   0
***********************************************

2024-01-17 15:38:05,613 INFO  org.apache.seatunnel.engine.client.job.JobMetricsRunner - 
***********************************************
           Job Progress Information
***********************************************
Job Id                    :  799900012245942273
Read Count So Far         :               38192
Write Count So Far        :               29996
Average Read Count        :                16/s
Average Write Count       :                16/s
Last Statistic Time       : 2024-01-17 15:37:05
Current Statistic Time    : 2024-01-17 15:38:05
***********************************************

2024-01-17 15:39:02,846 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
     CoordinatorService Thread Pool Status
***********************************************
activeCount               :                   1
corePoolSize              :                   0
maximumPoolSize           :          2147483647
poolSize                  :                   1
completedTaskCount        :                  67
taskCount                 :                  68
***********************************************

2024-01-17 15:39:02,849 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
                Job info detail
***********************************************
createdJobCount           :                   0
scheduledJobCount         :                   0
runningJobCount           :                   1
failingJobCount           :                   0
failedJobCount            :                   0
cancellingJobCount        :                   0
canceledJobCount          :                   0
finishedJobCount          :                   0
restartingJobCount        :                   0
suspendedJobCount         :                   0
reconcilingJobCount       :                   0
***********************************************

2024-01-17 15:39:05,613 INFO  org.apache.seatunnel.engine.client.job.JobMetricsRunner - 
***********************************************
           Job Progress Information
***********************************************
Job Id                    :  799900012245942273
Read Count So Far         :               41192
Write Count So Far        :               32996
Average Read Count        :                50/s
Average Write Count       :                50/s
Last Statistic Time       : 2024-01-17 15:38:05
Current Statistic Time    : 2024-01-17 15:39:05
***********************************************

2024-01-17 15:39:45,040 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - wait checkpoint completed: 4
2024-01-17 15:40:02,845 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
     CoordinatorService Thread Pool Status
***********************************************
activeCount               :                   1
corePoolSize              :                   0
maximumPoolSize           :          2147483647
poolSize                  :                   4
completedTaskCount        :                  72
taskCount                 :                  73
***********************************************

2024-01-17 15:40:02,847 INFO  org.apache.seatunnel.engine.server.CoordinatorService - [localhost]:5801 [seatunnel-478968] [5.1] 
***********************************************
                Job info detail
***********************************************
createdJobCount           :                   0
scheduledJobCount         :                   0
runningJobCount           :                   1
failingJobCount           :                   0
failedJobCount            :                   0
cancellingJobCount        :                   0
canceledJobCount          :                   0
finishedJobCount          :                   0
restartingJobCount        :                   0
suspendedJobCount         :                   0
reconcilingJobCount       :                   0
***********************************************

2024-01-17 15:40:05,609 INFO  org.apache.seatunnel.engine.client.job.JobMetricsRunner - 
***********************************************
           Job Progress Information
***********************************************
Job Id                    :  799900012245942273
Read Count So Far         :               44192
Write Count So Far        :               35996
Average Read Count        :                50/s
Average Write Count       :                50/s
Last Statistic Time       : 2024-01-17 15:39:05
Current Statistic Time    : 2024-01-17 15:40:05
***********************************************

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@duanmuyh duanmuyh added the bug label Jan 18, 2024
@duanmuyh duanmuyh changed the title The collection rate of local files and HDFS files is very slow when doing Attempted to collect local articles and HDFS text files into Doris. The working speed is very slow whether it is collecting locally or from HDFS Jan 30, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the stale label Mar 13, 2024
Copy link

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant