[Bug] [CheckPoint] increase checkpoint timeout #5722

EngonVHKxZ · 2023-10-26T10:41:03Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

When I use batch mode to synchronize data, I always prompt for checkpoint timeout

SeaTunnel Version

2.3.3

SeaTunnel Config

seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: hdfs://hadoop-001:8020


env {
    job.mode="BATCH",
    execution.parallelism=5
},
source {
    Jdbc {
        url="jdbc:oracle:thin:@//IP:PORT/ServiceName",
        driver="oracle.jdbc.OracleDriver",
        user="username",
        password="pwd",
        query="select CLASS_CODE,SPELL_CODE,MODIFY_OPERATOR,WBZX_CODE,OPERATION_GRADE,CREATE_OPERATOR,NAME,SEX_LIMIT,CODE,AGE_LIMIT_L,SOURCE,VERSION,AGE_LIMIT_H,ID,CREATE_TIME,REMARK,SORT_NO,VALID_FLAG,MODIFY_TIME from username.KOHD order by ID",
        fetch_size=10000
    }
},
transform  {},
sink {
    Jdbc {
        url="jdbc:oracle:thin:@//IP:PORT/ServiceName",
        driver="oracle.jdbc.OracleDriver",
        user="username",
        password="pwd",
        generate_sink_sql=true,
        database="ORCL",
        table="username.KOHD2",
        xa_data_source_class_name="oracle.jdbc.xa.client.OracleXADataSource",
        is_exactly_once="true",
        auto_commit="true"
    }
}

Running Command

seatunnel.sh -c oracle2oracle.config

Error Exception

2023-10-26 15:27:04,899 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - wait checkpoint completed: 7
2023-10-26 15:27:04,903 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - timeout checkpoint: 769824035470049281/1/1, CHECKPOINT_TYPE
2023-10-26 15:27:04,904 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - start clean pending checkpoint cause Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
2023-10-26 15:27:04,904 ERROR org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - trigger checkpoint failed
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
        at org.apache.seatunnel.engine.server.checkpoint.PendingCheckpoint.abortCheckpoint(PendingCheckpoint.java:172) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$cleanPendingCheckpoint$19(CheckpointCoordinator.java:645) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4770) ~[?:?]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.cleanPendingCheckpoint(CheckpointCoordinator.java:643) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:261) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:532) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2023-10-26 15:27:04,904 ERROR org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - trigger checkpoint failed
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
        at org.apache.seatunnel.engine.server.checkpoint.PendingCheckpoint.abortCheckpoint(PendingCheckpoint.java:172) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$cleanPendingCheckpoint$19(CheckpointCoordinator.java:645) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4770) ~[?:?]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.cleanPendingCheckpoint(CheckpointCoordinator.java:643) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:261) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:532) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2023-10-26 15:27:04,904 ERROR org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - trigger checkpoint failed
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
        at org.apache.seatunnel.engine.server.checkpoint.PendingCheckpoint.abortCheckpoint(PendingCheckpoint.java:172) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$cleanPendingCheckpoint$19(CheckpointCoordinator.java:645) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4770) ~[?:?]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.cleanPendingCheckpoint(CheckpointCoordinator.java:643) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:261) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:532) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

Zeta or Flink or Spark Version

No response

Java or Scala Version

JDK11

Screenshots

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

lukeyan2023 · 2023-10-28T04:11:28Z

This should not be a bug, just increase the checkpoint timeout according to the log prompts. I also encountered this issue, and after increasing the timeout, the problem was resolved

EngonVHKxZ · 2023-11-01T02:45:27Z

This should not be a bug, just increase the checkpoint timeout according to the log prompts. I also encountered this issue, and after increasing the timeout, the problem was resolved

i tried, but this solution is not work 4 me :(

mengyueyue · 2023-11-07T03:21:39Z

i met this in 2.3.4 version , like:
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env.
and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

EngonVHKxZ · 2023-11-08T02:44:51Z

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

Version 2.3.4 has not been released yet, and the config may change at any time. The company only allows stable versions, so...... I have to wait for the next stable version to be released or fixed with patches

Avoid reading large files or using scroll queries in Elasticsearch, as the pollNext method tends to hold the checkpointLock indefinitely, leading to checkpoint timeout.

codeDing18 · 2023-11-27T12:46:18Z

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

hi @mengyueyue.How much data do you synchronize? I have a table whose size is 7.4g and timeout has been increased. But I still get this error.

mengyueyue · 2023-11-28T01:57:08Z

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

hi @mengyueyue.How much data do you synchronize? I have a table whose size is 7.4g and timeout has been increased. But I still get this error.

my data size is 4.7g and i set int max value for them

Avoid reading large files or using scroll queries in Elasticsearch, as the pollNext method tends to hold the checkpointLock indefinitely, leading to checkpoint timeout.

codeDing18 · 2023-11-28T12:45:21Z

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

hi @mengyueyue.How much data do you synchronize? I have a table whose size is 7.4g and timeout has been increased. But I still get this error.

my data size is 4.7g and i set int max value for them

thx for reply

github-actions · 2024-01-22T00:25:22Z

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

EngonVHKxZ added the bug label Oct 26, 2023

xuqi1633 mentioned this issue Nov 23, 2023

[Bugfix][Zeta] Fix the checkpoint timeout (#5722) #5904

Closed

4 tasks

github-actions bot added the stale label Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [CheckPoint] increase checkpoint timeout #5722

[Bug] [CheckPoint] increase checkpoint timeout #5722

EngonVHKxZ commented Oct 26, 2023

lukeyan2023 commented Oct 28, 2023

EngonVHKxZ commented Nov 1, 2023

mengyueyue commented Nov 7, 2023

EngonVHKxZ commented Nov 8, 2023

codeDing18 commented Nov 27, 2023

mengyueyue commented Nov 28, 2023

codeDing18 commented Nov 28, 2023

github-actions bot commented Jan 22, 2024

[Bug] [CheckPoint] increase checkpoint timeout #5722

[Bug] [CheckPoint] increase checkpoint timeout #5722

Comments

EngonVHKxZ commented Oct 26, 2023

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Zeta or Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct

lukeyan2023 commented Oct 28, 2023

EngonVHKxZ commented Nov 1, 2023

mengyueyue commented Nov 7, 2023

EngonVHKxZ commented Nov 8, 2023

codeDing18 commented Nov 27, 2023

mengyueyue commented Nov 28, 2023

codeDing18 commented Nov 28, 2023

github-actions bot commented Jan 22, 2024