Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [CheckPoint] increase checkpoint timeout #5722

Open
2 of 3 tasks
EngonVHKxZ opened this issue Oct 26, 2023 · 8 comments
Open
2 of 3 tasks

[Bug] [CheckPoint] increase checkpoint timeout #5722

EngonVHKxZ opened this issue Oct 26, 2023 · 8 comments

Comments

@EngonVHKxZ
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When I use batch mode to synchronize data, I always prompt for checkpoint timeout

SeaTunnel Version

2.3.3

SeaTunnel Config

seatunnel:
  engine:
    history-job-expire-minutes: 1440
    backup-count: 1
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 10000
      timeout: 60000
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: hdfs://hadoop-001:8020


env {
    job.mode="BATCH",
    execution.parallelism=5
},
source {
    Jdbc {
        url="jdbc:oracle:thin:@//IP:PORT/ServiceName",
        driver="oracle.jdbc.OracleDriver",
        user="username",
        password="pwd",
        query="select CLASS_CODE,SPELL_CODE,MODIFY_OPERATOR,WBZX_CODE,OPERATION_GRADE,CREATE_OPERATOR,NAME,SEX_LIMIT,CODE,AGE_LIMIT_L,SOURCE,VERSION,AGE_LIMIT_H,ID,CREATE_TIME,REMARK,SORT_NO,VALID_FLAG,MODIFY_TIME from username.KOHD order by ID",
        fetch_size=10000
    }
},
transform  {},
sink {
    Jdbc {
        url="jdbc:oracle:thin:@//IP:PORT/ServiceName",
        driver="oracle.jdbc.OracleDriver",
        user="username",
        password="pwd",
        generate_sink_sql=true,
        database="ORCL",
        table="username.KOHD2",
        xa_data_source_class_name="oracle.jdbc.xa.client.OracleXADataSource",
        is_exactly_once="true",
        auto_commit="true"
    }
}

Running Command

seatunnel.sh -c oracle2oracle.config

Error Exception

2023-10-26 15:27:04,899 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - wait checkpoint completed: 7
2023-10-26 15:27:04,903 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - timeout checkpoint: 769824035470049281/1/1, CHECKPOINT_TYPE
2023-10-26 15:27:04,904 INFO  org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - start clean pending checkpoint cause Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
2023-10-26 15:27:04,904 ERROR org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - trigger checkpoint failed
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
        at org.apache.seatunnel.engine.server.checkpoint.PendingCheckpoint.abortCheckpoint(PendingCheckpoint.java:172) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$cleanPendingCheckpoint$19(CheckpointCoordinator.java:645) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4770) ~[?:?]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.cleanPendingCheckpoint(CheckpointCoordinator.java:643) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:261) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:532) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2023-10-26 15:27:04,904 ERROR org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - trigger checkpoint failed
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
        at org.apache.seatunnel.engine.server.checkpoint.PendingCheckpoint.abortCheckpoint(PendingCheckpoint.java:172) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$cleanPendingCheckpoint$19(CheckpointCoordinator.java:645) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4770) ~[?:?]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.cleanPendingCheckpoint(CheckpointCoordinator.java:643) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:261) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:532) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2023-10-26 15:27:04,904 ERROR org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator - trigger checkpoint failed
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml
        at org.apache.seatunnel.engine.server.checkpoint.PendingCheckpoint.abortCheckpoint(PendingCheckpoint.java:172) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$cleanPendingCheckpoint$19(CheckpointCoordinator.java:645) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4770) ~[?:?]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.cleanPendingCheckpoint(CheckpointCoordinator.java:643) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:261) ~[seatunnel-starter.jar:2.3.3]
        at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:532) ~[seatunnel-starter.jar:2.3.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

Zeta or Flink or Spark Version

No response

Java or Scala Version

JDK11

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@EngonVHKxZ EngonVHKxZ added the bug label Oct 26, 2023
@lukeyan2023
Copy link

This should not be a bug, just increase the checkpoint timeout according to the log prompts. I also encountered this issue, and after increasing the timeout, the problem was resolved

@EngonVHKxZ
Copy link
Author

This should not be a bug, just increase the checkpoint timeout according to the log prompts. I also encountered this issue, and after increasing the timeout, the problem was resolved

i tried, but this solution is not work 4 me :(

@mengyueyue
Copy link

i met this in 2.3.4 version , like:
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env.
and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

@EngonVHKxZ
Copy link
Author

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

Version 2.3.4 has not been released yet, and the config may change at any time. The company only allows stable versions, so...... I have to wait for the next stable version to be released or fixed with patches

xuqi1633 added a commit to xuqi1633/seatunnel that referenced this issue Nov 22, 2023
Avoid reading large files or using scroll queries in Elasticsearch,
as the pollNext method tends to hold the checkpointLock indefinitely,
leading to checkpoint timeout.
xuqi1633 added a commit to xuqi1633/seatunnel that referenced this issue Nov 22, 2023
Avoid reading large files or using scroll queries in Elasticsearch,
as the pollNext method tends to hold the checkpointLock indefinitely,
leading to checkpoint timeout.
xuqi1633 added a commit to xuqi1633/seatunnel that referenced this issue Nov 23, 2023
Avoid reading large files or using scroll queries in Elasticsearch,
as the pollNext method tends to hold the checkpointLock indefinitely,
leading to checkpoint timeout.
xuqi1633 added a commit to xuqi1633/seatunnel that referenced this issue Nov 24, 2023
Avoid reading large files or using scroll queries in Elasticsearch,
as the pollNext method tends to hold the checkpointLock indefinitely,
leading to checkpoint timeout.
xuqi1633 added a commit to xuqi1633/seatunnel that referenced this issue Nov 27, 2023
Avoid reading large files or using scroll queries in Elasticsearch,
as the pollNext method tends to hold the checkpointLock indefinitely,
leading to checkpoint timeout.
@codeDing18
Copy link

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

hi @mengyueyue.How much data do you synchronize? I have a table whose size is 7.4g and timeout has been increased. But I still get this error.

@mengyueyue
Copy link

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

hi @mengyueyue.How much data do you synchronize? I have a table whose size is 7.4g and timeout has been increased. But I still get this error.

my data size is 4.7g and i set int max value for them

xuqi1633 added a commit to xuqi1633/seatunnel that referenced this issue Nov 28, 2023
Avoid reading large files or using scroll queries in Elasticsearch,
as the pollNext method tends to hold the checkpointLock indefinitely,
leading to checkpoint timeout.
@codeDing18
Copy link

i met this in 2.3.4 version , like: Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. and i tried increasing checkpoint.timeout and checkpoint.interval in jobConfig env, it worked!

hi @mengyueyue.How much data do you synchronize? I have a table whose size is 7.4g and timeout has been increased. But I still get this error.

my data size is 4.7g and i set int max value for them

thx for reply

Copy link

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the stale label Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants