Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Zeta Engine] the checkpoint lock cause checkpoint-flow blocking with long time #5694

Closed
3 tasks done
happyboy1024 opened this issue Oct 24, 2023 · 3 comments
Closed
3 tasks done
Labels

Comments

@happyboy1024
Copy link
Contributor

happyboy1024 commented Oct 24, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

I want to stopp a synchronization task by triggering savepoint. But I find that the task always ends after the data synchronization is complete. I traced the logs and found that the checkpoint process triggered by savepoint always failed when trying to obtain the checkpointLock.

It should be noted that my task is running in a single-core and 4G memory environment.

After analyzing this problem, I find that checkpointLock is locked through synchronized, while synchronized is an unfair lock. In single-core environment, thread hunger is more likely due to high CPU load. The checkpoint flow fails to obtain the checkpointLock.

SeaTunnel Version

2.3.3

SeaTunnel Config

env {
    job.mode=BATCH
    job.name=DEMO
}
source {
    Jdbc {
        url="jdbc:mysql://xxxx/transfer_source"
        driver="com.mysql.cj.jdbc.Driver"
        user="root"
        password="xxxx"
        query="select * from order_info"
        partition_column=id
        partition_num=20
        parallelism=2
    }
}
transform {
}
sink {
    Jdbc {
        url="jdbc:mysql://xxxx/transfer_sink?rewriteBatchedStatements=true"
        driver="com.mysql.cj.jdbc.Driver"
        user="root"
        password="xxxx"
        database="transfer_sink"
        table="order_info_sink"
        batch_size=1000
	enable_upsert=true
     	generate_sink_sql=true
	primary_keys = [id]
        query = ""
    }
}

Running Command

./bin/seatunnel-local.sh -c config/savepoint.config

./bin/seatunnel-local.sh -s {jobid}

Error Exception

no exception

Zeta or Flink or Spark Version

zeta

Java or Scala Version

1.8

Screenshots

The major process obtain checkpoint lock in here:

image

The checkpoint process try to obtain checkpoint lock in here:

image

When the savepoint trigger, main thread is executing pollNext. The checkpoint thread will be block as long time in mark of picture one, Until the main thread is completed

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Hisoka-X
Copy link
Member

Fixed by #5695

@Jetiaime
Copy link
Contributor

Jetiaime commented May 9, 2024

Sadly, I still got this bug when I run my job, please check in 6738.

@Hisoka-X
Copy link
Member

Hisoka-X commented May 9, 2024

Sadly, I still got this bug when I run my job, please check in 6738.

cc @hailin0 @happyboy1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants