Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler & LeaseCoordinator run/shutdown race condition #427

Closed
danielcerutti opened this issue Sep 28, 2018 · 3 comments
Closed

Scheduler & LeaseCoordinator run/shutdown race condition #427

danielcerutti opened this issue Sep 28, 2018 · 3 comments
Labels
bug v2.x Issues related to the 2.x version
Milestone

Comments

@danielcerutti
Copy link

Scheduler::shutdown (or Worker for older revisions) can be executed before completing the Scheduler::initialize phase after calling Scheduler::run. This causes LeaseCoordinator::stop (where the LeaseTaker is canceled and shutdown) the to be called before LeaseCoordinator::start where the LeaseTaker is actually initialized. This eventually causes the Scheduler and LeaseCoordinator to be in a shutdown state while leaving the LeaseCoordinator's leaseCoordinatorThreadPool & takerFuture to remain executing while taking leases from other Schedulers that are actually running.

Simply running the following test can reproduce the problem:

    @Test
    public void runShutdownRaceCondition() throws Exception
    {
        new Thread(scheduler_).start();
        scheduler_.shutdown();

        while (true) {
            Thread.sleep(Long.MAX_VALUE);
        }
    }
@pfifer pfifer added bug v2.x Issues related to the 2.x version labels Oct 9, 2018
@sahilpalvia sahilpalvia added this to the v2.0.4 milestone Oct 9, 2018
@sahilpalvia
Copy link
Contributor

sahilpalvia commented Oct 18, 2018

Version 2.0.4 is now available. Closing this issue. Feel free to reopen if problem persists.

@matiaslb
Copy link

Are there any plans on providing a fix for v1?

@dharmeshspatel4u
Copy link

@pfifer with client v1.8.1 having below issue. I found this issue close to my issue.


2019-02-09 21:10:01.875  INFO 26971 --- [      Thread-29] c.a.s.k.clientlibrary.lib.worker.Worker  : Worker shutdown requested.
2019-02-09 21:10:01.876  INFO 26971 --- [      Thread-29] c.a.s.k.leases.impl.LeaseCoordinator     : Worker ip-1234. has successfully stopped lease-tracking threads
2019-02-09 21:10:01.877  INFO 26971 --- [dProcessor-0000] c.c.d.v.s.p.KinesisRecordProcessor       : Checkpointing shard shardId-000000000000
2019-02-09 21:10:01.878  INFO 26971 --- [dProcessor-0000] k.c.l.w.KinesisClientLibLeaseCoordinator : Worker ip-1234. could not update checkpoint for shard shardId-000000000000 because it does not hold the lease
2019-02-09 21:10:01.878  INFO 26971 --- [dProcessor-0000] c.c.d.v.s.p.KinesisRecordProcessor       : Caught shutdown exception, skipping checkpoint.

com.amazonaws.services.kinesis.clientlibrary.exceptions.ShutdownException: Can't update checkpoint - instance doesn't hold the lease for this shard
        at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator.setCheckpoint(KinesisClientLibLeaseCoordinator.java:174) ~[amazon-kinesis-client-1.8.1.jar!/:na]

Any clue, if this is my issue? I see sometimes checkpoint gets updated, sometimes throws above error and it delivers again those messages back to consumer. Is this fixed in 2.0.4? I have very major upgrade to do so looking to seeif 1.x has fix for above issue.

appreciate your quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug v2.x Issues related to the 2.x version
Projects
None yet
Development

No branches or pull requests

5 participants