Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertx.cancelTimer() can return true when timer fires #2912

Closed
matthewdallen opened this issue Apr 15, 2019 · 18 comments
Closed

Vertx.cancelTimer() can return true when timer fires #2912

matthewdallen opened this issue Apr 15, 2019 · 18 comments
Assignees
Labels
Milestone

Comments

@matthewdallen
Copy link

matthewdallen commented Apr 15, 2019

The API contract for Vertx.cancelTimer() appears to indicate that if it returns true, the timer has been successfully canceled. However, the timer handler is not properly synchronized with the timer flag, so the handler can see the cancelled flag is false and start the handler execution before the CAS in the cancel() method runs.

Currently, we have to do our own synchronization around timers to ensure that when a timer fires indicating a timeout, we are guaranteed the handler does not also execute. It would be better if InternalTimerHandler.handle() also did a cancelled.compareAndSet(false, true) and only executed the handler when the CAS is successful, ensuring that canceled timers are guaranteed not to execute their handlers.

@vietj
Copy link
Member

vietj commented Apr 16, 2019

can you provide a reproducer ?

@vietj vietj added this to the 3.7.1 milestone Apr 16, 2019
@vietj vietj added the bug label Apr 16, 2019
@vietj vietj self-assigned this Apr 16, 2019
@matthewdallen
Copy link
Author

matthewdallen commented Apr 16, 2019

I don't think a reproducer is warranted, given that it is clear by inspection of the following code (taken from VertxImpl.InternalTimerHandler) that cancelTimer is not linearized with respect to handle:

        boolean cancel() {
            if (this.cancelled.compareAndSet(false, true)) {
                if (VertxImpl.this.metrics != null) {
                    VertxImpl.this.metrics.timerEnded(this.timerID, true);
                }

                this.future.cancel(false);
                return true;
            } else {
                return false;
            }
        }
        public void handle(Void v) {
            if (!this.cancelled.get()) {
                try {
                    this.handler.handle(this.timerID);
                } finally {
                    if (!this.periodic) {
                        this.cleanupNonPeriodic();
                    }

                }
            }
        }

As written, the compareAndSet in cancel might as well be an ordinary load and store, since there is no other atomic operation on the cancelled flag to establish a linearization. Another way of explaining the issue is that there is a possible interleaving where handle sees cancelled is false, then cancel is called, the CAS succeeds in setting cancelled to true and thus the function returns true, and then the handler proceeds to execute.

Sorry if this is an over-explanantion, I don't mean to insult anybody's intelligence. Let me know if I am missing something.

@vietj
Copy link
Member

vietj commented Apr 17, 2019

this is a fine explanation.

I believe that when the timer fires, the handler should perform a CAS operation as well to execute the operation and signal the caller of cancel that the task executed.

@vietj
Copy link
Member

vietj commented Apr 17, 2019

I think that changing the handle implementation to:

    public void handle(Void v) {
      if (periodic) {
        if (!cancelled.get()) {
          handler.handle(timerID);
        }
      } else if (cancelled.compareAndSet(false, true)) {
        try {
          handler.handle(timerID);
        } finally {
          // Clean up after it's fired
          cleanupNonPeriodic();
        }
      }
    }

should fix the problem, I will come up with tests that reproduce this issue in the coming days.

@vietj
Copy link
Member

vietj commented Apr 17, 2019

I think this could even be simplified to remove the cancelled and instead rely on the timeout map, i.e cancel removes from the map and executing a non periodic will also remove from the map, so replacing the CAS by the timeouts.remove(timerID) != null expression will achieve the same effect

@vietj
Copy link
Member

vietj commented Apr 17, 2019

that being said, I think that even before this could have relied on the future.cancel(false) operation that returns a boolean saying whether or not the task was executed and avoid the race condition

@vietj
Copy link
Member

vietj commented Apr 17, 2019

in practice actually we cannot rely on the result if future.cancel(false) because this return the future and we actually measure the cancellation of the timer which might be a task executed on a worker thread

@vietj vietj closed this as completed in 614db39 Apr 18, 2019
vietj added a commit that referenced this issue Apr 18, 2019
… in a racy manner, instead use the existing timeouts map and rely on the timeouts map removal to grant ownership of the timer termination (cancellation or timeout). - fixes #2912
@matthewdallen
Copy link
Author

Looks good to me, thank you!

slinkydeveloper pushed a commit to slinkydeveloper/vert.x that referenced this issue May 20, 2019
… in a racy manner, instead use the existing timeouts map and rely on the timeouts map removal to grant ownership of the timer termination (cancellation or timeout). - fixes eclipse-vertx#2912
@nareshsoni02
Copy link

The API contract for Vertx.cancelTimer() appears to indicate that if it returns true, the timer has been successfully canceled. However, the timer handler is not properly synchronized with the timer flag, so the handler can see the cancelled flag is false and start the handler execution before the CAS in the cancel() method runs.

Currently, we have to do our own synchronization around timers to ensure that when a timer fires indicating a timeout, we are guaranteed the handler does not also execute. It would be better if InternalTimerHandler.handle() also did a cancelled.compareAndSet(false, true) and only executed the handler when the CAS is successful, ensuring that canceled timers are guaranteed not to execute their handlers.

Has this been fixed? We are facing similar issue.

@nareshsoni02
Copy link

Looks good to me, thank you!

Has this been fixed? We are facing similar issue.

@vietj
Copy link
Member

vietj commented Apr 26, 2020

@nareshsoni02 it should be fixed, if it's not then it's a bug

@nareshsoni02
Copy link

When i call Vertx.cancelTimer(id) it returns true. But even after that it run handler code 1 or 2 times. This happens under load testing.
As i can see it had been fixed in 3.7.1. We are using vert 3.8.
Not sure whats wrong here. I would really appreciate you, if can help or direct me.

@vietj
Copy link
Member

vietj commented Apr 27, 2020

can you provide a reproducer project ?

@vietj
Copy link
Member

vietj commented Apr 27, 2020

I think that the current code is incorrect and racy, do you have a reproducer ?

@vietj
Copy link
Member

vietj commented Apr 27, 2020

actually it seems correct but too complicated :-)

@vietj
Copy link
Member

vietj commented Apr 27, 2020

@nareshsoni02 if you can provide a reproducer it would be great . Otherwise the code itself is not complicated, the timer cancellation will return whether the timer was still in the timeouts map. Before the timer is executed, this map is probed with a remove, hence there should be no race:

  public boolean cancelTimer(long id) {
    InternalTimerHandler handler = timeouts.remove(id);
    if (handler != null) {
      handler.cancel();
      return true;
    } else {
      return false;
    }
  }
if (timeouts.remove(timerID) != null) {
   handler.handle(timerID);
}

As the timer can only be removed once, it is either executed or cancelled from any thread perspective as the timeouts map is a concurrent map that will not return true twice then removing the same key concurrently.

@nareshsoni02
Copy link

public Future<Task> pollForTaskCompletion(RoutingContext context, String taskId) {
    Future<Task> completedTaskFuture = Future.future();
    long timerId = vertx.setPeriodic(POLLING_INTERVAL, timer -> {
        Future<Task> taskFuture = service.getTask(taskId);
        taskFuture.setHandler(taskAsync -> {
            if(taskAsync.succeeded()){
                Task  task = taskAsync.result();
                completedTaskFuture.complete(task);
            } else{
                //Cancel task if more than polling time
                cancelTask();
            }
        });
    });

    Future<Task> timerCompletionFuture = Future.future();
    completedTaskFuture.setHandler(taskAsyncResult ->{
        vertx.cancleTimer(timerId);
        if(taskAsyncResult.succeeded()){
            timerCompletionFuture.complete(taskAsyncResult.result());
        } else{
            timerCompletionFuture.fail(taskAsyncResult.cause());
        }
    });

    return timerCompletionFuture;
}

@vietj
Copy link
Member

vietj commented Apr 27, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants