Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed EventQueue::cancel to return boolean value #10750

Merged
merged 1 commit into from Aug 27, 2019

Conversation

@jarvte
Copy link
Contributor

commented Jun 4, 2019

Description

It's important in some cases to know whether the EventQueue::cancel succeeded. Now cancel method returns the boolean value to indicate did the cancel succeed.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[x] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm
@kivaisan

Release Notes

EventQueue::cancel now returns the boolean value to indicate whether the cancel was successful or not.

@ciarmcom ciarmcom requested review from kivaisan, kjbracey-arm and ARMmbed/mbed-os-maintainers Jun 4, 2019
@ciarmcom

This comment has been minimized.

Copy link
Member

commented Jun 4, 2019

@@ -183,9 +183,9 @@ int equeue_post(equeue_t *queue, void (*cb)(void *), void *event);
// The equeue_cancel function is irq safe.
//
// If called while the event queue's dispatch loop is active, equeue_cancel
// does not guarantee that the event will not not execute after it returns as
// does not guarantee that the event will not execute after it returns as

This comment has been minimized.

Copy link
@kjbracey-arm

kjbracey-arm Jun 4, 2019

Contributor

This comment needs a bit of a rewrite now that we do return something. A return of true guarantees that it will not execute.

This comment has been minimized.

Copy link
@jarvte

jarvte Jun 5, 2019

Author Contributor

updated comment

@@ -127,8 +127,10 @@ class EventQueue : private mbed::NonCopyable<EventQueue> {
* returns, as the event may have already begun executing.
*
* @param id Unique id of the event
* @return true if event was successfully cancelled

This comment has been minimized.

Copy link
@kjbracey-arm

kjbracey-arm Jun 4, 2019

Contributor

It may be worth adjusting the final paragraph above - "If called while the event queue's dispatch loop is active in another thread, ...". A call made from the same thread as the dispatch loop can always succeed.

This comment has been minimized.

Copy link
@jarvte

jarvte Jun 5, 2019

Author Contributor

updated comment

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2019

@geky, please take a look. This is required to fully fix #10684

@@ -47,7 +47,7 @@ unsigned EventQueue::tick()
return EventQueue_stub::unsigned_value;
}

void EventQueue::cancel(int id)
bool EventQueue::cancel(int id)
{
}

This comment has been minimized.

Copy link
@SeppoTakalo

SeppoTakalo Jun 4, 2019

Contributor

bool should return true or false

This comment has been minimized.

Copy link
@jarvte

jarvte Jun 5, 2019

Author Contributor

fixed to return true

@@ -602,7 +605,7 @@ static void equeue_chain_dispatch(void *p)
static void equeue_chain_update(void *p, int ms)
{
struct equeue_chain_context *c = (struct equeue_chain_context *)p;
equeue_cancel(c->target, c->id);
(void)equeue_cancel(c->target, c->id);

This comment has been minimized.

Copy link
@SeppoTakalo

SeppoTakalo Jun 4, 2019

Contributor

What happens if the cancellation would fail in this case? Is it possible, and should we stop then?

This comment has been minimized.

Copy link
@jarvte

jarvte Jun 5, 2019

Author Contributor

I'm not sure so I left it as it is, maybe @geky could help with this one?

This comment has been minimized.

Copy link
@geky

geky Jun 5, 2019

Member

Failure is ok in this case. If we don't cancel the old chain event the worst thing that happens is our chained equeue gets a spurious wakeup. But equeue handles spurious wakeups by always checking the time and going back to sleep if it's too early.

@0xc0170 0xc0170 requested a review from geky Jun 4, 2019
@jarvte jarvte force-pushed the jarvte:eventqueue_cancel_return branch from 7b896f1 to de555af Jun 5, 2019
@jarvte

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

Force pushed review fixes

@jarvte jarvte force-pushed the jarvte:eventqueue_cancel_return branch from de555af to f329c0b Jun 5, 2019
@geky

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Thanks for the PR @jarvte, sorry about the delay.

I think it would be better to use the event's destructor:
#10684 (comment)

Some of the reason for not returning an error is that it's deceptive and tricky. With multiple threads you can't rely on an error to mean that the event is still running.

Though the real reason is that I haven't seen a use case for cancel errors that isn't handled better by relying on the event's destructor. (Though IMO the destructor hook is a bit cleaner in C). But I'd be happy to be wrong.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

With multiple threads you can't rely on an error to mean that the event is still running.

Can't you? Assuming the event has some synchronisation mechanism with the canceller (here a mutex), the canceller can know it has been queued but hasn't reached its synchronised section yet. So it either hasn't been dispatched yet, or is currently being dispatched and will soon reach its synchronised section.

If cancel returns an error, that means it wasn't found in the event queue, so we can conclude it is in the process of being dispatched, which in turn means it must be on another thread, so we can block this thread and wait for it to run.

Is there a flaw in that reasoning?

@geky

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Assuming the event has some synchronisation mechanism with the canceller (here a mutex)

With additional synchronization you can.

But otherwise you can't be sure that the event hasn't already expired.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

With additional synchronization you can.
But otherwise you can't be sure that the event hasn't already expired.

I think it's a fair assumption that anyone trying to cancel their events should have some synchronisation mechanism with the event itself. They must be sharing some sort of state, or they wouldn't be needing/wanting to cancel it, and caring whether it had expired or not, hence the should have some way of synchronising with it. (So probably a mutex, an atomic, or knowledge that they're in the same event queue).

In the example we're aiming for here, there is a mutex, and the routine clears the mutex-protected "event_queued" flag.

If we were to also clear in the destructor, the destructor would also need to take the mutex. But then we've got the problem that we've got a new state to worry about - between the routine and the destructor. If the routine doesn't clear the flag, we can have "event_queued" true, but the routine's already run. If the routine does clear the flag, then someone can queue another routine, but the destructor for the first clears it.

This is making my head hurt a bit. I would like to think about improving Event+EventQueue to forward better, to maybe make more complex objects with destructors usable as the functors, but I'm not sure borrowing the destructor is the answer for the routine synchronisation, at least in this example where we have to synchronise an immediate delete. If the whole thing was reference counted with shared pointers, it might be the answer.

@bulislaw

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

That is a breaking change isn't it? So it should target major release.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

Adding a return value to a void member function is only very slightly breaking.

Only failure I can see is breaking anyone passing it as a member function pointer somewhere. Would be pretty weird thing to do for this function, but I guess it's possible.

@geky

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

Hi, sorry for my terribly bad response times. If this is blocking something don't let me stop it.

I'm still believe using the equeue destructor is the correct solution. Cleaning up an event's state is what the destructor was designed for. It is less risky than relying on knowledge of when the event is alive.

If you need a temporary solution that doesn't create an API change, consider using the SharedPtr.

The unfortunate fact is this is the C++03 solution to lifetime problems until we adopt C++11. Just look the STL functions before emplace_back and friends.

If we were to also clear in the destructor, the destructor would also need to take the mutex. But then we've got the problem that we've got a new state to worry about - between the routine and the destructor. If the routine doesn't clear the flag, we can have "event_queued" true, but the routine's already run. If the routine does clear the flag, then someone can queue another routine, but the destructor for the first clears it.

Isn't the sharing of state what encapsulation solves?

Keep in mind the destructor runs if either 1. the event is canceled, or 2. the routine runs. So I don't believe you will run into the overlapping issue.

For #10684, I think you just need the destructor to set the _event_id to 0 in any situation.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

I'm still believe using the equeue destructor is the correct solution. Cleaning up an event's state is what the destructor was designed for. It is less risky than relying on knowledge of when the event is alive.

It's pretty heavy though. There's an possible optimisation in Callback to disable storage of non-trivial types which I'm planning to expose as a config option, and that saves 800 bytes of ROM on a client build. (On top of another 400 bytes gained in case-by-case detection of triviality with type traits). Can currently enable it for low memory because nobody is storing anything other than (member) function pointers.

The unfortunate fact is this is the C++03 solution to lifetime problems until we adopt C++11. Just look the STL functions before emplace_back and friends.

Ah, well the handy thing there is we have adopted C++11. So if you can cook up a C++11 answer, I'm all for it.

Isn't the sharing of state what encapsulation solves?

The overlap I'm considering is for where the event is not encapsulating information, but is just a trigger to start work. (Most common case?)

For #10684, I think you just need the destructor to set the _event_id to 0 in any situation.

I can't figure out how to make that work. Failure scenario:

  • Thread A: Queue event. _event_id = 1
  • Thread B: start to dispatch event 1,
  • Thread B: event 1 runs, clears event_id
  • Thread A: Queue event, event_id = 2
  • Thread B: destroys event 1, sets event_id = 0
  • Thread A: Destruction - sees event_id = 0, does not cancel event 2
  • Thread B: Event 2 runs, crashes claiming deleted mutex.

That's assuming the event running clears the event ID inside its synchronisation logic. If it doesn't, then you get this failure scenario:

  • Thread A: Queue event. _event_id = 1
  • Thread B: start to dispatch event 1,
  • Thread B: event 1 runs,
  • Thread A: Has new work, sees event_id set, decides it doesn't need to queue
  • Thread B: destroys event 1, sets event_id = 0
  • New work never processed.

I'm sure there's some trick to make that work by putting logic into the destructor to retrigger a copy of itself, but this is getting incredibly complicated.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

Another way of looking at it - we're not at all interested in the lifetime of the event object itself. It could linger on forever as far as we care. It's only a trigger. The only thing we care about is whether the event routine's synchronised section is due to run. And the event object itself can't determine that - only the event queue has that information available.

It seems like it would make sense to use the destructor of the event object to deal with any lifetime issues it has itself - eg destroying a shared pointer inside it. But we need synchronisation with the event queue and in the event routine.

Certainly you could make this work by using a shared (or weak) pointer in the event object, but that's forcing even more dynamic allocation into the design, which I want to avoid. In this scenario, ideally both the network interface and the event itself would be entirely static. At the minute we can at least have the interface be entirely static, and there's still potential to permit static events.

@geky

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

Oh ok, so if I understand correctly you're dealing with a chain of multiple events, where the thing you need to clean up persists across the chain?

This does actually seem like a case where using reference counting would protect everything quite nicely, but I'll give you that it does get more complicated.

Lets go with adding the boolean value to cancel if it makes life easier. I don't have that strong of an argument against it. As long as it comes in the the normal API change runway in Mbed OS's release schedule (minor release?), public API changes shouldn't be easy.


Ah, well the handy thing there is we have adopted C++11. So if you can cook up a C++11 answer, I'm all for it.

Then you just need to modify EventQueue to do perfect forwarding :)

@geky
geky approved these changes Jul 9, 2019
@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

Oh ok, so if I understand correctly you're dealing with a chain of multiple events, where the thing you need to clean up persists across the chain?

Well, it's even simpler than that, really. The event is just a singleton trigger from "arbitrary thread context" to start "work on event queue". It contains no information, it's just the queued request to "start work".

The model (which is very typical in various Mbed components ) is

* arbitary context (eg `NetworkSocket::sigio`) - queue event if "event queued flag" clear 
* event queue - clear "event queued flag", process work

Notionally, this model would settle for a totally static handler routine and a squashable "event flag" type trigger - we don't really need to queue an object in the event queue.

We need to cancel in the destructor of the subsystem - the destructor has to (synchronously) cancel any pending events. (Or otherwise ensure they'd be no-ops not touching our destroyed object).

This does actually seem like a case where using reference counting would protect everything quite nicely, but I'll give you that it does get more complicated.

Yes, if the "event queued flag" and a "cancel flag" were actually accessed via a shared pointer, rather than being embedded in the subsystem object, and that shared pointer was passed in the event object (as weak?), then it would work, but I'd rather avoid the dynamic allocation (and the need to explain the mechanism).

Then you just need to modify EventQueue to do perfect forwarding :)

I don't think perfect forwarding is what we're thinking about here - we have to capture.

But we could upgrade to move capture rather than copy. (But I note std::function itself always copy-captures its FunctionObject - it doesn't offer move capture)

@geky geky referenced this pull request Jul 21, 2019
@geky

This comment has been minimized.

Copy link
Member

commented Jul 23, 2019

We need to cancel in the destructor of the subsystem - the destructor has to (synchronously) cancel any pending events. (Or otherwise ensure they'd be no-ops not touching our destroyed object).

But if the events are not chained, does the class need to know whether or not canceling the events succeeded? (well, it knows no events will run after cancel).

Yes, if the "event queued flag" and a "cancel flag" were actually accessed via a shared pointer, rather than being embedded in the subsystem object, and that shared pointer was passed in the event object (as weak?), then it would work, but I'd rather avoid the dynamic allocation (and the need to explain the mechanism).

I'm not sure how dynamic memory got involved. If the flags are static/class allocated, then in theory so can the reference count + pointer.

I don't think perfect forwarding is what we're thinking about here - we have to capture.

Both work, from what I understand. With perfect forwarding you can construct the object directly in the event queue without needing a copy.


I think this PR is good to go 👍

@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

@jarvte Please rebase. Shall we move this forward now, as we are getting close to 5.14

@kjbracey-arm Can you approve or not?

@jarvte jarvte force-pushed the jarvte:eventqueue_cancel_return branch from f329c0b to 719117e Aug 20, 2019
@jarvte

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2019

@0xc0170 rebase done

@0xc0170 0xc0170 added needs: CI and removed needs: work labels Aug 20, 2019
@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

CI started

@mbed-ci

This comment has been minimized.

Copy link

commented Aug 21, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 1
Build artifacts

@0xc0170 0xc0170 added needs: review and removed needs: CI labels Aug 21, 2019
@0xc0170 0xc0170 requested a review from bulislaw Aug 21, 2019
@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

Adding a return value to a void member function is only very slightly breaking.

Only failure I can see is breaking anyone passing it as a member function pointer somewhere. Would be pretty weird thing to do for this function, but I guess it's possible.

@bulislaw Based on the decryption above, is this going in for 5.14 ?

@bulislaw

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

Yeah, I think it's ok to go as is to minor.

@0xc0170 0xc0170 merged commit 6add979 into ARMmbed:master Aug 27, 2019
25 checks passed
25 checks passed
continuous-integration/jenkins/pr-head This commit looks good
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage RTOS ROM(+0 bytes) RAM(+0 bytes)
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Success!
Details
travis-ci/docs Success!
Details
travis-ci/doxy-spellcheck Success!
Details
travis-ci/events Success! Runtime is 8617 cycles.
Details
travis-ci/gitattributestest Success!
Details
travis-ci/include_check Success!
Details
travis-ci/licence_check Success!
Details
travis-ci/littlefs Success! Code size is 8464B.
Details
travis-ci/psa-autogen Success!
Details
travis-ci/tools-py2.7 Success!
Details
travis-ci/tools-py3.5 Success!
Details
travis-ci/tools-py3.6 Success!
Details
travis-ci/tools-py3.7 Success!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.