Skip to content

Conversation

jarvte
Copy link
Contributor

@jarvte jarvte commented Jun 4, 2019

Description

It's important in some cases to know whether the EventQueue::cancel succeeded. Now cancel method returns the boolean value to indicate did the cancel succeed.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[x] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm
@kivaisan

Release Notes

EventQueue::cancel now returns the boolean value to indicate whether the cancel was successful or not.

@ciarmcom ciarmcom requested review from kivaisan, kjbracey and a team June 4, 2019 07:00
@ciarmcom
Copy link
Member

ciarmcom commented Jun 4, 2019

@jarvte, thank you for your changes.
@kjbracey-arm @kivaisan @ARMmbed/mbed-os-test @ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

//
// If called while the event queue's dispatch loop is active, equeue_cancel
// does not guarantee that the event will not not execute after it returns as
// does not guarantee that the event will not execute after it returns as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs a bit of a rewrite now that we do return something. A return of true guarantees that it will not execute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment

* returns, as the event may have already begun executing.
*
* @param id Unique id of the event
* @return true if event was successfully cancelled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adjusting the final paragraph above - "If called while the event queue's dispatch loop is active in another thread, ...". A call made from the same thread as the dispatch loop can always succeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment

@kjbracey
Copy link
Contributor

kjbracey commented Jun 4, 2019

@geky, please take a look. This is required to fully fix #10684

void EventQueue::cancel(int id)
bool EventQueue::cancel(int id)
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool should return true or false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed to return true

{
struct equeue_chain_context *c = (struct equeue_chain_context *)p;
equeue_cancel(c->target, c->id);
(void)equeue_cancel(c->target, c->id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the cancellation would fail in this case? Is it possible, and should we stop then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure so I left it as it is, maybe @geky could help with this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure is ok in this case. If we don't cancel the old chain event the worst thing that happens is our chained equeue gets a spurious wakeup. But equeue handles spurious wakeups by always checking the time and going back to sleep if it's too early.

@jarvte
Copy link
Contributor Author

jarvte commented Jun 5, 2019

Force pushed review fixes

@jarvte jarvte force-pushed the eventqueue_cancel_return branch from de555af to f329c0b Compare June 5, 2019 06:32
@geky
Copy link
Contributor

geky commented Jun 6, 2019

Thanks for the PR @jarvte, sorry about the delay.

I think it would be better to use the event's destructor:
#10684 (comment)

Some of the reason for not returning an error is that it's deceptive and tricky. With multiple threads you can't rely on an error to mean that the event is still running.

Though the real reason is that I haven't seen a use case for cancel errors that isn't handled better by relying on the event's destructor. (Though IMO the destructor hook is a bit cleaner in C). But I'd be happy to be wrong.

@kjbracey
Copy link
Contributor

kjbracey commented Jun 6, 2019

With multiple threads you can't rely on an error to mean that the event is still running.

Can't you? Assuming the event has some synchronisation mechanism with the canceller (here a mutex), the canceller can know it has been queued but hasn't reached its synchronised section yet. So it either hasn't been dispatched yet, or is currently being dispatched and will soon reach its synchronised section.

If cancel returns an error, that means it wasn't found in the event queue, so we can conclude it is in the process of being dispatched, which in turn means it must be on another thread, so we can block this thread and wait for it to run.

Is there a flaw in that reasoning?

@geky
Copy link
Contributor

geky commented Jun 6, 2019

Assuming the event has some synchronisation mechanism with the canceller (here a mutex)

With additional synchronization you can.

But otherwise you can't be sure that the event hasn't already expired.

@kjbracey
Copy link
Contributor

kjbracey commented Jun 7, 2019

With additional synchronization you can.
But otherwise you can't be sure that the event hasn't already expired.

I think it's a fair assumption that anyone trying to cancel their events should have some synchronisation mechanism with the event itself. They must be sharing some sort of state, or they wouldn't be needing/wanting to cancel it, and caring whether it had expired or not, hence the should have some way of synchronising with it. (So probably a mutex, an atomic, or knowledge that they're in the same event queue).

In the example we're aiming for here, there is a mutex, and the routine clears the mutex-protected "event_queued" flag.

If we were to also clear in the destructor, the destructor would also need to take the mutex. But then we've got the problem that we've got a new state to worry about - between the routine and the destructor. If the routine doesn't clear the flag, we can have "event_queued" true, but the routine's already run. If the routine does clear the flag, then someone can queue another routine, but the destructor for the first clears it.

This is making my head hurt a bit. I would like to think about improving Event+EventQueue to forward better, to maybe make more complex objects with destructors usable as the functors, but I'm not sure borrowing the destructor is the answer for the routine synchronisation, at least in this example where we have to synchronise an immediate delete. If the whole thing was reference counted with shared pointers, it might be the answer.

@bulislaw
Copy link
Member

bulislaw commented Jul 2, 2019

That is a breaking change isn't it? So it should target major release.

@kjbracey
Copy link
Contributor

kjbracey commented Jul 2, 2019

Adding a return value to a void member function is only very slightly breaking.

Only failure I can see is breaking anyone passing it as a member function pointer somewhere. Would be pretty weird thing to do for this function, but I guess it's possible.

@geky
Copy link
Contributor

geky commented Jul 3, 2019

Hi, sorry for my terribly bad response times. If this is blocking something don't let me stop it.

I'm still believe using the equeue destructor is the correct solution. Cleaning up an event's state is what the destructor was designed for. It is less risky than relying on knowledge of when the event is alive.

If you need a temporary solution that doesn't create an API change, consider using the SharedPtr.

The unfortunate fact is this is the C++03 solution to lifetime problems until we adopt C++11. Just look the STL functions before emplace_back and friends.

If we were to also clear in the destructor, the destructor would also need to take the mutex. But then we've got the problem that we've got a new state to worry about - between the routine and the destructor. If the routine doesn't clear the flag, we can have "event_queued" true, but the routine's already run. If the routine does clear the flag, then someone can queue another routine, but the destructor for the first clears it.

Isn't the sharing of state what encapsulation solves?

Keep in mind the destructor runs if either 1. the event is canceled, or 2. the routine runs. So I don't believe you will run into the overlapping issue.

For #10684, I think you just need the destructor to set the _event_id to 0 in any situation.

@kjbracey
Copy link
Contributor

kjbracey commented Jul 4, 2019

I'm still believe using the equeue destructor is the correct solution. Cleaning up an event's state is what the destructor was designed for. It is less risky than relying on knowledge of when the event is alive.

It's pretty heavy though. There's an possible optimisation in Callback to disable storage of non-trivial types which I'm planning to expose as a config option, and that saves 800 bytes of ROM on a client build. (On top of another 400 bytes gained in case-by-case detection of triviality with type traits). Can currently enable it for low memory because nobody is storing anything other than (member) function pointers.

The unfortunate fact is this is the C++03 solution to lifetime problems until we adopt C++11. Just look the STL functions before emplace_back and friends.

Ah, well the handy thing there is we have adopted C++11. So if you can cook up a C++11 answer, I'm all for it.

Isn't the sharing of state what encapsulation solves?

The overlap I'm considering is for where the event is not encapsulating information, but is just a trigger to start work. (Most common case?)

For #10684, I think you just need the destructor to set the _event_id to 0 in any situation.

I can't figure out how to make that work. Failure scenario:

  • Thread A: Queue event. _event_id = 1
  • Thread B: start to dispatch event 1,
  • Thread B: event 1 runs, clears event_id
  • Thread A: Queue event, event_id = 2
  • Thread B: destroys event 1, sets event_id = 0
  • Thread A: Destruction - sees event_id = 0, does not cancel event 2
  • Thread B: Event 2 runs, crashes claiming deleted mutex.

That's assuming the event running clears the event ID inside its synchronisation logic. If it doesn't, then you get this failure scenario:

  • Thread A: Queue event. _event_id = 1
  • Thread B: start to dispatch event 1,
  • Thread B: event 1 runs,
  • Thread A: Has new work, sees event_id set, decides it doesn't need to queue
  • Thread B: destroys event 1, sets event_id = 0
  • New work never processed.

I'm sure there's some trick to make that work by putting logic into the destructor to retrigger a copy of itself, but this is getting incredibly complicated.

@kjbracey
Copy link
Contributor

kjbracey commented Jul 8, 2019

Another way of looking at it - we're not at all interested in the lifetime of the event object itself. It could linger on forever as far as we care. It's only a trigger. The only thing we care about is whether the event routine's synchronised section is due to run. And the event object itself can't determine that - only the event queue has that information available.

It seems like it would make sense to use the destructor of the event object to deal with any lifetime issues it has itself - eg destroying a shared pointer inside it. But we need synchronisation with the event queue and in the event routine.

Certainly you could make this work by using a shared (or weak) pointer in the event object, but that's forcing even more dynamic allocation into the design, which I want to avoid. In this scenario, ideally both the network interface and the event itself would be entirely static. At the minute we can at least have the interface be entirely static, and there's still potential to permit static events.

@geky
Copy link
Contributor

geky commented Jul 9, 2019

Oh ok, so if I understand correctly you're dealing with a chain of multiple events, where the thing you need to clean up persists across the chain?

This does actually seem like a case where using reference counting would protect everything quite nicely, but I'll give you that it does get more complicated.

Lets go with adding the boolean value to cancel if it makes life easier. I don't have that strong of an argument against it. As long as it comes in the the normal API change runway in Mbed OS's release schedule (minor release?), public API changes shouldn't be easy.


Ah, well the handy thing there is we have adopted C++11. So if you can cook up a C++11 answer, I'm all for it.

Then you just need to modify EventQueue to do perfect forwarding :)

@kjbracey
Copy link
Contributor

kjbracey commented Jul 9, 2019

Oh ok, so if I understand correctly you're dealing with a chain of multiple events, where the thing you need to clean up persists across the chain?

Well, it's even simpler than that, really. The event is just a singleton trigger from "arbitrary thread context" to start "work on event queue". It contains no information, it's just the queued request to "start work".

The model (which is very typical in various Mbed components ) is

* arbitary context (eg `NetworkSocket::sigio`) - queue event if "event queued flag" clear 
* event queue - clear "event queued flag", process work

Notionally, this model would settle for a totally static handler routine and a squashable "event flag" type trigger - we don't really need to queue an object in the event queue.

We need to cancel in the destructor of the subsystem - the destructor has to (synchronously) cancel any pending events. (Or otherwise ensure they'd be no-ops not touching our destroyed object).

This does actually seem like a case where using reference counting would protect everything quite nicely, but I'll give you that it does get more complicated.

Yes, if the "event queued flag" and a "cancel flag" were actually accessed via a shared pointer, rather than being embedded in the subsystem object, and that shared pointer was passed in the event object (as weak?), then it would work, but I'd rather avoid the dynamic allocation (and the need to explain the mechanism).

Then you just need to modify EventQueue to do perfect forwarding :)

I don't think perfect forwarding is what we're thinking about here - we have to capture.

But we could upgrade to move capture rather than copy. (But I note std::function itself always copy-captures its FunctionObject - it doesn't offer move capture)

@geky geky mentioned this pull request Jul 21, 2019
@geky
Copy link
Contributor

geky commented Jul 23, 2019

We need to cancel in the destructor of the subsystem - the destructor has to (synchronously) cancel any pending events. (Or otherwise ensure they'd be no-ops not touching our destroyed object).

But if the events are not chained, does the class need to know whether or not canceling the events succeeded? (well, it knows no events will run after cancel).

Yes, if the "event queued flag" and a "cancel flag" were actually accessed via a shared pointer, rather than being embedded in the subsystem object, and that shared pointer was passed in the event object (as weak?), then it would work, but I'd rather avoid the dynamic allocation (and the need to explain the mechanism).

I'm not sure how dynamic memory got involved. If the flags are static/class allocated, then in theory so can the reference count + pointer.

I don't think perfect forwarding is what we're thinking about here - we have to capture.

Both work, from what I understand. With perfect forwarding you can construct the object directly in the event queue without needing a copy.


I think this PR is good to go 👍

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 20, 2019

@jarvte Please rebase. Shall we move this forward now, as we are getting close to 5.14

@kjbracey-arm Can you approve or not?

@jarvte jarvte force-pushed the eventqueue_cancel_return branch from f329c0b to 719117e Compare August 20, 2019 12:57
@jarvte
Copy link
Contributor Author

jarvte commented Aug 20, 2019

@0xc0170 rebase done

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 21, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Aug 21, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 1
Build artifacts

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 26, 2019

Adding a return value to a void member function is only very slightly breaking.

Only failure I can see is breaking anyone passing it as a member function pointer somewhere. Would be pretty weird thing to do for this function, but I guess it's possible.

@bulislaw Based on the decryption above, is this going in for 5.14 ?

@bulislaw
Copy link
Member

Yeah, I think it's ok to go as is to minor.

@0xc0170 0xc0170 merged commit 6add979 into ARMmbed:master Aug 27, 2019
@jarvte jarvte deleted the eventqueue_cancel_return branch January 17, 2020 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants