Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: make Context::alloc_ongoing return a guard #4248

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

flub
Copy link
Member

@flub flub commented Mar 30, 2023

This guard can be awaited and will resolve when Context::stop_ongoing
is called, i.e. the ongoing process is cancelled. The guard will also
free the ongoing process when dropped, making it RAII and easier to
not accidentally free the ongoing process.

#skip-changelog

This guard can be awaited and will resolve when Context::stop_ongoing
is called, i.e. the ongoing process is cancelled.  The guard will also
free the ongoing process when dropped, making it RAII and easier to
not accidentally free the ongoing process.
This is now handled better by the Drop from the OngoingGuard returned
by Context::alloc_ongoing.
@flub flub requested review from link2xt and iequidoo March 30, 2023 09:03
@flub
Copy link
Member Author

flub commented Mar 30, 2023

This also found a real bug: #4249

@@ -536,21 +553,24 @@ impl Context {
/// Signal an ongoing process to stop.
pub async fn stop_ongoing(&self) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cancel_ongoing()? Because it doesn't actually wait for the ongoing process to stop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that cancel_ongoing() would be a nicer name. I guess I could change that as well. I didn't touch it because it is a pub API though and I feel like if we try and rename it maybe it should also be renamed in the FFI and JSON-RPC APIs and now we're breaking all clients. That seemed like a bit too much work for a slightly unfortunate name.

RunningState::ShallStop | RunningState::Stopped => {
// Put back the current state
*s = current_state;
info!(self, "No ongoing process to stop.",);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log correct for ShallStop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind off, yes. The ongoing process is already requested to stop so there's no ongoing process to stop. 🤷

@flub
Copy link
Member Author

flub commented Apr 4, 2023

I am completely stumped by the test failure since merging master... it's not a flaky failure.

@link2xt
Copy link
Collaborator

link2xt commented Oct 9, 2023

@flub Could you rebase it or redo it on top of the current stable? I am going through oldest PRs and this one seems to be stale.

@flub
Copy link
Member Author

flub commented Oct 9, 2023

oh wow, i completely lost sight of this. yes, i'll update it

@link2xt link2xt deleted the branch main October 25, 2023 21:22
@link2xt link2xt closed this Oct 25, 2023
@link2xt link2xt reopened this Oct 25, 2023
@iequidoo
Copy link
Collaborator

@flub Could you rebase it or redo it on top of the current stable? I am going through oldest PRs and this one seems to be stale.

I guess now this PR also should go to main instead.

@link2xt
Copy link
Collaborator

link2xt commented Dec 17, 2023

Yes, master is outdated, currently "292 commits behind main".

@link2xt link2xt changed the base branch from master to main December 17, 2023 13:27
@flub
Copy link
Member Author

flub commented Dec 17, 2023

I looked a bit at the failures now. And while I could fix them I'm a bit unsure how to proceed:

The failures occurring are because async drop is difficult. So this PR works around that by spawning a task when the drop guard is created and the drop guard itself only sends a message to that task to do run the drop impl.

The major problem with this approach is that drop is no longer deterministic. It could run any other time. And this is what makes the tests falky now: sometimes drop has run, sometimes it hasn't and the context is not yet freed.

One could argue this is a matter of more synchronisation: I can add code that sends a signal when drop is finished and we can await until the mutex is truly freed. However this makes using it again brittle: because it is easy enough to write code that may suddenly need to rely on this signal, but nothing forces you to do it. And often you'll only realise with difficult to track bugs. This kind of was the entire point of the drop guard: to make it easier to write correct code. But this drop guard impl does not achieve that because it moves some other subtle synchronisation issues to the user.

So really, the only way to make this drop guard work is to really release the mutex in the sync Drop code. Maybe that's doable somehow, but it's so easy.

@iequidoo
Copy link
Collaborator

I looked a bit at the failures now. And while I could fix them I'm a bit unsure how to proceed:

The failures occurring are because async drop is difficult. So this PR works around that by spawning a task when the drop guard is created and the drop guard itself only sends a message to that task to do run the drop impl.

The major problem with this approach is that drop is no longer deterministic. It could run any other time. And this is what makes the tests falky now: sometimes drop has run, sometimes it hasn't and the context is not yet freed.

One could argue this is a matter of more synchronisation: I can add code that sends a signal when drop is finished and we can await until the mutex is truly freed. However this makes using it again brittle: because it is easy enough to write code that may suddenly need to rely on this signal, but nothing forces you to do it. And often you'll only realise with difficult to track bugs. This kind of was the entire point of the drop guard: to make it easier to write correct code. But this drop guard impl does not achieve that because it moves some other subtle synchronisation issues to the user.

So really, the only way to make this drop guard work is to really release the mutex in the sync Drop code. Maybe that's doable somehow, but it's so easy.

Btw, the same problem as i had in accounts.rs:Config::create_lock_task(). I think this can be solved by introducing a Stopping state. So, if it's Stopping, alloc_ongoing() should wait until it stops, but if it's Running, it must fail as now. Btw, u already have ShallStop, maybe it fits?

@@ -250,7 +252,7 @@ pub struct InnerContext {
#[derive(Debug)]
enum RunningState {
/// Ongoing process is allocated.
Running { cancel_sender: Sender<()> },
Running { cancel_sender: oneshot::Sender<()> },

/// Cancel signal has been sent, waiting for ongoing process to be freed.
ShallStop { request: Instant },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure we want to use Instant anywhere because it may freeze during the deep sleep on some systems e.g. Android. In #5108 i'm removing it

@link2xt link2xt force-pushed the main branch 2 times, most recently from 1abb12e to 2af9ff1 Compare March 4, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants