ref: make Context::alloc_ongoing return a guard #4248

flub · 2023-03-30T08:54:25Z

This guard can be awaited and will resolve when Context::stop_ongoing
is called, i.e. the ongoing process is cancelled. The guard will also
free the ongoing process when dropped, making it RAII and easier to
not accidentally free the ongoing process.

#skip-changelog

This guard can be awaited and will resolve when Context::stop_ongoing is called, i.e. the ongoing process is cancelled. The guard will also free the ongoing process when dropped, making it RAII and easier to not accidentally free the ongoing process.

This is now handled better by the Drop from the OngoingGuard returned by Context::alloc_ongoing.

flub · 2023-03-30T09:04:11Z

This also found a real bug: #4249

iequidoo · 2023-03-30T13:51:23Z

src/context.rs

@@ -536,21 +553,24 @@ impl Context {
    /// Signal an ongoing process to stop.
    pub async fn stop_ongoing(&self) {


Maybe cancel_ongoing()? Because it doesn't actually wait for the ongoing process to stop

Yeah, I agree that cancel_ongoing() would be a nicer name. I guess I could change that as well. I didn't touch it because it is a pub API though and I feel like if we try and rename it maybe it should also be renamed in the FFI and JSON-RPC APIs and now we're breaking all clients. That seemed like a bit too much work for a slightly unfortunate name.

iequidoo · 2023-03-30T13:52:44Z

src/context.rs

            RunningState::ShallStop | RunningState::Stopped => {
+                // Put back the current state
+                *s = current_state;
                info!(self, "No ongoing process to stop.",);


Is this log correct for ShallStop?

Kind off, yes. The ongoing process is already requested to stop so there's no ongoing process to stop. 🤷

flub · 2023-04-04T13:00:30Z

I am completely stumped by the test failure since merging master... it's not a flaky failure.

link2xt · 2023-10-09T16:02:14Z

@flub Could you rebase it or redo it on top of the current stable? I am going through oldest PRs and this one seems to be stale.

flub · 2023-10-09T18:32:31Z

oh wow, i completely lost sight of this. yes, i'll update it

iequidoo · 2023-12-16T18:25:09Z

@flub Could you rebase it or redo it on top of the current stable? I am going through oldest PRs and this one seems to be stale.

I guess now this PR also should go to main instead.

link2xt · 2023-12-17T13:27:27Z

Yes, master is outdated, currently "292 commits behind main".

flub · 2023-12-17T16:27:20Z

I looked a bit at the failures now. And while I could fix them I'm a bit unsure how to proceed:

The failures occurring are because async drop is difficult. So this PR works around that by spawning a task when the drop guard is created and the drop guard itself only sends a message to that task to do run the drop impl.

The major problem with this approach is that drop is no longer deterministic. It could run any other time. And this is what makes the tests falky now: sometimes drop has run, sometimes it hasn't and the context is not yet freed.

One could argue this is a matter of more synchronisation: I can add code that sends a signal when drop is finished and we can await until the mutex is truly freed. However this makes using it again brittle: because it is easy enough to write code that may suddenly need to rely on this signal, but nothing forces you to do it. And often you'll only realise with difficult to track bugs. This kind of was the entire point of the drop guard: to make it easier to write correct code. But this drop guard impl does not achieve that because it moves some other subtle synchronisation issues to the user.

So really, the only way to make this drop guard work is to really release the mutex in the sync Drop code. Maybe that's doable somehow, but it's so easy.

iequidoo · 2023-12-17T19:12:51Z

I looked a bit at the failures now. And while I could fix them I'm a bit unsure how to proceed:

The failures occurring are because async drop is difficult. So this PR works around that by spawning a task when the drop guard is created and the drop guard itself only sends a message to that task to do run the drop impl.

The major problem with this approach is that drop is no longer deterministic. It could run any other time. And this is what makes the tests falky now: sometimes drop has run, sometimes it hasn't and the context is not yet freed.

One could argue this is a matter of more synchronisation: I can add code that sends a signal when drop is finished and we can await until the mutex is truly freed. However this makes using it again brittle: because it is easy enough to write code that may suddenly need to rely on this signal, but nothing forces you to do it. And often you'll only realise with difficult to track bugs. This kind of was the entire point of the drop guard: to make it easier to write correct code. But this drop guard impl does not achieve that because it moves some other subtle synchronisation issues to the user.

So really, the only way to make this drop guard work is to really release the mutex in the sync Drop code. Maybe that's doable somehow, but it's so easy.

Btw, the same problem as i had in accounts.rs:Config::create_lock_task(). I think this can be solved by introducing a Stopping state. So, if it's Stopping, alloc_ongoing() should wait until it stops, but if it's Running, it must fail as now. Btw, u already have ShallStop, maybe it fits?

iequidoo · 2023-12-17T18:51:52Z

src/context.rs

@@ -250,7 +252,7 @@ pub struct InnerContext {
 #[derive(Debug)]
 enum RunningState {
    /// Ongoing process is allocated.
-    Running { cancel_sender: Sender<()> },
+    Running { cancel_sender: oneshot::Sender<()> },

    /// Cancel signal has been sent, waiting for ongoing process to be freed.
    ShallStop { request: Instant },


I'm unsure we want to use Instant anywhere because it may freeze during the deep sleep on some systems e.g. Android. In #5108 i'm removing it

flub added 2 commits March 30, 2023 10:46

Remove Context::free_ongoing function

201d05d

This is now handled better by the Drop from the OngoingGuard returned by Context::alloc_ongoing.

flub requested review from link2xt and iequidoo March 30, 2023 09:03

iequidoo reviewed Mar 30, 2023

View reviewed changes

iequidoo mentioned this pull request Mar 30, 2023

fix(imex): transfer::get_backup must always free ongoing process #4249

Merged

flub added 2 commits April 4, 2023 13:03

Merge branch 'master' into flub/ongoing-guard

32629b9

typo

61b00f9

Merge branch 'master' into flub/ongoing-guard

b7edd4e

link2xt force-pushed the master branch from cfacebb to 514074d Compare April 22, 2023 16:43

link2xt force-pushed the master branch from a1cc433 to 5b435d1 Compare June 1, 2023 12:31

link2xt deleted the branch main October 25, 2023 21:22

link2xt closed this Oct 25, 2023

link2xt reopened this Oct 25, 2023

flub added 2 commits December 16, 2023 16:29

Merge branch 'master' into flub/ongoing-guard

4aa248d

Re-add info message using elapsed stopping time

b9fd529

link2xt changed the base branch from master to main December 17, 2023 13:27

iequidoo reviewed Dec 17, 2023

View reviewed changes

link2xt force-pushed the main branch 2 times, most recently from 1abb12e to 2af9ff1 Compare March 4, 2024 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref: make Context::alloc_ongoing return a guard #4248

ref: make Context::alloc_ongoing return a guard #4248

flub commented Mar 30, 2023

flub commented Mar 30, 2023

iequidoo Mar 30, 2023

flub Mar 30, 2023

iequidoo Mar 30, 2023

flub Mar 30, 2023

flub commented Apr 4, 2023

link2xt commented Oct 9, 2023

flub commented Oct 9, 2023

iequidoo commented Dec 16, 2023

link2xt commented Dec 17, 2023

flub commented Dec 17, 2023

iequidoo commented Dec 17, 2023

iequidoo Dec 17, 2023

		@@ -536,21 +553,24 @@ impl Context {
		/// Signal an ongoing process to stop.
		pub async fn stop_ongoing(&self) {

ref: make Context::alloc_ongoing return a guard #4248

Are you sure you want to change the base?

ref: make Context::alloc_ongoing return a guard #4248

Conversation

flub commented Mar 30, 2023

flub commented Mar 30, 2023

iequidoo Mar 30, 2023

Choose a reason for hiding this comment

flub Mar 30, 2023

Choose a reason for hiding this comment

iequidoo Mar 30, 2023

Choose a reason for hiding this comment

flub Mar 30, 2023

Choose a reason for hiding this comment

flub commented Apr 4, 2023

link2xt commented Oct 9, 2023

flub commented Oct 9, 2023

iequidoo commented Dec 16, 2023

link2xt commented Dec 17, 2023

flub commented Dec 17, 2023

iequidoo commented Dec 17, 2023

iequidoo Dec 17, 2023

Choose a reason for hiding this comment