Skip to content

Create Monitor to support js-contexts @monitored decorator #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 15, 2023

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Jun 5, 2023

Description

I want to create a Monitor class that resembles how DBTransaction works by creating a "re-entrant" object that can be used to lock things.

Combined with a @monitored decorator in js-contexts, we can enable ContextMonitored that enables declarative mutual exclusion contexts for arbitrary concurrent state mutation.

This can be useful for js-quic and other projects which require concurrency control but don't have any use of js-db which is the only thing that provides this construct.

This is a proper realisation of the js-async-monitor idea first explained here: MatrixAI/js-async-monitor#1 and obviates that project, since you wouldn't need it.

Issues Fixed

Tasks

  • 1. Create a Monitor class that resembles DBTransaction - 0.5 day
  • 2. Integrate timer and cancellability into the standard locks
    • - Apply ContextTimedInput to Lock
    • - Apply ContextTimedInput to Semaphore
    • - Apply ContextTimedInput to Barrier
    • - Apply ContextTimedInput to RWLockWriter
    • - Apply ContextTimedInput to RWLockReader
    • - Apply ContextTimedInput to LockBox
  • 3. Test Monitor similar to LockBox - 0.5 day
  • 4. Test Monitor deadlock detection, cancellation and time expiry - 0.5 day
  • 5. Update project framework to use SWC to test

Minimal testing will be done for RWLockReader as this is not something we use in PK, but it was there for completeness.

These 2 tasks will be part of js-contexts:

  • 1. Install this into js-contexts
  • 2. Test out @monitored decorator and monitored HOF in js-contexts

Should go into the js-contexts PR.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@ghost
Copy link

ghost commented Jun 5, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@CMCDragonkai
Copy link
Member Author

Should the Monitor implement the Lockable interface? It seems that it would suit it. Right now DBTransaction doesn't bother with that, it provides a more convenient method called lock which can be used directly with strings like tran.lock('a', 'b').

Right now LockBox which does implement Lockable, it had to also provide a separate function called lockMulti that exposed the array of resource acquisitions which was then used by DBTransaction.

The DBTransaction.lock however returned Promise<void> probably because it doesn't expect the resource to be combined with anything else. I think it should be possible to do this.

@CMCDragonkai
Copy link
Member Author

As a minimal prototype, I've extracted the logic out of DBTransaction into Monitor. The implementation is surprisingly thin. It was easy to extract this factor out.

However it of course doesn't fit the Lockable interface... and I wonder if it makes sense to do so...

We shall see.

@CMCDragonkai CMCDragonkai self-assigned this Jun 5, 2023
@CMCDragonkai
Copy link
Member Author

So it turns out that js-contexts doesn't actually use js-async-locks at all.

[nix-shell:~/Projects/js-contexts]$ npm list --depth=4 | gr
ep @matrixai
@matrixai/contexts@1.0.0 /home/cmcdragonkai/Projects/js-contexts
├── @matrixai/async-cancellable@1.0.6
├─┬ @matrixai/errors@1.1.7
├─┬ @matrixai/timer@1.1.0
│ └── @matrixai/async-cancellable@1.0.6 deduped

So we have an opportunity to either bring in the js-contexts into js-async-locks, or to bring js-async-locks into js-contexts.

I think in this case it makes more sense for js-async-locks to be a dependency of js-contexts.

@CMCDragonkai
Copy link
Member Author

I integrated ContextTimed into the Lock, and now we have a problem.

The problem is that currently due to DirtyHairy/async-mutex#64, when we attempt to race between lock.acquire() and say an abortP, there's no way to cancel only a specific pending lock.

That means in this example:

  const lock = new Lock();
  const [release] = await lock.lock()();

  // Ok now so this should fail after 500 ms
  try {
    const [release2] = await lock.lock({
      timer: new Timer({ delay: 500 })
    })();
  } catch (e) {
    console.log('TIMED OUT', e instanceof errors.ErrorAsyncLocksTimeout);
  }
  await release();
  console.log(lock.isLocked());

We will end up with true at the end, even though the original lock awaiter has failed due to timing out after 500ms.

A proper solution is to ensure that we can cancel just a specific pending lock.

If we use lock.cancel() it cancels ALL pending locks which is not what we want.

Because of issue 63, we may have to reimplement our own async-mutex underneath. It shouldn't be too difficult since we did most of the work already.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 5, 2023

A quick hack is to add this:

            // This is quick hack due to
            // https://github.com/DirtyHairy/async-mutex/issues/64
            // Here we chain it to release immediately
            void acquireP.then((release) => release());

To immediately chain the release when it does acquire it. But that seems redundant, and also I'm not sure how that would work with the await yieldMicro().

Plus it's probably not that efficient.

@CMCDragonkai
Copy link
Member Author

Another thing is that I should change js-resources to change ResourceAcquire to use PromiseLike instead of Promise. This can allow the public lock to return PromiseCancellable type.

@CMCDragonkai
Copy link
Member Author

Ok so basically we're going need to build out the async-mutex with the ability to cancel a specific one.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 6, 2023

Ok so everything is actually built on top of a semaphore construct. In the async-mutex, the Mutex itself is on built on top of Semaphore.

In which case I have reimplemented the async mutex's semaphore inside src/Semaphore.ts.

It's actually improved in the sense that it supports "weighted semaphores", I had look at the current async-mutex implementation, but it seemed a bit too complicated, so asked chatgpt to give me a better idea.

Not only that, it now supports prioritised weighted semaphores. This means that by default the priority is set to false. Which means it biases towards fairness where a if there is a semaphore of limit 3 with an existing lock of weight 1, if locking attempts respectively try to lock a weight of 3 and 2, the 3 will block the 2. If the priority is set to true, it will bias towards more concurrency over fairness and thus induce the possibility of starvation of heavier locks.

So the semaphore now:

  1. Supports timers and abort signal
  2. Supports literal timers with just numbers
  3. Supports weighted semaphores
  4. Supports prioritised weighted semaphores
  5. Supports returning PromiseCancellable from both lock and waitForUnlock, but not withF and withG.

All existing tests pass, and I did add a few new tests into the semaphore to try it out.

Now we can proceed to adapt Lock so that it uses an underlying Semaphore with a limit of 1 (which is basically what Mutex was).

It's important to also test whether we still need yieldMicro now.

                  // TODO: NOT SURE IF THIS IS NEEDED ANYMORE
                  // NEED TO TEST RWLOCK build up
                  await utils.yieldMicro();

Primarily because the releasing functions is actually just synchronous now. But I remember there was some ordering problems with RWLockWriter.

Oh and I had to get 1.1.1 of js-async-locks since I found that the fix using the symbol doesn't cover all cases, instead we just check for DOMException with name property of AbortError and swap to undefined. This means intermediate signal.reason can still be DOMException, not an undefined.

@CMCDragonkai
Copy link
Member Author

The addition of weights means Semaphore is breaking API since the signature is now like await semaphore.lock(weight, ctx)(). required. The weight is defaulted to 1. Which means by default you are locking only of weight 1.

@CMCDragonkai
Copy link
Member Author

Everything is built on top of the Semaphore now. Even a Lock is just a Semaphore with a limit of 1. Now both Lock and Semaphore pass tests.

@CMCDragonkai
Copy link
Member Author

I've fixed up a bug in semaphore. Basically both lockers and waiters can use the same queue, it doesn't require separate queues unlike what chatgpt said. This keeps everything ordered properly and fixed the RWLockWriter order of operations issue.

Additionally for RWLockWriter in order to synchronise the order of the readers, I've made the second and subsequent readers end up waiting for the first reader to acquire the read lock first. This is more robust than adding 3 microyields which may not work well if the underlying operations change in the future.

@CMCDragonkai
Copy link
Member Author

Removed micro yields, so far everything is working.

I'm noting the fact that I didn't await on the releasing functions in RWLockReader. It seems to work fine though.

@CMCDragonkai
Copy link
Member Author

The isLocked is actually taking the type for RWLockReader and RWLockWriter now.

@CMCDragonkai
Copy link
Member Author

Just realised that I could have just wrapped all the async await in a const f = async () => { ... } and then at the very end wrap the f() in PromiseCancellable.from, it would have been simpler than having to convert it all the promise chaining syntax.

@CMCDragonkai
Copy link
Member Author

The LockBox ends up calling the the underlying lock function, potentially with only the ctx. This means all lock methods needs to be capable of taking a sole ctx parameter.

So right now we're having quite a flexible bunch of parameters that has to be propagated.

@CMCDragonkai
Copy link
Member Author

Looks like all existing concurrency control mechanisms have been ported over and now support cancellability.

Now it's time to get back to the Monitor.

@CMCDragonkai
Copy link
Member Author

I've had to change the API for LockBox a bit.

  1. We cannot use ToString, the lock key must be string. So any users must convert toString ahead of time. This was due to ambiguity between the ToString and the Partial<ContextTimedInput>.
  2. The waitForUnlock now can take the key, ctx or just take ctx or just take key.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 13, 2023

Monitor due to wrapping the Lockbox behaves differently when we synchronously attempt to lock a key then ask it whether that key is locked.

The problem is because Monitor wraps lockbox and its own _locks map isn't updated until the lock has been fully acquired.

In order to maintain consistent behaviour, the solution is to extend the _locks map with pending variant.

Something like:

  protected _locks: Map<
    string,
    { status: 'acquiring', type: 'read' | 'write' } |
    { status: 'acquired', type: 'read' | 'write'; lock: RWLock; release: ResourceRelease; }
  > = new Map();

We can then use acquired status to know that the key is pending with respect to that specific monitor.

An important realisation is that isLocked should return true upon synchronously attempting to lock. This is the case for all other locking mechanisms currently.

At the same time, it is important that the pending lock being set up above must happen after the const lockAcquireP = lockAcquire() is called, but before we await lockAcquireP.

This ensures that by the time the pending lock is setup in this._locks, that it is guaranteed that the this.lockBox.locks map will also have the same key and lock object. This is important for all subsequent uses of the pending lock. That's why we will end up using this.lockBox.locks.get(key)! with the ! asserting that it exists.

@CMCDragonkai
Copy link
Member Author

The deadlock detection system is working. I'm putting the last test cases for the Monitor and we should see everything working.

@CMCDragonkai
Copy link
Member Author

Ok all deadlock tests pass now. The deadlock detection revealed a very tricky race condition when using PromiseCancellable.

Basically, when creating derived promises, PromiseCancellable, we often need to use combinators like then. When doing this, the signal chain is not automatically propagated. It has to be done manually by connecting it with the PromiseCancellableController type.

Usually it is simple "chain" the signal cancellation. For example this is the lock method of Lock.

  public lock(ctx?: Partial<ContextTimedInput>): ResourceAcquireCancellable<Lock> {
    const acquire = this.semaphore.lock(1, ctx);
    return () => {
      const acquireP = acquire();
      return acquireP.then(
        ([release]) => [release, this],
        undefined,
        (signal) => {
          // Propagate cancellation to `acquireP`
          signal.addEventListener(
            'abort',
            () => {
              acquireP.cancel(signal.reason);
            },
            { once: true }
          );
        }
      );
    };
  };

This works since it can be guaranteed that acquireP would be available to be cancelled by the time any user would have access to the returned promise of lock()().

However in other places I'm using a pattern where I create a currentP that gets reassigned to whichever promise may current be blocked. This seems like it would work, but it's in fact a bit of a footgun.

Imagine instead the above function had to wait for 2 promises, 2 separate blocking operations that may require cancellation. This is what happened in RWLockWriter.write and RWLockReader.read.

In this case, a possible race condition occurs when you call p.cancel, but it ends up cancelling in the middle between the first blocking operation and the second blocking operation. That means it may be cancelling a promise that has already been resolved.

The solution to this problem is to not do this. It's not a good idea basically. Instead one should re-use the utils.setupTimedCancellable utility which will give you a ctx: ContextTimed that can be shared between the 2 blocking operations, such that if any signal cancellation occurs, even if it were to occur in between the 2 blocking operations, the fact that the ctx.signal is shared between the 2 operations ends up causing the second operation to reject with the abortion error.

So the only time we should use the pattern of currentP is to use it like we do above in Lock.lock, where there's only 1 assignment. So that way the whole asynchronous operation either resolves or rejects, there's no intermediate state where cancellation occurs against something that's already resolved, and thus the partial async operation ends up blocking and causing a deadlock.

I've added tests to both RWLockReader and RWLockWriter called promise cancellation for read then write and promise cancellation for write then read to ensure no regressions on this.

@CMCDragonkai
Copy link
Member Author

Beware of the above @amydevs @tegefaulkes when using the PromiseCancellable.

@CMCDragonkai
Copy link
Member Author

This is ready to be squashed.

@CMCDragonkai CMCDragonkai changed the title WIP: Create Monitor to support js-contexts @monitored decorator Create Monitor to support js-contexts @monitored decorator Jun 15, 2023
… timers

* replaced `async-mutex` with native `Semaphore`
* integrated `PromiseCancellable`
* integrated `ContextTimerInput` allow timers and abortion
* created `Monitor` which builds on top of `LockBox` to provide PCC transactions
* changed to using `swc` for testing
@CMCDragonkai
Copy link
Member Author

Had to update to 1.1.1 for js-timer as there was a bug there. That needs to propagate to all other libraries @tegefaulkes @amydevs

@CMCDragonkai CMCDragonkai merged commit d2aeb98 into staging Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Integrate js-contexts and js-async-cancellable
1 participant