Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Futures returned in library are not send #165

Open
LLFourn opened this issue Nov 16, 2020 · 21 comments
Open

Futures returned in library are not send #165

LLFourn opened this issue Nov 16, 2020 · 21 comments

Comments

@LLFourn
Copy link
Contributor

LLFourn commented Nov 16, 2020

Send is explicitly opted out of here:

#[async_trait(?Send)]
.

This is a bit of a problem. For example, after potentially receiving some funds you might like to do:

tokio::spawn(wallet.sync(...).then(|res| info!(the result of the sync was "{:?}", res))

To sync the wallet in the background. This is not possible since the future must be Send to send to tokio::spawn.
It will take some re-engineering of the internals to make this work.

@afilini do you have opinions on how to address this?

@afilini
Copy link
Member

afilini commented Nov 16, 2020

Isn't this somewhat connected to the discussion we had about making the wallet thread safe? If the wallet was thread safe then you could spawn a background thread and call sync() in that thread, without necessarily having to spawn a task using tokio and going through the async interface.

Actually, I think having a thread-safe wallet is probably the first requirement to then make the futures Send, otherwise it doesn't make sense to have a background task run sync() if you can't do anything else on the wallet because there's a Mutex that locks everything.

I guess the biggest problem with thread-safe wallets in general is that the database is in an inconsistent state during sync. We would probably need to have transactions of some sorts, so that you only commit the full changes at the end of sync and always keep the database in a consistent state.

@afilini
Copy link
Member

afilini commented Nov 16, 2020

So unless I have misunderstood something, I guess there are a few steps before we can get to working on Send futures:

  1. Make Database thread-safe
  2. Add transactional operations to Database
  3. Update all the Blockchain types to use the transactions with Database
  4. Make Blockchain thread-safe, or wrap it in a Mutex. Here we need some kind of locking mechanism to avoid having multiple sync running at the same time
  5. Make Wallet thread-safe, remove RefCells. At this point you can already achieve the "sync in background" by spawning a thread and calling sync inside
  6. ??? to make futures Send

@notmandatory
Copy link
Member

I agree the first priorities should be to make the Database, Blockchain and Wallet thread safe. For our bdk-jni android example the app code is spawning it's own threads from the JVM to sync the wallet or get balance updates, etc. So if bdk blocks on those calls it's not a problem.

@LLFourn
Copy link
Contributor Author

LLFourn commented Nov 16, 2020

The difficulty with the approach is that now you have to have two impls -- one using tokio::sync::Mutex and one using RefCell to account for no_std/WASM. I don't see how to get around this but I haven't got much experience finding work arounds. I feel it will be further complicated by maybe_await stuff because you now have to maybe_await the lock on the components as well (or just borrow_mut if using RefCell).

I think a better solution is to just remove interior mutability from Wallet (i.e. remove RefCell) and make &self mutable in methods like sync.
This would circumvent the need for all the rather involved steps @afilini mentions above.

Then if you want to do a sync in a background thread:

let wallet: Arc<Mutex<Wallet<...>>> = Arc::clone(&wallet);
tokio::spawn(async move {
    let mut wallet = wallet.lock().await;
    wallet.sync(...).await;
})

I think this will work. Although rust has a way of foiling my plans in this area.

The disadvantage of this is that now anyone wanting to use the wallet while sync is running is going to have to wait.
There could be ways of easing this in the future -- for example you expose a sync_next_batch method which only does a little bit of work (and indicates whether the sync is done). This would let another thread do something between successive calls to sync_next_batch.

@thomaseizinger
Copy link
Contributor

I just ran into this while using bdk::Wallet within an actor framework.

I have one actor had controls access to the wallet and thus, the wallet itself does not need to be thread-safe. The message handlers are all async though and will be run on the tokio runtime.

At the moment, the lack of Send futures makes it impossible to use the EsploraBlockchain because without async-interface it panics that it would block a worker thread:

thread 'main' panicked at 'Cannot start a runtime from within a runtime. This happens because a function (like block_on) attempted to block the current thread while the thread is being used to drive asynchronous tasks.', /home/thomas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.15.0/src/runtime/enter.rs:39:9

and with async-interface, the code doesn't compile because the async-trait implementation of Blockchain on EsploraBlockchain returns Futures that are not Send.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jan 10, 2022

I think a better solution is to just remove interior mutability from Wallet (i.e. remove RefCell) and make &self mutable in methods like sync.

I would also advocate for this. Accessing the wallet from multiple threads / tasks should be an application decision and can be solved in various ways (actors, mutex, etc).

@LLFourn
Copy link
Contributor Author

LLFourn commented Jan 10, 2022

I've changed my thinking on this topic. After working with various database APIs they are not at all consistent about why they need to be &mut or not. For example, a database API may have &mut self for every call because it has only one connection and the underlying connection is mutable. Others may only have &mut self for writing operations but not for reading (this is what BDK database assumes). Furthermore, others may only have &mut self when starting and ending a transaction/batch update. Finally, some have no mutability at all and handle everything internally or using thread local variables like sled.

Therefore I don't think we can manage when the database should and shouldn't bee mutable and we should force the Database itself to handle thread safety i.e. make the database trait Sync. I don't think there is much of a cost to doing this and it sure would simplify a lot . Perhaps we'd need to make an exception for wasm32 so as to avoid using Mutex in MemoryDatabase. But the better solution here is just to make a proper wasm databse that uses web apis!

Side note: #461 made database updates atomic (for esplora and electrum) -- the sync logic no longer updates the database while syncing but puts everything into one big batch update at the end (see script_sync.rs). This makes things easier to reason about and means even if doing an async sync the database will never be locked across an .await point.
As far as I can tell it is sound to do multiple syncs at the same time and if not it is a problem with the sync logic. After #461 the db state should be consistent after a sync with esplora and electrum (I think!). It should probably be avoided anyway because it's redundant but it is not the responsibility of the database to ensure this.

@thomaseizinger
Copy link
Contributor

Is making the DB thread-safe going to make the entire wallet thread-safe? Because if not, then why bother with internal mutability when users need to wrap it in a Mutex anyway?

@LLFourn
Copy link
Contributor Author

LLFourn commented Jan 18, 2022

Is making the DB thread-safe going to make the entire wallet thread-safe? Because if not, then why bother with internal mutability when users need to wrap it in a Mutex anyway?

Yes I think so. Well this would be necessary condition for going in this direction.

@thomaseizinger
Copy link
Contributor

An alternative design could be to decouple the wallet and the database from each other. A call to sync could:

  • Require a SyncStatus struct as a parameter that can be obtained from the database
  • Return a SyncResult struct that can be applied to a database

This makes doing a sync a bit more verbose as users would have to do something like:

let status = db.get_sync_status();
let result = wallet.sync(status);
db.apply_sync_result(result);

However, the benefits would be that the wallet does not care whether or not the DB is mutable / immutable or sync / async. Using such a design may remove the need for Database trait entirely.

It is an ergonomics hit although I don't think it would be particularly bad. In my experience with BDK, I always wrapped the bdk::Wallet in a Wallet struct myself to provide slightly different APIs and/or compose more logic on top of it. Having to compose a few more pieces from bdk in my own Wallet struct wouldn't be a big deal here.

Going down this route would result in removal of all type parameters from Wallet and users would have to compose the wallet with a database and a client by passing data back and forth between those components. WDYT?

@LLFourn
Copy link
Contributor Author

LLFourn commented Jan 20, 2022

@thomaseizinger thanks for thinking more deeply about this problem. Here's the issue with your suggestion:

However, the benefits would be that the wallet does not care whether or not the DB is mutable / immutable or sync / async. Using such a design may remove the need for Database trait entirely.

The wallet would need the database trait still:

  1. The wallet needs to use it to create transactions.
  2. The wallet actually mutates it when generating new addresses.
  3. In general you need to be able to query the database for stuff -- it's handy if it has the same API regardless of the underlying concrete DB.

Your line of questioning about what exactly should be the relationship between components is really important. What is a wallet? Is it even a good abstraction to base this library around? I think a lot about this but I don't think it's directly relevant here. In any case it seems to me that we do need a Database trait and making it Sync makes sense and would fix the problem at hand. I think this is true even if we adopted a sync API more like what you were suggesting unless we can stop all database queries during the sync. This might be possible but requires some careful design work!

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Feb 4, 2022

In any case it seems to me that we do need a Database trait and making it Sync makes sense and would fix the problem at hand. I think this is true even if we adopted a sync API more like what you were suggesting unless we can stop all database queries during the sync. This might be possible but requires some careful design work!

I agree with you. Is someone currently working on this? If not, can we have some design notes on what is expected so it can be picked up?
Not being able to use an async-native backend with bdk is mildly annoying. I'd really like to move away from the blocking interface. Our work-around for #521 is to scan for 1000 addresses and together with a blocking client, this means we are blocking one of tokio's executor threads for ~4 seconds every time a sync is triggered.

Happy to contribute a PR if we can get some mentoring notes on how to change things :)

@LLFourn
Copy link
Contributor Author

LLFourn commented Feb 18, 2022

Ok so I will paint my grand vision of how this should go:

  1. We separate the Blockchain traits into separate synchronous and asynchronous versions e.g. under bdk::async. The async version will of course require the futures to be Send. This is just a necessary cleanup that I think we want to do.
  2. Instead of calling wallet.sync(&blockchain, options) (which will be the API after Remove blockchain from wallet #535) we make the API something more like what you suggested above:
let sync_session = wallet.start_sync(options)?;
let result = blockchain.sync(&sync_session)?; // this is the one that might be async
wallet.apply_sync(&result)?;

So this would fix things because we wouldn't care about whether the database is Sync we just care the sync_session is. Which isn't to say that the various databases shouldn't be Sync but it just means that we don't care about that here.

What is a SyncSession? I don't know yet. It depends on how much each sync needs to know about the database. Here's some points on this:

  1. Right now we always do a "full" sync on esplora and electrum. Which means we don't look at what the blockheight was when we did the last sync and only get new stuff and we always sync for every address. The blockchain is provided with a database so that it doesn't download more transactions that it needs to.
  2. If we were to at least improve this by only caring about transactions that have appeared after the last sync then maybe we wouldn't need to database at all since we can avoid downloading it by just looking at its height.

So consider the following structures

// given to the blockchain
struct SyncSession {
    // txs that were previously in the mempool.
    // So we don't need to download them again and can tell if we need to delete them.
    prev_mempool: Vec<Txid>,
    // The blockid of the last sync
    last_sync_block: (BlockHash, u32)
    // the list of addresses to look at for each keychain
    script_pubkeys_interested: HashMap<KeychainKind, Box<dyn Iterator<Item=Script> + Send>>
}

// returned from the blockchain
struct SyncResult {
     transactions: SyncTxs,
     // this will be set to last_sync_block next time
     sync_block: (BlockHahsh, u32)
     // where we are up to for each keychain
     keychain_indexes: HashMap<KeychainKind, u32>
}

enum SyncTxs {
    Partial {
       // txs we haven't seen before that are in mempool or chain now
       new: Vec<TransactionDetails>,
       // txs that were in the mempool but are now gone
       mempool_confirmed: Vec<Txid>,
       // these txs have disappeared from the mempool
       mempool_gone: Vec<Txid>
    },
    Full {
        // all transactions ever associated with this wallet.
        // wallet should hose the current database and replace it with the data here.
        txs: Vec<TransactionDetails>
    }
}

This stops us needing a reference to the Database during sync and allows us to just exchange data back and forth with the blockchain. The slight downside is we do a full download in the case where we previously synced on a stale block. But this will happen very rarely in practice like once a month if you are syncing all the time. If this is actually a problem we could store the last two blockhashes which would decrease this.

On a side not I think #521 could be fixed at the same time by architecting the script pubkey iterators properly.

Thoughts?

@thomaseizinger
Copy link
Contributor

Thoughts?

If we can make this work then I believe this would be a really clean way of resolving the async/sync duality of blockchain operations. Pre-condition for that is that we can gather all required information in one go, do the sync and come back with a result.

With #535, perhaps the API should actually be the other way round:

blockchain.sync(&wallet).await?;

Maybe we can do the same thing for the database:

database.sync(&wallet).await?;

A wallet could by default have an in-memory storage and "sync" with a persistent backend like sqlite or seld.

@notmandatory
Copy link
Member

I was discussing with @rajarshimaitra and he brought up a good point to consider, which is if always sync the database to an in-memory struct in the wallet could this be a memory problem for very big wallets, or even not so big wallets on memory constrained devices?

I like the blockchain.sync api and finding a way to split the blockchain and database from the core wallet lib.

@notmandatory
Copy link
Member

notmandatory commented Feb 25, 2022

Using something like an async feature flag, could we enable async versions of wallet functions, so if enabled would be called something like:

let blockchain: AsyncBlockchain = ...; // impl an async version of a blockchain
let database: AsyncDatabase = ...; // impl an async version of a database
wallet.sync(&blockchain, &database).await?;  // async version of the functions
wallet.get_new_address(&database).await?; 

And if async feature not enabled:

let blockchain: Blockchain = ...; // impl a blocking version of a blockchain
let database: Database = ...; // impl a blocking version of  a database
wallet.sync(&blockchain, &database); // blocking versions of the functions
wallet.get_new_address(&database);

Though in this approach the Wallet and core lib crate would need to know about async and the code would be less clear with async and blocking versions of everything...so maybe not really an improvement. But food for thought.

EDIT:

I should have refreshed my memory on how the async-interface feature currently works. The #[maybe_async] and #[maybe_await] macros @afilini created are used to generate "async" versions of Blockchain and related Wallet methods, but only for the wasm32 target. Could this approach be extended to the Database trait and all wallet functions and not only the wasm32 target?

@thomaseizinger
Copy link
Contributor

Feature flags really should never change APIs because they are evaluated additively by cargo.

Assume the following situation:

  • crate A depends on bdk without async-interface
  • crate B depends on bdk with async-interface

You cannot use crate A and B in the same dependency graph now because A will fail to compile because its code is written against the non-async API.

Feature flags work best if they add code, like additional functions. (Be careful with enum variants, those are also nasty).

@LLFourn
Copy link
Contributor Author

LLFourn commented Feb 26, 2022

Just wanted to second what @thomaseizinger said. One of the main goals of this refactoring is to remove all this maybe_async stuff and await_or_block etc so the features are additive. It may be possible to get rid of AnyBlockchain stuff in favour of just using trait objects too so we remove most of the other conditional enum variants too.

@notmandatory
Copy link
Member

Thanks for the explanation, making features additive makes sense and sounds like the right way to design things. It's even in the Cargo Book which apparently I need to read more closely! 😵‍💫

@cryptoquick
Copy link

Ideally this would be implemented in such a way that the same code that works on web can also be easily targeted to work on desktop. Imagine a use-case where the isomorphic UI library Iced can compile to both using the bulk of the same application logic and UI code, with platform-specific edge cases handled through conditional compilation macros.

@MaxFangX
Copy link
Contributor

MaxFangX commented Jan 25, 2023

FWIW I hacked out a workaround using a number of refactors. Definitely too ugly to be upstreamed but it gets the job done if you really really need BDK's futures to be Send today (which we did).

Here's the branch: https://github.com/lexe-tech/bdk/tree/max/thread-safe

And the two commits that implement it:

which at this point can be cherry-picked onto the 0.26 release.

You can confirm that the futures are indeed Send with the following:

cargo clippy --all-targets --features=async-interface,use-esplora-async --no-default-features -- --allow=warnings

Important to read the commit messages to understand the caveats for using this patch correctly (i.e. bdk::Wallet must be wrapped in an async Mutex to prevent concurrent access).

luckysori added a commit to get10101/10101 that referenced this issue Jun 8, 2023
An actor that manages the `bdk::Wallet` resource. It allows us to use
the wallet whilst the inevitably expensive on-chain sync is happening
in the background.

We would like to use the async version of `EsploraBlockchain`, but
bitcoindevkit/bdk#165 is still an issue.

The only hacky bit stems from the fact that we still have to implement
certain non-async foreign traits and to access any `bdk::Wallet`
resource we now have to go via async methods, since the actor is
async.

To do so, at some point we have to call an async function "from a sync
context". The natural way to do so (for me) would be to use
`runtime.block_on`, which schedules an async task and waits for it to
resolve, blocking the thread. *But*, these non-async foreign trait
methods are actually called elswhere in _async_ contexts. This leads
to `runtime.block_on` panicking because `tokio` is trying to prevent
us from blocking a thread in an async context. This is analogous to us
misusing async and doing an expensive CPU-bound computation in an
async context, but here `tokio` is able to "aid" us since `block_on`
is provided by `tokio`.

The solution is to use the following pattern:

```rust
tokio::task::block_in_place(|| {
    tokio::runtime::Handle::current().block_on(async {
        // async code
    })
});
```

From the documentation of `block_in_place`, we are able to run "the
provided blocking function on the current thread without blocking the
executor". We therefore avoid the panic, as we no longer block the
executor when calling `block_on`.

This has one final side-effect, which is that all `ln-dlc-node` async
tests now need to go back to using the `multi_thread` flavour.
`block_in_place` works by moving all scheduled tasks to a different
worker thread and without `multi_thread` there is only one worker
thread, so it just panics.

Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>
luckysori added a commit to get10101/10101 that referenced this issue Jun 8, 2023
An actor that manages the `bdk::Wallet` resource. It allows us to use
the wallet whilst the inevitably expensive on-chain sync is happening
in the background.

We would like to use the async version of `EsploraBlockchain`, but
bitcoindevkit/bdk#165 is still an issue.

The only hacky bit stems from the fact that we still have to implement
certain non-async foreign traits and to access any `bdk::Wallet`
resource we now have to go via async methods, since the actor is
async.

To do so, at some point we have to call an async function "from a sync
context". The natural way to do so (for me) would be to use
`runtime.block_on`, which schedules an async task and waits for it to
resolve, blocking the thread. *But*, these non-async foreign trait
methods are actually called elswhere in _async_ contexts. This leads
to `runtime.block_on` panicking because `tokio` is trying to prevent
us from blocking a thread in an async context. This is analogous to us
misusing async and doing an expensive CPU-bound computation in an
async context, but here `tokio` is able to "aid" us since `block_on`
is provided by `tokio`.

The solution is to use the following pattern:

```rust
tokio::task::block_in_place(|| {
    tokio::runtime::Handle::current().block_on(async {
        // async code
    })
});
```

From the documentation of `block_in_place`, we are able to run "the
provided blocking function on the current thread without blocking the
executor". We therefore avoid the panic, as we no longer block the
executor when calling `block_on`.

This has one final side-effect, which is that all `ln-dlc-node` async
tests now need to go back to using the `multi_thread` flavour.
`block_in_place` works by moving all scheduled tasks to a different
worker thread and without `multi_thread` there is only one worker
thread, so it just panics.

Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

6 participants