feat(zebra-scan): Create a key storage database in RAM #7904

oxarbitrage · 2023-11-03T17:56:07Z

As part of the scanning project, we are going to be holding user defined viewing keys and scanning results in a storage database.

Proposed structure:

pub struct Storage {
    /// The key and an optional birthday for it.
    keys: HashMap<String, Option<Height>>,

    /// The key and the related transaction id.
    results: HashMap<String, Vec<Hash>>
}

Create the storage module.
Create a basic test for it.

This key storage database should be always ready for being traversed by the scanning task.

As the keys are initially added to the config no delete methods should be implemented as part of this ticket.

The text was updated successfully, but these errors were encountered:

arya2 · 2023-11-07T05:44:16Z

This looks like a significant change, I was hoping a HashMap would be enough here, is there a specific motivation?

mpguerra · 2023-11-07T09:49:18Z

maybe a Hashmap should be enough to begin with

oxarbitrage · 2023-11-07T13:43:17Z

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: https://github.com/ZcashFoundation/zebra/blob/main/zebrad/src/components/mempool/storage.rs#L115

I am pretty sure that the storage will get complicated enough to have its own sub module (errors, constants like max number of keys, etc).

arya2 · 2023-11-07T19:25:43Z

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: https://github.com/ZcashFoundation/zebra/blob/main/zebrad/src/components/mempool/storage.rs#L115

Ah, that sounds good, I misunderstood the scope of this ticket.

Something similar might also be great for storing the initial scan tasks and adding methods to check on the progress of existing initial scan tasks if we want to implement cancelling & restarting those, or sending them new viewing keys for them to scan down the line.

teor2345 · 2023-11-07T20:46:40Z

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: main/zebrad/src/components/mempool/storage.rs#L115

...

Something similar might also be great for storing the initial scan tasks and adding methods to check on the progress of existing initial scan tasks if we want to implement cancelling & restarting those, or sending them new viewing keys for them to scan down the line.

Let's open a separate ticket to design the scanning strategy?
This seems like something to discuss with the ECC engineers - I've already had some initial discussions on Discord here:
https://discord.com/channels/809218587167293450/972649509651906711/1163259715363549204

teor2345 · 2023-11-07T20:58:26Z

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: main/zebrad/src/components/mempool/storage.rs#L115

I am pretty sure that the storage will get complicated enough to have its own sub module (errors, constants like max number of keys, etc).

optional suggestion about swapping out storage layers

It might be helpful to have another crate for private key and secret scanning storage. That way, people can swap it out with another crate with the same interface. For example, some applications might just want to store secrets in RAM, and avoid the dependency on a disk-based database. But we could also do this using a Rust feature, so we don't need to make that decision now.

I also think we should consider re-using RocksDB here, because we already have well-tested low level interfaces to it. That would involve splitting out:

https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_db.rs
https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs
and their submodules (except for the Zebra-specific parts of upgrade.rs) into a new crate, which zebra-state and zebra-scan could depend on.

That refactor would need a separate ticket.

But for now, can we just pass a HashMap of keys to the scanner task, and make the detailed decisions about the scanner interface later? We had a lot of rework in the mempool storage implementation because we did it early, so we had to change it every time we changed how the mempool worked.

teor2345 · 2023-11-07T23:16:49Z

I estimated this ticket based on our decision at the meeting: we'll just pass a HashMap to the scanner task for now, and split the rest of the disk database design into a separate ticket.

oxarbitrage · 2023-11-08T12:39:12Z

Well, that means there is nothing to do in this ticket. I think we should add all the database stuff here instead of opening more tickets.

arya2 · 2023-11-08T17:18:31Z

But for now, can we just pass a HashMap of keys to the scanner task, and make the detailed decisions about the scanner interface later?

Could we do this as part of #7905 and estimate this issue for re-using rocksdb here?

We had a lot of rework in the mempool storage implementation because we did it early, so we had to change it every time we changed how the mempool worked.

I think we should block this on #7905 to avoid that.

teor2345 · 2023-11-08T20:16:58Z

But for now, can we just pass a HashMap of keys to the scanner task, and make the detailed decisions about the scanner interface later?

Could we do this as part of #7905 and estimate this issue for re-using rocksdb here?

I think there are two separate changes here:

split a zebra-db crate out of zebra-state, which is not blocked by feat(zebra-scan): Immediately scan newly verified blocks #7905
use the zebra-db crate to implement a key and results storage database, which could be blocked by feat(zebra-scan): Immediately scan newly verified blocks #7905

Since we want to be able to do some work in parallel, splitting the work into those two tickets would help.

We had a lot of rework in the mempool storage implementation because we did it early, so we had to change it every time we changed how the mempool worked.

I think we should block this on #7905 to avoid that.

In my experience, blocking on code sometimes causes more rework, because that code includes assumptions that aren't in the design. (And those assumptions don't work with other code from other tickets that are happening in parallel.)

I'd like to look at our design over the next week, and work out how to avoid blocking across key, scanning task, and results storage work. Because that can get really complicated to manage. Defining simple interfaces will reduce those dependencies.

Analysis

But for now, let's focus on the specific design question in this ticket. So what's the design question here?

Here's one way to find missing parts of the design:

Create a list of key operations, and say what happens to the keys, scanning tasks, and results with each operation.

For example:

Operation	Key Storage	Block Scanning Tasks	Results Storage
Add key	Key stored	New and cached blocks scanned	(nothing)
Delete key	Key deleted	Stop scanning with that key	Stored and in-flight results deleted

Questions

I think our current design covers everything except deleting results. And maybe how the scanner task finds out about new or deleted keys.

So the design questions are:
How do the scanning tasks find out about new and deleted keys?
How does the results storage find out about deleted keys?

My Suggestions

In the meeting we said that we'd just pass a HashMap to the task when it starts up. That seems like enough for #7905 to move forward. But as Alfredo said, that doesn't leave much work in this ticket.

So here are some things we could do in this ticket:

create in-memory key storage which uses a HashMap internally
define key operation requests and responses
provide the key set via a watch channel (depends on moving the watch receiver into zebra-chain, but that can be done in the same PR or fixed later)
provide the deleted keys via a broadcast channel

(Or these things can be done in multiple tickets, that's up to Alfredo and Pili.)

And here are some things that depend on larger work:

move the key storage to disk (depends on the state db refactor)
update the scanner task so it gets the current key set via the watch channel (depends on feat(zebra-scan): Immediately scan newly verified blocks #7905)

What do you think? Is that a good size for this ticket? Are there any missing dependencies?

oxarbitrage · 2023-11-08T20:43:59Z

Yes, that is what i want to do. I opened new tickets for database persistence: #7926 and #7928

oxarbitrage added the A-blockchain-scanner Area: Blockchain scanner of shielded transactions label Nov 3, 2023

oxarbitrage mentioned this issue Nov 3, 2023

Epic: Blockchain Scanning #7728

Open

teor2345 changed the title ~~feat(zebra-scan): Create a key storage database~~ feat(zebra-scan): Create a key storage database in RAM Nov 5, 2023

mpguerra added S-needs-triage Status: A bug report needs triage P-Medium ⚡ C-feature Category: New features labels Nov 6, 2023

oxarbitrage self-assigned this Nov 7, 2023

oxarbitrage mentioned this issue Nov 7, 2023

feat(zebra-scan): Immediately scan newly verified blocks #7905

Open

3 tasks

This was referenced Nov 8, 2023

feat(zebra-db): Split low level database code inside zebra-state #7926

Open

feat(zebra-scan): Make the key storage database persistent #7928

Closed

teor2345 mentioned this issue Nov 13, 2023

Create a zebra-db crate #7937

Open

This was referenced Nov 14, 2023

feat(config): Add keys to be scanned to Zebra config file #7941

Closed

feat(scanner): Add a very basic RAM database to store keys and scan results #7942

Merged

feat(scanner): Read and store keys from config at zebra startup #7946

Closed

mergify bot closed this as completed in #7942 Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(zebra-scan): Create a key storage database in RAM #7904

feat(zebra-scan): Create a key storage database in RAM #7904

oxarbitrage commented Nov 3, 2023 •

edited

arya2 commented Nov 7, 2023

mpguerra commented Nov 7, 2023

oxarbitrage commented Nov 7, 2023

arya2 commented Nov 7, 2023

teor2345 commented Nov 7, 2023

teor2345 commented Nov 7, 2023 •

edited

teor2345 commented Nov 7, 2023

oxarbitrage commented Nov 8, 2023

arya2 commented Nov 8, 2023

teor2345 commented Nov 8, 2023 •

edited

oxarbitrage commented Nov 8, 2023

feat(zebra-scan): Create a key storage database in RAM #7904

feat(zebra-scan): Create a key storage database in RAM #7904

Comments

oxarbitrage commented Nov 3, 2023 • edited

arya2 commented Nov 7, 2023

mpguerra commented Nov 7, 2023

oxarbitrage commented Nov 7, 2023

arya2 commented Nov 7, 2023

teor2345 commented Nov 7, 2023

teor2345 commented Nov 7, 2023 • edited

teor2345 commented Nov 7, 2023

oxarbitrage commented Nov 8, 2023

arya2 commented Nov 8, 2023

teor2345 commented Nov 8, 2023 • edited

Analysis

Questions

My Suggestions

oxarbitrage commented Nov 8, 2023

oxarbitrage commented Nov 3, 2023 •

edited

teor2345 commented Nov 7, 2023 •

edited

teor2345 commented Nov 8, 2023 •

edited