Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zebra-scan): Create a key storage database in RAM #7904

Closed
2 tasks
Tracked by #7728
oxarbitrage opened this issue Nov 3, 2023 · 11 comments · Fixed by #7942
Closed
2 tasks
Tracked by #7728

feat(zebra-scan): Create a key storage database in RAM #7904

oxarbitrage opened this issue Nov 3, 2023 · 11 comments · Fixed by #7942
Assignees
Labels
A-blockchain-scanner Area: Blockchain scanner of shielded transactions C-feature Category: New features S-needs-triage Status: A bug report needs triage

Comments

@oxarbitrage
Copy link
Contributor

oxarbitrage commented Nov 3, 2023

As part of the scanning project, we are going to be holding user defined viewing keys and scanning results in a storage database.

Proposed structure:

pub struct Storage {
    /// The key and an optional birthday for it.
    keys: HashMap<String, Option<Height>>,

    /// The key and the related transaction id.
    results: HashMap<String, Vec<Hash>>
}
  • Create the storage module.
  • Create a basic test for it.

This key storage database should be always ready for being traversed by the scanning task.

As the keys are initially added to the config no delete methods should be implemented as part of this ticket.

@oxarbitrage oxarbitrage added the A-blockchain-scanner Area: Blockchain scanner of shielded transactions label Nov 3, 2023
@teor2345 teor2345 changed the title feat(zebra-scan): Create a key storage database feat(zebra-scan): Create a key storage database in RAM Nov 5, 2023
@mpguerra mpguerra added S-needs-triage Status: A bug report needs triage P-Medium ⚡ C-feature Category: New features labels Nov 6, 2023
@arya2
Copy link
Contributor

arya2 commented Nov 7, 2023

This looks like a significant change, I was hoping a HashMap would be enough here, is there a specific motivation?

@mpguerra
Copy link
Contributor

mpguerra commented Nov 7, 2023

maybe a Hashmap should be enough to begin with

@oxarbitrage
Copy link
Contributor Author

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: https://github.com/ZcashFoundation/zebra/blob/main/zebrad/src/components/mempool/storage.rs#L115

I am pretty sure that the storage will get complicated enough to have its own sub module (errors, constants like max number of keys, etc).

@oxarbitrage oxarbitrage self-assigned this Nov 7, 2023
@arya2
Copy link
Contributor

arya2 commented Nov 7, 2023

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: https://github.com/ZcashFoundation/zebra/blob/main/zebrad/src/components/mempool/storage.rs#L115

Ah, that sounds good, I misunderstood the scope of this ticket.

Something similar might also be great for storing the initial scan tasks and adding methods to check on the progress of existing initial scan tasks if we want to implement cancelling & restarting those, or sending them new viewing keys for them to scan down the line.

@teor2345
Copy link
Contributor

teor2345 commented Nov 7, 2023

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: main/zebrad/src/components/mempool/storage.rs#L115

...

Something similar might also be great for storing the initial scan tasks and adding methods to check on the progress of existing initial scan tasks if we want to implement cancelling & restarting those, or sending them new viewing keys for them to scan down the line.

Let's open a separate ticket to design the scanning strategy?
This seems like something to discuss with the ECC engineers - I've already had some initial discussions on Discord here:
https://discord.com/channels/809218587167293450/972649509651906711/1163259715363549204

@teor2345
Copy link
Contributor

teor2345 commented Nov 7, 2023

I think a Hashmap is fine to be the underlying data structure if we want to. But i also think we need a structure on top of it, something like: main/zebrad/src/components/mempool/storage.rs#L115

I am pretty sure that the storage will get complicated enough to have its own sub module (errors, constants like max number of keys, etc).

optional suggestion about swapping out storage layers

It might be helpful to have another crate for private key and secret scanning storage. That way, people can swap it out with another crate with the same interface. For example, some applications might just want to store secrets in RAM, and avoid the dependency on a disk-based database. But we could also do this using a Rust feature, so we don't need to make that decision now.

I also think we should consider re-using RocksDB here, because we already have well-tested low level interfaces to it. That would involve splitting out:

That refactor would need a separate ticket.

But for now, can we just pass a HashMap of keys to the scanner task, and make the detailed decisions about the scanner interface later? We had a lot of rework in the mempool storage implementation because we did it early, so we had to change it every time we changed how the mempool worked.

@teor2345
Copy link
Contributor

teor2345 commented Nov 7, 2023

I estimated this ticket based on our decision at the meeting: we'll just pass a HashMap to the scanner task for now, and split the rest of the disk database design into a separate ticket.

@oxarbitrage
Copy link
Contributor Author

Well, that means there is nothing to do in this ticket. I think we should add all the database stuff here instead of opening more tickets.

@arya2
Copy link
Contributor

arya2 commented Nov 8, 2023

But for now, can we just pass a HashMap of keys to the scanner task, and make the detailed decisions about the scanner interface later?

Could we do this as part of #7905 and estimate this issue for re-using rocksdb here?

We had a lot of rework in the mempool storage implementation because we did it early, so we had to change it every time we changed how the mempool worked.

I think we should block this on #7905 to avoid that.

@teor2345
Copy link
Contributor

teor2345 commented Nov 8, 2023

But for now, can we just pass a HashMap of keys to the scanner task, and make the detailed decisions about the scanner interface later?

Could we do this as part of #7905 and estimate this issue for re-using rocksdb here?

I think there are two separate changes here:

Since we want to be able to do some work in parallel, splitting the work into those two tickets would help.

We had a lot of rework in the mempool storage implementation because we did it early, so we had to change it every time we changed how the mempool worked.

I think we should block this on #7905 to avoid that.

In my experience, blocking on code sometimes causes more rework, because that code includes assumptions that aren't in the design. (And those assumptions don't work with other code from other tickets that are happening in parallel.)

I'd like to look at our design over the next week, and work out how to avoid blocking across key, scanning task, and results storage work. Because that can get really complicated to manage. Defining simple interfaces will reduce those dependencies.

Analysis

But for now, let's focus on the specific design question in this ticket. So what's the design question here?

Here's one way to find missing parts of the design:

Create a list of key operations, and say what happens to the keys, scanning tasks, and results with each operation.

For example:

Operation Key Storage Block Scanning Tasks Results Storage
Add key Key stored New and cached blocks scanned (nothing)
Delete key Key deleted Stop scanning with that key Stored and in-flight results deleted

Questions

I think our current design covers everything except deleting results. And maybe how the scanner task finds out about new or deleted keys.

So the design questions are:
How do the scanning tasks find out about new and deleted keys?
How does the results storage find out about deleted keys?

My Suggestions

In the meeting we said that we'd just pass a HashMap to the task when it starts up. That seems like enough for #7905 to move forward. But as Alfredo said, that doesn't leave much work in this ticket.

So here are some things we could do in this ticket:

  • create in-memory key storage which uses a HashMap internally
  • define key operation requests and responses
  • provide the key set via a watch channel (depends on moving the watch receiver into zebra-chain, but that can be done in the same PR or fixed later)
  • provide the deleted keys via a broadcast channel

(Or these things can be done in multiple tickets, that's up to Alfredo and Pili.)

And here are some things that depend on larger work:

What do you think? Is that a good size for this ticket? Are there any missing dependencies?

@oxarbitrage
Copy link
Contributor Author

Yes, that is what i want to do. I opened new tickets for database persistence: #7926 and #7928

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-blockchain-scanner Area: Blockchain scanner of shielded transactions C-feature Category: New features S-needs-triage Status: A bug report needs triage
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants