Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark addresses as used while discovering them. #2033

Merged
merged 5 commits into from
Aug 20, 2020

Conversation

KtorZ
Copy link
Member

@KtorZ KtorZ commented Aug 12, 2020

Issue Number

#2032

Overview

  • dd3cff5
    📍 mark addresses as used as they are discovered
    The main issue of the current list address handler is that it loads in memory and crawls the ENTIRE transaction history of the wallet. This is done merely to know whether addresses have been used or not. This is unnecessary for random addresses because used and unused addresses are already separated in two different Maps. For sequential wallets it's a bit more subtle but, we can mark addresses as "Used" during discovery and store this information in the database.

  • 61bbd7b
    📍 add two properties for Random & Sequential wallets showing the effect of IsOurs on the address state

  • eed460c
    📍 fix random address import to not mark every imported address as 'Used'

Comments

  • NOTE 1: This change will require a database migration forcing wallets to re-synchronize from scratch.

  • NOTE 2: I haven't yet fixed unit tests which need some adjustment since some of the interfaces have changed.

  • NOTE 3: I'll try measuring and comparing the behavior with a stress-wallet that owns many addresses.

@KtorZ KtorZ added the RESOLVING ISSUE Mark a PR as resolving issues, for auto-generated CHANGELOG label Aug 12, 2020
@KtorZ KtorZ requested a review from rvl August 12, 2020 15:38
@KtorZ KtorZ self-assigned this Aug 12, 2020
@KtorZ KtorZ force-pushed the KtorZ/2032/mark-address-while-discovering branch 3 times, most recently from 8ba5a3f to eed460c Compare August 13, 2020 13:05
Copy link
Contributor

@rvl rvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholly approve of the change to mark sequential addresses as used once they are discovered, so that this information is easily known when listing addresses.

I also like the idea of streaming responses, but the topic needs further discussion in a separate PR.

-- separator), and decoding them as a JSON list.
decodeOne = Aeson.decodeStrict . mconcat
decodeStream chunks = do
xs <- traverse (Aeson.decodeStrict @Aeson.Value . B8.init) chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can necessarily assume that the chunks are split nicely at the newline.
The OS might interrupt the socket read early, and you will have a line in two pieces, neither of which are well-formed json.

This streaming change is worthy of its own PR, where the details can be sorted out.

I like the idea of streaming in general, because it removes one case of unbounded memory usage.

But on the other hand, if you have 1 million addresses, that would be perhaps 80MB of encoded json, as an indication of size. So the memory cost of not streaming is smallish. Without streaming there is a delay while the server is reading the list from sqlite, but unless the clients also have stream processing, so what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvl actually.. there's one test failing because of this which shows that sometimes, there are chunks that only contains half of an address (it fails on the "import 15.000 addresses and list them" scenario, I haven't noticed this below 100 addresses).

So yeah, this require some additional thoughts maybe. I'll split the streaming approach into a separate PR.

  The main issue of the current list address handler is that it loads in memory and crawls the ENTIRE transaction history of the wallet. This is done merely to know whether addresses have been used or not. This is unnecessary for random addresses because used and unused addresses are already separated in two different Maps. For sequential wallets it's a bit more subtle but, we can mark addresses as "Used" during discovery and store this information in the database.
@KtorZ KtorZ force-pushed the KtorZ/2032/mark-address-while-discovering branch from 02b126e to b94f9f8 Compare August 19, 2020 12:41
@KtorZ KtorZ marked this pull request as ready for review August 19, 2020 12:43
@KtorZ
Copy link
Member Author

KtorZ commented Aug 19, 2020

@rvl I ended up adding a simple migration for this. Reviewing the migration approach of the wallet can be done as a separate task / debt. I've tested the query with a hand-crafted database with 100k address and txout rows arbitrarily generated (with however existing relationships between tables) and the query is executed almost instantly.

  In the end, migrating an existing database was quite easy given the right SQL incantation. We need to mark any 'known' address as used where a known address is simply an address that appear in one of the wallet known txout. We don't care in the end if it's an outgoing or incoming transaction. Just seeing an address there immediately qualifies it as 'used'.
@KtorZ KtorZ force-pushed the KtorZ/2032/mark-address-while-discovering branch from 29ed552 to dc6db76 Compare August 19, 2020 16:31
Copy link
Contributor

@rvl rvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration looks OK

@rvl rvl force-pushed the KtorZ/2032/mark-address-while-discovering branch from 1486920 to 3a01713 Compare August 20, 2020 02:49
@rvl
Copy link
Contributor

rvl commented Aug 20, 2020

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Aug 20, 2020

Build succeeded

@iohk-bors iohk-bors bot merged commit 9864ac3 into master Aug 20, 2020
@iohk-bors iohk-bors bot deleted the KtorZ/2032/mark-address-while-discovering branch August 20, 2020 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RESOLVING ISSUE Mark a PR as resolving issues, for auto-generated CHANGELOG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants