Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy Indexing Overhaul #42

Merged
merged 1 commit into from
May 23, 2024
Merged

Conversation

jeffro256
Copy link

@jeffro256 jeffro256 commented Apr 7, 2024

This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses
(amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed, saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing scheme.

In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming most ring members will RingCT enotes.

We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under the legacy indexing scheme.

@jeffro256 jeffro256 marked this pull request as ready for review April 8, 2024 12:22
Copy link

@j-berman j-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review

It's unfortunate that this duplicates some work from #40 and has similar problems that PR addresses. This PR seems less complex and less complexity would be my preference though (apologies @SNeedlewoods :/)

src/seraphis_core/legacy_output_index.h Outdated Show resolved Hide resolved
src/seraphis_core/legacy_output_index.h Show resolved Hide resolved
@@ -72,14 +72,15 @@ static bool try_view_scan_legacy_enote_v1(const rct::key &legacy_base_spend_pubk
const std::uint64_t block_index,
const std::uint64_t block_timestamp,
const rct::key &transaction_id,
const std::uint64_t total_enotes_before_tx,
const std::function<std::uint64_t(rct::xmr_amount)> &total_enotes_before_tx,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler to just pass the enote_ledger_index here and get rid of this std::function, and also get rid of enote_tx_index_by_amount_inout. This function param is going to be a little annoying to re-implement in the scanner.

Using enote_tx_index_by_amount_inout like this also conflicts with #43 since the same enote can be re-scanned with different tx pub keys. Seems simpler to just get rid of that by going with a simple enote_ledger_index passed into the function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_enotes_before_tx doesn't just return one result for the whole tx, it will return different results based on which amounts you pass to it. We could pass a map, maybe? That would require we pre-scan the tx to look for the ledger indexing amounts in the enotes before calling this function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could pass a map, maybe?

Was suggesting pass a vector of enote_ledger_index's into try_find_legacy_enotes_in_tx over here

Then when iterating over enotes, reference the specific enote_ledger_index to pass into try_view_scan_legacy_enote_v1

But a map is fine too

require we pre-scan the tx to look for the ledger indexing amounts in the enotes before calling this function

The daemon currently returns a vector of global output indices per tx (source); the respective amount is inferred from the tx output. So technically "pre-scanning" is already done on the server-side today

contextual_record_out.origin_context =
SpEnoteOriginContextV1{
.block_index = block_index,
.block_timestamp = block_timestamp,
.transaction_id = transaction_id,
.enote_tx_index = enote_index,
.enote_ledger_index = total_enotes_before_tx + enote_index,
.enote_ledger_index = enote_ledger_index,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This messes up the is_older_than implementation when comparing SpEnoteOriginContextV1's enote_ledger_index's

enotes with different transparent amounts can't be compared using this to determine age

Same as this comment: #40 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal for is_older_than: add tx_index_in_block to SpEnoteOriginContextV1

With this, we can then determine is_older_than for SpEnoteOriginContextV1 using the block_index, tx_index_in_block, and enote_tx_index

Would be useful not only to guarantee correctness of is_older_than, but also to guarantee a consistent order between the enote store and a wallet2 m_transfers container. The latter is marginally useful for populating a wallet2 instance from the enote store.

Potentially worth a separate PR

@@ -784,7 +821,7 @@ void MockLedgerContext::get_onchain_chunk_legacy(const std::uint64_t chunk_start
total_output_count_before_tx,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A vector of the enote ledger indexes would be much appreciated here instead of total_output_count_before_tx. This function just seems like a pain

if (variant.is_type<LegacyEnoteV1>())
return variant.unwrap<LegacyEnoteV1>().amount;
else
return 0;
Copy link

@j-berman j-berman Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unfortunate caveat here: LegacyEnoteV4 can either be pre-RCT or coinbase. This needs to return the amount for the former and 0 for the latter

EDIT: so we don't forget in this PR, I forgot to mention here LegacyEnoteV1 can also either be pre-RCT or post-RCT pre-view tag coinbase. Also needs amount for the former and 0 for the latter

Copy link
Author

@jeffro256 jeffro256 Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I updated the PR with changes shared from PR #40, and together they tackle the problem nicely.

/// on-chain indices of the proof's ring members (serializable as index offsets)
std::vector<std::uint64_t> reference_set_COMPACT;
/// on-chain indices of the proof's ring members
std::vector<std::pair<std::uint64_t, std::uint64_t>> reference_set;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: the reason you got rid of compaction is because this LegacyRingSignatureV4 could have {pre-RCT, RCT} enotes in the same ring, right? If instead each ring was tied to a single ledger_indexing_amount, then it would be trivial to just keep the same reference_set_COMPACT as is and add a field for the ledger_indexing_amount

IIUC I think this is fine as is

Copy link
Author

@jeffro256 jeffro256 Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly I got rid of compaction because I'm overhauling serialization in #39 anyways. Here's how I would do compaction:

  1. Order pairs by increasing amounts
  2. Order each pair for a given amount by increasing index number
  3. Decumulate the list of unique amounts
  4. Decumulate the list of index numbers for a given amount
  5. Store number of unique amounts, then entries of amount offset and a list of index number offets

For example, if your reference set entries (a_j, i_k) is {(0, 0}, (0, 5), (0, 8), (1, 4), (1, 5), (3, 0), (3, 10)}, the encoding could look like so:

3            // number of unique amounts
0            // base amount
3            // number of indexes for amount '0'
0            // base index for amount '0'
5            // index offset
3            // index offset
1            // amount offset
2            // number of indexes for amount '1'
4            // base index for amount '1'
1            // index offset
2            // amount offset
2            // number of indexes for amount '3'
0            // base index for amount '3'
10          // index offset

You can see how this can get pretty compact if there are not many unique ledger indexing amounts. For example, if we want to specify the first 20 RingCT enotes for our reference set (ledger indexing amount is 0), then our encoding would be: 1 0 20 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.

Edit: It's in this PR now

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely important to include this, with good tests. For follow-up PR.

src/seraphis_main/tx_component_types_legacy.cpp Outdated Show resolved Hide resolved
src/seraphis_main/tx_builders_multisig.cpp Show resolved Hide resolved
@jeffro256 jeffro256 changed the title Use legacy-compatible indexing (amount, index in amount) pairs in txs Legacy Indexing Overhaul Apr 26, 2024
Copy link

@SNeedlewoods SNeedlewoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very glad to see how much cleaner this is.

Apart from the few minor things I commented on, this looks good to me and I approve this PR.

src/seraphis_core/legacy_output_index.h Outdated Show resolved Hide resolved
src/seraphis_main/contextual_enote_record_types.h Outdated Show resolved Hide resolved
src/seraphis_main/contextual_enote_record_utils.cpp Outdated Show resolved Hide resolved
src/seraphis_main/scan_balance_recovery_utils.cpp Outdated Show resolved Hide resolved
src/seraphis_main/tx_component_types_legacy.cpp Outdated Show resolved Hide resolved
src/seraphis_mocks/mock_ledger_context.cpp Outdated Show resolved Hide resolved
@jeffro256
Copy link
Author

Thanks @SNeedlewoods !

src/common/container_helpers.h Outdated Show resolved Hide resolved
src/common/container_helpers.h Outdated Show resolved Hide resolved
src/seraphis_core/legacy_decoy_selector_flat.cpp Outdated Show resolved Hide resolved
src/seraphis_core/legacy_decoy_selector_flat.h Outdated Show resolved Hide resolved
src/seraphis_core/legacy_output_index.h Outdated Show resolved Hide resolved
src/seraphis_main/scan_balance_recovery_utils.cpp Outdated Show resolved Hide resolved
src/seraphis_main/tx_builders_legacy_inputs.h Outdated Show resolved Hide resolved
src/seraphis_main/tx_validators.cpp Outdated Show resolved Hide resolved
src/seraphis_mocks/mock_ledger_context.h Outdated Show resolved Hide resolved
src/seraphis_mocks/mock_ledger_context.h Outdated Show resolved Hide resolved
@jeffro256
Copy link
Author

jeffro256 commented May 4, 2024

Rebased and added compact legacy reference set serialization. See seraphis_serialization.h for the changes

src/seraphis_impl/seraphis_serialization.h Outdated Show resolved Hide resolved
src/seraphis_impl/seraphis_serialization.h Outdated Show resolved Hide resolved
src/seraphis_impl/seraphis_serialization.h Show resolved Hide resolved
@UkoeHB
Copy link
Owner

UkoeHB commented May 15, 2024

Can you get it to pass CI?

@jeffro256
Copy link
Author

Rebased to integrate with the async wallet scanner changes

@jeffro256
Copy link
Author

Windows build and MacOS builds are broken because of wrong Boost versions

Copy link
Owner

@UkoeHB UkoeHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one comment then merge!

src/seraphis_mocks/scan_context_async_mock.cpp Outdated Show resolved Hide resolved
This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses
(amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how
the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed,
saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive
database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing
scheme.

In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming
most ring members will RingCT enotes.

We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under
the legacy indexing scheme.

Co-authored-by: SNeedlewoods <sneedlewoods_1@protonmail.com>
@jeffro256
Copy link
Author

Rebased to resolve merge conflicts with #43 and edited for @UkoeHB's latest comment

@UkoeHB UkoeHB merged commit ace9228 into UkoeHB:seraphis_lib May 23, 2024
16 of 18 checks passed
@jeffro256 jeffro256 mentioned this pull request May 24, 2024
UkoeHB pushed a commit that referenced this pull request Aug 2, 2024
This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses
(amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how
the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed,
saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive
database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing
scheme.

In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming
most ring members will RingCT enotes.

We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under
the legacy indexing scheme.

Co-authored-by: SNeedlewoods <sneedlewoods_1@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants