Legacy Indexing Overhaul #42

jeffro256 · 2024-04-07T01:40:07Z

This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses
(amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed, saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing scheme.

In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming most ring members will RingCT enotes.

We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under the legacy indexing scheme.

j-berman

First pass review

It's unfortunate that this duplicates some work from #40 and has similar problems that PR addresses. This PR seems less complex and less complexity would be my preference though (apologies @SNeedlewoods :/)

src/seraphis_core/legacy_output_index.h

j-berman · 2024-04-19T04:13:07Z

src/seraphis_main/scan_balance_recovery_utils.cpp

@@ -72,14 +72,15 @@ static bool try_view_scan_legacy_enote_v1(const rct::key &legacy_base_spend_pubk
    const std::uint64_t block_index,
    const std::uint64_t block_timestamp,
    const rct::key &transaction_id,
-    const std::uint64_t total_enotes_before_tx,
+    const std::function<std::uint64_t(rct::xmr_amount)> &total_enotes_before_tx,


I think it would be simpler to just pass the enote_ledger_index here and get rid of this std::function, and also get rid of enote_tx_index_by_amount_inout. This function param is going to be a little annoying to re-implement in the scanner.

Using enote_tx_index_by_amount_inout like this also conflicts with #43 since the same enote can be re-scanned with different tx pub keys. Seems simpler to just get rid of that by going with a simple enote_ledger_index passed into the function

total_enotes_before_tx doesn't just return one result for the whole tx, it will return different results based on which amounts you pass to it. We could pass a map, maybe? That would require we pre-scan the tx to look for the ledger indexing amounts in the enotes before calling this function.

We could pass a map, maybe?

Was suggesting pass a vector of enote_ledger_index's into try_find_legacy_enotes_in_tx over here

Then when iterating over enotes, reference the specific enote_ledger_index to pass into try_view_scan_legacy_enote_v1

But a map is fine too

require we pre-scan the tx to look for the ledger indexing amounts in the enotes before calling this function

The daemon currently returns a vector of global output indices per tx (source); the respective amount is inferred from the tx output. So technically "pre-scanning" is already done on the server-side today

j-berman · 2024-04-19T04:15:14Z

src/seraphis_main/scan_balance_recovery_utils.cpp

    contextual_record_out.origin_context =
        SpEnoteOriginContextV1{
                .block_index        = block_index,
                .block_timestamp    = block_timestamp,
                .transaction_id     = transaction_id,
                .enote_tx_index     = enote_index,
-                .enote_ledger_index = total_enotes_before_tx + enote_index,
+                .enote_ledger_index = enote_ledger_index,


This messes up the is_older_than implementation when comparing SpEnoteOriginContextV1's enote_ledger_index's

enotes with different transparent amounts can't be compared using this to determine age

Same as this comment: #40 (comment)

Proposal for is_older_than: add tx_index_in_block to SpEnoteOriginContextV1

With this, we can then determine is_older_than for SpEnoteOriginContextV1 using the block_index, tx_index_in_block, and enote_tx_index

Would be useful not only to guarantee correctness of is_older_than, but also to guarantee a consistent order between the enote store and a wallet2 m_transfers container. The latter is marginally useful for populating a wallet2 instance from the enote store.

Potentially worth a separate PR

j-berman · 2024-04-19T04:45:05Z

src/seraphis_mocks/mock_ledger_context.cpp

@@ -784,7 +821,7 @@ void MockLedgerContext::get_onchain_chunk_legacy(const std::uint64_t chunk_start
                            total_output_count_before_tx,


A vector of the enote ledger indexes would be much appreciated here instead of total_output_count_before_tx. This function just seems like a pain

src/seraphis_main/tx_builders_legacy_inputs.cpp

j-berman · 2024-04-19T05:34:25Z

src/seraphis_core/legacy_enote_types.cpp

+    if (variant.is_type<LegacyEnoteV1>())
+        return variant.unwrap<LegacyEnoteV1>().amount;
+    else
+        return 0;


An unfortunate caveat here: LegacyEnoteV4 can either be pre-RCT or coinbase. This needs to return the amount for the former and 0 for the latter

EDIT: so we don't forget in this PR, I forgot to mention here LegacyEnoteV1 can also either be pre-RCT or post-RCT pre-view tag coinbase. Also needs amount for the former and 0 for the latter

Okay I updated the PR with changes shared from PR #40, and together they tackle the problem nicely.

j-berman · 2024-04-19T05:53:02Z

src/seraphis_impl/serialization_demo_types.h

-    /// on-chain indices of the proof's ring members (serializable as index offsets)
-    std::vector<std::uint64_t> reference_set_COMPACT;
+    /// on-chain indices of the proof's ring members
+    std::vector<std::pair<std::uint64_t, std::uint64_t>> reference_set;


Question: the reason you got rid of compaction is because this LegacyRingSignatureV4 could have {pre-RCT, RCT} enotes in the same ring, right? If instead each ring was tied to a single ledger_indexing_amount, then it would be trivial to just keep the same reference_set_COMPACT as is and add a field for the ledger_indexing_amount

IIUC I think this is fine as is

Mainly I got rid of compaction because I'm overhauling serialization in #39 anyways. Here's how I would do compaction:

Order pairs by increasing amounts

Order each pair for a given amount by increasing index number

Decumulate the list of unique amounts

Decumulate the list of index numbers for a given amount

Store number of unique amounts, then entries of amount offset and a list of index number offets

For example, if your reference set entries (a_j, i_k) is {(0, 0}, (0, 5), (0, 8), (1, 4), (1, 5), (3, 0), (3, 10)}, the encoding could look like so:

3 // number of unique amounts 0 // base amount 3 // number of indexes for amount '0' 0 // base index for amount '0' 5 // index offset 3 // index offset 1 // amount offset 2 // number of indexes for amount '1' 4 // base index for amount '1' 1 // index offset 2 // amount offset 2 // number of indexes for amount '3' 0 // base index for amount '3' 10 // index offset

You can see how this can get pretty compact if there are not many unique ledger indexing amounts. For example, if we want to specify the first 20 RingCT enotes for our reference set (ledger indexing amount is 0), then our encoding would be: 1 0 20 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.

Edit: It's in this PR now

Definitely important to include this, with good tests. For follow-up PR.

src/seraphis_main/tx_component_types_legacy.cpp

src/seraphis_main/tx_builders_multisig.cpp

SNeedlewoods

Very glad to see how much cleaner this is.

Apart from the few minor things I commented on, this looks good to me and I approve this PR.

src/seraphis_core/legacy_output_index.h

src/seraphis_main/contextual_enote_record_types.h

src/seraphis_main/contextual_enote_record_utils.cpp

src/seraphis_main/scan_balance_recovery_utils.cpp

src/seraphis_main/tx_component_types_legacy.cpp

src/seraphis_mocks/mock_ledger_context.cpp

jeffro256 · 2024-05-01T13:37:59Z

Thanks @SNeedlewoods !

src/common/container_helpers.h

src/seraphis_core/legacy_decoy_selector_flat.cpp

src/seraphis_core/legacy_decoy_selector_flat.h

src/seraphis_core/legacy_output_index.h

src/seraphis_main/scan_balance_recovery_utils.cpp

src/seraphis_main/tx_builders_legacy_inputs.h

src/seraphis_main/tx_validators.cpp

src/seraphis_mocks/mock_ledger_context.h

src/common/container_helpers.h

jeffro256 · 2024-05-04T15:56:24Z

Rebased and added compact legacy reference set serialization. See seraphis_serialization.h for the changes

src/seraphis_impl/seraphis_serialization.h

UkoeHB · 2024-05-15T21:21:39Z

Can you get it to pass CI?

jeffro256 · 2024-05-16T17:13:19Z

Rebased to integrate with the async wallet scanner changes

jeffro256 · 2024-05-16T17:13:44Z

Windows build and MacOS builds are broken because of wrong Boost versions

UkoeHB

Sorry, one comment then merge!

src/seraphis_mocks/scan_context_async_mock.cpp

This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses (amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed, saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing scheme. In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming most ring members will RingCT enotes. We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under the legacy indexing scheme. Co-authored-by: SNeedlewoods <sneedlewoods_1@protonmail.com>

jeffro256 · 2024-05-23T19:08:29Z

Rebased to resolve merge conflicts with #43 and edited for @UkoeHB's latest comment

This PR removes "universal"-style indexing for legacy CLSAG rings, and replaces it with a reference set scheme that uses (amount, index in amount) indexing pairs to reference on-chain enotes. This is the same method that Cryptonote txs use, and is how the current Monero Core LMDB database is referenced. Doing things this way means that the database will not have to be re-indexed, saving at a very minimum 1.6 GB (100M on-chain enotes * (16 bytes for extra table keys)) of storage space, and an expensive database migration involving moving all existing enote data to a new table. We change the MockLedgerContext to support this indexing scheme. In practice, serialized txs under this method shouldn't take up much more space than pre-PR if compressed clever-ly, and assuming most ring members will RingCT enotes. We also add LegacyEnoteOriginContext for contextualized enote records so we can better keep tracked of scanned legacy enotes under the legacy indexing scheme. Co-authored-by: SNeedlewoods <sneedlewoods_1@protonmail.com>

jeffro256 marked this pull request as ready for review April 8, 2024 12:22

j-berman reviewed Apr 19, 2024

View reviewed changes

jeffro256 commented Apr 23, 2024

View reviewed changes

src/seraphis_main/tx_builders_multisig.cpp Show resolved Hide resolved

j-berman mentioned this pull request Apr 24, 2024

add LegacyEnoteOriginContext seraphis-migration/monero#16

Closed

jeffro256 force-pushed the leg_clsag_indexing branch from 1fe2d30 to 5dd9885 Compare April 26, 2024 15:16

jeffro256 changed the title ~~Use legacy-compatible indexing (amount, index in amount) pairs in txs~~ Legacy Indexing Overhaul Apr 26, 2024

SNeedlewoods approved these changes Apr 28, 2024

View reviewed changes

SNeedlewoods mentioned this pull request Apr 29, 2024

add LegacyEnoteOriginContext #40

Closed

j-berman mentioned this pull request Apr 29, 2024

Nice-to-have data for the scanner to identify (or pass back to a consumer) in order of preference #48

Open

jeffro256 mentioned this pull request May 1, 2024

Async wallet scanner #23

Merged

UkoeHB requested changes May 2, 2024

View reviewed changes

UkoeHB reviewed May 3, 2024

View reviewed changes

src/common/container_helpers.h Outdated Show resolved Hide resolved

jeffro256 force-pushed the leg_clsag_indexing branch from 104d91b to 8bd201b Compare May 4, 2024 15:55

jeffro256 force-pushed the leg_clsag_indexing branch from 8bd201b to f26ce71 Compare May 4, 2024 16:01

UkoeHB requested changes May 7, 2024

View reviewed changes

src/seraphis_impl/seraphis_serialization.h Outdated Show resolved Hide resolved

src/seraphis_impl/seraphis_serialization.h Outdated Show resolved Hide resolved

src/seraphis_impl/seraphis_serialization.h Show resolved Hide resolved

jeffro256 force-pushed the leg_clsag_indexing branch from 8c13298 to e9b159a Compare May 16, 2024 16:54

UkoeHB requested changes May 21, 2024

View reviewed changes

src/seraphis_mocks/scan_context_async_mock.cpp Outdated Show resolved Hide resolved

jeffro256 force-pushed the leg_clsag_indexing branch from e9b159a to 9e9aa1b Compare May 23, 2024 19:07

UkoeHB approved these changes May 23, 2024

View reviewed changes

UkoeHB merged commit ace9228 into UkoeHB:seraphis_lib May 23, 2024
16 of 18 checks passed

jeffro256 mentioned this pull request May 24, 2024

jamtis changes #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legacy Indexing Overhaul #42

Legacy Indexing Overhaul #42

jeffro256 commented Apr 7, 2024 •

edited

Loading

j-berman left a comment

j-berman Apr 19, 2024

jeffro256 Apr 22, 2024

j-berman Apr 22, 2024

j-berman Apr 19, 2024

j-berman Apr 24, 2024

j-berman Apr 19, 2024

j-berman Apr 19, 2024 •

edited

Loading

jeffro256 Apr 26, 2024 •

edited

Loading

j-berman Apr 19, 2024

jeffro256 Apr 22, 2024 •

edited

Loading

UkoeHB May 2, 2024

SNeedlewoods left a comment

jeffro256 commented May 1, 2024

jeffro256 commented May 4, 2024 •

edited

Loading

UkoeHB commented May 15, 2024

jeffro256 commented May 16, 2024

jeffro256 commented May 16, 2024

UkoeHB left a comment

jeffro256 commented May 23, 2024

		@@ -784,7 +821,7 @@ void MockLedgerContext::get_onchain_chunk_legacy(const std::uint64_t chunk_start
		total_output_count_before_tx,

Legacy Indexing Overhaul #42

Legacy Indexing Overhaul #42

Conversation

jeffro256 commented Apr 7, 2024 • edited Loading

j-berman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-berman Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

jeffro256 Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffro256 Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SNeedlewoods left a comment

Choose a reason for hiding this comment

jeffro256 commented May 1, 2024

jeffro256 commented May 4, 2024 • edited Loading

UkoeHB commented May 15, 2024

jeffro256 commented May 16, 2024

jeffro256 commented May 16, 2024

UkoeHB left a comment

Choose a reason for hiding this comment

jeffro256 commented May 23, 2024

jeffro256 commented Apr 7, 2024 •

edited

Loading

j-berman Apr 19, 2024 •

edited

Loading

jeffro256 Apr 26, 2024 •

edited

Loading

jeffro256 Apr 22, 2024 •

edited

Loading

jeffro256 commented May 4, 2024 •

edited

Loading