Skip to content

feat: fetch commitments only#1347

Merged
bobbinth merged 5 commits intonextfrom
igamigo-optimize-query
Nov 12, 2025
Merged

feat: fetch commitments only#1347
bobbinth merged 5 commits intonextfrom
igamigo-optimize-query

Conversation

@igamigo
Copy link
Collaborator

@igamigo igamigo commented Nov 10, 2025

Addresses #1338 (comment), mainly with the motivation to avoid getting and deserializing more data than needed from the DB.

@igamigo igamigo marked this pull request as ready for review November 10, 2025 00:06
@igamigo igamigo added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Nov 10, 2025
@igamigo igamigo requested review from Mirko-von-Leipzig and bobbinth and removed request for bobbinth November 10, 2025 02:58
Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a potential impl optimization

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to check for existence, and then retain the commitment from the inputs instead of re-deserializing it?

I'm unsure how to do this with Diesel, but in raw sqlite I would have done a loop with exists to filter the input (in Rust) and build the output hashset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this, maybe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite; I'm hoping for the database to return a bool per input indicating its existence within the table. @drahnr maybe knows how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure where this pointed to in the past, both the link and the comment attachment seem to not exist anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the comment was for the file, and the link does exist, it's a link to a commit I've just reverted. Basically @Mirko-von-Leipzig is referring to this section of the code, where the input is serialized and then deserialized back to its original form and he's suggesting it could be optimized a bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drahnr as a summary:

We want to know which of the given input set of note commitments exist in the DB table. This is currently implemented by returning the union of the notes and the input set. This returns the full data for each commitment, which we already have in the input set - and we only want to test for existence actually.

I'm proposing we instead return a bool for each input if it exists in the table.

In raw sql terms:

-- current
SELECT commitment FROM notes WHERE commitment IN ?;

-- proposed (but maybe there is a better way for multiple rows?)
SELECT EXISTS (SELECT 1 FROM notes where commitment = ?);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diesel query would look like this:

let sub_query = schema::notes::table
    .select(0.into_sql::<diesel::sql_types::Integer>())
    .filter(schema::notes::commitment.eq(desired_commitment));
let result = schema::notes::table.exists(diesel::dsl::exists(sub_query)).get_results::<bool>(&mut conn)?;

We should make sure to have an index for it.

The alt is using an intermediate left join, but I'd not go there unless needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but just as a thought. Separate (but similar) to the sql query, we may also want to consider minimizing the return message size by returning true/false for each commitment. This is under the assumption that we expect large note sets and that sometimes there will be many, and sometimes there will be few existing notes.

Though maybe this is premature and we should wait until this shows up as a hotspot. Options as I see them:

  1. Return existing notes
  2. Return non-existing notes
  3. Return flag for each
  4. Do any of the above with an enum to optimize dynamically based on whichever is smallest.

cc @bobbinth

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a couple of comments inline.

Not for this PR, but just as a thought. Separate (but similar) to the sql query, we may also want to consider minimizing the return message size by returning true/false for each commitment. This is under the assumption that we expect large note sets and that sometimes there will be many, and sometimes there will be few existing notes.

Though maybe this is premature and we should wait until this shows up as a hotspot. Options as I see them:

  1. Return existing notes
  2. Return non-existing notes
  3. Return flag for each
  4. Do any of the above with an enum to optimize dynamically based on whichever is smallest.

I think it is a bit too early to work on optimizations like this - but let's create an issue for this (in my mind, this falls into a similar category as #1200).

Comment on lines 234 to 240
let commitments: HashSet<Word> = note_commitments
.iter()
.zip(&serialized)
.filter_map(|(&word, serialized)| {
existing_bytes.contains(serialized.as_slice()).then_some(word)
})
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why do we need to compare against serialized here? Wouldn't the data returned from the database already include only the commitments in the serialized vec? If so, then the only thing we'd need to do is map bytes to words.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was mostly an attempt to tackle this comment, where we avoid deserializing all the commitments again (after having serialized them once already)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably hold off on making this optimization as it is not immediately clear to me that repeated searching through existing_bytes would be more performant than just deserializing them.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you!

@bobbinth bobbinth merged commit e693d57 into next Nov 12, 2025
6 checks passed
@bobbinth bobbinth deleted the igamigo-optimize-query branch November 12, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants