Conversation
Mirko-von-Leipzig
left a comment
There was a problem hiding this comment.
LGTM, with a potential impl optimization
There was a problem hiding this comment.
Would it be possible to check for existence, and then retain the commitment from the inputs instead of re-deserializing it?
I'm unsure how to do this with Diesel, but in raw sqlite I would have done a loop with exists to filter the input (in Rust) and build the output hashset.
There was a problem hiding this comment.
Not quite; I'm hoping for the database to return a bool per input indicating its existence within the table. @drahnr maybe knows how?
There was a problem hiding this comment.
I am unsure where this pointed to in the past, both the link and the comment attachment seem to not exist anymore
There was a problem hiding this comment.
I believe the comment was for the file, and the link does exist, it's a link to a commit I've just reverted. Basically @Mirko-von-Leipzig is referring to this section of the code, where the input is serialized and then deserialized back to its original form and he's suggesting it could be optimized a bit.
There was a problem hiding this comment.
@drahnr as a summary:
We want to know which of the given input set of note commitments exist in the DB table. This is currently implemented by returning the union of the notes and the input set. This returns the full data for each commitment, which we already have in the input set - and we only want to test for existence actually.
I'm proposing we instead return a bool for each input if it exists in the table.
In raw sql terms:
-- current
SELECT commitment FROM notes WHERE commitment IN ?;
-- proposed (but maybe there is a better way for multiple rows?)
SELECT EXISTS (SELECT 1 FROM notes where commitment = ?);There was a problem hiding this comment.
The diesel query would look like this:
let sub_query = schema::notes::table
.select(0.into_sql::<diesel::sql_types::Integer>())
.filter(schema::notes::commitment.eq(desired_commitment));
let result = schema::notes::table.exists(diesel::dsl::exists(sub_query)).get_results::<bool>(&mut conn)?;
We should make sure to have an index for it.
The alt is using an intermediate left join, but I'd not go there unless needed.
There was a problem hiding this comment.
Not for this PR, but just as a thought. Separate (but similar) to the sql query, we may also want to consider minimizing the return message size by returning true/false for each commitment. This is under the assumption that we expect large note sets and that sometimes there will be many, and sometimes there will be few existing notes.
Though maybe this is premature and we should wait until this shows up as a hotspot. Options as I see them:
- Return existing notes
- Return non-existing notes
- Return flag for each
- Do any of the above with an enum to optimize dynamically based on whichever is smallest.
cc @bobbinth
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you! I left a couple of comments inline.
Not for this PR, but just as a thought. Separate (but similar) to the sql query, we may also want to consider minimizing the return message size by returning
true/falsefor each commitment. This is under the assumption that we expect large note sets and that sometimes there will be many, and sometimes there will be few existing notes.Though maybe this is premature and we should wait until this shows up as a hotspot. Options as I see them:
- Return existing notes
- Return non-existing notes
- Return flag for each
- Do any of the above with an enum to optimize dynamically based on whichever is smallest.
I think it is a bit too early to work on optimizations like this - but let's create an issue for this (in my mind, this falls into a similar category as #1200).
| let commitments: HashSet<Word> = note_commitments | ||
| .iter() | ||
| .zip(&serialized) | ||
| .filter_map(|(&word, serialized)| { | ||
| existing_bytes.contains(serialized.as_slice()).then_some(word) | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
Question: why do we need to compare against serialized here? Wouldn't the data returned from the database already include only the commitments in the serialized vec? If so, then the only thing we'd need to do is map bytes to words.
There was a problem hiding this comment.
Was mostly an attempt to tackle this comment, where we avoid deserializing all the commitments again (after having serialized them once already)
There was a problem hiding this comment.
I would probably hold off on making this optimization as it is not immediately clear to me that repeated searching through existing_bytes would be more performant than just deserializing them.
This reverts commit a66baa4.
bobbinth
left a comment
There was a problem hiding this comment.
All looks good! Thank you!
Addresses #1338 (comment), mainly with the motivation to avoid getting and deserializing more data than needed from the DB.