-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bump curve25519-dalek from 3.2.1 to 4.1.3 #1693
Conversation
@samkim-crypto do you have any idea why https://github.com/solana-labs/solana-program-library/blob/646efc13449fa6e1a8f69d767283478d2a0002e4/token/cli/tests/command.rs#L2756-L2770 couldn't pass with this patch 🤔 |
This looks suspicious: https://github.com/anza-xyz/agave/pull/1693/files#diff-d660b342bff7546a265b36794627b9a07221e9356049a4848418ab7c696c30df If the generators change, a proof generated with old code won't verify with new code. |
Hm, I did check this specifically. I believe it is the same Sha3 generators, just with different syntax. The tests seem to work successfully on my local machine 🤔 for some reason, but let me dig into it a bit more. |
let payer = &context.payer; | ||
let recent_blockhash = context.last_blockhash; | ||
|
||
// verify a valid proof (wihtout creating a context account) | ||
let instructions = vec![proof_instruction.encode_verify_proof(None, success_proof_data)]; | ||
let transaction = Transaction::new_signed_with_payer( | ||
&instructions.with_max_compute_unit_limit(), | ||
Some(&payer.pubkey()), | ||
&[payer], | ||
recent_blockhash, | ||
client.get_latest_blockhash().await.unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor
While this is just tests, why request the blockhash over and over again?
Considering that the current version is already caching it, it seems easier to just update the line that was requesting the blockhash in the first place.
let payer = &context.payer; | |
let recent_blockhash = context.last_blockhash; | |
// verify a valid proof (wihtout creating a context account) | |
let instructions = vec![proof_instruction.encode_verify_proof(None, success_proof_data)]; | |
let transaction = Transaction::new_signed_with_payer( | |
&instructions.with_max_compute_unit_limit(), | |
Some(&payer.pubkey()), | |
&[payer], | |
recent_blockhash, | |
client.get_latest_blockhash().await.unwrap(), | |
let payer = &context.payer; | |
let recent_blockhash = client.get_latest_blockhash().await.unwrap(); | |
// verify a valid proof (wihtout creating a context account) | |
let instructions = vec![proof_instruction.encode_verify_proof(None, success_proof_data)]; | |
let transaction = Transaction::new_signed_with_payer( | |
&instructions.with_max_compute_unit_limit(), | |
Some(&payer.pubkey()), | |
&[payer], | |
recent_blockhash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it hit https://buildkite.com/anza/agave/builds/5647#018ff290-ded1-4ce0-a248-ef109173cc6a/74-6193
the line number changed but it is this func:
Lines 3146 to 3153 in da36ce7
pub fn get_blockhash_last_valid_block_height(&self, blockhash: &Hash) -> Option<Slot> { | |
let blockhash_queue = self.blockhash_queue.read().unwrap(); | |
// This calculation will need to be updated to consider epoch boundaries if BlockhashQueue | |
// length is made variable by epoch | |
blockhash_queue | |
.get_hash_age(blockhash) | |
.map(|age| self.block_height + MAX_PROCESSING_AGE as u64 - age) | |
} |
I found the age is increasing more than before so it failed. you should able reproduce the error by
- clone this PR
- revert the blockhash change
cargo test -p solana-zk-token-proof-program-tests -- --exact test_withdraw
also I think it's harmless, and even better, to always create a tx with the latest blockhash 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it into a separate PR then.
With a proper explanation as to why this change is being made.
sdk/program/src/pubkey.rs
Outdated
curve25519_dalek::edwards::CompressedEdwardsY::from_slice(_bytes.as_ref()) | ||
.decompress() | ||
.is_some() | ||
.map(|compressed_edwards_y| compressed_edwards_y.decompress().is_some()) | ||
.unwrap_or(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is more verbose, I find the following easier to follow:
curve25519_dalek::edwards::CompressedEdwardsY::from_slice(_bytes.as_ref()) | |
.decompress() | |
.is_some() | |
.map(|compressed_edwards_y| compressed_edwards_y.decompress().is_some()) | |
.unwrap_or(false) | |
use curve25519_dalek::edwards::CompressedEdwardsY; | |
let bytes = _bytes.as_ref(); | |
let Ok(compressed_edwards_y) = CompressedEdwardsY::from_slice(bytes) else { | |
return false; | |
} | |
compressed_edwards_y.decompress().is_some() |
In particular, the original version was putting false
result into the Ok
part of the Result
value. Which was really an indication of a failure.
It is harder to make sure that it was indeed correct, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean "putting false result into the Ok part of the Result value"? afaik the original version of code will only return true
or false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you follow the logic of the expression, you'll see the following:
CompressedEdwardsY::from_slice(_bytes.as_ref())
produces Result<CompressedEdwardsY, Error>
- here the "good" value is in the Ok()
part of the result, and the error is in Err()
.
This is expected.
But after
.map(|compressed_edwards_y| compressed_edwards_y.decompress().is_some())
you get Result<Bool, Error>
.
Where Ok(true)
indicates a successful decompression, while Ok(false)
indicates a decompression failure. With Err(_)
still holding the parsing error.
This is somewhat confusing, as the failed decompression is represented by an Ok(_)
value, even while we still have an Err(_)
.
Finally
.unwrap_or(false)
will combine the parsing error and the decompression failure into a single false
.
The whole thing works. But when I try to follow it, it just feels like following a puzzle, rather than reading something that is supposed to explain an intent of the original author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay! thank you!
fn validate_point(&self) -> bool { | ||
CompressedEdwardsY::from_slice(&self.0) | ||
.decompress() | ||
.is_some() | ||
.map(|compressed_edwards_y| compressed_edwards_y.decompress().is_some()) | ||
.unwrap_or(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar argument here.
I think using Ok(false)
is a bit misleading.
fn validate_point(&self) -> bool { | |
CompressedEdwardsY::from_slice(&self.0) | |
.decompress() | |
.is_some() | |
.map(|compressed_edwards_y| compressed_edwards_y.decompress().is_some()) | |
.unwrap_or(false) | |
let compressed_edwards_y = CompressedEdwardsY::from_slice(&self.0) else { | |
return false; | |
}; | |
compressed_edwards_y.decompress().is_some() |
In another case below, the same conversion was done like this:
fn validate_point(&self) -> bool {
CompressedRistretto::from_slice(&self.0)
.ok()
.and_then(|compressed_ristretto| compressed_ristretto.decompress())
.is_some()
}
While I find it a bit harder to actually verify a version that uses a chain of Options
, I think, it is an OK alternative from the readability standpoint.
But it is better to use the same style everywhere.
So either use let / else
everywhere, or .ok()/.and_then()/.is_some()
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any preference between let / else
and .ok()/.and_then()/.is_some()
? I think I tried to do the later in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think both syntax have pros and cons. I personally preferred .ok()/.and_then/.is_some()
since it is more concise, but @ilya-bobyr's point about let / else
being more readable is also quite true since there is no point of having Ok(false)
when we can just return false
earlier in the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find .ok()/.and_then/.is_some()
chains harder to follow.
Just recently, we had an issue when ok_or_else()
was used instead of unwrap_or_else()
and it went unnoticed: #1689
I was suggesting we rewrite it as an if/else
and even during a discussion of the very same block nobody noticed that the original logic is incorrect: #1301
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. I will try to refactor all logic in this PR to let / else
style 🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn validate_point(&self) -> bool { | ||
CompressedRistretto::from_slice(&self.0) | ||
.decompress() | ||
.ok() | ||
.and_then(|compressed_ristretto| compressed_ristretto.decompress()) | ||
.is_some() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this conversion is inconsistent with the other two that perform an identical transformation.
Please choose one style and use it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. I think now I understand what you're referring to. I will check the code again.
Ok(bytes) => Ok(ElGamalSecretKey::from( | ||
Scalar::from_canonical_bytes(bytes) | ||
Option::<Scalar>::from(Scalar::from_canonical_bytes(bytes)) | ||
.ok_or(ElGamalError::SecretKeyDeserialization)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor
Unrelated to this PR, as it is existing code, but this conversion might be also written as:
Ok(bytes) => Ok(ElGamalSecretKey::from( | |
Scalar::from_canonical_bytes(bytes) | |
Option::<Scalar>::from(Scalar::from_canonical_bytes(bytes)) | |
.ok_or(ElGamalError::SecretKeyDeserialization)?, | |
Ok(bytes) => Option::from(Scalar::from_canonical_bytes(bytes)) | |
.ok_or(ElGamalError::SecretKeyDeserialization) | |
.map(ElGamalSecretKey::from), |
It would be a bit more aligned with how similar conversions are written elsewhere in this codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. I will check the code again
ensure using the latest blockhash to process tx we see tests failing in CI due to the blockhash being invalid. details: #1693 (comment)
b52e242
to
23147ed
Compare
perf/src/sigverify.rs
Outdated
let compressed_edwards_y = CompressedEdwardsY::from_slice(&input); | ||
assert!(compressed_edwards_y.is_ok()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap()
is better than an assert!()
with no message.
assert!()
with no message will print very little context if it fails.
In particular, it will not print the error that caused the failure.
let compressed_edwards_y = CompressedEdwardsY::from_slice(&input); | |
assert!(compressed_edwards_y.is_ok()); | |
let compressed_edwards_y = CompressedEdwardsY::from_slice(&input).unwrap(); |
perf/src/sigverify.rs
Outdated
let compressed_edwards_y = compressed_edwards_y.unwrap(); | ||
if let Some(ref_element) = compressed_edwards_y.decompress() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor
I think it is OK to combine unwrap()
with the decompress()
call.
The extra line does not seem to make it any clearer:
let compressed_edwards_y = compressed_edwards_y.unwrap(); | |
if let Some(ref_element) = compressed_edwards_y.decompress() { | |
if let Some(ref_element) = compressed_edwards_y.decompress().unwrap() { |
zk-sdk/src/encryption/elgamal.rs
Outdated
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(bytes) else { | ||
return Err(ElGamalError::PubkeyDeserialization); | ||
}; | ||
|
||
let Some(ristretto_point) = compressed_ristretto.decompress() else { | ||
return Err(ElGamalError::PubkeyDeserialization); | ||
}; | ||
|
||
Ok(ElGamalPubkey(ristretto_point)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is common to use ok_or()
or map_err()
to add error reporting.
It was just confusing to put error into the Ok()
part of a Result
.
So, I think, it is fine to write it like this, and there is no need to destruct the decompress()
result explicitly:
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(bytes) else { | |
return Err(ElGamalError::PubkeyDeserialization); | |
}; | |
let Some(ristretto_point) = compressed_ristretto.decompress() else { | |
return Err(ElGamalError::PubkeyDeserialization); | |
}; | |
Ok(ElGamalPubkey(ristretto_point)) | |
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(bytes) else { | |
return Err(ElGamalError::PubkeyDeserialization); | |
}; | |
compressed_ristretto | |
.decompress() | |
.ok_or(ElGamalError::PubkeyDeserialization) | |
.map(ElGamalPubkey) |
zk-sdk/src/encryption/elgamal.rs
Outdated
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(bytes) else { | ||
return None; | ||
}; | ||
|
||
let ristretto_point = compressed_ristretto.decompress()?; | ||
|
||
Some(DecryptHandle(ristretto_point)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, and else where - ok_or()
and other helper functions are fine when used sensibly.
People who write Rust should be able to read the following with no problem, it is a very common pattern:
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(bytes) else { | |
return None; | |
}; | |
let ristretto_point = compressed_ristretto.decompress()?; | |
Some(DecryptHandle(ristretto_point)) | |
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(bytes) else { | |
return None; | |
}; | |
compressed_ristretto.decompress() | |
.map(DecryptHandle) |
zk-sdk/src/sigma_proofs/mod.rs
Outdated
let Some(slice) = optional_slice else { | ||
return Err(SigmaProofVerificationError::Deserialization); | ||
}; | ||
|
||
if slice.len() != RISTRETTO_POINT_LEN { | ||
return Err(SigmaProofVerificationError::Deserialization); | ||
} | ||
|
||
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(slice) else { | ||
return Err(SigmaProofVerificationError::Deserialization); | ||
}; | ||
|
||
Ok(compressed_ristretto) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An explicit len()
check, I think, is improving readability.
But the last step can be shorter and still be readable:
let Some(slice) = optional_slice else { | |
return Err(SigmaProofVerificationError::Deserialization); | |
}; | |
if slice.len() != RISTRETTO_POINT_LEN { | |
return Err(SigmaProofVerificationError::Deserialization); | |
} | |
let Ok(compressed_ristretto) = CompressedRistretto::from_slice(slice) else { | |
return Err(SigmaProofVerificationError::Deserialization); | |
}; | |
Ok(compressed_ristretto) | |
let Some(slice) = optional_slice else { | |
return Err(SigmaProofVerificationError::Deserialization); | |
}; | |
if slice.len() != RISTRETTO_POINT_LEN { | |
return Err(SigmaProofVerificationError::Deserialization); | |
} | |
CompressedRistretto::from_slice(slice) | |
.map_err(|_| SigmaProofVerificationError::Deserialization)) |
Note that it is considered suboptimal to combine changes that are unrelated in a single commit.
This PR is already pretty big, and it is about adjusting the code base due to the API changes. Not about refactoring to make this code more readable.
While it is reasonable to fix locations that become more complex due to the API updates, we should probably keep other changes to a minimum.
And if there is a desire to refactor something, it would be much better to do it in a separate change.
ensure using the latest blockhash to process tx we see tests failing in CI due to the blockhash being invalid. details: anza-xyz#1693 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making all the changes.
I think it is a bit easier to follow those conversions now.
this upgrade is significantly increasing the execution time for some txs, but it appears to be necessary for improved security. will ignore some spl tests temporarily for this PR. below is a before/after comparison for before:
after:
|
@yihau any way you can get some data about where performance regressed? Flame graphs or other measurements? Also, what versions are you noticing the regression between? 3.x and 4.x? Or between two different 4.x versions? |
@tarcieri thank you for asking! I will check this one and post more details here! (btw, the regression I found is between 3.x and 4.x) updated: in fact, I found |
Any update on when this will be merged? |
I will check this on our edge node to see if it affects chain. |
Sorry to ask this question this late, but doesn't this break our backward compatibility now? Updating a major version of a dependency that is publicly exposed needs to cause a major version bump in the library. So this should only be possible in I also do not see a confirmation from @joncinque that he is OK with the downstream tests being patched as part of the verification run. I did not expect this to merge without Jon approval, to be honest, so I was not in a rush to ask. |
yes, it does. I thought we will plan to bp this one into 2.0.
I'm just thinking that those spl tests are created by Sam and he is aware of this issue so we should able to merge this one and debug later. I tried to do some benchmarking and confirmed with Sam about the curve25519 syscall. all of them show that the performance has increased. I think the only thing affected is the zk feature which we haven't published in any chain. looks like we have concerns about this one, just sent a revert PR to gather more consensus. #2055 |
If there's a performance problem, we can take a look. Where is the code? |
We can not backport an API breaking change, if we want to follow the symver rules.
My concern is not the performance, but the backward incompatible changes. If the SDK exports any of the types from The fact that If it turns out that none of the types are exposed in the SDK API or all of the exposed types are in the "unreleased" P.S. I have no experience using either of them. But I've seen both mentioned in a few articles. |
Wait, so does this mean this will only be bumped once 3.0 is released? Is there an ETA for this? 😅 |
@tarcieri thank you for asking! I guess we found the culprit. it looks like the auto-detect backend is affecting. I will do more tests to verify it later. very much appreciated that you're here. |
thanks so much for all the hard work here. This PR solves a ton of issues for us, let me know if i can be of any help. |
Hi folks, is there an ETA? Our crates still don't compile with the latest master:
|
Unfortunately, upgrade to 4.1.3 is a tricky problem as it might not be backward compatible.
While an upgrade will solve your problem, the issue is actually with the One workaround is described in the SDK [patch.crates-io.curve25519-dalek]
git = "https://github.com/anza-xyz/curve25519-dalek.git"
rev = "b500cdc2a920cd5bff9e2dd974d7b97349d61464" |
Sorry for the late response here. I think we've narrowed down the performance issue to the SIMD backend for curve25519-dalek performing poorly in a test-validator setting. I'll let @samkim-crypto confirm. We should not merge this change until we have adequate performance for the downstream tests, since we do want to enable the syscalls with 2.0. Considering this has been long-awaited, even though the change breaks semver, I think the benefits outweigh the potential harms, since 2.0 adoption is still new, and we've been backporting breaking changes to 2.0 in the meantime. So to summarize, I would like to see the performance issue resolved, and then we can also backport to 2.0. Please let me know if there are any specific concerns about the breaking change or otherwise. |
ensure using the latest blockhash to process tx we see tests failing in CI due to the blockhash being invalid. details: anza-xyz#1693 (comment)
Sorry I am just getting back to investigating the slow-down with the dalek-v4 upgrade, but from what I have found:
If we add the following in
@tarcieri, I just wanted to double check with you 🙏 :
|
@samkim-crypto what opt-level are you using for debug builds? I would probably suggest increasing that first if you haven’t already |
@tarcieri Thanks! The opt-level seems to have been the issue. With |
Why is this marked as merged when it isn't done? Will it be solved soon? |
…3335) CHANGELOG: Add entry for breaking change to curve22519-dalek #### Problem #1693 introduced a breaking change between v2.0 and v2.1 of the Solana crates by bumping a dependency to a new major version, but it isn't reflected in the changelog. #### Summary of changes Add a line in the changelog about the breaking change.
Problem
Summary of Changes
Fixes #