Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NCC-E005955-DBV] zebra-chain: Unbounded Rejection Sampling with Possibility of Panics #6338

Closed
Tracked by #6277
mpguerra opened this issue Mar 16, 2023 · 8 comments · Fixed by #6385
Closed
Tracked by #6277
Assignees
Labels
C-audit Category: Issues arising from audit findings C-cleanup Category: This is a cleanup C-security Category: Security issues I-panic Zebra panics with an internal error message S-needs-triage Status: A bug report needs triage

Comments

@mpguerra
Copy link
Contributor

mpguerra commented Mar 16, 2023

Impact

The zebra-chain module implements random number generation for some types which are primarily used for testing, without guarding against possible panics.

Description

The use of fill_bytes() in the code snippet below (from zebra-chain/src/sapling/keys.rs) can potentially cause panics. It is recommended to use try_fill_bytes() to catch the random number generator’s failures1.

In addition, the number of sampling retries by the loop has to be bounded.

impl Diversifier {
/// Generate a new _Diversifier_ that has already been confirmed
/// as a preimage to a valid diversified base point when used to
/// derive a diversified payment address.
///
/// <https://zips.z.cash/protocol/protocol.pdf#saplingkeycomponents>
/// <https://zips.z.cash/protocol/protocol.pdf#concretediversifyhash>
pub fn new<T>(csprng: &mut T) -> Self
where
T: RngCore + CryptoRng,
{
loop {
let mut bytes = [0u8; 11];
csprng.fill_bytes(&mut bytes);
if diversify_hash(bytes).is_some() {
break Self(bytes);
}
}
}
}

There is one instance where this can potentially be problematic for a full node with mining support. Generating a random Rho for an Orchard note uses the same unchecked RNG API.

A node with mining capabilities might use this API to generate a dummy Orchard input note as a nullifier for the output note when creating a miner’s reward note:

impl Rho {
pub fn new<T>(csprng: &mut T) -> Self
where
T: RngCore + CryptoRng,
{
let mut bytes = [0u8; 32];
csprng.fill_bytes(&mut bytes);
Self(extract_p(pallas::Point::from_bytes(&bytes).unwrap()))
}
}


  1. For instance OsRng can, in rare occasions, fail on some platforms, and thread_rng might return a
    warning if it fails to reseed itself. See https://rust-random.github.io/book/guide-err.html for more
    details.

Recommendation

Consider replacing the fill_bytes() usage with try_fill_bytes() and properly handle its possible failure.

Location

@mpguerra mpguerra added P-Medium ⚡ I-panic Zebra panics with an internal error message C-audit Category: Issues arising from audit findings S-needs-triage Status: A bug report needs triage C-cleanup Category: This is a cleanup labels Mar 16, 2023
@mpguerra
Copy link
Contributor Author

@oxarbitrage oxarbitrage self-assigned this Mar 20, 2023
@teor2345 teor2345 added C-testing Category: These are tests C-security Category: Security issues and removed C-testing Category: These are tests labels Mar 22, 2023
@teor2345
Copy link
Contributor

This doesn't block starting work on Orchard shielded coinbase mining, but it does block running on mainnet.

@daira
Copy link
Contributor

daira commented Apr 19, 2023

The documentation of when fill_bytes can panic (which is documented to be exactly when try_fill_bytes can return an error) does not convince me that zebra shouldn't actually panic in this situation:

This method should guarantee that dest is entirely filled with new data, and may panic if this is impossible (e.g. reading past the end of a file that is being used as the source of randomness).

We shouldn't read a file to get randomness, or anything similarly unreliable, for a cryptographic RNG. Note that this method is defined on RngCore, and the design of the rand crate makes CryptoRng just a marker trait, so it can't change the API. But it's likely that the design of RngCore was intended to allow a wide variety of possible sources of randomness that are not necessarily cryptographic, and that informed the error handling in ways that are less than ideal for a cryptographic RNG.

The Zcash protocols assume that it is always possible to get randomness. If an RNG returns an error, then either:

  • The RNG is always going to fail for the rest of this process execution. In that case the process can't reasonably continue, since it's going to need randomness very frequently and there aren't many useful things it can do without it. Since zebra is designed to tolerate crashes, the best thing to do is to crash immediately.
  • The failure is intermittent. In that case, how do we know that it isn't correlated with the output that the RNG would have returned? If it is, then that potentially biases future output. So again zebra should crash, in order to minimize the damage and make sure that the user knows something is seriously wrong.
  • The failure could be a symptom of something else, like memory corruption / undefined behaviour. Again zebra should crash to minimize possible damage.

@dconnolly
Copy link
Contributor

dconnolly commented Apr 20, 2023

The documentation of when fill_bytes can panic (which is documented to be exactly when try_fill_bytes can return an error) does not convince me that zebra shouldn't actually panic in this situation:

This method should guarantee that dest is entirely filled with new data, and may panic if this is impossible (e.g. reading past the end of a file that is being used as the source of randomness).

We shouldn't read a file to get randomness, or anything similarly unreliable, for a cryptographic RNG. Note that this method is defined on RngCore, and the design of the rand crate makes CryptoRng just a marker trait, so it can't change the API. But it's likely that the design of RngCore was intended to allow a wide variety of possible sources of randomness that are not necessarily cryptographic, and that informed the error handling in ways that are less than ideal for a cryptographic RNG.

The Zcash protocols assume that it is always possible to get randomness. If an RNG returns an error, then either:

  • The RNG is always going to fail for the rest of this process execution. In that case the process can't reasonably continue, since it's going to need randomness very frequently and there aren't many useful things it can do without it. Since zebra is designed to tolerate crashes, the best thing to do is to crash immediately.
  • The failure is intermittent. In that case, how do we know that it isn't correlated with the output that the RNG would have returned? If it is, then that potentially biases future output. So again zebra should crash, in order to minimize the damage and make sure that the user knows something is seriously wrong.
  • The failure could be a symptom of something else, like memory corruption / undefined behaviour. Again zebra should crash to minimize possible damage.

All of these sources of randomness are happening in a library, which in general terms should not panic, but raise an error. The consumer of the library should determine their behavior on catching such an error, such as panic, for example if they are a wallet. We don't have a wallet yet. If zebrad is generating Rho's for Orchard notes for mining, or generating note commitments, we should catch the error at the application level and panic, but I don't think we are doing that yet: we are generating block templates that miners use. For shielded coinbase (Sapling and Orchard), we (zebrad / zebra-consensus, the consumer of zebra-chain) will catch any of these errors and then panic.

@mpguerra
Copy link
Contributor Author

Can we document this in the code for future use cases?

@mergify mergify bot closed this as completed in #6385 Apr 20, 2023
@daira
Copy link
Contributor

daira commented Apr 23, 2023

All of these sources of randomness are happening in a library, which in general terms should not panic, but raise an error. The consumer of the library should determine their behavior on catching such an error, such as panic, for example if they are a wallet. We don't have a wallet yet. If zebrad is generating Rho's for Orchard notes for mining, or generating note commitments, we should catch the error at the application level and panic, but I don't think we are doing that yet: we are generating block templates that miners use. For shielded coinbase (Sapling and Orchard), we (zebrad / zebra-consensus, the consumer of zebra-chain) will catch any of these errors and then panic.

I disagree, and I've told you my considered professional opinion as a cryptographic engineer. The auditors are wrong, and a panic is the appropriate behaviour. It does not matter whether the panic happens at the library or application layer, except that the latter has slightly greater risk of the panic not actually happening, or of there being negative consequences of undefined behaviour if the failure is due to that.

(If zebra didn't follow crash-only design, it might be appropriate to do a controlled shutdown that flushes data to disk for example. But it does, so that argument doesn't apply.)

@teor2345
Copy link
Contributor

(If zebra didn't follow crash-only design, it might be appropriate to do a controlled shutdown that flushes data to disk for example. But it does, so that argument doesn't apply.)

At this point a controlled shutdown isn't required in Zebra, because we commit each verified block to disk in a database transaction. So either the transaction commits and the database is valid with the new block, or it is rolled back on restart and it's valid with the old block. Losing a block doesn't matter, because we drop the last 100 blocks in memory every time we restart anyway.

This would be different if we had a wallet, or any other kind of non-transactional persistent data. In that case, we'd probably want to do an atomic switch to the new version of the file (or directory):
https://users.rust-lang.org/t/how-to-write-replace-files-atomically/42821
https://docs.rs/atomicwrites/latest/atomicwrites/struct.AtomicFile.html#method.new

@teor2345
Copy link
Contributor

(This is just for info, it's not related to the specific decision about cryptographic panics.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-audit Category: Issues arising from audit findings C-cleanup Category: This is a cleanup C-security Category: Security issues I-panic Zebra panics with an internal error message S-needs-triage Status: A bug report needs triage
Projects
Archived in project
5 participants