New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial interpolation #61
Conversation
Implements barycentric Lagrange interpolation. Uses algorithm (3.1) from the paper "Polynomial Interpolation: Langrange vs Newton" by Wilhelm Werner to find the barycentric weights, and then evaluates at `Gf256::zero()` using the second or "true" form of the barycentric interpolation formula. I also earlier implemented a variant of this algorithm, Algorithm 2, from "A new efficient algorithm for polynomial interpolation," which uses less total operations than Werner's version, however, because it uses a lot more multiplications or divisions (depending on how you choose to write it), it runs slower given the running time of subtraction/ addition (equal) vs multiplication, and especially division in the Gf256 module. The new algorithm takes n^2 / 2 divisions and n^2 subtractions to calculate the barycentric weights, and another n divisions, n multiplications, and 2n additions to evaluate the polynomial*. The old algorithm runs in n^2 - n divisions, n^2 multiplications, and n^2 subtractions. Without knowing the exact running time of each of these operations, we can't say for sure, but I think a good guess would be the new algorithm trends toward about 1/3 running time as n -> infinity. It's also easy to see theoretically that for small n the original lagrange algorithm is faster. This is backed up by benchmarks, which showed for n >= 5, the new algorithm is faster. We can see that this is more or less what we should expect given the running times in n of these algorithms. To ensure we always run the faster algorithm, I've kept both versions and only use the new one when 5 or more points are given. Previously the tests in the lagrange module were allowed to pass nodes to the interpolation algorithms with x = 0. Genuine shares will not be evaluated at x = 0, since then they would just be the secret, so: 1. Now nodes in tests start at x = 1 like `scheme::secret_share` deals them out. 2. I have added assert statements to reinforce this fact and guard against division by 0 panics. This meant getting rid of the `evaluate_at_works` test, but `interpolate_evaluate_at_0_eq_evaluate_at` provides a similar test. Further work will include the use of barycentric weights in the `interpolate` function. A couple more interesting things to note about barycentric weights: * Barycentric weights can be partially computed if less than threshold shares are present. When additional shares come in, computation can resume with no penalty to the total runtime. * They can be determined totally independently from the y values of our points, and the x value we want to evaluate for. We only need to know the x values of our interpolation points.
While this is a slight regression in performance in the case where k < 5, in absolute terms it is small enough to be neglible.
Horner's method is an algorithm for calculating polynomials, which consists of transforming the monomial form into a computationally efficient form. It is pretty easy to understand: https://en.wikipedia.org/wiki/Horner%27s_method#Description_of_the_algorithm This implementation has resulted in a noticeable secret share generation speedup as the RustySecrets benchmarks show, especially when calculating larger polynomials: Before: test sss::generate_1kb_10_25 ... bench: 3,104,391 ns/iter (+/- 113,824) test sss::generate_1kb_3_5 ... bench: 951,807 ns/iter (+/- 41,067) After: test sss::generate_1kb_10_25 ... bench: 2,071,655 ns/iter (+/- 46,445) test sss::generate_1kb_3_5 ... bench: 869,875 ns/iter (+/- 40,246)
RustySecrets makes minimal use of the rand library. It only initializes the `ChaChaRng` with a seed, and `OsRng` in the standard way, and then calls their `fill_bytes` methods, provided by the same Trait, and whose function signature has not changed. I have confirmed by looking at the code changes, that there have been no changes to the relevant interfaces this library uses.
Since id is a `u8` it will never be greater than 255.
It's possible that two different points have the same data. To give a concrete example consider the secret polynomial `x^2 + x + s`, where `s` is the secret byte. Plugging in 214 and 215 (both elements of the cyclic subgroup of order 2) for `x` will give the same result, `1 + s`. More broadly, for any polynomial `b*x^t + b*x^(t-1) + ... + x + s`, where `t` is the order of at least one subgroup of GF(256), for all subgroups of order `t`, all elements of that subgroup, when chosen for `x`, will produce the same result. There are certainly other types of polynomials that have "share collisions." This type was just easy to find because it exploits the nature of finite fields.
Ensures that threshold > 2 during the parsing process, since we ensure the same during the splitting process.
Since the validation already confirms `shares` is not empty, `k_sets` will never match 0.
The arguments were provided in the wrong order.
c106fc9
to
0f22824
Compare
* Pass a ref to `Vec<Shares>` instead of recreating and moving the object through several functions. * Return `slen`/ `data_len`, since we'll be using it anyway in `recover_secrets`
I think that using hashmaps and hash sets was overkill and made the code much longer and complicated than it needed to be. The new code also produces more useful error messages that will hopefully help users identify which share(s) are causing the inconsistency.
The best place to catch share problems is immediately during parsing from `&str`, however, because `validate_shares` takes any type that implements the `IsShare` trait, and there's nothing about that trait that guarantees that the share id, threshold, and secret length will be valid, I thought it best to leave those three tests in `validate_shares` as a defensive coding practice.
This should be useful when validating very large sets of shares. Wouldn't want to print out up to 254 shares.
0f22824
to
8466096
Compare
* Update rustfmt compliance Looks like rustfmt has made some improvements recently, so wanted to bring the code up to date. * Add rustfmt to nightly item in Travis matrix * Use Travis Cargo cache * Allow fast_finish in Travis Items that match the `allow_failures` predicate (right now, just Rust nightly), will still finish, but Travis won't wait for them to report a result if the other builds have already finished. * Run kcov in a separate matrix build in Travis * Rework allowed_failures logic We don't want rustfmt to match `allow_failures` just because it needs to use nightly, while we do want nightly to match `allow_failures`. Env vars provide a solution. * Add --all switch to rustfmt Travis * Test building docs in Travis * Use exact Ubuntu dependencies listed for kcov Some of the dependencies we were installing were not listed on https://github.com/SimonKagstrom/kcov/blob/master/INSTALL.md, and we were missing one dependency that was listed there. When `sudo: true` Travis uses Ubuntu Trusty. * No need to build before running kcov kcov builds its own test executables. * Generate `Cargo.lock` w/ `cargo update` before running kcov As noted in aeb3906 it is not necessary to build the project before running kcov, but kcov does require a `Cargo.lock` file, which can be generated with `cargo update`.
8466096
to
e3ba2d5
Compare
This refactor makes the code a lot clearer, and separates barycentric interpolation into parts that can be reused, such as in the partial interpolation functionality I intend to implement.
In this PR: * Introduces `PartialSecret` struct and associated methods for interpolating and evaluating polynomials incrementally (or all-at-once for that matter). * Implements strict input validation for all public functions. With private ones we can reason about their inputs. * Uses this struct behind-the-scenes with `interpolate_at`. Problems to be addressed later: * There should be a higher level interface in sss. * Error handling right now is mostly for example. Probably we should create some new `ErrorKinds`. I just used the most analagous ones as placeholders. Validation is comprehensive, I believe, which is good, but it should be DRYed out. * Numeric overflow is possible when we cast some `len()` to `u8` in order to satisfy the function signatures of certain `ErrorKinds`. This is a general bug, that I will make a separate PR for. Future work: * It is possible to pre-compute all barycentric weights for a given secret after receiving the first share(s) if `shares_count` is equal to `threshold`, but `Share`s don't include a `shares_count` field (presumably because this is unecessary information for reconstruction, and in the case of a share being compromised would provide the bad actor with more information). * Use barycentric Lagrange interpolation to find coefficients (incrementally and all at once).
Changes the `differences` field name to `diffs`, adds/ improves some documentation, makes sure `update` fails if we've already evaluated sufficient points to compute the secret, and adds the `shares_needed` convenience method.
This way we can reuse the computational work we've done if for some reason we want to evaluate the same set of interpolated points at value other than `Gf256::zero()`. As noted, a slight sacrifice to efficiency was made when implementing this function, in order to reduce the `PartialSecret` size, and increase precomputation in the standard case of evaluating at `Gf256::zero()`.
Makes `secret` and `threshold` fields public for easy access. Refines `shares_needed` and adds `shares_evaluated` convenience functions. Refines example error handling*. * Note these are still just temporary values to illustrate what type of validation we will be doing.
These should be considered exemplary at this point, but I wanted to start to flesh out a higher-level way to interact with the `PartialSecret` struct. Besides more conceptual changes in terms of how to make this interface more user-friendly, I think the validation needs to be DRYed out, and the error handling refined. In particular, all the functions that follow `begin_partial_secret_recovery` are basically analogues to methods, and it feels like it would be nicer to call them as such instead of as functions. Mostly, I'm unsure of how the repository maintainers would like such an interface to work, so only took my best jab at fleshing this out.
* Create new `NoMoreSharesNeeded` `ErrorKind` to be used when a `PartialSecret` already holds a complete secret. * Replaced large `if else` validation blocks with re-usable methods and functions. * Created `update_diffs` function to DRY out code shared between `new` and `update` methods.
* Introduces the IncrementalRecovery struct, a struct that creates and updates many `PartialSecret`s from `Share`s, essentially introducing a higher-level interface. * New validation functions were created to handle this case were not all shares arrive at once. Only the necessary metadata for validating further shares (the threshold, the secret length, the IDs that have been verified so far, and optionally the root hash that signed the shares validated so far) is stored by the `IncrementalRecovery` struct.
* Most of the changes were recommended by clippy. A few very small changes were my own initiative. * Most common change using references instead of moving values when the value is not consumed by the function body. * Using `&Vec<_>` instead of `&[_]` requires one more reference and cannot be used with non-Vec-based slices. * Added clippy linter directives to the `Add` and `Subtract` implementations for `gf256`. The functions are fine, but use weird binary operators because XOR, add, and subtract are all the same GF(256). I had to add these to stop the linter from erroring out.
Since `interpolate_at` had been reduced to a 2-line convenience function used in a single place, I thought it better to simply move those two lines to that single place.
e3ba2d5
to
de0250b
Compare
Rebased on master, so diff is much nicer to look at now. |
Temporarily error messages from `verify_signatures` will suffer, but this will be rectified soon using `error_chain`.
Work still to be done: decide how to refactor the validation module once more so that we don't have to separately check that we have threshold shares (see TODOs in diff).
I think this refactor finally streamlines the `validation` module (at least for what it needs to do now). Gets rid of those TODOs and creates two "types" of validation function for incremental and all-at-once secret recovery.
`threshold` and `ids` are no longer stored in `PartialSecret`, reducing the size of a `Recovery` object by up to 33% (as the number of shares interpolated grows). Optimizing for speed (since we will call methods on `PartialSecrets` much more often than functions in `validate`), the `ids` field of `Recovery` is now of type `Vec<Gf256>`. `PartialSecret` no longer holds all its own state and relies on that state being managed from the outside. While this is not ideal for using `PartialSecret` independently of `Recovery`, I feel comfortable tying the two classes. I have added additional assertions to `PartialSecret` to catch errors.
Having crippled `PartialSecrets` sufficiently from its original form, I realized it was best to carry this process out as far as possible, and make this even more explicit. Storing an `secret: Option<u8>` didn't make sense when it no longer stored `ids` and `thresholds` and was dependent on outside forces providing that information, so it could know when to compute the secret. The `compute_secret` method, now `evaluate_at_zero`, it's own function in the `lagrange` module, also seemed out of place. Now `threshold` doesn't needed to be passed to `BarycentricWeights`, and the type is more accurately described by its name. An outside force is still responsible for managing the `ids`, and when `threshold` shares have been interpolated, it will need to call `evaluate_at_zero` with the `BarycentricWeights` and `ids` (`Recovery` now does this). The last change to `Recovery` was to make it hold its own `secret` (it automatically computes this secret as soon as threshold shares have been interpolated via calls to `new` and `update`). Thus `get_secret` will be faster because we're just unwrapping a `Some(Vec<u8>)` and rewrapping it in `Ok`, instead of needing to construct that `Vec<u8>` by iterating over `Option<u8>`s we need to unwrap and `collect`.
Not planning on making more changes except documentation/ comments/ tests/ benchmarks, and maybe a very small diff regarding error handling in one function. I would say it's stable for the purposes of reviewing now and getting feedback. Sorry for the churn, I didn't think I would refactor this PR another 5 times after opening it. |
Closing because I don't think this PR means much with respect to intended use cases of this library. The speedup won't matter for the low degree polynomials this library expects its users to be working with, and anyways FFT should be used for high-degree polynomials. |
Status
Having shown an earlier version of this branch to @romac, I'm now putting this up for public review. It is still a work in progress, but the main components are in place.
Description
This PR implements most of a complete interface for incrementally computing a secret using barycentric Lagrange interpolation. In cases where not all shares are available at once, it is still possible to start interpolating the secret polynomial, such that when threshold shares are finally present, the final computation time (i.e., the time between receiving the threshold-th share and the secret being recovered) is much smaller. This will be especially noticeable when the secret is very long, or the threshold is very high. (E.g., in Sunder, shares are entered one at a time and this could be useful.)
It is still missing it's highest-level, public-facing interface that would accept
&[String]
s (shares in string form), parse them, and returnRecovery
objects.Recovery
objects store and update the state of the partially recovered secret.Under the hood,
Recovery
objects store:barycentric
weights and diffs computed evaluate the polynomial.ids
processed so far.threshold
.slen
).root_hash
of the Merkle tree (i.e., the public key the shares were signed with).secret
.1-3 are needed to correctly compute the secret, and know when it's ready. Since there would be a lot of redundant data among the
BarycentricWeights
s if we were to store theids
andthreshold
in that struct, we depend onRecovery
to store and update theids
, and provide new points to update eachBarycentricWeights
. Whenthreshold
shares have been processed, thesecret
, which is initialized atNone
, is automatically computed (Some<Vec<u8>>
), and can then be retrieved withget_secret
.2-5 are needed for validation of the shares and verification of the signatures. If
verify_signatures
is trueRecovery::new()
will set the theroot_hash
to the root hash of the initial share(s) used to create theRecovery
object, and thisroot_hash
value is automatically used during subsequentRecover::update
s to ensure consistency among the signatures. Likewise, during updates share validation "picks up where it left off" by passing the already processedids
, along with the new shares to be validate, to the new functionvalidate_additional_signed_shares
in order to prevent duplicate shares from being processed.Thoughts/ ideas:
I'm not sure if it is necessary to create
{to,from}_string
methods forRecovery
in order to make a functional interface in regards to the node library. If it is not necessary, I assume this means the node interface can act on anRecovery
object in memory, without it having to be in a printable or JS-intelligible/interoperable form. In this case, one could simply wait until threshold shares have been interpolated, and then callget_secret()
on theRecovery
object, which will return aResult<Vec<u8>>
(just as doesRecovery::recover_secret
, which, though changed under the hood, provides the same interface thatSSS::recover_secret
used to).Whether or not it such methods are necessary, they may be useful. The idea being that you could partially interpolate a secret from less than threshold shares, storing a result a fraction of the size the combined shares (and maybe more importantly/ conveniently, as a single piece of data), and then later interpolate more shares to get the final result. To do this we'd need some way to serialize an
Recovery
for long-term storage (string, or binary for that matter). As a use case, imagine you want to recover a secret using Sunder, and you expect to get shares in person over the course of some time. A useful feature would be the ability to save to disk a serializedRecovery
object, so that you don't have to save each share to disk, and then enter them all in once there are sufficient. Good UI would keep you updated on how many shares you've interpolated so far and how many more are needed to fully recover the object using theshares_interpolated()
andshares_needed()
methods. Especially when recovering multiple secrets at a time in an incremental effort, a good UI built around this functionality could save you a lot of organizational effort and help prevent mistakes.TODO
&[String]
.{to,from_string}
forRecovery
?InconsistentSignature
errors to correct theids
argument whenS::verify_signature
is called invalidate::validate_additional_signed_shares
before re-raising (or whatever the Rustaceans call this) withmatch
and whatever else from theerror_chain
crate (see this commit message).