Partial interpolation #61

psivesely · 2018-04-01T21:35:07Z

Status

Having shown an earlier version of this branch to @romac, I'm now putting this up for public review. It is still a work in progress, but the main components are in place.

Description

This PR implements most of a complete interface for incrementally computing a secret using barycentric Lagrange interpolation. In cases where not all shares are available at once, it is still possible to start interpolating the secret polynomial, such that when threshold shares are finally present, the final computation time (i.e., the time between receiving the threshold-th share and the secret being recovered) is much smaller. This will be especially noticeable when the secret is very long, or the threshold is very high. (E.g., in Sunder, shares are entered one at a time and this could be useful.)

It is still missing it's highest-level, public-facing interface that would accept &[String]s (shares in string form), parse them, and return Recovery objects. Recovery objects store and update the state of the partially recovered secret.

Under the hood, Recovery objects store:

barycentric weights and diffs computed evaluate the polynomial.
The ids processed so far.
The threshold.
The secret length (slen).
Optionally, the root_hash of the Merkle tree (i.e., the public key the shares were signed with).
Optionally, the secret.

1-3 are needed to correctly compute the secret, and know when it's ready. Since there would be a lot of redundant data among the BarycentricWeightss if we were to store the ids and threshold in that struct, we depend on Recovery to store and update the ids, and provide new points to update each BarycentricWeights. When threshold shares have been processed, the secret, which is initialized at None, is automatically computed (Some<Vec<u8>>), and can then be retrieved with get_secret.

2-5 are needed for validation of the shares and verification of the signatures. If verify_signatures is true Recovery::new() will set the the root_hash to the root hash of the initial share(s) used to create the Recovery object, and this root_hash value is automatically used during subsequent Recover::updates to ensure consistency among the signatures. Likewise, during updates share validation "picks up where it left off" by passing the already processed ids, along with the new shares to be validate, to the new function validate_additional_signed_shares in order to prevent duplicate shares from being processed.

Thoughts/ ideas:

I'm not sure if it is necessary to create {to,from}_string methods for Recovery in order to make a functional interface in regards to the node library. If it is not necessary, I assume this means the node interface can act on an Recovery object in memory, without it having to be in a printable or JS-intelligible/interoperable form. In this case, one could simply wait until threshold shares have been interpolated, and then call get_secret() on the Recovery object, which will return a Result<Vec<u8>> (just as does Recovery::recover_secret, which, though changed under the hood, provides the same interface that SSS::recover_secret used to).

Whether or not it such methods are necessary, they may be useful. The idea being that you could partially interpolate a secret from less than threshold shares, storing a result a fraction of the size the combined shares (and maybe more importantly/ conveniently, as a single piece of data), and then later interpolate more shares to get the final result. To do this we'd need some way to serialize an Recovery for long-term storage (string, or binary for that matter). As a use case, imagine you want to recover a secret using Sunder, and you expect to get shares in person over the course of some time. A useful feature would be the ability to save to disk a serialized Recovery object, so that you don't have to save each share to disk, and then enter them all in once there are sufficient. Good UI would keep you updated on how many shares you've interpolated so far and how many more are needed to fully recover the object using the shares_interpolated() and shares_needed() methods. Especially when recovering multiple secrets at a time in an incremental effort, a good UI built around this functionality could save you a lot of organizational effort and help prevent mistakes.

TODO

Improve documentation and code comments.
Add a lot more tests.
Add a few more benchmarks.
Create highest-level (user) interface that takes &[String].
- Protobuf/ {to,from_string} for Recovery?
Catch and modify any InconsistentSignature errors to correct the ids argument when S::verify_signature is called in validate::validate_additional_signed_shares before re-raising (or whatever the Rustaceans call this) with match and whatever else from the error_chain crate (see this commit message).

Implements barycentric Lagrange interpolation. Uses algorithm (3.1) from the paper "Polynomial Interpolation: Langrange vs Newton" by Wilhelm Werner to find the barycentric weights, and then evaluates at `Gf256::zero()` using the second or "true" form of the barycentric interpolation formula. I also earlier implemented a variant of this algorithm, Algorithm 2, from "A new efficient algorithm for polynomial interpolation," which uses less total operations than Werner's version, however, because it uses a lot more multiplications or divisions (depending on how you choose to write it), it runs slower given the running time of subtraction/ addition (equal) vs multiplication, and especially division in the Gf256 module. The new algorithm takes n^2 / 2 divisions and n^2 subtractions to calculate the barycentric weights, and another n divisions, n multiplications, and 2n additions to evaluate the polynomial*. The old algorithm runs in n^2 - n divisions, n^2 multiplications, and n^2 subtractions. Without knowing the exact running time of each of these operations, we can't say for sure, but I think a good guess would be the new algorithm trends toward about 1/3 running time as n -> infinity. It's also easy to see theoretically that for small n the original lagrange algorithm is faster. This is backed up by benchmarks, which showed for n >= 5, the new algorithm is faster. We can see that this is more or less what we should expect given the running times in n of these algorithms. To ensure we always run the faster algorithm, I've kept both versions and only use the new one when 5 or more points are given. Previously the tests in the lagrange module were allowed to pass nodes to the interpolation algorithms with x = 0. Genuine shares will not be evaluated at x = 0, since then they would just be the secret, so: 1. Now nodes in tests start at x = 1 like `scheme::secret_share` deals them out. 2. I have added assert statements to reinforce this fact and guard against division by 0 panics. This meant getting rid of the `evaluate_at_works` test, but `interpolate_evaluate_at_0_eq_evaluate_at` provides a similar test. Further work will include the use of barycentric weights in the `interpolate` function. A couple more interesting things to note about barycentric weights: * Barycentric weights can be partially computed if less than threshold shares are present. When additional shares come in, computation can resume with no penalty to the total runtime. * They can be determined totally independently from the y values of our points, and the x value we want to evaluate for. We only need to know the x values of our interpolation points.

While this is a slight regression in performance in the case where k < 5, in absolute terms it is small enough to be neglible.

Horner's method is an algorithm for calculating polynomials, which consists of transforming the monomial form into a computationally efficient form. It is pretty easy to understand: https://en.wikipedia.org/wiki/Horner%27s_method#Description_of_the_algorithm This implementation has resulted in a noticeable secret share generation speedup as the RustySecrets benchmarks show, especially when calculating larger polynomials: Before: test sss::generate_1kb_10_25 ... bench: 3,104,391 ns/iter (+/- 113,824) test sss::generate_1kb_3_5 ... bench: 951,807 ns/iter (+/- 41,067) After: test sss::generate_1kb_10_25 ... bench: 2,071,655 ns/iter (+/- 46,445) test sss::generate_1kb_3_5 ... bench: 869,875 ns/iter (+/- 40,246)

RustySecrets makes minimal use of the rand library. It only initializes the `ChaChaRng` with a seed, and `OsRng` in the standard way, and then calls their `fill_bytes` methods, provided by the same Trait, and whose function signature has not changed. I have confirmed by looking at the code changes, that there have been no changes to the relevant interfaces this library uses.

Since id is a `u8` it will never be greater than 255.

It's possible that two different points have the same data. To give a concrete example consider the secret polynomial `x^2 + x + s`, where `s` is the secret byte. Plugging in 214 and 215 (both elements of the cyclic subgroup of order 2) for `x` will give the same result, `1 + s`. More broadly, for any polynomial `b*x^t + b*x^(t-1) + ... + x + s`, where `t` is the order of at least one subgroup of GF(256), for all subgroups of order `t`, all elements of that subgroup, when chosen for `x`, will produce the same result. There are certainly other types of polynomials that have "share collisions." This type was just easy to find because it exploits the nature of finite fields.

Ensures that threshold > 2 during the parsing process, since we ensure the same during the splitting process.

Since the validation already confirms `shares` is not empty, `k_sets` will never match 0.

The arguments were provided in the wrong order.

Closes SpinResearch#49

* Pass a ref to `Vec<Shares>` instead of recreating and moving the object through several functions. * Return `slen`/ `data_len`, since we'll be using it anyway in `recover_secrets`

I think that using hashmaps and hash sets was overkill and made the code much longer and complicated than it needed to be. The new code also produces more useful error messages that will hopefully help users identify which share(s) are causing the inconsistency.

The best place to catch share problems is immediately during parsing from `&str`, however, because `validate_shares` takes any type that implements the `IsShare` trait, and there's nothing about that trait that guarantees that the share id, threshold, and secret length will be valid, I thought it best to leave those three tests in `validate_shares` as a defensive coding practice.

This should be useful when validating very large sets of shares. Wouldn't want to print out up to 254 shares.

* Update rustfmt compliance Looks like rustfmt has made some improvements recently, so wanted to bring the code up to date. * Add rustfmt to nightly item in Travis matrix * Use Travis Cargo cache * Allow fast_finish in Travis Items that match the `allow_failures` predicate (right now, just Rust nightly), will still finish, but Travis won't wait for them to report a result if the other builds have already finished. * Run kcov in a separate matrix build in Travis * Rework allowed_failures logic We don't want rustfmt to match `allow_failures` just because it needs to use nightly, while we do want nightly to match `allow_failures`. Env vars provide a solution. * Add --all switch to rustfmt Travis * Test building docs in Travis * Use exact Ubuntu dependencies listed for kcov Some of the dependencies we were installing were not listed on https://github.com/SimonKagstrom/kcov/blob/master/INSTALL.md, and we were missing one dependency that was listed there. When `sudo: true` Travis uses Ubuntu Trusty. * No need to build before running kcov kcov builds its own test executables. * Generate `Cargo.lock` w/ `cargo update` before running kcov As noted in aeb3906 it is not necessary to build the project before running kcov, but kcov does require a `Cargo.lock` file, which can be generated with `cargo update`.

This refactor makes the code a lot clearer, and separates barycentric interpolation into parts that can be reused, such as in the partial interpolation functionality I intend to implement.

In this PR: * Introduces `PartialSecret` struct and associated methods for interpolating and evaluating polynomials incrementally (or all-at-once for that matter). * Implements strict input validation for all public functions. With private ones we can reason about their inputs. * Uses this struct behind-the-scenes with `interpolate_at`. Problems to be addressed later: * There should be a higher level interface in sss. * Error handling right now is mostly for example. Probably we should create some new `ErrorKinds`. I just used the most analagous ones as placeholders. Validation is comprehensive, I believe, which is good, but it should be DRYed out. * Numeric overflow is possible when we cast some `len()` to `u8` in order to satisfy the function signatures of certain `ErrorKinds`. This is a general bug, that I will make a separate PR for. Future work: * It is possible to pre-compute all barycentric weights for a given secret after receiving the first share(s) if `shares_count` is equal to `threshold`, but `Share`s don't include a `shares_count` field (presumably because this is unecessary information for reconstruction, and in the case of a share being compromised would provide the bad actor with more information). * Use barycentric Lagrange interpolation to find coefficients (incrementally and all at once).

Changes the `differences` field name to `diffs`, adds/ improves some documentation, makes sure `update` fails if we've already evaluated sufficient points to compute the secret, and adds the `shares_needed` convenience method.

This way we can reuse the computational work we've done if for some reason we want to evaluate the same set of interpolated points at value other than `Gf256::zero()`. As noted, a slight sacrifice to efficiency was made when implementing this function, in order to reduce the `PartialSecret` size, and increase precomputation in the standard case of evaluating at `Gf256::zero()`.

Makes `secret` and `threshold` fields public for easy access. Refines `shares_needed` and adds `shares_evaluated` convenience functions. Refines example error handling*. * Note these are still just temporary values to illustrate what type of validation we will be doing.

These should be considered exemplary at this point, but I wanted to start to flesh out a higher-level way to interact with the `PartialSecret` struct. Besides more conceptual changes in terms of how to make this interface more user-friendly, I think the validation needs to be DRYed out, and the error handling refined. In particular, all the functions that follow `begin_partial_secret_recovery` are basically analogues to methods, and it feels like it would be nicer to call them as such instead of as functions. Mostly, I'm unsure of how the repository maintainers would like such an interface to work, so only took my best jab at fleshing this out.

* Create new `NoMoreSharesNeeded` `ErrorKind` to be used when a `PartialSecret` already holds a complete secret. * Replaced large `if else` validation blocks with re-usable methods and functions. * Created `update_diffs` function to DRY out code shared between `new` and `update` methods.

* Introduces the IncrementalRecovery struct, a struct that creates and updates many `PartialSecret`s from `Share`s, essentially introducing a higher-level interface. * New validation functions were created to handle this case were not all shares arrive at once. Only the necessary metadata for validating further shares (the threshold, the secret length, the IDs that have been verified so far, and optionally the root hash that signed the shares validated so far) is stored by the `IncrementalRecovery` struct.

* Most of the changes were recommended by clippy. A few very small changes were my own initiative. * Most common change using references instead of moving values when the value is not consumed by the function body. * Using `&Vec<_>` instead of `&[_]` requires one more reference and cannot be used with non-Vec-based slices. * Added clippy linter directives to the `Add` and `Subtract` implementations for `gf256`. The functions are fine, but use weird binary operators because XOR, add, and subtract are all the same GF(256). I had to add these to stop the linter from erroring out.

Since `interpolate_at` had been reduced to a 2-line convenience function used in a single place, I thought it better to simply move those two lines to that single place.

psivesely · 2018-04-02T20:15:32Z

Rebased on master, so diff is much nicer to look at now.

Temporarily error messages from `verify_signatures` will suffer, but this will be rectified soon using `error_chain`.

Work still to be done: decide how to refactor the validation module once more so that we don't have to separately check that we have threshold shares (see TODOs in diff).

I think this refactor finally streamlines the `validation` module (at least for what it needs to do now). Gets rid of those TODOs and creates two "types" of validation function for incremental and all-at-once secret recovery.

`threshold` and `ids` are no longer stored in `PartialSecret`, reducing the size of a `Recovery` object by up to 33% (as the number of shares interpolated grows). Optimizing for speed (since we will call methods on `PartialSecrets` much more often than functions in `validate`), the `ids` field of `Recovery` is now of type `Vec<Gf256>`. `PartialSecret` no longer holds all its own state and relies on that state being managed from the outside. While this is not ideal for using `PartialSecret` independently of `Recovery`, I feel comfortable tying the two classes. I have added additional assertions to `PartialSecret` to catch errors.

Having crippled `PartialSecrets` sufficiently from its original form, I realized it was best to carry this process out as far as possible, and make this even more explicit. Storing an `secret: Option<u8>` didn't make sense when it no longer stored `ids` and `thresholds` and was dependent on outside forces providing that information, so it could know when to compute the secret. The `compute_secret` method, now `evaluate_at_zero`, it's own function in the `lagrange` module, also seemed out of place. Now `threshold` doesn't needed to be passed to `BarycentricWeights`, and the type is more accurately described by its name. An outside force is still responsible for managing the `ids`, and when `threshold` shares have been interpolated, it will need to call `evaluate_at_zero` with the `BarycentricWeights` and `ids` (`Recovery` now does this). The last change to `Recovery` was to make it hold its own `secret` (it automatically computes this secret as soon as threshold shares have been interpolated via calls to `new` and `update`). Thus `get_secret` will be faster because we're just unwrapping a `Some(Vec<u8>)` and rewrapping it in `Ok`, instead of needing to construct that `Vec<u8>` by iterating over `Option<u8>`s we need to unwrap and `collect`.

psivesely · 2018-04-03T17:30:21Z

Not planning on making more changes except documentation/ comments/ tests/ benchmarks, and maybe a very small diff regarding error handling in one function. I would say it's stable for the purposes of reviewing now and getting feedback. Sorry for the churn, I didn't think I would refactor this PR another 5 times after opening it.

psivesely · 2019-07-18T15:39:56Z

Closing because I don't think this PR means much with respect to intended use cases of this library. The speedup won't matter for the low degree polynomials this library expects its users to be working with, and anyways FFT should be used for high-degree polynomials.

psivesely and others added 17 commits March 19, 2018 21:22

Small Rustfmt formatting fix to build.rs

603882f

Use barycentric Lagrange interpolation in all cases.

be20e77

While this is a slight regression in performance in the case where k < 5, in absolute terms it is small enough to be neglible.

Ensure there is at least one point in QuickCheck tests

e39bdaf

Note algorithm in encode_secret_byte docstring

0b42b6b

Add TODO note on unreleased Rng::try_fill_bytes

b34a209

Remove ShareIdentifierTooBig error and validation

ecb22aa

Since id is a `u8` it will never be greater than 255.

Add ErrorKind::ShareParsingInvalidShareThreshold

10209b6

Ensures that threshold > 2 during the parsing process, since we ensure the same during the splitting process.

Simplify threshold consistency validation

b48a74a

Since the validation already confirms `shares` is not empty, `k_sets` will never match 0.

Fix arg order missing shares validation

c7f2742

The arguments were provided in the wrong order.

MissingShares should take u8 for required arg

6d04a58

More specific validation error when share thresholds mismatch

8ab91bb

Validate shares have the same data length

fefd5ab

Disable dss benchmarks until we expose the module.

51f77fa

Closes SpinResearch#49

psivesely force-pushed the partial-interpolation branch from c106fc9 to 0f22824 Compare April 1, 2018 22:44

psivesely added 6 commits April 2, 2018 19:14

Change signatures of share validation fns

5ee3cf2

* Pass a ref to `Vec<Shares>` instead of recreating and moving the object through several functions. * Return `slen`/ `data_len`, since we'll be using it anyway in `recover_secrets`

Standardize validation var identifier on

2df49c5

Minor improvement to validation

463d42b

Adds no_more_than_five formatter

a6ff8a7

This should be useful when validating very large sets of shares. Wouldn't want to print out up to 254 shares.

psivesely force-pushed the partial-interpolation branch from 0f22824 to 8466096 Compare April 2, 2018 18:15

psivesely force-pushed the partial-interpolation branch from 8466096 to e3ba2d5 Compare April 2, 2018 20:07

psivesely added 3 commits April 2, 2018 15:08

Refactor barycentric interpolation for modularity

d29415e

This refactor makes the code a lot clearer, and separates barycentric interpolation into parts that can be reused, such as in the partial interpolation functionality I intend to implement.

Slight refactor of PartialSecret

dc67d1b

Changes the `differences` field name to `diffs`, adds/ improves some documentation, makes sure `update` fails if we've already evaluated sufficient points to compute the secret, and adds the `shares_needed` convenience method.

psivesely added 7 commits April 2, 2018 15:08

Remove interpolate_at, work directly with PartialSecret

de0250b

Since `interpolate_at` had been reduced to a 2-line convenience function used in a single place, I thought it better to simply move those two lines to that single place.

psivesely force-pushed the partial-interpolation branch from e3ba2d5 to de0250b Compare April 2, 2018 20:09

psivesely added 7 commits April 2, 2018 17:15

Simplify verification

dc8b436

Temporarily error messages from `verify_signatures` will suffer, but this will be rectified soon using `error_chain`.

Refactor share::validation and sss::scheme modules

4998e5b

Work still to be done: decide how to refactor the validation module once more so that we don't have to separately check that we have threshold shares (see TODOs in diff).

Refactor validation

b92ab2a

I think this refactor finally streamlines the `validation` module (at least for what it needs to do now). Gets rid of those TODOs and creates two "types" of validation function for incremental and all-at-once secret recovery.

Recover::update should fail if no ::shares_needed

ae5ce47

Use gf256! instead of Gf256::from_byte

eebdab2

Misc minor fixes to new Partial Interpolation capability

3e971e4

romac force-pushed the master branch from cbf6dcc to 06033f1 Compare August 13, 2018 19:32

psivesely closed this Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial interpolation #61

Partial interpolation #61

psivesely commented Apr 1, 2018 •

edited

psivesely commented Apr 2, 2018

psivesely commented Apr 3, 2018

psivesely commented Jul 18, 2019

Partial interpolation #61

Partial interpolation #61

Conversation

psivesely commented Apr 1, 2018 • edited

Status

Description

Thoughts/ ideas:

TODO

psivesely commented Apr 2, 2018

psivesely commented Apr 3, 2018

psivesely commented Jul 18, 2019

psivesely commented Apr 1, 2018 •

edited