Re-implement randomness server in rust #50

rillian · 2023-02-17T22:54:22Z

Eliminate the C ABI ffi bindings between the golang nitriding daemon and the rust ppoprf implementation by serving the /randomness and /info endpoints over http from rust and using the nitriding v2 proxy to terminate tls and forward requests from the public interface.

Implement a local webserver in Rust
Port /randomness endpoint for ppoprf evaluation
Port /info endpoint with current epoch and zk public keys
Port next epoch timestamp to /info endpoint
Replace old go webservice with the nitriding daemon
Command line switched for configuration
Tests

Resolves #49, #16, and #14.

The new version of the nitriding framework can proxy http traffic to a separate server running inside the enclave. This allows us to replace the ffi calls from golang with a pure-rust server implementation, which is less error-prone.

Use the `info!` convenience macro with the default logger, which reads the `RUST_LOG` environment variable. So run with: RUST_LOG=tower_http=trace,star_randsrv=debug cargo run for reasonable output. By default it's `ERROR` level only, so there's no convenience info about requests or listening port; would be nice to default to `star_randsrv=info` or something like that.

Define structs for the request and response and plumb them through with a dummy response value.

FIXME: only processes the first point. Note that the `State` argument must come *before* the request, not after!

Needs better error handling.

Use the current version of the ppoprf which cleans up some unneeded dependencies. Bump everything else to the latest compatible version.

This allows multiple concurrent reads to the ppoprf struct for evaluations. We only need exclusive access to reset or puncture.

Make the API docs slightly less sparse

Spawn a background task to advance the current epoch on a fixed schedule. When it runs out of epochs it eventually panics the main thread instead of re-initializing the ppoprf state.

Instead of leaving all epochs puctured and later panicking, create and fresh OPRFServer instance with the same set of epochs and continue with that. Abstract the struct initialization into OPRFServer::new() so it can be called from both locations, Move the `future_epochs` initialization inside the background loop so the same code can be used each time a new state is initialized.

Define a response type for errors and convert various unwrap() and expect() calls to return that instead. Replace the iterator chain in the randomness handler with an explicit loop so we can return early if there's an error in the request data.

Turns out enum variants with inner values don't just look like functions, they actually work as a FnOnce! Also stonger camelcasing.

The previous code reset the `future_epochs` vector before the empty check that was supposed to trigger re-initialization. So OprfServer.epoch was updated, but the actual set of viable epochs was punctured, resulting in perpetual "No prefix found" errors. Instead, assume we'll use contiguous blocks of epoch numbers and store them as a Range, which is smaller and more convenient. `EpochRange` never needs to change and can be passed to `OprfServer::new` each time the key is rotated. Note that because `Range` and `InclusiveRange` are different types, we can't both keep the range as `u8` and include epoch 255. I though code simplicity was more important here.

Instead of linking over ffi from the go nitriding framework, build it as a separate proxy daemon which runs inside the the same container as star-randsrv.

Having `nitriding` as a submodule complicates copying just that subtree to the go-builder, so we copy everything, which is slow if there are local build files.

Build and lint the local application. Assume the nitriding repo takes care of itself.

Rather than installing golangci-lint so the default Makefile target in nitriding/cmd can run it, just build the `nitriding` target directly. This is a faster build and follows our earlier policy of treating the submodule as a separate unit for testing.

Copy `star-randsrv` from the correct path so both executables are present in the final container.

This is 5x larger than the alpine containers, but provides the expected execution environment.

Go back to using alpine linux for the runtime, which requires building with CGO disabled, and on the rust:alpine build container to avoid dependencies on glibc. This is about 25% the size of the debian:slim runtime container.

Which to a revision of `nitriding` that builds a static executable by default so we can invoke the Makefile but still use it with our alpine runtime container.

Doesn't return the timestamp at the end of the epoch. We don't support returning the zk proofs, so returning the public key isn't useful, but neither does the current golang version, and I wanted to match its api. I expect that to change when we implement batch proofs anyway.

Instead of accepting any unpunctured epoch, and letting the ppoprf eval fail for punctured ones, filter out non-current epochs when the request is processed. Also enforce the MAX_POINTS limit on requests. By this point the point vec has already been constructed, so this doesn't affect the amount of memory consumed, but it still limits the cpu cost of each request, making any denial-of-service mitigation based on network flows more effective.

Set up a testing framework inside the main implementation to check the handlers without the overhead of a network socket. Based on the `testing` example from the axum repository. Test that the `/` route returns some text.

NullHypothesis

Left a bunch of comments. I'm not in a good position to review the correctness of the Rust code. For that, @DJAndries is probably the best fit.

Cargo.toml

Dockerfile

start.sh

src/main.rs

This commit makes two changes: 1. Invoke kaniko with the flag '--custom-platform linux/amd64'. This is necessary when building star-randsrv on non-Linux, non-amd64 platforms like macOS. 2. Use an intermediate build layer to add start.sh. If we don't do this, we may end up with a build layer that contains inconsistent file permissions from the host operating system. With the above two changes, it's now possible to arrive at identical image IDs, even when building star-randsrv on Linux (amd64) and macOS (arm64).

Make star-randsrv build reproducibly.

If this happens it indicates an internal problem with the state management, and the server likely needs to be restarted. Signal this so the more serious error case is obvious. Other error variants should be the result of bad client input and don't affect the usability of the service.

The ppoprf epoch tag is a `u8`. When using the full epoch range 0..=255, the final increment would overflow. In debug builds this would panic. In release builds, it would roll over unchecked, then panic at the end of the next interval when it tried to puncture the already-punctured starting epoch value. In either case, the panic happened while the write lock was held, poisoning it. The tokio thread runner would catch the panic, but with the `RwLock` poisoned, it couldn't respond to further queries. Instead, check for overflow and use that to trigger key rotation, just as we do when the epoch is out of the configure range. This is more localized than promoting the epoch counter and range to u16 and narrowing at the ppoprf calls.

NullHypothesis · 2023-04-18T19:53:35Z

Stumbled upon one more thing: we should modify (or remove) .github/workflows/makefile.yaml, which is based on the old Go code.

Addresses a clippy lint.

rillian · 2023-04-18T20:28:29Z

Stumbled upon one more thing: we should modify (or remove) .github/workflows/makefile.yaml, which is based on the old Go code.

Don't we still need a golang environment to build the nitriding daemon?

The default makefile target runs cargo test, cargo clippy, and cargo audit so it should give adequate coverage for the new rust code, although it's just using the default rust toolchain in the image.

NullHypothesis · 2023-04-19T13:14:09Z

Don't we still need a golang environment to build the nitriding daemon?

Do we really need these lines when simply running make?

The default makefile target just builds, tests, lints and audits the rust application, so we no longer need a go environment for this job.

rillian · 2023-04-19T16:28:31Z

Do we really need these lines when simply running make?

Ah, you're right. The default makefile target doesn't build the nitriding daemon. so this is unnecessary. Removed in e75c152.

fmarier · 2023-04-20T03:20:17Z

Dockerfile

+
+# Build the web server application itself.
+# Use the -alpine variant so it will run in a alpine-based container.
+FROM public.ecr.aws/docker/library/rust:1.68.2-alpine as rust-builder


Is the pin on the minor version needed here? Normally we'd ask to use 1.68-alpine here in order to get any minor fixes / patches, but maybe there's attestation requirements that mean we need to pin to the full version?

Assume we start an enclave based on x.y-alpine. Shortly after, x.y.z-alpine comes out. A user is trying to verify our deployment based on the newly-release x.y.z-alpine and notices that it fails because the enclave still runs x.y-alpine.

We could work around that by updating the enclave whenever a new patch comes out.

An even better solution would be to get rid of alpine altogether and only have the statically-compiled nitriding and star-randsrv in the image. For now, we only need alpine because of the shell script that starts nitriding and star-randsrv — a very heavy dependency for a simple task. I filed brave/nitriding#58 for this improvement.

Hopefully tls-alpn means we don't need a trusted ca certificate package either.

@fmarier Generally I agree that specifying a particular point release tag bitrots quickly and misses out on useful fixes. Because in the particular application reproducible builds are important, tracking the specific versions of our dependencies, as we do with go.sum and Cargo.lock is a worthwhile trade-off.

Hopefully we can use renovatebot or similar to make sure we bump regularly.

Sounds good. Let's lean on Renovate for now, and then hopefully eliminate the alpine image entirely later.

Dockerfile

Configure Renovate

rillian · 2023-04-20T17:52:14Z

I think we're ready to merge here. Asking for final review; remaining issues should be follow-ups.

NullHypothesis

Left comments for a couple minor changes. Otherwise looks good!

.github/workflows/generalized-deployments-reproducible.yaml

start.sh

This was for testing with the deployable container build pipeline during development. Remove before merging into the default branch since it will no longer be needed there.

I'm told this is no longer necessary as the pipeline is using a different solution.

rillian · 2023-04-20T18:21:10Z

Security ci job failure looks like problems with the ubuntu package mirror.

NullHypothesis

👌

rillian · 2023-04-20T19:13:38Z

ci failure resolved by brave/security-action#164. Green now.

DJAndries

lgtm. please address #54 soon as it will act as an important fail safe for crashes, shouldn't require much effort, just one small func.

renovate bot and others added 27 commits November 3, 2022 17:41

Add renovate.json

61683a6

Stub out a axum-based rust web service.

abeaa61

The new version of the nitriding framework can proxy http traffic to a separate server running inside the enclave. This allows us to replace the ffi calls from golang with a pure-rust server implementation, which is less error-prone.

Stub out axum routes

0676723

Stub out the /randomness endpoint

2b98c57

Define structs for the request and response and plumb them through with a dummy response value.

Stub out /randomness endpoint

1e755ee

FIXME: only processes the first point. Note that the `State` argument must come *before* the request, not after!

Processes all submitted points.

6aacdae

Needs better error handling.

cargo fmt

3975a2f

Update rust deps

e621f9b

Use the current version of the ppoprf which cleans up some unneeded dependencies. Bump everything else to the latest compatible version.

Replace Mutex with RwLock.

5b63a76

This allows multiple concurrent reads to the ppoprf struct for evaluations. We only need exclusive access to reset or puncture.

Add basic comments to types

23c4e9b

Make the API docs slightly less sparse

Stub out background epoch rotation task

a0137ca

Spawn a background task to advance the current epoch on a fixed schedule. When it runs out of epochs it eventually panics the main thread instead of re-initializing the ppoprf state.

Add error handling

48ce3cf

Define a response type for errors and convert various unwrap() and expect() calls to return that instead. Replace the iterator chain in the randomness handler with an explicit loop so we can return early if there's an error in the request data.

Fix clippy lints

0896f57

Turns out enum variants with inner values don't just look like functions, they actually work as a FnOnce! Also stonger camelcasing.

Remove old nitriding implementation

acdb3fb

Instead of linking over ffi from the go nitriding framework, build it as a separate proxy daemon which runs inside the the same container as star-randsrv.

Update Dockerfile to build the new configuration.

49c118b

Having `nitriding` as a submodule complicates copying just that subtree to the go-builder, so we copy everything, which is slow if there are local build files.

Update Makefile

4a54a87

Build and lint the local application. Assume the nitriding repo takes care of itself.

Don't lint nitriding

3dc48a1

Rather than installing golangci-lint so the default Makefile target in nitriding/cmd can run it, just build the `nitriding` target directly. This is a faster build and follows our earlier policy of treating the submodule as a separate unit for testing.

Fix build path

d5652ed

Copy `star-randsrv` from the correct path so both executables are present in the final container.

Base container on debian:slim

b3daa78

This is 5x larger than the alpine containers, but provides the expected execution environment.

Use static builds with the docker container.

c529e8c

Go back to using alpine linux for the runtime, which requires building with CGO disabled, and on the rust:alpine build container to avoid dependencies on glibc. This is about 25% the size of the debian:slim runtime container.

Use nitriding's static build

fa22c5f

Which to a revision of `nitriding` that builds a static executable by default so we can invoke the Makefile but still use it with our alpine runtime container.

Add a test for the root endpoint

1c6477e

Set up a testing framework inside the main implementation to check the handlers without the overhead of a network socket. Based on the `testing` example from the axum repository. Test that the `/` route returns some text.

rillian self-assigned this Feb 17, 2023

NullHypothesis reviewed Feb 22, 2023

View reviewed changes

DJAndries reviewed Feb 22, 2023

View reviewed changes

src/main.rs Show resolved Hide resolved

Philipp Winter and others added 3 commits April 11, 2023 11:39

Merge pull request #55 from brave/reproducible-build

368be97

Make star-randsrv build reproducibly.

rillian mentioned this pull request Apr 18, 2023

RwLock poisoned #56

Closed

rillian added 2 commits April 18, 2023 10:37

cargo fmt

2adf203

Remove unnecessary borrow.

0f2fb25

Addresses a clippy lint.

github actions: remove golang config

e75c152

The default makefile target just builds, tests, lints and audits the rust application, so we no longer need a go environment for this job.

fmarier reviewed Apr 20, 2023

View reviewed changes

Merge pull request #46 from brave/renovate/configure

80ed097

Configure Renovate

rillian requested review from NullHypothesis and DJAndries April 20, 2023 17:52

NullHypothesis previously approved these changes Apr 20, 2023

View reviewed changes

.github/workflows/generalized-deployments-reproducible.yaml Outdated Show resolved Hide resolved

.github/workflows/generalized-deployments-reproducible.yaml Outdated Show resolved Hide resolved

start.sh Show resolved Hide resolved

rillian added 2 commits April 20, 2023 11:13

Don't deploy from the rust branch

334bb06

This was for testing with the deployable container build pipeline during development. Remove before merging into the default branch since it will no longer be needed there.

Don't checkout submodules for deployment builds.

ae95895

I'm told this is no longer necessary as the pipeline is using a different solution.

rillian dismissed NullHypothesis’s stale review via ae95895 April 20, 2023 18:15

NullHypothesis approved these changes Apr 20, 2023

View reviewed changes

DJAndries approved these changes Apr 20, 2023

View reviewed changes

rillian merged commit 9e18582 into main Apr 24, 2023

rillian deleted the rust branch April 24, 2023 15:12

This was referenced May 8, 2023

Reject invalid epoch requests #16

Closed

Add logging lib with log levels, add debug logging #14

Closed

DJAndries mentioned this pull request Aug 30, 2023

Add support for multiple OPRF instances in a single server #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement randomness server in rust #50

Re-implement randomness server in rust #50

rillian commented Feb 17, 2023 •

edited

Loading

NullHypothesis left a comment

NullHypothesis commented Apr 18, 2023

rillian commented Apr 18, 2023

NullHypothesis commented Apr 19, 2023

rillian commented Apr 19, 2023

fmarier Apr 20, 2023

NullHypothesis Apr 20, 2023

rillian Apr 20, 2023

rillian Apr 20, 2023

fmarier Apr 20, 2023 •

edited

Loading

rillian commented Apr 20, 2023

NullHypothesis left a comment

rillian commented Apr 20, 2023

NullHypothesis left a comment

rillian commented Apr 20, 2023

DJAndries left a comment

Re-implement randomness server in rust #50

Re-implement randomness server in rust #50

Conversation

rillian commented Feb 17, 2023 • edited Loading

NullHypothesis left a comment

Choose a reason for hiding this comment

NullHypothesis commented Apr 18, 2023

rillian commented Apr 18, 2023

NullHypothesis commented Apr 19, 2023

rillian commented Apr 19, 2023

fmarier Apr 20, 2023

Choose a reason for hiding this comment

NullHypothesis Apr 20, 2023

Choose a reason for hiding this comment

rillian Apr 20, 2023

Choose a reason for hiding this comment

rillian Apr 20, 2023

Choose a reason for hiding this comment

fmarier Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

rillian commented Apr 20, 2023

NullHypothesis left a comment

Choose a reason for hiding this comment

rillian commented Apr 20, 2023

NullHypothesis left a comment

Choose a reason for hiding this comment

rillian commented Apr 20, 2023

DJAndries left a comment

Choose a reason for hiding this comment

rillian commented Feb 17, 2023 •

edited

Loading

fmarier Apr 20, 2023 •

edited

Loading