Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat #126: Added testnet and image builds with pinned Rust and Solana… #133

Merged
merged 5 commits into from
Jun 8, 2021

Conversation

glottologist
Copy link
Contributor

I have added the tooling for building the artefacts inside the container. The Rust version and the version of the Solana toolchain are pinned and the artefacts are packaged inside the container in the following dirs

solido cli -> The solido cli deploy -> The .so packages that can be deployed on-chain

There are two ways of building the image:

  • Using a local testnet via testnetup.sh
  • Building the chorusone/solido container image via buildimage.sh

I will update the front facing docs in the widget repo to reflect the new structure and how to use along with docs in the forthcoming multisig repo for the dry-run.

There are 2 outstanding issues:

  1. using cargo build (rather than cargo build-bpf) or cargo test inside the container results in one of the dependencies jemalloc failing to build. It isn't an issue outside of the container build so likely a sys dependency missing from the container. As a workaround I am building the CLI outside the container via the scripts and copying the result into the container. I will raise an issue to fix this but didn't want to waste time coming up to the dry-run.
  2. Inside the container when cargo build-bpf is run it downloads the extension to cargo so this isn't pinned. I will raise a further issue to find a way to pin this.

After the build the cli and the deploy so files are hashed and the result stored alongside the artefacts in the container. @ruuda I couldn't find a way of getting a program hash from an on-chain program, do you remember where you saw it? If not, then maybe we should just publish those hashes in the docs somewhere at the point of deployment onto the mainnet. Throughts?

@ruuda
Copy link
Contributor

ruuda commented Jun 3, 2021

@ruuda I couldn't find a way of getting a program hash from an on-chain program, do you remember where you saw it? If not, then maybe we should just publish those hashes in the docs somewhere at the point of deployment onto the mainnet. Throughts?

I haven’t found an easy way, but what you can do is solana program dump «program-id» «outfile» and then hash outfile. A caveat here is that it will dump the entire account, which is probably larger than the program itself (you need to pick the size at create time and add some room to allow for upgrades), so the result is padded with zeros.

One way to work around that is to truncate the programs that we build to be the same size as the one obtained from solana program dump (if the on-chain one was the larger one), and then we can hash both. Alternatively we can truncate the downloaded program to be the same size as the one we built, and then we need to confirm that the excess consists only of zeros.

@ruuda
Copy link
Contributor

ruuda commented Jun 3, 2021

I have one silly question ... where do the release artifacts end up? I ran buildimage.sh successfully, Docker printed

Successfully built d2dbed8a2020
Successfully tagged chorusone/solido:d65d608

But nothing in my working directory changed. If I want to upload the .so file using solana program deploy, then what is the next step? Do I need to manually extract the programs from the image? Or am I supposed to start a shell in there through Docker and run solana program deploy inside the container?

@ruuda
Copy link
Contributor

ruuda commented Jun 3, 2021

I started a shell in the container and navigated to /root/.local/share/solana/install/active_release/bin/solido/deploy. These are the hashes I got out:

sha256sum *.so
1f3faecc820f788684ea1b70170d96a279d6fe6f9d63faeffeb46713aa623ef8  lido.so
0137f35edf794529c3c52e7d9b3192349b2dd18f06365bd737e06c50fb1fa4ca  multisig.so
dd85659dff0d3d00be284684efdcd118ff3f5f53aca36265f5971b816c3959d6  spl_math.so
8aff32e2c1965d73762b238521d7012a481dca58544ba590d23643133ad5f849  spl_stake_pool.so
da9ffe1a176d80e56be39d828b600afd49e852bce638a6aee1ef2ef70bf1a93d  spl_token.so

Do they match yours? They differ from the ones that I get when I build locally, without the container, but that was expected.

&& sha256sum multisig.so >> multisig.hash \
&& sha256sum spl_math.so >> spl_math.hash \
&& sha256sum spl_stake_pool.so >> spl_stake_pool.hash \
&& sha256sum spl_token.so >> spl_token.hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hashes by themselves are not so useful, because unfortunately Solana zero-pads programs when you upload them (because accounts have a fixed length that you need to decide on up front). So when we want to compare checksums of the on-chain programs, they will differ due to the zero padding.

One thing we could do is to truncate the programs we download from the chain to be the same length as these files here, and then confirm that we only sliced off zeros at the end. Or we could do the opposite and zero-pad these files, but then we need to know the length of the on-chain programs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we definitely need a better way. I'll raise a ticket to look into the program dump you suggested.

Dockerfile Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Dockerfile Outdated
&& sha256sum solido >> solido.hash


FROM debian:stable-slim@sha256:463cabea60abc361ab670570fe30549a6531cd9af4a1b8577b1c93e9b5a1d369
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part below is no longer needed for building the programs, right? Only for running the test validator. Can we split that out, so we have a way of just building the programs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it is a little unclear what you mean. Do you mean why we use debian:stable-slim for the second stage container?
If so, then it is mostly to reduce image size (a saving of 2 gb).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, there are two things this container is doing:

  1. Build the BPF programs and CLI client.
  2. Start solana-test-validator

At this point, part 1 is done, and the below are to make part 2 work. But couldn’t we separate those two things into separate containers? For multisig participants, part 2 is not needed, that’s only for local development if you don’t want to install solana-test-validator directly on your host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. The container itself doesn't run the validator. The first stage installs Solana, copies in the already built CLI (because of the jemalloc issue) and also builds the bdf programs. These are then just copied into a debian image in order to reduce image size. When you are using the testnet, it is Tilt that is running the validator when it spins up the solido.yml manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are then just copied into a debian image in order to reduce image size.

Ah, I see. But since you need to have the Rust image anyway to build, what is the advantage of this? I can see why this is desirable if you build the image once and then distribute it to many people, but in this case, we want others to build it themselves and not download something pre-built, so they can verify that the artifact was really produced from the same source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we want others to build it" -> It is still not clear to me that this is the only case, In conversations I have had with @malikankit previously it could be the case that we will publish this as a pre-built image for people to use. @malikankit your thoughts on this?

In any case, the multi stage build (i.e. using the rust base container to build and then copying into a debian slim container is purely to reduce image size of final image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be the case that we will publish this as a pre-built image for people to use

But who would use it? Lido users interact with the on-chain program, they never need the BPF program or the CLI. The uploader needs to build the BPF program anyway. And the multisig participants should confirm that the uploader uploaded the right thing, so they would need to build from source too.

For the CLI program and in the future the maintenance command, I suppose multisig participants should build it from source as well, otherwise we could just provide a rigged binary that e.g. prints that a proposed transaction adds validator X, even when the transaction really adds validator Y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as I said, it is a bit unclear to me what the use case for a publishable image would be down the line. However, reducing image size is generally good practice although I appreciate your point that if you are building the image locally then you won't be reducing space used overall due to the base containers used during the build.

buildimage.sh Outdated Show resolved Hide resolved
nix/rust.nix Outdated
@@ -0,0 +1,10 @@
{ sources ? import ./sources.nix }:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer used, is it? We now get the Rust toolchain from the container image. In fact, we don’t use Nix at all to build the BPF programs and CLI, right? Can we remove it for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the nix-shell for dev and prefer pinned sources.

testnet/Tiltfile Outdated
@@ -0,0 +1,12 @@
load('ext://namespace', 'namespace_yaml')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t look at the testnet directory, but it seems to me we don’t need it to build the programs, can we remove it for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only needed for using the testnet. You can build 2 ways (with the same docker file):

  • just building the image with the buildimage script
  • or by spinning up the testnet which builds and packages into the container in the testnet.

buildimage.sh Show resolved Hide resolved
@ruuda
Copy link
Contributor

ruuda commented Jun 3, 2021

I was curious what the difference would be between the .so files built locally and in the container. One difference is that rustc embeds source file paths in panic messages, and they are different between my local build and the container. The remaining differences might be just offsets being different due to the different string lengths. Rustc does have --remap-path-prefix for this, maybe eventually the output could be made reproducible without containers even.

@glottologist
Copy link
Contributor Author

I was curious what the difference would be between the .so files built locally and in the container. One difference is that rustc embeds source file paths in panic messages, and they are different between my local build and the container. The remaining differences might be just offsets being different due to the different string lengths. Rustc does have --remap-path-prefix for this, maybe eventually the output could be made reproducible without containers even.

It might be worth doing a bindiff on the two files built in different ways at some point to deep dive into the differences. Would be interesting to know.

@ruuda
Copy link
Contributor

ruuda commented Jun 3, 2021

It might be worth doing a bindiff on the two files built in different ways at some point to deep dive into the differences. Would be interesting to know.

Yeah, this is what diffoscope does. These are the differences: solido-diff.html

@glottologist
Copy link
Contributor Author

It might be worth doing a bindiff on the two files built in different ways at some point to deep dive into the differences. Would be interesting to know.

Yeah, this is what diffoscope does. These are the differences: solido-diff.html

Did you try with the --remap-path-prefix?

@ruuda
Copy link
Contributor

ruuda commented Jun 3, 2021

Did you try with the --remap-path-prefix?

I tried briefly, but I couldn’t find a way to make cargo build-bpf pass flags to rustc, it doesn’t seem to respect RUSTFLAGS.

@glottologist
Copy link
Contributor Author

Did you try with the --remap-path-prefix?

I tried briefly, but I couldn’t find a way to make cargo build-bpf pass flags to rustc, it doesn’t seem to respect RUSTFLAGS.

Maybe worth a ticket to explore further down the line.

@glottologist
Copy link
Contributor Author

@ruuda I fixed the issue with building inside the container. Now everything is done in the container so no cargo clean in local repo.
Pinned image on this branch after build should be chorusone/solido:f7b0192
Hashes in my image are:
/deploy
1f3faecc820f788684ea1b70170d96a279d6fe6f9d63faeffeb46713aa623ef8 lido.so
0137f35edf794529c3c52e7d9b3192349b2dd18f06365bd737e06c50fb1fa4ca multisig.so
dd85659dff0d3d00be284684efdcd118ff3f5f53aca36265f5971b816c3959d6 spl_math.so
8aff32e2c1965d73762b238521d7012a481dca58544ba590d23643133ad5f849 spl_stake_pool.so
da9ffe1a176d80e56be39d828b600afd49e852bce638a6aee1ef2ef70bf1a93d spl_token.so
/cli
fb8fd6f6efa57df9027ec1e3510ce10f665339a71b78d4322d4dce7a927b0beb solido

@ruuda
Copy link
Contributor

ruuda commented Jun 4, 2021

Hashes in my image are:

That matches what I got above 🎉

Copy link
Contributor

@ruuda ruuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile and buildimage.sh look good to me! But can we remove nix/ and testnet/ from this PR? They are not needed to build the programs.

@glottologist glottologist requested a review from ruuda June 7, 2021 14:39
Copy link
Contributor

@ruuda ruuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@glottologist glottologist merged commit 8b22c19 into main Jun 8, 2021
@ruuda ruuda deleted the 126-reproducible-builds branch June 22, 2021 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants