-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat #126: Added testnet and image builds with pinned Rust and Solana… #133
Conversation
… inside the container build
I haven’t found an easy way, but what you can do is One way to work around that is to |
I have one silly question ... where do the release artifacts end up? I ran
But nothing in my working directory changed. If I want to upload the |
I started a shell in the container and navigated to
Do they match yours? They differ from the ones that I get when I build locally, without the container, but that was expected. |
&& sha256sum multisig.so >> multisig.hash \ | ||
&& sha256sum spl_math.so >> spl_math.hash \ | ||
&& sha256sum spl_stake_pool.so >> spl_stake_pool.hash \ | ||
&& sha256sum spl_token.so >> spl_token.hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These hashes by themselves are not so useful, because unfortunately Solana zero-pads programs when you upload them (because accounts have a fixed length that you need to decide on up front). So when we want to compare checksums of the on-chain programs, they will differ due to the zero padding.
One thing we could do is to truncate the programs we download from the chain to be the same length as these files here, and then confirm that we only sliced off zeros at the end. Or we could do the opposite and zero-pad these files, but then we need to know the length of the on-chain programs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we definitely need a better way. I'll raise a ticket to look into the program dump you suggested.
Dockerfile
Outdated
&& sha256sum solido >> solido.hash | ||
|
||
|
||
FROM debian:stable-slim@sha256:463cabea60abc361ab670570fe30549a6531cd9af4a1b8577b1c93e9b5a1d369 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part below is no longer needed for building the programs, right? Only for running the test validator. Can we split that out, so we have a way of just building the programs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it is a little unclear what you mean. Do you mean why we use debian:stable-slim for the second stage container?
If so, then it is mostly to reduce image size (a saving of 2 gb).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, there are two things this container is doing:
- Build the BPF programs and CLI client.
- Start
solana-test-validator
At this point, part 1 is done, and the below are to make part 2 work. But couldn’t we separate those two things into separate containers? For multisig participants, part 2 is not needed, that’s only for local development if you don’t want to install solana-test-validator
directly on your host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. The container itself doesn't run the validator. The first stage installs Solana, copies in the already built CLI (because of the jemalloc issue) and also builds the bdf programs. These are then just copied into a debian image in order to reduce image size. When you are using the testnet, it is Tilt that is running the validator when it spins up the solido.yml manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are then just copied into a debian image in order to reduce image size.
Ah, I see. But since you need to have the Rust image anyway to build, what is the advantage of this? I can see why this is desirable if you build the image once and then distribute it to many people, but in this case, we want others to build it themselves and not download something pre-built, so they can verify that the artifact was really produced from the same source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"we want others to build it" -> It is still not clear to me that this is the only case, In conversations I have had with @malikankit previously it could be the case that we will publish this as a pre-built image for people to use. @malikankit your thoughts on this?
In any case, the multi stage build (i.e. using the rust base container to build and then copying into a debian slim container is purely to reduce image size of final image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be the case that we will publish this as a pre-built image for people to use
But who would use it? Lido users interact with the on-chain program, they never need the BPF program or the CLI. The uploader needs to build the BPF program anyway. And the multisig participants should confirm that the uploader uploaded the right thing, so they would need to build from source too.
For the CLI program and in the future the maintenance command, I suppose multisig participants should build it from source as well, otherwise we could just provide a rigged binary that e.g. prints that a proposed transaction adds validator X, even when the transaction really adds validator Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah as I said, it is a bit unclear to me what the use case for a publishable image would be down the line. However, reducing image size is generally good practice although I appreciate your point that if you are building the image locally then you won't be reducing space used overall due to the base containers used during the build.
nix/rust.nix
Outdated
@@ -0,0 +1,10 @@ | |||
{ sources ? import ./sources.nix }: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer used, is it? We now get the Rust toolchain from the container image. In fact, we don’t use Nix at all to build the BPF programs and CLI, right? Can we remove it for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the nix-shell for dev and prefer pinned sources.
testnet/Tiltfile
Outdated
@@ -0,0 +1,12 @@ | |||
load('ext://namespace', 'namespace_yaml') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t look at the testnet
directory, but it seems to me we don’t need it to build the programs, can we remove it for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only needed for using the testnet. You can build 2 ways (with the same docker file):
- just building the image with the buildimage script
- or by spinning up the testnet which builds and packages into the container in the testnet.
I was curious what the difference would be between the |
It might be worth doing a bindiff on the two files built in different ways at some point to deep dive into the differences. Would be interesting to know. |
Yeah, this is what |
Did you try with the --remap-path-prefix? |
I tried briefly, but I couldn’t find a way to make |
Maybe worth a ticket to explore further down the line. |
@ruuda I fixed the issue with building inside the container. Now everything is done in the container so no cargo clean in local repo. |
That matches what I got above 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dockerfile
and buildimage.sh
look good to me! But can we remove nix/
and testnet/
from this PR? They are not needed to build the programs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I have added the tooling for building the artefacts inside the container. The Rust version and the version of the Solana toolchain are pinned and the artefacts are packaged inside the container in the following dirs
solido cli -> The solido cli deploy -> The .so packages that can be deployed on-chainThere are two ways of building the image:
I will update the front facing docs in the widget repo to reflect the new structure and how to use along with docs in the forthcoming multisig repo for the dry-run.
There are 2 outstanding issues:
After the build the cli and the deploy so files are hashed and the result stored alongside the artefacts in the container. @ruuda I couldn't find a way of getting a program hash from an on-chain program, do you remember where you saw it? If not, then maybe we should just publish those hashes in the docs somewhere at the point of deployment onto the mainnet. Throughts?