Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.9.0 Planning #200

Open
fitzthum opened this issue Apr 2, 2024 · 11 comments
Open

Release v0.9.0 Planning #200

fitzthum opened this issue Apr 2, 2024 · 11 comments

Comments

@fitzthum
Copy link
Member

fitzthum commented Apr 2, 2024

Planning issue for release v0.9.0 and releases involving Kata main in general.

Basic Process

The release process is defined by CoCo's somewhat complex dependency graph, which is approximately

flowchart LR   
    Trustee --> Versions.yaml
    Guest-Components --> Versions.yaml
    Kata --> kustomization.yaml
    Guest-Components --> Client-tool

    subgraph Kata
        Versions.yaml
    end
    subgraph Guest-Components
    end
    subgraph Trustee
        Client-tool
    end
    subgraph Operator
        kustomization.yaml
    end
    

At least three things are left out from the above:

  1. Peer pods. This would mainly go to the right of Kata
  2. Enclave-cc. I am assuming the release process for this will be mostly the same so it is not discussed here.
  3. Nydus-snapshotter. The snapshotter is a dependency of both the operator and the kata tests, but since the snapshotter itself isn't part of CoCo, it's left off the diagram.

Our release checklist mainly follows from this graph. One added complexity is that we go through most of the steps twice, creating a staging build to test the release before creating each release tag.

Practical Challenges

Individually none of the steps on the release checklist are particularly difficult, but overall the release tends to be challenging. Two of the biggest hurdles tend to be getting approvals quickly for all the PRs and getting the CI to pass on all of the PRs. The release requires coordination between multiple maintainers and the process can be very slow if this is not well-synchronized.

The release process also tends to uncover integration bugs that require fixing.

Changes for v0.9.0

I don't yet know the full extent of the changes that are needed for the v0.9.0 using Kata main. One of the first steps for that release will be figuring this out. One approach might be to look through the release checklist and try to update it before we start.

At the very least, we'll need to add some steps to handle the Trustee dependency in Kata. Kata did not pull Trustee in the v0.8.0, but it will for v0.9.0 because our tests actually use it.

Kata Main

In some ways using Kata main doesn't change much, but there are at least two things to think about.

Freezing

The release process involves making a number of PRs to kata to update various dependencies. In CCv0 this wasn't really an issue. We could make a bunch of random PRs to CCv0 and crucially we could freeze the branch while the release was happening.

To be more specific, in step 2 of the current release checklist, we update the version of the guest-components that kata pulls in. We make sure that the tests pass here. This is essentially our release candidate. Later in step 12 we do this again, but this time with the tagged version of guest components. What happens if Kata changes in between those steps?

One way to cope with this would be to create a branch in Kata corresponding to each release. This solves some problems but it also introduces some new questions, like what to do if we need to add a fixup to Kata during the release process. I'm not sure what the best solution is here.

Synchronization with Kata Releases

Another somewhat thorny question is whether we should relate to Kata releases. I guess the ideal situation would be for the bundle that we release via the operator to correspond with a release of Kata. I'm not sure how the timing would work out for this, especially for v0.9.0. This question night also be related to the last one.

Optimizations

We should always be on the lookout for ways to simplify the release process, although maybe for v0.9.0 we shouldn't make too many changes at once.

Since v0.8.0 we have made some changes to the way that the Trustee release works. The new process is documented here. Previously when we tagged the repo it triggered a workflow that built the release images. Now we reuse the most recently built image from the CI as the release image. Essentially this avoids having to build the image again.

This hints at a deeper question of whether there are more release steps that we could skip. Rather than building new images for the release, can we just tag the existing candidate image? Taking a step further, do we need to point to dependencies by release tag at all? Currently in step 2, we point Kata to the candidate image, and then later in step 12 we point to the release image, but if the tests pass on step 2/3, do we even need to do step 12?

@fitzthum
Copy link
Member Author

fitzthum commented Apr 2, 2024

maybe of interest to @jepio and @gkurz and @portersrc

@fitzthum
Copy link
Member Author

fitzthum commented Apr 4, 2024

Some more details about changes from v0.8.0

Nydus-snapshotter

The snapshotter actually was part of the last release, but it isn't mentioned anywhere in the release checklist. Do we need to do something more for v0.9.0? The nydus snapshotter isn't part of CoCo, so we won't be cutting a release of it as part of our release. I also don't think we will necessarily be updating to the latest version during the release. I would expect these kinds of updates to happen during the normal development cycle. One crucial thing to watch out for is that both the kata tests and the operator depend on the nydus snapshotter. During the release we should make sure that they are using the same version.

For the snapshotter, I propose a simple item on the checklist, where we check this against this. Do we need to do anything else?

The Tests

For v0.8.0 the operator tests were the ultimate signal for the release. To run these we needed the Kata CI to generate a bundle. Now the Kata CI itself runs on the bundle (rather than separately generated CI artifacts) and the Kata CI has tests to exercise all the supported CoCo features. This means that we probably do not have to go through our whole checklist twice. Once the tests are passing on Kata, we can be pretty sure that the bundle is valid. This potentially solves our issue with freezing. Also note that we can use one PR to try out a few different things. For instance we can make an initial version with the candidate hashes of each dependency, then if the CI passes, we can tag the dependencies, and update the PR to use the tags.

Trustee

I mentioned in the first note that we now have a trustee dependency in kata. This should be relatively easy to handle. We just need to reorder the checklist a bit. In trustee we have a dependency on the guest-components. This is part of the client-tool (and not even a feature that we use in kata). We should continue to use the hash of the guest-components repo for this dependency.

Tomorrow I will try to put all this together into a PR to the checklist.

@mythi
Copy link
Contributor

mythi commented Apr 5, 2024

We should always be on the lookout for ways to simplify the release process, although maybe for v0.9.0 we shouldn't make too many changes at once.

It looks to me the sync to Kata releases is a bit open in this proposal. Given Kata is moving to a monthly release cadence, would we be OK to follow or let's say release every odd/even Kata minor version. I find it risky we'd do "in-between" releases.

Another comment I have is related to the dependency diagram and versions.yaml. It's rather minor but AFAUI, guest-components' KBS protocol has a (weak) dependency to Trustee's version but from Kata's perspective the Trustee version is only implicit and should be defined by what guest-components version is used. So, any bump to Trustee should in theory have the corresponding validated guest-components KBS protocol bump as well. It would be good to have this at least documented and perhaps for clarity, have all of these bundled to a single versions.yaml entry.

In trustee we have a dependency on the guest-components. This is part of the client-tool (and not even a feature that we use in kata)

This is a rather annoying circular dependency. Does it make sense to keep the client-tool in trustee repo or move it to somewhere else?

@fitzthum
Copy link
Member Author

fitzthum commented Apr 5, 2024

Given Kata is moving to a monthly release cadence, would we be OK to follow or let's say release every odd/even Kata minor version. I find it risky we'd do "in-between" releases.

I think we should do something like this. We just need to figure out what that would actually look like in terms of the Kata release process (i.e. when would it be best to create our PR to kata updating the coco stuff).

So, any bump to Trustee should in theory have the corresponding validated guest-components KBS protocol bump as well. It would be good to have this at least documented and perhaps for clarity, have all of these bundled to a single versions.yaml entry.

The relationship between guest-components and Trustee is interesting. I guess the coupling isn't really that strong given that the KBS protocol itself doesn't change very frequently, but from the POV of a user it is true that they should choose the version of Trustee their using based on what platforms/features they are expecting to use with it.

For Kata, versions.yaml will always specify a pair that work together, but maybe we should also give users some hint about what to use. I guess the Trustee repo will have a corresponding release tag. Maybe that is enough info.

This is a rather annoying circular dependency. Does it make sense to keep the client-tool in trustee repo or move it to somewhere else?

We have talked about moving it to guest components, but it's a little bit thorny since we use the client-tool in the KBS tests and now in the kata tests. I am still open to this, though.

@fitzthum
Copy link
Member Author

fitzthum commented Apr 5, 2024

I think we can probably ignore the client-tool dependency for the release actually. It should be updated during the normal course of development. Even if there were a fatal mismatch it would not affect Kata.

@mythi
Copy link
Contributor

mythi commented Apr 8, 2024

Given Kata is moving to a monthly release cadence, would we be OK to follow or let's say release every odd/even Kata minor version. I find it risky we'd do "in-between" releases.

I think we should do something like this. We just need to figure out what that would actually look like in terms of the Kata release process (i.e. when would it be best to create our PR to kata updating the coco stuff).

I believe it depends on whether we are ready to follow their monthly cadence or we skip every other release. If the agreement is to go with a monthly cadence, it should be possible to submit Kata SHA versions.yaml updates on a regular basis and then have our repos tagged after each Kata release.

So, any bump to Trustee should in theory have the corresponding validated guest-components KBS protocol bump as well. It would be good to have this at least documented and perhaps for clarity, have all of these bundled to a single versions.yaml entry.

The relationship between guest-components and Trustee is interesting. I guess the coupling isn't really that strong given that the KBS protocol itself doesn't change very frequently, but from the POV of a user it is true that they should choose the version of Trustee their using based on what platforms/features they are expecting to use with it.

It's more than the KBS protocol. Between 0.8.0 and 0.9.0 there's a breakage in the evidence format the attesters provide vs what AS can handle and that's transparent to the protocol itself. For the time being we may have to be explicit that there's no "version skew".

For Kata, versions.yaml will always specify a pair that work together, but maybe we should also give users some hint about what to use. I guess the Trustee repo will have a corresponding release tag. Maybe that is enough info.

This is a rather annoying circular dependency. Does it make sense to keep the client-tool in trustee repo or move it to somewhere else?

We have talked about moving it to guest components, but it's a little bit thorny since we use the client-tool in the KBS tests and now in the kata tests. I am still open to this, though.

Moving it to guest-components would relax the cross-repo dependency problem at least.

@fitzthum
Copy link
Member Author

fitzthum commented Apr 8, 2024

See #201 for updates to the release checklist

@fitzthum
Copy link
Member Author

fitzthum commented Apr 8, 2024

I believe it depends on whether we are ready to follow their monthly cadence or we skip every other release. If the agreement is to go with a monthly cadence, it should be possible to submit Kata SHA versions.yaml updates on a regular basis and then have our repos tagged after each Kata release.

Yeah I think we shoudl do something like this. I'm just not sure if we want to do our part during their release (i.e. during some freeze) or right after.

Moving it to guest-components would relax the cross-repo dependency problem at least.

Yes, but we would have a dependency on guest-components for the Trustee tests.

@mythi
Copy link
Contributor

mythi commented Apr 9, 2024

Moving it to guest-components would relax the cross-repo dependency problem at least.

Yes, but we would have a dependency on guest-components for the Trustee tests.

Was there a plan to have the client tool released as a binary?

@bpradipt
Copy link
Member

bpradipt commented Apr 9, 2024

I could think of few other requirements for the 0.9.0 release

  1. Having a runtimeClass to support non-TEE h/w but with all the CoCo features (like image guest pull etc..)
  2. Guest image with sample attester
  3. Instructions to enable sample attester in KBS (policy config)

@fitzthum @stevenhorsman @fidencio @wainersm does it make sense ? Let me know if these needs to be discussed in some other thread and I'll move it there.

@fitzthum
Copy link
Member Author

fitzthum commented Apr 9, 2024

The sample attester is now always baked into the CDH/AA so as long as we are using the confidential rootfs we will have it. No need for any config there either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants