Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.1.0, What can we improve? #61

Open
magowan opened this issue Sep 29, 2022 · 20 comments
Open

Release 0.1.0, What can we improve? #61

magowan opened this issue Sep 29, 2022 · 20 comments

Comments

@magowan
Copy link
Member

magowan commented Sep 29, 2022

Please reflect on our journey to 0.1.0 and leave your comments here on areas you feel we can improve.

Please add ideas on how we can improve these areas in your comment.
Thanks

@fitzthum
Copy link
Member

We didn't do a very good job at tagging our projects.

@fitzthum
Copy link
Member

fitzthum commented Oct 3, 2022

We were fairly informal regarding the freeze. Was it a code freeze (no)? Was it a feature freeze (sort of)? What were the goals of the freeze? Did we achieve them? Do we have a plan for future freezes?

@stevenhorsman
Copy link
Member

Slightly related to the tagging and code freezes, I think there was a lack of clarity and communication of what versions of different components got into the release and when they were tested. e.g. We cut the final payload on Tuesday, so any PRs made to components after this weren't included and we pulled in certain commits of components (e.g. image-rs) so some of them hadn't been updated in a few weeks.
We also waited until the operator CI was finished before merging in the RC1 payload, at 12:51 UTC on Wednesday, so any testing run before then was on something we didn't release.

@stevenhorsman
Copy link
Member

The current process to create the payloadImage is pretty reliable, but manual to kick-off, so susceptible to human error (e.g. not updating the branch you are building on correctly)

@stevenhorsman
Copy link
Member

We didn't manage to including the signature verification features in the sample payload and the payload was just created from one of my branches, so not tracked, or repeatable. For the next release, if we are still producing a sample payload, we should have some automated/scriptable process to add the required artefacts to the kata image.

@stevenhorsman
Copy link
Member

Other agent configuration is a bit of a mess at the moment. Some things are only customisable in the agent-config.toml (the endpoints allowed and aa_kbc_params). These have to be ‘baked’ into the image which makes it difficult for users to interact with. The agent config also overrides all the other kernel_params settings, so we should consider adding the ability to pass through these settings via kernel_params, and/or update the prioritisation so that the kernel_params override the baked in agent-config.toml. We also have a ‘default/sample’ agent-config template duplicated in a few places, so we should try and unify them if possible.

@stevenhorsman
Copy link
Member

Probably bigger than the scope of a single release, but having a proper CD process (we’ll need some CNCF resource first, so starting to try and get that might help) would help with us keeping quality high and being able to minimise code freezes.
I think the utopian goal here is that when dev updates something in ocicrypt-rs/image-rs/attestation-agent/kata-containers we are able to run the E2E operator based tests with that change included. I’d expect this to take much longer than a single release cycle, but the sooner we start, the sooner we finish!
If we had this system then I think the release process would be simplified a lot - on release date we'd just take the last passing 'integration' build and ship that (we'd need to ensure that the release notes matched it still I guess).

@stevenhorsman
Copy link
Member

Add some CI automated tests for all the features we doc and therefore 'support' e.g. secure ephemeral storage

@fidencio
Copy link
Member

fidencio commented Oct 6, 2022

The current process to create the payloadImage is pretty reliable, but manual to kick-off, so susceptible to human error (e.g. not updating the branch you are building on correctly)

kata-containers/kata-containers#5330

@fidencio
Copy link
Member

fidencio commented Oct 6, 2022

Probably bigger than the scope of a single release, but having a proper CD process (we’ll need some CNCF resource first, so starting to try and get that might help) would help with us keeping quality high and being able to minimise code freezes. I think the utopian goal here is that when dev updates something in ocicrypt-rs/image-rs/attestation-agent/kata-containers we are able to run the E2E operator based tests with that change included. I’d expect this to take much longer than a single release cycle, but the sooner we start, the sooner we finish! If we had this system then I think the release process would be simplified a lot - on release date we'd just take the last passing 'integration' build and ship that (we'd need to ensure that the release notes matched it still I guess).

#62

@fidencio
Copy link
Member

fidencio commented Oct 6, 2022

We didn't do a very good job at tagging our projects.

More than that, it seems there's a general lack of understanding that we don't necessarily need to tag all the projects for a release, and that just tagging a project doesn't make it automatically used by the payloadImage used by the Operator.

I wonder if we should have some educational sessions on how things are being done on the Kata Containers side, so folks can fully understand what to test / what to use / what to expect.

@fidencio
Copy link
Member

fidencio commented Oct 6, 2022

We were fairly informal regarding the freeze. Was it a code freeze (no)? Was it a feature freeze (sort of)? What were the goals of the freeze? Did we achieve them? Do we have a plan for future freezes?

As far as I understand we had requests of "new things" coming in till the last week, such as the sample payload image (please, don't take me wrong, I'm NOT blaming the ones who requested that, I'm just using this as an example).

In the future we should consider having a week or so for stabilisation and be strict that no new requests will come in. By the moment we have a new thing popping up, it means that all the validation and testing done so far is basically thrown away.

@fidencio
Copy link
Member

fidencio commented Oct 6, 2022

There's the sincere need of more people involved with the projects we rely on. At the end of the day the reviews on Kata Containers were mostly on the back of a very small group of folks, who were also working on huge set of different tasks.

We need people to get more involved with Kata Containers as we're relying on the project, and with enough contributions become an official reviewer / member of the project, then we can start spreading the load.

@sameo
Copy link
Member

sameo commented Oct 6, 2022

  • 100% manual release notes generation (may want to automate some of it)
  • Inconsistent tagging across repos
  • Lack of coordination between repo maintainers
  • Missing contributors list from the release notes
  • A few gaps compared to what we were planning (threat model, full removal of umoci/skopeo)

@wainersm
Copy link
Member

wainersm commented Oct 6, 2022

There's the sincere need of more people involved with the projects we rely on. At the end of the day the reviews on Kata Containers were mostly on the back of a very small group of folks, who were also working on huge set of different tasks.

We need people to get more involved with Kata Containers as we're relying on the project, and with enough contributions become an official reviewer / member of the project, then we can start spreading the load.

I'd like to emphasize those points as being critical for the CoCo project long term.

On this first release I very few people were reviewing the changes on Kata Containers, and a portion of those doing so weren't accounted on merge policy (2 or more people members of Kata Containers on github).

Not just for reviews, we need more developers in general to help on developing features/build/release/CI tasks.

@sameo
Copy link
Member

sameo commented Oct 6, 2022

There's the sincere need of more people involved with the projects we rely on. At the end of the day the reviews on Kata Containers were mostly on the back of a very small group of folks, who were also working on huge set of different tasks.
We need people to get more involved with Kata Containers as we're relying on the project, and with enough contributions become an official reviewer / member of the project, then we can start spreading the load.

I'd like to emphasize those points as being critical for the CoCo project long term.

On this first release I very few people were reviewing the changes on Kata Containers, and a portion of those doing so weren't accounted on merge policy (2 or more people members of Kata Containers on github).

Not just for reviews, we need more developers in general to help on developing features/build/release/CI tasks.

I concur. That actually emphasizes the need to work harder on upstreaming. The tighter we'll be to Kata, the higher the incentive, imho.

@wainersm
Copy link
Member

wainersm commented Oct 6, 2022

Probably bigger than the scope of a single release, but having a proper CD process (we’ll need some CNCF resource first, so starting to try and get that might help) would help with us keeping quality high and being able to minimise code freezes. I think the utopian goal here is that when dev updates something in ocicrypt-rs/image-rs/attestation-agent/kata-containers we are able to run the E2E operator based tests with that change included. I’d expect this to take much longer than a single release cycle, but the sooner we start, the sooner we finish! If we had this system then I think the release process would be simplified a lot - on release date we'd just take the last passing 'integration' build and ship that (we'd need to ensure that the release notes matched it still I guess).

This is a must-have if we are serious about releases every 6 weeks. Otherwise the releases will be with tears and pain. :)

@wainersm
Copy link
Member

wainersm commented Oct 6, 2022

Add some CI automated tests for all the features we doc and therefore 'support' e.g. secure ephemeral storage

I was going to ask about how to have an automated test for secure storage in kata-containers/kata-containers#5314 :)

Still regarding tests:

  • Asses the current tests in Kata Containers, their coverage and the gaps. Of course implement more tests afterwards.
  • Perhaps have a policy like: new features don't get merged without e2e automated tests running on CI. Unless otherwise there is a good technical justification to not do so
  • Use of current tests on CI operator is a hack. We need to figure out a better way to share tests between Kata and the Operator CI

@magowan
Copy link
Member Author

magowan commented Oct 6, 2022

Scope creep a little but understandable as everyone wanted to see their efforts in the first release.
Hopefully we can get into a good cadence with frequent releases, so features get added to the next release after they are ready, rather than rushing the into a release.

@dcmiddle
Copy link
Member

dcmiddle commented Mar 1, 2023

@magowan recommend we close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants