Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Workflow Refactoring #1637

Merged
merged 218 commits into from
Feb 12, 2024
Merged

CI Workflow Refactoring #1637

merged 218 commits into from
Feb 12, 2024

Conversation

bennibbelink
Copy link
Contributor

This branch uses a custom action (currently in my repo) to build docker images using the Dockerfiles we have in our repos. The action is as follows:

  • optionally download an docker image artifact (as a tarball) and load the image
  • checks out a specified repository and a specified ref (defaults to the commit that triggered the workflow)
  • builds a docker image via the Dockerfile specified and optionally tags it according to an input argument and uploads a tarball artifact

I separated the Dockerfile into one that just builds the dependency images and one that builds cyclus. I also moved code coverage into its own workflow since it is more in-depth than just building a Dockerfile

Changes to the old workflows:

build_test.yml

  • uses the custom action to build cyclus from cyclus.dockerfile
  • includes downstream testing of cycamore and cymetric, which is only triggered on PR and "soft" fails

build_test_publish.yml

I couldn't find a great way to implement this without redundancy (or the multi-stage-build action we used before), so I'm open to ideas (I did find an action that might allow us to share artifacts between workflows but I fear that may end up adding much more complexity than its worth)

  • builds the dependency image from deps.dockerfile using the custom action and uploads a tarball artifact
  • builds cyclus using the custom action (after loading the tarball) and push image if the workflow was triggered on push (to main)
  • IF the cyclus build succeeds, push the dependency image to ghcr

publish_release.yml

  • uses the custom action to build and push a cyclus image tagged as the github.ref_name

I'm sure there must be some details I'm forgetting. @gonuke what thoughts do you have?

@bennibbelink
Copy link
Contributor Author

Ah shoot looks like I need to fix the default tag. Will find fix soon

@gonuke
Copy link
Member

gonuke commented Dec 20, 2023

I think this PR is prematurely pushing updated docker images to GHCR as latest. Most of our workflows have only done this when pushing to main (aka merging a PR). I haven't been able to confirm how that logic works here, but I do know that there is a sudden failure due to a lack of a libxml library in #1634, and I think I've traced it to this.

@gonuke
Copy link
Member

gonuke commented Dec 20, 2023

Thanks for building this system for us to review. I think there's a lot to discuss and so many different factors to consider.

@bennibbelink
Copy link
Contributor Author

I think this PR is prematurely pushing updated docker images to GHCR as latest. Most of our workflows have only done this when pushing to main (aka merging a PR). I haven't been able to confirm how that logic works here, but I do know that there is a sudden failure due to a lack of a libxml library in #1634, and I think I've traced it to this.

My fault, sorry about that.

What is the backtrace you've found? I'm seeing the image sha used by failing workflow in #1634 as 4481a2f20cc68136982538b3a2a8a263dc818f282fd9374c2d135d8dada4855e, which is consistent with the version of latest in ghcr. But the latest successful build/push workflow associated with this PR pushes a 22.04-conda-deps image with sha 190ff26d1e527a489547b2dbf088760e9beab6662315340a9aa9490ae6ec9083.

If you want to close this PR until we fix the problem so that CI works for other PRs, that's fine with me.

@gonuke
Copy link
Member

gonuke commented Dec 23, 2023

I'm not sure if we need to close it, or just prevent it from pushing with the latest tag?

@gonuke
Copy link
Member

gonuke commented Dec 23, 2023

I retagged an older version with the latest tag - hopefully that fixes it for now

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look forward to chatting about some of these questions.

conda,
]

name: Push Dependency Image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only happen on a push to develop right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. Right now this workflow only runs on PR and a push to main, do you want to finally introduce a develop branch workflow like we've discussed in the past?

.github/workflows/build_test.yml Outdated Show resolved Hide resolved
docker/cyclus.dockerfile Outdated Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
@bennibbelink
Copy link
Contributor Author

I'm not convinced that my manual caching between jobs is working correctly, I need to dig into this more.

Also... just stumbled across this which might allow us to simplify the custom action

@bennibbelink
Copy link
Contributor Author

Still working through this downstream testing PR, but it looks like #1634 might've broken Cycamore @gonuke @nuclearkatie

@nuclearkatie
Copy link
Contributor

Yes, it did, see cycamore#569. #1634 changed the behavior of material buy policy, including the init process so cycamore also has to change.

I have a PR open that will fix it, cycamore#568. Also, there's a stray unrelated (sink, vs that PR is about storage) failing test that I have cycamore#570 open for

@bennibbelink
Copy link
Contributor Author

Thanks or the explanation @nuclearkatie

This was helpful in demonstrating this PRs capabilities (or lack thereof). Thanks to the failing cycamore build we now have an example of failing downstream tests. I fixed the workflow so that downstream failures don't inhibit us from merging a Cyclus PR. Unfortunately, the failing downstream tests dont present themselves in this PR discussion in any way, you have to view the workflow summary to see the warnings.

@gonuke do you have any thoughts about this? I wish it were more convenient to return warnings instead of success/failure but Github doesn't seem to have that functionality in place natively

@gonuke
Copy link
Member

gonuke commented Jan 24, 2024

Hmmm.... that's annoying. What about something like this: https://github.com/marketplace/actions/add-pr-comment

@cyclus cyclus deleted a comment from github-actions bot Jan 24, 2024
@cyclus cyclus deleted a comment from github-actions bot Jan 29, 2024
@cyclus cyclus deleted a comment from github-actions bot Jan 29, 2024
@cyclus cyclus deleted a comment from github-actions bot Jan 29, 2024
@cyclus cyclus deleted a comment from github-actions bot Jan 29, 2024
@cyclus cyclus deleted a comment from github-actions bot Jan 29, 2024
Copy link

github-actions bot commented Jan 29, 2024

Downstream build statuses using cyclus_22.04_apt

  • Cycamore: Success
  • Cymetric: Success

Copy link

github-actions bot commented Jan 29, 2024

Downstream build statuses using cyclus_20.04_conda

  • Cycamore: Success
  • Cymetric: Success

Copy link

github-actions bot commented Jan 29, 2024

Downstream build statuses using cyclus_20.04_apt

  • Cycamore: Success
  • Cymetric: Success

Copy link

github-actions bot commented Jan 29, 2024

Downstream build statuses using cyclus_22.04_conda

  • Cycamore: Success
  • Cymetric: Success

@bennibbelink
Copy link
Contributor Author

Please see cycamore PR #573 for notes on some of the changes that I've made (also involved is cymetric PR #190). I am finally confident that the workflows are caching layers and images appropriately, and downstream testing is using correct images to build from.

We are still seeing cycamore build failures but this is expected until cycamore PR #568 is merged (and it currently gives us a good example of what happens when downstream testing fails).

Also, since this PR now utilizes build arguments in the cycamore and cymetric Dockerfiles, those two PRs must be merged first before this can be merged. Currently the workflow checks out the tag-build-arg ref instead of main to allow it to work. This should be fixed before this PR gets approved

@bennibbelink
Copy link
Contributor Author

bennibbelink commented Jan 29, 2024

Also... thanks to using the registry as a cache backend, we no longer need to maintain <pkg_mgr>-deps images to speed up our workflows. Thus, I renamed the build_test_publish.yml workflow to publish_latest.yml in all 3 repos and only have it push the cyclus image. It is nearly identical to the publish_release.yml workflow except - publish_latest.yml publishes the latest image on a push to main and publish_release.yml publishes the stable image on a push to main with a tag (I think this will work for a release, we can test it out prior to a release by creating an arbitrary tag. There are release triggers but the different types seem somewhat convoluted and this solution feels simpler yet still correct).

@bennibbelink bennibbelink marked this pull request as ready for review February 1, 2024 14:27
@bennibbelink
Copy link
Contributor Author

With that - marking this as ready for review. Some things that I changed since we met on Wednesday:

  • I changed some triggers... mostly to do with the Dockerfile. I may be forgetting something but I think a big reason we had special cases for Dockerfile changes were because we were pushing updated dependency images. Now the workflows are triggered as follows:
    • publish_latest.yml runs on a PR that changes publish_latest.yml or on a push to main
    • build_test.yml runs on a PR that changes pretty much anything relevant, i.e. everything except the other workflows, docs, and changelog.
    • the others are pretty straightforward
  • I modified publish_latest.yml to build and push downstream images. This way if there is a cyclus update the downstream latest images will get updated with the latest cyclus core library. It will tag them as latest according to the usual conditions and as ci-image-cache otherwise (i.e. when someone has a PR that modifies the workflow). The downstream image builds fail softly

Note: I marked this ready for review but as we discussed we must merge cymetric #573 first, update cycamore PR #190 and merge, then update this PR and merge

@bennibbelink bennibbelink marked this pull request as ready for review February 9, 2024 04:26
Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some repeated some new comments... mostly minor

.github/workflows/build_test.yml Outdated Show resolved Hide resolved
.github/workflows/build_test.yml Outdated Show resolved Hide resolved
.github/workflows/code_coverage.yml Outdated Show resolved Hide resolved
.github/workflows/publish_latest.yml Outdated Show resolved Hide resolved
.github/workflows/publish_latest.yml Outdated Show resolved Hide resolved
.github/workflows/publish_release.yml Outdated Show resolved Hide resolved
docker/Dockerfile Outdated Show resolved Hide resolved
@gonuke
Copy link
Member

gonuke commented Feb 12, 2024

Merging because the code coverage reduction is an artifact of increased testing scope

@gonuke gonuke merged commit e205a12 into main Feb 12, 2024
14 of 15 checks passed
@bennibbelink bennibbelink deleted the downstream-testing branch February 16, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add downstream testing on merge
4 participants