Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bb.org: Add docker library cleanup on dependent failure #81

Closed
wants to merge 1 commit into from

Conversation

grooverdan
Copy link
Member

results appears to be BuildStep attribute per
master/buildbot/process/buildstep.py in the buildbot code.

The unused(broken?) ifStagingSucceeding uses:
step.build.results in ('SUCCESS', 'WARNINGS')

@cvicentiu
Copy link
Member

Hi @grooverdan. I don't fully understand what the problem is, but I assume it's some cleanup that is not running. It would be good if you could document this failure a little bit better in the commit message, so we know the history behind it.

Also, why does this have to be a new builder? Why isn't this a simple step?
The current code, if I understand it correctly, assumes that the build will be run on the same worker and if at any point in the future we have two workers running docker_library builds the current setup won't work as the cleanup might just run on a different machine.

Copy link
Contributor

@fauust fauust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me. But I am not sure how to test this? Also I am wondering if we could not add a cleaning based on the until filter?
https://docs.podman.io/en/latest/markdown/podman-image-prune.1.html

This is maybe not needed but we have implemented a more drastic clean on docker workers, see: https://gitlab.com/mariadb/sysadmin/-/blob/master/ansible/roles/bb_worker_docker/tasks/docker.yml#L27-34


commit=${2:-}
branch=${3:-${master_branch}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should not use getopts more frequently for those script? It makes them easier to debug and above all more robust in case they are not called correctly:

  usage() {
    cat >&2 <<-EOF
    Usage : $0 -v <mariadb-version> -c <commit> -b <branch>
      -v mariadb version
      -c commit hash
      -b branch
      -h help
  EOF
  }

  typeset mariadb_version=""
  typeset commit=""
  typeset branch=""

  while getopts "v:c:b:h" OPTION; do
    case $OPTION in
      v)
        mariadb_version="$OPTARG"
        ;;
      c)
        commit="$OPTARG"
        ;;
      b)
        branch="$OPTARG"
        ;;
      h)
        usage
        exit 0
        ;;
      *)
        usage
        exit 1
        ;;
    esac
  done

  [[ $mariadb_version != "" ]] || {
    usage
    exit 1
  }
  [[ $commit != "" ]] || {
    usage
    exit 1
  }
  [[ $branch != "" ]] || {
    usage
    exit 1
  }

  mariadb_version=${mariadb_version#*-}
  master_branch=${mariadb_version%\.*}
  commit=${$commit:-}
  branch=${$branch:-${master_branch}}

f_dockerlibrary_cleanup.addStep(getScript('docker_cleanup.sh'))
f_dockerlibrary_cleanup.addStep(steps.ShellCommand(
name="cleanup manifest/images because one dependency failed to build",
command=["bash", "-xc", util.Interpolate("./docker_cleanup.sh \"%(prop:mariadb_version)s\" \"%(prop:revision)s\" \"%(prop:branch)s\"")]))
Copy link
Contributor

@fauust fauust Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command=["bash", "-xc", util.Interpolate("./docker_cleanup.sh -v \"%(prop:mariadb_version)s\" -c \"%(prop:revision)s\" -b \"%(prop:branch)s\"")]))

See above

while read line; do
id=${line% *}
digest=${line#* }
echo id=$id digest=$digest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mandatory, but good habit IMO:
[shellcheck] SC2086: Double quote to prevent globbing and word splitting. [Info]


t=$(mktemp)
for m in mariadb-devel-${container_tag}-$commit mariadb-debug-${container_tag}-$commit
do
Copy link
Contributor

@fauust fauust Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for foo in bar; do

@grooverdan
Copy link
Member Author

Hi @grooverdan. I don't fully understand what the problem is, but I assume it's some cleanup that is not running. It would be good if you could document this failure a little bit better in the commit message, so we know the history behind it.

Also, why does this have to be a new builder? Why isn't this a simple step?
The current code, if I understand it correctly, assumes that the build will be run on the same worker and if at any point in the future we have two workers running docker_library builds the current setup won't work as the cleanup might just run on a different machine.

Its adding a fail only step that triggers the cleanup script on a different worker. i.e. ppc64le-ubuntu-2004-deb-autobake fails to build, trigger the cleanup of the production of the image from the other workers (which as I realize now assumes this failure is the last step).

Better suggestions welcome.

@cvicentiu
Copy link
Member

@grooverdan I still don't understand. Why is this necessary? If an image fails to build, what gets pushed to the registry that needs to be cleaned up?

And you still haven't answered what prevents you from running the cleanup script on the same worker?

Container builds are based off 3 or 4 dependent builders triggering
the amd64-rhel8-docker builder to trigger the builds. These represent
the number of architectures that can be built depending on the major
version at the time.

The container and manifest are only pushed to quay.io once the full
complement of architectures is reached. The two contraints that
led to this approach are:
* you cannot push an image without a tag
* if you pushed a manifest of a particular tag, users fetching that
  tag of an with an architecture that hasn't finished building with
  receive a non-functioning image (for their architecture).

If a deb-autobake builder for ubuntu-2004 (or 1804 for 10.2 branch)
fails, this results in a number of images and their manifest waiting
for a trigger that won't happen.

As such we create a trigger step that happens alwaysRun=True, but only
runs if the current build step result is failure. This triggers the
cleanup script on the amd64-rhel8-dockerlibrary builder to remove
the image and manifests of those that won't complete.

"results" appears to be BuildStep attribute per
master/buildbot/process/buildstep.py in the buildbot code.

The unused(broken?) ifStagingSucceeding uses step.build.results in
('SUCCESS', 'WARNINGS'), so I'm unsure if which invokation is correct.
@grooverdan
Copy link
Member Author

commit message description updated.

@cvicentiu
Copy link
Member

Hi @grooverdan ! Thank you for the updated commit message. This made it much clearer.

A better proposal coming up in a few hours.

@fauust fauust force-pushed the master branch 2 times, most recently from 77b6092 to ed394b7 Compare April 28, 2022 21:12
@fauust fauust force-pushed the master branch 13 times, most recently from 7c3b2ca to cd5321a Compare May 9, 2022 16:44
@grooverdan grooverdan marked this pull request as draft June 8, 2022 08:35
@grooverdan grooverdan closed this Dec 21, 2022
@grooverdan grooverdan deleted the dockerlibrary_cleanup branch December 21, 2022 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants