Skip to content
This repository has been archived by the owner on Aug 6, 2021. It is now read-only.

Test rollback-health (broken balena) #556

Open
balenaio opened this issue Feb 11, 2021 · 0 comments
Open

Test rollback-health (broken balena) #556

balenaio opened this issue Feb 11, 2021 · 0 comments

Comments

@balenaio
Copy link
Collaborator

Description

Note: The OLD OS must be v2.30+

Provision in production a device with the latest balenaOS version available there. If that version in production is hostapp-update enabled (above resinOS 2.5.1) use that, otherwise take the latest hostapp-update enabled version from staging and provision it on production. (meaning use a config.json from the production device and use that config.json)

If you are testing an ESR release, then check with cat /etc/os-release and note from there the META_BALENA_VERSION the ESR is based on and instead use the previous release to META_BALENA_VERSION.

rollback-health recovers the device if the new OS after a HUP can't run balena

Steps to reproduce issue

Make sure you are running the old OS on the device. Not the new OS.

1. Make sure the host OS update bundle is available in a docker hub repo.

( https://hub.docker.com/r/resin/resinos/tags/ for public released boards; custom dockerhub repo for private boards, see steps below)

The hostOS update bundle should be available in resin/resinos-staging:-

For example:

resin/resinos-staging:2.7.2_rev2-intel-edison

(note that “2.7.2+rev2” becomes “2.7.2_rev2” for the purpose of the tag.)

If you are working on a board available in the resin dashboard, skip to step number 2.

If the update bundle is not available in a docker hub repo, but you have a docker hub account and also have the corresponding resinHUP .docker image, then follow these instructions:

$ docker load --quiet -i <PATH_TO_.docker_HUP_BUNDLE>

the above command will yield an IMAGE_ID

$ docker tag <IMAGE_ID from above> <DOCKERHUB_ACCOUNT>/<DOCKERHUB_REPO>:

$ docker push <DOCKERHUB_ACCOUNT>/<DOCKERHUB_REPO>:

2. Run HUP manually (without rebooting!)

hostapp-update -i <DOCKERHUB_ACCOUNT>/<DOCKERHUB_REPO>:<TAG>

Note: This is without the -r flag. We don't want to reboot the device. We would like to mess up the new OS before rebooting.

3. Break new OS in inactive partition (break balena) and reduce timer so the healthcheck trigger happens faster:

cp /bin/bash `find /mnt/sysroot/inactive/ | grep "/usr/bin/balena-engine$"` && sed -i "s/COUNT=.*/COUNT=2/g" `find /mnt/sysroot/inactive/ | grep "bin/rollback-health"` && reboot

4. Check for rollback-*-breadcrumb files:

ls -al /mnt/state/rollback-*-breadcrumb

5. Check the rollback-health.service:

journalctl -f -u rollback-health.service

6. Check for breadcrumbs:

ls -al /mnt/state/rollback-health-triggered

Expected result

3.

After the command is executed, board shall reboot and the new OS shall boot.

4.

There should be breadcrumb files:

root@47dcdb4:~# ls -al /mnt/state/rollback-*-breadcrumb -rw-r--r-- 1 root root 0 Dec 24 22:59 /mnt/state/rollback-altboot-breadcrumb -rw-r--r-- 1 root root 0 Dec 24 22:59 /mnt/state/rollback-health-breadcrumb

5.

The device should trigger a rollback in 2 minutes after balena doesn't run and the balena healthcheck fails

root@balena:~# journalctl -f -u rollback-health.service -- Logs begin at Wed 2019-01-09 12:50:34 UTC. -- Jan 30 12:27:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:27:40 balena sh[752]: Trying healthcheck again 0 of 2 attempts Jan 30 12:28:40 balena sh[752]: Rollback: Running tests Jan 30 12:28:40 balena sh[752]: Rollback: VPN used to be offline before HUP. Not using VPN healthcheck for rollback Jan 30 12:28:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:28:40 balena sh[752]: Trying healthcheck again 1 of 2 attempts ...

The OS should reboot into the previous OS.

6.

Once rebooted in the previous OS, check that the rollback-health-triggered file exists in the state partition:

root@a12e085:~# ls -al /mnt/state/rollback-health-triggered -rw-r--r-- 1 root root 0 Dec 24 11:17 /mnt/state/rollback-health-triggered

Actual result

Fails because we have removed vmlinux from the boot partition

Other information

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant