Skip to content
This repository has been archived by the owner on Aug 6, 2021. It is now read-only.

Test rollback-health (broken balena) #538

Open
balenaio opened this issue May 22, 2019 · 0 comments
Open

Test rollback-health (broken balena) #538

balenaio opened this issue May 22, 2019 · 0 comments

Comments

@balenaio
Copy link
Collaborator

Description

Note: The OLD OS must be v2.30+

rollback-health recovers the device if the new OS after a HUP can't run balena

Steps to reproduce issue

Make sure you are running the old OS on the device. Not the new OS.

0 Note down old OS version.

1 Run HUP

hostapp-update -i <DOCKERHUB_ACCOUNT>/<DOCKERHUB_REPO>:<TAG>

Note: This is without the -r flag. We don't want to reboot the device. We would like to mess up the new OS before rebooting.

2 Break new OS in inactive partition (break balena)

Find balena-engine binary
find /mnt/sysroot/inactive | grep balena-engine

root@balena:~# find /mnt/sysroot/inactive/ | grep balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/etc/systemd/system/balena-engine.service
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/etc/balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/home/root/.balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/var/lib/balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-containerd-ctr
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-proxy
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-daemon
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-runc
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-containerd-shim
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-containerd
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/lib/systemd/system/balena-engine.socket
root@balena:~# 

replace the balena-engine binary with bash
cp /bin/bash PATH_TO_BALENA_BINARY

3. Reduce timer

By default, rollback-health triggers in 15 minutes. We'd like to reduce the time for testing. Lets trigger in 2 minutes.

Find the rollback-health script
root@balena:~# find /mnt/sysroot/inactive/ | grep "bin/rollback-health" /mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/rollback-health root@balena:~#

Edit COUNT=15 and make it COUNT=2

sed -i "s/COUNT=.*/COUNT=2/g" PATH_TO_rollback-health_FILE

4 Reboot

sync
reboot

Expected result

The new OS should boot.

  • Check ls /mnt/state, there should be rollback-*-breadcrumb files

  • Check journalctl -f -u rollback-health.service. The device should trigger a rollback in 2 minutes after balena doesn't run and the balena healthcheck fails

root@balena:~# journalctl -f -u rollback-health.service -- Logs begin at Wed 2019-01-09 12:50:34 UTC. -- Jan 30 12:27:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:27:40 balena sh[752]: Trying healthcheck again 0 of 2 attempts Jan 30 12:28:40 balena sh[752]: Rollback: Running tests Jan 30 12:28:40 balena sh[752]: Rollback: VPN used to be offline before HUP. Not using VPN healthcheck for rollback Jan 30 12:28:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:28:40 balena sh[752]: Trying healthcheck again 1 of 2 attempts ...

The OS should reboot into the previous OS.
rollback-health-triggered file should exist in the stat partition.

root@balena:~# ls /mnt/state/ lost+found machine-id remove_me_to_reset rollback-health-triggered root-overlay root@balena:~#

Other information

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant