Test rollback-health (broken balena) #538

balenaio · 2019-05-22T14:21:08Z

Description

Note: The OLD OS must be v2.30+

rollback-health recovers the device if the new OS after a HUP can't run balena

Steps to reproduce issue

Make sure you are running the old OS on the device. Not the new OS.

0 Note down old OS version.

1 Run HUP

hostapp-update -i <DOCKERHUB_ACCOUNT>/<DOCKERHUB_REPO>:<TAG>

Note: This is without the -r flag. We don't want to reboot the device. We would like to mess up the new OS before rebooting.

2 Break new OS in inactive partition (break balena)

Find balena-engine binary
find /mnt/sysroot/inactive | grep balena-engine

root@balena:~# find /mnt/sysroot/inactive/ | grep balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/etc/systemd/system/balena-engine.service
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/etc/balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/home/root/.balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/var/lib/balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-containerd-ctr
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-proxy
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-daemon
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-runc
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-containerd-shim
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/balena-engine-containerd
/mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/lib/systemd/system/balena-engine.socket
root@balena:~#

replace the balena-engine binary with bash
cp /bin/bash PATH_TO_BALENA_BINARY

3. Reduce timer

By default, rollback-health triggers in 15 minutes. We'd like to reduce the time for testing. Lets trigger in 2 minutes.

Find the rollback-health script
root@balena:~# find /mnt/sysroot/inactive/ | grep "bin/rollback-health" /mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/rollback-health root@balena:~#

Edit COUNT=15 and make it COUNT=2

sed -i "s/COUNT=.*/COUNT=2/g" PATH_TO_rollback-health_FILE

4 Reboot

sync
reboot

Expected result

The new OS should boot.

Check ls /mnt/state, there should be rollback-*-breadcrumb files
Check journalctl -f -u rollback-health.service. The device should trigger a rollback in 2 minutes after balena doesn't run and the balena healthcheck fails

root@balena:~# journalctl -f -u rollback-health.service -- Logs begin at Wed 2019-01-09 12:50:34 UTC. -- Jan 30 12:27:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:27:40 balena sh[752]: Trying healthcheck again 0 of 2 attempts Jan 30 12:28:40 balena sh[752]: Rollback: Running tests Jan 30 12:28:40 balena sh[752]: Rollback: VPN used to be offline before HUP. Not using VPN healthcheck for rollback Jan 30 12:28:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:28:40 balena sh[752]: Trying healthcheck again 1 of 2 attempts ...

The OS should reboot into the previous OS.
rollback-health-triggered file should exist in the stat partition.

root@balena:~# ls /mnt/state/ lost+found machine-id remove_me_to_reset rollback-health-triggered root-overlay root@balena:~#

Other information

Reported by: Vicentiu Galanopulo
Test case number: TC46
Test run: https://resinio.testlodge.com/projects/16238/runs/396972?tab=2&run_section_id=578221#executed_case_17140937

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test rollback-health (broken balena) #538

Test rollback-health (broken balena) #538

balenaio commented May 22, 2019

Test rollback-health (broken balena) #538

Test rollback-health (broken balena) #538

Comments

balenaio commented May 22, 2019

Description

Steps to reproduce issue

0 Note down old OS version.

1 Run HUP

2 Break new OS in inactive partition (break balena)

3. Reduce timer

4 Reboot

Expected result

Other information