Skip to content
This repository has been archived by the owner on Aug 6, 2021. It is now read-only.

Test rollback-health (broken VPN) #539

Open
balenaio opened this issue May 23, 2019 · 0 comments
Open

Test rollback-health (broken VPN) #539

balenaio opened this issue May 23, 2019 · 0 comments

Comments

@balenaio
Copy link
Collaborator

Description

Note: The OLD OS must be v2.30+

rollback-health recovers the device if the new OS after a HUP can't connect to the VPN (if the device used to be able to connect to the VPN

Steps to reproduce issue

Make sure you are running the old OS on the device. Not the new OS.

0 Note down old OS version and make sure it is connected to vpn

1 Run HUP

hostapp-update -i <DOCKERHUB_ACCOUNT>/<DOCKERHUB_REPO>:<TAG>

Note: This is without the -r flag. We don't want to reboot the device. We would like to mess up the new OS before rebooting.

2 Break new OS in inactive partition (break vpn)

Find balena-engine binary
root@balena:~# find /mnt/sysroot/inactive/ | grep "bin/openvpn" /mnt/sysroot/inactive/balena/aufs/diff/a3deab251bad69187f6b0f30ea7b701f757497fb584682689add89125ff39613/usr/sbin/openvpn

replace the open-vpn binary with bash
cp /bin/bash PATH_TO_VPN_BINARY_IN_SYSROOT_INACTIVE

3. Reduce timer

By default, rollback-health triggers in 15 minutes. We'd like to reduce the time for testing. Lets trigger in 2 minutes.

Find the rollback-health script
root@balena:~# find /mnt/sysroot/inactive/ | grep "bin/rollback-health" /mnt/sysroot/inactive/balena/aufs/diff/22304dc93919b27d31a1dac635a7fed7215021bb6ef365c9ea56c7a65a16a8b9/usr/bin/rollback-health root@balena:~#

Edit COUNT=15 and make it COUNT=2

sed -i "s/COUNT=.*/COUNT=2/g" PATH_TO_rollback-health_FILE

4 Reboot

sync
reboot

Expected result

The new OS should boot.

  • Check ls /mnt/state, there should be rollback-*-breadcrumb files

  • Check journalctl -f -u rollback-health.service. The device should trigger a rollback in 2 minutes after balena doesn't run and the balena healthcheck fails

root@balena:~# journalctl -f -u rollback-health.service -- Logs begin at Wed 2019-01-09 12:50:34 UTC. -- Jan 30 12:27:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:27:40 balena sh[752]: Trying healthcheck again 0 of 2 attempts Jan 30 12:28:40 balena sh[752]: Rollback: Running tests Jan 30 12:28:40 balena sh[752]: Rollback: VPN used to be offline before HUP. Not using VPN healthcheck for rollback Jan 30 12:28:40 balena sh[752]: Rollback: ERROR: Balena is not healthy! Jan 30 12:28:40 balena sh[752]: Trying healthcheck again 1 of 2 attempts ...

The OS should reboot into the previous OS.
rollback-health-triggered file should exist in the stat partition.

root@balena:~# ls /mnt/state/ lost+found machine-id remove_me_to_reset rollback-health-triggered root-overlay root@balena:~#

Actual result

ournalctl -f -u rollback-health.service
-- Logs begin at Thu 2019-05-23 07:06:02 UTC. --
May 23 07:06:07 28c1a02 systemd[1]: Starting Balena rollback checks health...
May 23 07:06:07 28c1a02 sh[1538]: Rollback: Parsing bootloader configuration
May 23 07:06:07 28c1a02 sh[1538]: Rollback: Could not find upgrade_available variable in bootloader environment
May 23 07:06:07 28c1a02 systemd[1]: rollback-health.service: Main process exited, code=exited, status=1/FAILURE
May 23 07:06:07 28c1a02 systemd[1]: Failed to start Balena rollback checks health.
May 23 07:06:07 28c1a02 systemd[1]: rollback-health.service: Unit entered failed state.
May 23 07:06:07 28c1a02 systemd[1]: rollback-health.service: Failed with result 'exit-code'.

ls /mnt/state/
lost+found rollback-altboot-breadcrumb root-overlay
machine-id rollback-health-breadcrumb
remove_me_to_reset rollback-health-variables

Other information

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant