Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

manics · 2018-04-16T14:05:40Z

This reduces but does not eliminate some long standing issues with race conditions during reboots. The original aim of this repository was to support a full deployment run-through from scratch, with reboots postponed to the very end. In practice this causes problems when installed services are unnecessarily interrupted, or when a service is installed against an old running kernel which changes after the reboot. Instead this PRs executes the reboot at the end of the pre-install stage, after system packages have been upgraded but before any applications are installed.

This will lead to the first full run-through failing after the preinstall stage since the combination of multiple VM reboots via a proxy and rebooting the proxy makes it too complicated to auto-detect when all VMs have resumed, however this is preferable to other problems caused later.

joshmoore · 2018-04-17T07:12:18Z

The description here with planned failure seems at best like a workaround. Is deploy-idr.sh trying to do too much? Is there any 2 stage way of calling that would not fail?

manics · 2018-04-17T09:35:47Z

deploy-idr.sh currently has 7 steps:

- 'galaxy': Install galaxy roles
- 'provision': Provision OpenStack instances, storage and networks
- 'network': Miscellaneous network configuration inside instances
- 'deploy-pre': Initialise instances, including updating packages
- 'deploy': Deploy the IDR
- 'deploy-apps': Deploy public IDR apps
- 'deploy-post': Additional setup including OMERO accounts and monitoring

We could remove all and recommend running the stages individually, or add additional groupings?

joshmoore · 2018-04-17T13:59:22Z

I tested this locally by using:

$ deploy-idr.sh prod50 all  # failed for some unrelated issue
$ deploy-idr.sh prod50 expert deployment/ansible/idr-00-preinstall.yml

For me, the above which amounts to galaxy + provision + network + pre-install + the necessary reboot is a pretty good function for this script (though I'd still say in needs to be this repo). This amounts to "get me started" or bootstrap. The rest is pretty straight-forward on a playbook-by-playbook basis. e.g. I'm currently testing with this full.yml:


# Common usage:
# ansible-playbook playbooks/full.yml -u centos -e idr_environment=testXX -u centos

- hosts: localhost
  connection: local
  gather_facts: False
  pre_tasks:
   - fail: msg="Variable 'idr_environment' is not defined"
     when: idr_environment is not defined

- hosts: localhost
  connection: local
  gather_facts: False
  tasks:
   - include_vars:
       file: vars/os-idr-create-{{ idr_environment }}.yml

- import_playbook: ../deployment/ansible/idr-01-install-idr.yml
- import_playbook: ../deployment/ansible/idr-02-services.yml
- import_playbook: ../deployment/ansible/idr-03-postinstall.yml
- import_playbook: idr-links.yml
- import_playbook: idr-oneoff-steps.yml
- import_playbook: ../deployment/ansible/idr-09-monitoring.yml
- import_playbook: notify-slack.yml

joshmoore · 2018-04-17T14:19:53Z

Relaunched travis with gh-109

joshmoore · 2018-04-17T14:55:02Z

Now green with travis in addition to being run against prod50 as outlined above.

sbesson

Given all the ongoing work and the existing issues, I would propose to merge with the caveats discussed above. My understanding is that there is no way to safely get a single script deploying everything in one run with the guarantee that the system will be usable and that we have to evolve towards a multi-stage deployment:

boostrap (galaxy + provision + network + upgrade + reboot)
deploy (core + vae + ftp depending on the flages)

More discussion will be necessary about the future of the deploy-idr.sh script discussed above. If we are splitting the single script into self-contained group of phase, an additional thought is whether we should handle the decoupled components via separate standalone deployment phases (instead of using flags) i.e.:

boostrap (galaxy + provision + network + upgrade + reboot)
deploy_core
deploy_ftp
deploy_vae

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml

4c50045

manics added this to the 0.4.8 milestone Apr 16, 2018

manics mentioned this pull request Apr 16, 2018

First reboot of omero* instances after updates may fail #71

Open

sbesson approved these changes Apr 19, 2018

View reviewed changes

sbesson merged commit 7ce8da3 into IDR:master Apr 19, 2018

manics deleted the reboot-in-preinstall branch April 19, 2018 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

manics commented Apr 16, 2018 •

edited

Loading

joshmoore commented Apr 17, 2018

manics commented Apr 17, 2018

joshmoore commented Apr 17, 2018

joshmoore commented Apr 17, 2018

joshmoore commented Apr 17, 2018

sbesson left a comment

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

Conversation

manics commented Apr 16, 2018 • edited Loading

joshmoore commented Apr 17, 2018

manics commented Apr 17, 2018

joshmoore commented Apr 17, 2018

joshmoore commented Apr 17, 2018

joshmoore commented Apr 17, 2018

sbesson left a comment

Choose a reason for hiding this comment

manics commented Apr 16, 2018 •

edited

Loading