Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

Merged
merged 1 commit into from
Apr 19, 2018

Conversation

manics
Copy link
Contributor

@manics manics commented Apr 16, 2018

This reduces but does not eliminate some long standing issues with race conditions during reboots. The original aim of this repository was to support a full deployment run-through from scratch, with reboots postponed to the very end. In practice this causes problems when installed services are unnecessarily interrupted, or when a service is installed against an old running kernel which changes after the reboot. Instead this PRs executes the reboot at the end of the pre-install stage, after system packages have been upgraded but before any applications are installed.

This will lead to the first full run-through failing after the preinstall stage since the combination of multiple VM reboots via a proxy and rebooting the proxy makes it too complicated to auto-detect when all VMs have resumed, however this is preferable to other problems caused later.

@joshmoore
Copy link
Member

The description here with planned failure seems at best like a workaround. Is deploy-idr.sh trying to do too much? Is there any 2 stage way of calling that would not fail?

@manics
Copy link
Contributor Author

manics commented Apr 17, 2018

deploy-idr.sh currently has 7 steps:

- 'galaxy': Install galaxy roles
- 'provision': Provision OpenStack instances, storage and networks
- 'network': Miscellaneous network configuration inside instances
- 'deploy-pre': Initialise instances, including updating packages
- 'deploy': Deploy the IDR
- 'deploy-apps': Deploy public IDR apps
- 'deploy-post': Additional setup including OMERO accounts and monitoring

We could remove all and recommend running the stages individually, or add additional groupings?

@joshmoore
Copy link
Member

I tested this locally by using:

$ deploy-idr.sh prod50 all  # failed for some unrelated issue
$ deploy-idr.sh prod50 expert deployment/ansible/idr-00-preinstall.yml

For me, the above which amounts to galaxy + provision + network + pre-install + the necessary reboot is a pretty good function for this script (though I'd still say in needs to be this repo). This amounts to "get me started" or bootstrap. The rest is pretty straight-forward on a playbook-by-playbook basis. e.g. I'm currently testing with this full.yml:


# Common usage:
# ansible-playbook playbooks/full.yml -u centos -e idr_environment=testXX -u centos

- hosts: localhost
  connection: local
  gather_facts: False
  pre_tasks:
   - fail: msg="Variable 'idr_environment' is not defined"
     when: idr_environment is not defined

- hosts: localhost
  connection: local
  gather_facts: False
  tasks:
   - include_vars:
       file: vars/os-idr-create-{{ idr_environment }}.yml

- import_playbook: ../deployment/ansible/idr-01-install-idr.yml
- import_playbook: ../deployment/ansible/idr-02-services.yml
- import_playbook: ../deployment/ansible/idr-03-postinstall.yml
- import_playbook: idr-links.yml
- import_playbook: idr-oneoff-steps.yml
- import_playbook: ../deployment/ansible/idr-09-monitoring.yml
- import_playbook: notify-slack.yml

@joshmoore
Copy link
Member

Relaunched travis with gh-109

@joshmoore
Copy link
Member

Now green with travis in addition to being run against prod50 as outlined above.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all the ongoing work and the existing issues, I would propose to merge with the caveats discussed above. My understanding is that there is no way to safely get a single script deploying everything in one run with the guarantee that the system will be usable and that we have to evolve towards a multi-stage deployment:

boostrap (galaxy + provision + network + upgrade + reboot)
deploy (core + vae + ftp depending on the flages)

More discussion will be necessary about the future of the deploy-idr.sh script discussed above. If we are splitting the single script into self-contained group of phase, an additional thought is whether we should handle the decoupled components via separate standalone deployment phases (instead of using flags) i.e.:

boostrap (galaxy + provision + network + upgrade + reboot)
deploy_core
deploy_ftp
deploy_vae

@sbesson sbesson merged commit 7ce8da3 into IDR:master Apr 19, 2018
@manics manics deleted the reboot-in-preinstall branch April 19, 2018 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants