Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD restart failure while upgrading from Luminous to Nautilus with openstack-ansible #4981

Closed
jftalta opened this issue Jan 23, 2020 · 4 comments

Comments

@jftalta
Copy link

jftalta commented Jan 23, 2020

Bug Report

What happened:
While upgrading OpenStack from Stein to Train and Ceph from Luminous to Nautilus, with OSA and thus ceph-ansible, OSDs fails to restart and systemd journal shows these error messages:

Jan 23 10:57:30 dm1osstorage1a ceph-osd[4250]: server name not found: [v2:xx.xxx.210.11:3300 (Name or service not known) Jan 23 10:57:30 dm1osstorage1a ceph-osd[4250]: unable to parse addrs in [v2:xx.xxx.210.11:3300,v1:xx.xxx.210.11:6789],[v2:xx.xxx.210.73:3300,v1:xx.xxx.210.73:6789],[v2:xx.xxx.210.196:3300,v1:xx.xxx.210.196:6789]'

Three OSD are running on three dedicated hosts, MON and MGR are deployed on the three OpenStack controller/infra nodes.

What I expected to happen:
OSDs upgraded to Nautilis, up and running.

How to reproduce it:
Upgrade and OpenStack/Ceph deployment from Stein to Train, with OSA 20.0.1
openstack-ansible ceph-install.yml

Share your group_vars files, inventory:

openstack_user_config.yml:

---
_infrastructure_hosts: &infrastructure_hosts
  dm1osinfra1a:
    ip: xx.xxx.210.200
  dm1osinfra1b:
    ip: xx.xxx.210.201
  dm1osinfra1c:
    ip: xx.xxx.210.202
ceph-osd_hosts:
  dm1osstorage1a:
    ip: xx.xxx.210.210
  dm1osstorage1b:
    ip: xx.xxx.210.211
  dm1osstorage1c:
    ip: xx.xxx.210.212
ceph-mon_hosts: *infrastructure_hosts

user_variables.yml:

## Ceph cluster fsid (must be generated before first run)
## Generate a uuid using: python -c 'import uuid; print(str(uuid.uuid4()))'
generate_fsid: false
fsid: 636ebc87-997d-4c59-96cf-eea3b4e3f506 # Replace with your generated UUID
## ceph-ansible settings
monitor_address_block: "{{ cidr_networks.container }}"
public_network: "{{ cidr_networks.container }}"
cluster_network: "{{ cidr_networks.storage }}"
devices:
  - /dev/sdb
osd_objectstore: bluestore
lvm_volumes:
  - data: /dev/sdb
osd_scenario: lvm
journal_size: 10240 # size in MB
openstack_config: true
cinder_ceph_client: cinder
glance_ceph_client: glance
glance_default_store: rbd
glance_rbd_store_pool: images
nova_libvirt_images_rbd_pool: vms
cinder_backends:
  RBD:
    volume_driver: cinder.volume.drivers.rbd.RBDDriver
    rbd_pool: volumes
    rbd_ceph_conf: /etc/ceph/ceph.conf
    rbd_store_chunk_size: 8
    volume_backend_name: rbddriver
    rbd_user: "{{ cinder_ceph_client }}"
    rbd_secret_uuid: "{{ cinder_ceph_client_uuid }}"
    report_discard_supported: true

Environment:

  • OS (e.g. from /etc/os-release): CentOS 7
  • Kernel (e.g. uname -a): 3.10.0-957.12.2.el7.x86_64
  • Docker version: n/a
  • Ansible version (e.g. ansible-playbook --version): 2.8.5
  • ceph-ansible version (e.g. git head or tag or stable branch): 1906865
  • Ceph version (e.g. ceph -v): upgrading from 12.2.12 to 14.2.6
@jftalta jftalta closed this as completed Jan 23, 2020
@jftalta jftalta reopened this Jan 23, 2020
@jftalta
Copy link
Author

jftalta commented Jan 23, 2020

Full ceph-ansible log file.
ceph-ansible.log

@dsavineau
Copy link
Contributor

I don't know about OSA and openstack-ansible ceph-install.yml but the upgrade process is completely different for the upgrade.
And we have a dedicated playbook for that [1]

AFAIK OSA doesn't use ceph-ansible playbooks but only roles.

[1] https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/rolling_update.yml

@dsavineau
Copy link
Contributor

To be more precise on your issue, the Luminous to Nautilus upgrade requires to enable msgr2 and update the ceph.conf file (with v2+v1 mon addressses) [1][2] after the services/packages upgrade while this isn't done with ceph-install.yml OSA playbook.

[1] https://github.com/ceph/ceph-ansible/blob/stable-4.0/infrastructure-playbooks/rolling_update.yml#L979-L997
[2] https://docs.ceph.com/docs/nautilus/releases/nautilus/#upgrading-from-mimic-or-luminous

@dsavineau
Copy link
Contributor

Closing as it's not an ceph-ansible issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants