Disk snapshots dissapearing #2687

tkald · 2018-12-05T18:57:40Z

Description
Somewhere after upgrade to One 5.6 my vm disk snapshots on Ceph storage started to misbehave.
Previous disk snapshots taken with one 5.4 are overwritten by new snapshots - snapshot index at least starts with zero index angain. Sometimes also double snapshot indexes.

Snapshots taken with one 5.2 disappear completely from sunstone if I try to make new snapshot with one 5.6
Before taking a new disk snapshot:

And after

Also I'm unable to take any new disk snapshots on that disk image.

Listing rbd disks snapshots on ceph cluster shows that snapshots are still present

rbd snap ls one/one-93
SNAPID NAME     SIZE
    81 0    24576 MB
   922 1    24576 MB
   923 2    24576 MB

VM log also shows that one is trying to overwrite snapshots:

Wed Dec 5 20:21:19 2018 [Z0][VM][I]: New LCM state is DISK_SNAPSHOT
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: Command execution failed (exit code: 17): /var/lib/one/remotes/tm/ceph/snap_create_live r620-4:/var/lib/one//datastores/100/1587/disk.0 0 1587 101
Wed Dec 5 20:21:21 2018 [Z0][VMM][E]: snap_create_live: Command " set -e -o pipefail
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: 
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: if virsh -c qemu:///system domfsfreeze one-1587 ; then
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: trap "virsh -c qemu:///system domfsthaw one-1587" EXIT TERM INT HUP
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: fi
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: 
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: RBD="rbd --id libvirt"
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: 
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: rbd_check_2 one/one-93
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: 
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: rbd --id libvirt snap create one/one-93@0
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: rbd --id libvirt snap protect one/one-93@0" failed: rbd: failed to create snapshot: (17) File exists
Wed Dec 5 20:21:21 2018 [Z0][VMM][E]: Error creating snapshot one/one-93@0
Wed Dec 5 20:21:21 2018 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_snap_create_live.
Wed Dec 5 20:21:21 2018 [Z0][VMM][E]: Error creating new disk snapshot: Error creating snapshot one/one-93@0
Wed Dec 5 20:21:21 2018 [Z0][VM][I]: New LCM state is RUNNING
Wed Dec 5 20:21:21 2018 [Z0][LCM][E]: Could not take disk snapshot.
Wed Dec 5 20:23:16 2018 [Z0][VM][I]: New state is ACTIVE
Wed Dec 5 20:23:16 2018 [Z0][VM][I]: New LCM state is DISK_SNAPSHOT
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: Command execution failed (exit code: 17): /var/lib/one/remotes/tm/ceph/snap_create_live r620-4:/var/lib/one//datastores/100/1587/disk.0 1 1587 101
Wed Dec 5 20:23:18 2018 [Z0][VMM][E]: snap_create_live: Command " set -e -o pipefail
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: 
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: if virsh -c qemu:///system domfsfreeze one-1587 ; then
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: trap "virsh -c qemu:///system domfsthaw one-1587" EXIT TERM INT HUP
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: fi
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: 
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: RBD="rbd --id libvirt"
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: 
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: rbd_check_2 one/one-93
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: 
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: rbd --id libvirt snap create one/one-93@1
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: rbd --id libvirt snap protect one/one-93@1" failed: rbd: failed to create snapshot: (17) File exists
Wed Dec 5 20:23:18 2018 [Z0][VMM][E]: Error creating snapshot one/one-93@1
Wed Dec 5 20:23:18 2018 [Z0][VMM][I]: Failed to execute transfer manager driver operation: tm_snap_create_live.
Wed Dec 5 20:23:18 2018 [Z0][VMM][E]: Error creating new disk snapshot: Error creating snapshot one/one-93@1
Wed Dec 5 20:23:18 2018 [Z0][VM][I]: New LCM state is RUNNING
Wed Dec 5 20:23:18 2018 [Z0][LCM][E]: Could not take disk snapshot.

To Reproduce
Steps to reproduce the behavior.

Expected behavior
Proper snapshots are taken

Details

Affected Component: [Sunstone, Storage]
Hypervisor: [KVM on ubuntu 16.04]
Version: [5.6.2]

Additional context
Ceph version 10.2.11

Progress Status

Branch created
Code committed to development branch
Testing - QA
Documentation
Release notes - resolved issues, compatibility, known issues
Code committed to upstream release/hotfix branches
Documentation committed to upstream release/hotfix branches

The text was updated successfully, but these errors were encountered:

tkald · 2018-12-05T19:06:44Z

UPDATE:
4th snapshot try succeeded

rbd snap ls one/one-93
SNAPID NAME     SIZE
    81 0    24576 MB
   922 1    24576 MB
   923 2    24576 MB
  1613 3    24576 MB

But still no old snapshots visible on sunstone.

Snapshots taken with one 5.4 where double indexes are show (1st picture of original post). There's no old snapshots on ceph cluster any more:

rbd snap ls one/one-339
SNAPID NAME     SIZE
  1612 0    81920 MB

vholer · 2018-12-13T11:25:57Z

In 5.4, there was a problem that next snapshot ID was calculated always as a maximum+1 from the current list of snapshots, which lead to reusing the snapshot IDs (if some of them were deleted from the tail and OpenNebula was restarted in the meanwhile). As part of the fix #2189 we have started using a persistent value from NEXT_SNAPSHOT (id), which is now part of the VM / image templates.

On database upgrade, the NEXT_SNAPSHOT (id) should be calculated from the list of existing snapshots. But, it looks to me there is a wrong condition check in the database migrator and the calculated NEXT_SNAPSHOT isn't persisted into the templates. This leads to reusing the snapshot IDs again from 0 for all existing VM.

one/src/onedb/local/5.4.1_to_5.5.80.rb

Lines 236 to 243 in 48664d4

    
           sxml = doc.xpath("//SNAPSHOTS") 
        
           if !sxml 
        
               ns = doc.create_element("NEXT_SNAPSHOT") 
        
               ns.content = next_snapshot 
        
               sxml = sxml.first.add_child(ns)

Just a quick note, I might be wrong.

vholer · 2018-12-18T15:50:42Z

We have:

database fix when upgrading to 5.7.80 to be merged into master B #2687: Disk snapshots dissapearing #2739
documentation B #2687: Disk snapshots dissapearing docs#448 update into one-5.6 and one-5.6-maintenance with workarounds for 5.6 users
TBD - inform users in the forum

rsmontero assigned vholer Dec 11, 2018

rsmontero added Status: Pending Community Category: Drivers - Storage Type: Bug labels Dec 11, 2018

rsmontero added this to the Release 5.8 milestone Dec 11, 2018

vholer pushed a commit that referenced this issue Dec 13, 2018

B #2687: Disk snapshots dissapearing

9e291a8

vholer pushed a commit that referenced this issue Dec 17, 2018

B #2687: Disk snapshots dissapearing

ff9af01

vholer pushed a commit that referenced this issue Dec 17, 2018

B #2687: Disk snapshots dissapearing

954f6be

vholer pushed a commit that referenced this issue Dec 18, 2018

B #2687: Disk snapshots dissapearing

a38939c

vholer pushed a commit that referenced this issue Dec 18, 2018

B #2687: Disk snapshots dissapearing

5ff8b14

rsmontero pushed a commit that referenced this issue Dec 19, 2018

B #2687: Disk snapshots dissapearing (#2739)

7618a5c

rsmontero closed this as completed Dec 19, 2018

rsmontero pushed a commit that referenced this issue Jul 26, 2023

fixup! F #6265: Fix recovery_snap_create_live (#2659) (#2687)

e18d1b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk snapshots dissapearing #2687

Disk snapshots dissapearing #2687

tkald commented Dec 5, 2018 •

edited by vholer

tkald commented Dec 5, 2018 •

edited

vholer commented Dec 13, 2018

vholer commented Dec 18, 2018

Disk snapshots dissapearing #2687

Disk snapshots dissapearing #2687

Comments

tkald commented Dec 5, 2018 • edited by vholer

Progress Status

tkald commented Dec 5, 2018 • edited

vholer commented Dec 13, 2018

vholer commented Dec 18, 2018

tkald commented Dec 5, 2018 •

edited by vholer

tkald commented Dec 5, 2018 •

edited