Skip to content

Delete snapshot api complains that there is no space left on device #26730

Closed
@ppf2

Description

@ppf2

Reproducible on 5.5.

  • Create a node with a file system repository path pointed to a mount for a small volume.
  • Create a snapshot repository to this mount path.
  • Create a few snapshots.
  • Then use dd or other means to generate a large file in that volume that causes it to use all space.
  • Use the delete snapshot api to delete the snapshot. The api call will fail complaining that there is no space left on the device.
{"error":{"root_cause":[{"type":"repository_exception","reason":"[backup] failed to delete snapshot [snapshot4/fv_jj1JOQMilPvL4jHjExQ]"}],"type":"repository_exception","reason":"[backup] failed to delete snapshot [snapshot4/fv_jj1JOQMilPvL4jHjExQ]","caused_by":{"type":"i_o_exception","reason":"No space left on device"}},"status":500}

The only indication of a delete snapshot call actually creating files is that we appear to be creating a new "index generation" within the snapshot metadata store.

[2017-09-21T03:46:42,183][DEBUG][o.e.s.SnapshotsService   ] [TwOmS-G] deleted snapshot is not running - deleting files
[2017-09-21T03:46:42,184][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] processing [delete snapshot]: took [3ms] done applying updated cluster_state (version: 7, uuid: VlTYntZNRdmhxQBSB0i8yQ)
[2017-09-21T03:46:42,200][DEBUG][o.e.r.f.FsRepository     ] [TwOmS-G] Repository [backup] writing new index generational blob [index-6]
[2017-09-21T03:46:42,202][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] processing [remove snapshot deletion metadata]: execute
[2017-09-21T03:46:42,202][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] cluster state updated, version [8], source [remove snapshot deletion metadata]
[2017-09-21T03:46:42,202][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] publishing cluster state version [8]
[2017-09-21T03:46:42,202][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] applying cluster state version 8
[2017-09-21T03:46:42,202][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] set local cluster state to version 8
[2017-09-21T03:46:42,206][WARN ][r.suppressed             ] path: /_snapshot/backup/snapshot4, params: {repository=backup, snapshot=snapshot4}
org.elasticsearch.repositories.RepositoryException: [backup] failed to delete snapshot [snapshot4/fv_jj1JOQMilPvL4jHjExQ]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:494) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$6(SnapshotsService.java:1263) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.5.0.jar:5.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.io.IOException: No space left on device
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
	at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
	at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
	at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
	at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_144]
	at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_144]
	at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_144]
	at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_144]
	at org.elasticsearch.common.io.Streams.copy(Streams.java:80) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.blobstore.fs.FsBlobContainer.writeBlob(FsBlobContainer.java:131) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeAtomic(BlobStoreRepository.java:953) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeIndexGen(BlobStoreRepository.java:842) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:441) ~[elasticsearch-5.5.0.jar:5.5.0]
	... 5 more
[2017-09-21T03:46:42,216][DEBUG][o.e.c.s.ClusterService   ] [TwOmS-G] processing [remove snapshot deletion metadata]: took [13ms] done applying updated cluster_state (version: 8, uuid: NByDoHeYRl2rCXeiPn2LTA)

This behavior seems counterintuitive. Naturally, when the repository path gets full, admins will attempt to delete older snapshots to so that snapshot will delete the segment files from the repository that are not referenced in any of the snapshots to reclaim space. Because the delete snapshot api actually ends up creating new files, they are not able to quickly recover space using our APIs. Are we somehow creating the new index generation in the snapshot metadata store first (before deleting the older index generation files?), instead of doing it in reverse order?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions