Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curator failing to delete old snapshots #37548

Closed
mojamal opened this issue Jan 16, 2019 · 6 comments
Closed

Curator failing to delete old snapshots #37548

mojamal opened this issue Jan 16, 2019 · 6 comments
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@mojamal
Copy link

mojamal commented Jan 16, 2019

Elasticsearch version (bin/elasticsearch --version):
$ /usr/share/elasticsearch/bin/elasticsearch -version
Version: 5.6.14, Build: f310fe9/2018-12-05T21:20:16.416Z, JVM: 1.8.0_191

Plugins installed: []
x-pack 5.6.14
repository-s3 5.6.14
kibana 5.6.14
cerebro - 0.8.1
license 5.x

JVM version (java -version):
$ java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

OS version (uname -a if on a Unix-like system):
$ uname -a
Linux ip-10-20-8-162 4.4.0-1057-aws #66-Ubuntu SMP Thu May 3 12:49:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"

Description : Curator fails to cleanup old snapshots.

A related github issue that has been closed is : #27061

Reproduce Snapshot Setup
Snapshots are configured as follows in crontab for Chef Name: Snapshot ES indices using Curator
30 1 * * * /usr/bin/curator --config /etc/logstash/curator_client_masteronly.yml /etc/logstash/curator_snapshot.yml && sh /etc/logstash/gotel_snapshot_checkin.sh

$ sudo cat /etc/logstash/curator_client_masteronly.yml
client:
hosts:
-
port: 9200
use_ssl: False
certificate: '/etc/ssl/certs/ROOT-CA.pem'
client_cert:
client_key:
aws_key:
aws_secret_key:
aws_region:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: True

logging:
loglevel: INFO
logfile: /mnt/log/elasticsearch/curator.log
logformat: default
blacklist: ['elasticsearch', 'urllib3']

$ sudo cat /etc/logstash/curator_snapshot.yml
actions:
1:
action: snapshot
description: >-
Snapshot selected indices to 'repository' with the snapshot name or name
pattern in 'name'. Use all other options as assigned
options:
repository: brepository
skip_repo_fs_check: False
# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
name:
wait_for_completion: True
max_wait: 7200
ignore_unavailable: False
include_global_state: True
partial: False
ignore_empty_list: True
filters:
- filtertype: pattern
kind: regex
value: .*
2:
action: delete_snapshots
description: >-
Delete snapshots from repostory: brepository older than 15 days.
options:
repository: brepository
disable_action: False
ignore_empty_list: True
timeout_override: 3600
filters:
- filtertype: pattern
kind: prefix
value: curator-
exclude:
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 15

Relevant Logs while triaging the failed snapshots :
Curator Error

2019-01-15 01:33:36,480 ERROR Failed to complete action: delete_snapshots. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(500, 'index_shard_snapshot_failed_exception', 'error deleting index file [pending-index-29] during cleanup')

ES Error
[2019-01-15T01:33:36,473][WARN ][r.suppressed ] path: /_snapshot/brepository/curator-20181231013002, params: {repository=brepository, snapshot=curator-20181231013002}[2019-01-15T01:33:36,473][WARN ][r.suppressed ] path: /_snapshot/brepository/curator-20181231013002, params: {repository=brepository, snapshot=curator-20181231013002}org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: error deleting index file [pending-index-29] during cleanup at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1149) ~[elasticsearch-5.6.14.jar:5.6.14] at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete(BlobStoreRepository.java:1114) ~[elasticsearch-5.6.14.jar:5.6.14] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete(BlobStoreRepository.java:1042) ~[elasticsearch-5.6.14.jar:5.6.14] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:467) ~[elasticsearch-5.6.14.jar:5.6.14] at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$6(SnapshotsService.java:1309) ~[elasticsearch-5.6.14.jar:5.6.14] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) ~[elasticsearch-5.6.14.jar:5.6.14] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]Caused by: java.nio.file.NoSuchFileException: Blob [pending-index-29] does not exist at org.elasticsearch.repositories.s3.S3BlobContainer.deleteBlob(S3BlobContainer.java:122) ~[?:?] at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1145) ~[elasticsearch-5.6.14.jar:5.6.14] ... 8 more

MAIN ERROR: Caused by: java.nio.file.NoSuchFileException: Blob [pending-index-29] does not exist at org.elasticsearch.repositories.s3.S3BlobContainer.deleteBlob(S3BlobContainer.java:122) ~[?:?] at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1145) ~[elasticsearch-5.6.14.jar:5.6.14]

@polyfractal polyfractal added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Jan 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jmorph99
Copy link
Contributor

Could this be a duplicate of
#30332
?

@mojamal
Copy link
Author

mojamal commented Jan 16, 2019 via email

@jmorph99
Copy link
Contributor

jmorph99 commented Jan 16, 2019

What version? It said in the pull that 5.6.5 may have fixed it.
Edit:
Never Mind, you are on 5.6.1. I'd suggest an upgrade.

@mojamal
Copy link
Author

mojamal commented Jan 16, 2019 via email

@mojamal
Copy link
Author

mojamal commented Jan 31, 2019

We are no longer able to reproduce the issue.

@mojamal mojamal closed this as completed Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

4 participants