Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster metadata files destroyed when using blob store gateway causing data loss #1564

Closed
alambert opened this issue Dec 22, 2011 · 3 comments

Comments

@alambert
Copy link
Contributor

I am using ElasticSearch v0.18.5 with the S3 shared storage gateway. Last night, I restarted my single-node cluster; when it came back up, all indexes were lost (no shards or metadata.)

The debug logs are below. ES shuts down at 21:36:16,783 and starts up at 21:36:45,830. Both before and after shutdown, there are multiple threads running Gateway#write()(elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/gateway/shared/SharedStorageGateway.java). In the blob storage implementation (elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/gateway/blobstore/BlobStoreGateway.java), after the metadata write is complete, all metadata files other than the one just written are deleted. Before the crash, the threads each write their own metadata file and then delete the others', which causes the cluster to start up with no metadata after the crash (causing index loss.) After the crash, the two threads write to the same metadata file.

root@aws-e1b-12:/md/elasticsearch/app/elasticsearch-0.18.5/logs# egrep "(to gateway|metadata%2F|metadata found|stopping|initializing)" dev.log.2011-12-21 | grep " 21:3" | head -26
[2011-12-21 21:34:13,468][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] writing to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@2547c6ef ...
[2011-12-21 21:34:13,471][INFO ][com.amazonaws.request    ] Sending Request: PUT http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-914 Headers: (Content-Length: 109057, Content-Type: application/octet-stream, ) 
[2011-12-21 21:34:13,636][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] writing to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@ab78020 ...
[2011-12-21 21:34:13,641][INFO ][com.amazonaws.request    ] Sending Request: PUT http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-914 Headers: (Content-Length: 109223, Content-Type: application/octet-stream, ) 
[2011-12-21 21:34:13,747][INFO ][com.amazonaws.request    ] Sending Request: DELETE http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-913 Headers: (Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 
[2011-12-21 21:34:13,769][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] wrote to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@ab78020, took 133ms
[2011-12-21 21:34:13,854][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] writing to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@5b8b8615 ...
[2011-12-21 21:34:13,857][INFO ][com.amazonaws.request    ] Sending Request: PUT http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-916 Headers: (Content-Length: 111257, Content-Type: application/octet-stream, ) 
[2011-12-21 21:34:14,026][INFO ][com.amazonaws.request    ] Sending Request: DELETE http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-914 Headers: (Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 
[2011-12-21 21:34:14,045][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] wrote to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@5b8b8615, took 191ms
[2011-12-21 21:34:15,451][INFO ][com.amazonaws.request    ] Sending Request: DELETE http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-916 Headers: (Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 
[2011-12-21 21:34:15,472][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] wrote to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@2547c6ef, took 2s
[2011-12-21 21:36:16,783][INFO ][node                     ] [aws-e1b-12.xxxxxxxx.net] {0.18.5}[2514]: stopping ...
[2011-12-21 21:36:45,830][INFO ][node                     ] [aws-e1b-12.xxxxxxxx.net] {0.18.5}[30336]: initializing ...
[2011-12-21 21:36:49,640][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] Latest metadata found at index [-1]
[2011-12-21 21:36:53,510][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] writing to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@2ef9748f ...
[2011-12-21 21:36:53,653][INFO ][com.amazonaws.request    ] Sending Request: PUT http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-0 Headers: (Content-Length: 43, Content-Type: application/octet-stream, ) 
[2011-12-21 21:36:53,776][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] wrote to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@2ef9748f, took 266ms
[2011-12-21 21:37:37,034][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] writing to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@161e14f0 ...
[2011-12-21 21:37:37,084][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] writing to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@b5a191e ...
[2011-12-21 21:37:37,775][INFO ][com.amazonaws.request    ] Sending Request: PUT http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-1 Headers: (Content-Length: 210, Content-Type: application/octet-stream, ) 
[2011-12-21 21:37:37,781][INFO ][com.amazonaws.request    ] Sending Request: PUT http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-1 Headers: (Content-Length: 376, Content-Type: application/octet-stream, ) 
[2011-12-21 21:37:37,861][INFO ][com.amazonaws.request    ] Sending Request: DELETE http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-0 Headers: (Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 
[2011-12-21 21:37:37,872][INFO ][com.amazonaws.request    ] Sending Request: DELETE http://xxxxxxxx-app-dev-es.s3.amazonaws.com /dev%2Fmetadata%2Fmetadata-0 Headers: (Content-Type: application/x-www-form-urlencoded; charset=utf-8, ) 
[2011-12-21 21:37:37,878][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] wrote to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@161e14f0, took 844ms
[2011-12-21 21:37:37,887][DEBUG][gateway.s3               ] [aws-e1b-12.xxxxxxxx.net] wrote to gateway org.elasticsearch.gateway.shared.SharedStorageGateway$2@b5a191e, took 802ms
root@aws-e1b-12:/md/elasticsearch/app/elasticsearch-0.18.5/logs# 

I can provide additional logs if needed. Thank you!

@kimchy
Copy link
Member

kimchy commented Dec 22, 2011

Yes, you are right, I will push a fix... (nice catch!).

@kimchy kimchy closed this as completed in 3d9e872 Dec 22, 2011
@kimchy
Copy link
Member

kimchy commented Dec 22, 2011

also, backported to 0.18 branch, if 0.18.7 is released, it will include it.

@alambert
Copy link
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants