New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshots are taking more place even if no changes happened #8119
Comments
Is it possible that some primary shards for this index switched between snapshots? |
Cluster is healthy, so I think no. Look at Do I understand this correctly? If shard is snapshotted, then replica becomes primary, then snapshot is taken again, then procedure is repeated 10 times, then snapshot is going to be 10 times bigger? What if replica is rebuild from primary (previous replica is totally lost). Let me know if I could add more info to help you with this issue. |
We copy only files that changed since the last snapshot. If you have a replica that was never synched with primary and primary went down, it's possible to have another copy but it will stop there. So, it can explain 2x difference not 10x difference in size. Could you send us these two files:
Do you know which version of elasticsearch you had when the index statistics-20140108 was created? |
2x is better than 10x, but can we do better? Is there an api to sync replicas with primaries on byte level so even if replica becomes primary snapshot is noop? https://gist.github.com/bobrik/c7ab1e0df88f0585f274 here are the files you requested. Elasticsearch version was from 0.90.x line, but those specific indices were probably restored from snapshot on 1.3.2. Actually it looks like all problematic indices were restored from snapshot with renaming and good indices weren't restored at all (they were here since 0.90.x). That should help. |
Which version of elasticsearch did you use to restore these indices from snapshot? |
1.3.2 was used for snapshot and restore. Cluster is 1.3.4 since the day of release, in all comments above you should assume 1.3.4 instead of 1.3.2. Current snapshots are made on 1.3.4. |
So, index was created with 0.90, upgraded to 1.3.2, then you created snapshot with 1.3.2, restored this index while renaming in 1.3.2, now you are creating snapshots with 1.3.4 and they are duplicated. Is this correct description? Could you also send us files |
Yes, this looks correct, but index was upgraded from 0.90 to 1.3.2 with many intermediate versions (1.0, 1.1, 1.2 hold those indices too). Here are the files from the root of repository: https://gist.github.com/bobrik/d1deb9239c59db998f24 |
I see. One last piece of information (hopefully). Could you also post these two files:
|
They are the same: {"statistics-20140108":{"version":8,"state":"open","settings":{"index.number_of_replicas":"1","index.version.created":"900999","index.number_of_shards":"5","index.uuid":"7OLwrzjOSFemAoXS1XB2qg","index.codec.bloom.load":"false"},"mappings":[{"markers":{"_all":{"enabled":false},"properties":{"@message":{"type":"string"},"@timestamp":{"type":"date","format":"dateOptionalTime"}}}},{"precise":{"_all":{"enabled":false},"_routing":{"required":true,"path":"@key"},"properties":{"@key":{"type":"string","index":"not_analyzed"},"@precise":{"type":"double"},"@timestamp":{"type":"date","format":"dateOptionalTime"}}}},{"events":{"_all":{"enabled":false},"_routing":{"required":true,"path":"@key"},"properties":{"---":{"type":"long"},"@key":{"type":"string","index":"not_analyzed"},"@timestamp":{"type":"date","format":"dateOptionalTime"},"@value":{"type":"long"},"ad":{"type":"string"},"age":{"type":"long"},"app":{"type":"string","index":"not_analyzed"},"cit":{"type":"string","index":"not_analyzed"},"cnt":{"type":"string","index":"not_analyzed"},"con":{"type":"string","index":"not_analyzed"},"cor":{"type":"long"},"cvn":{"type":"string","index":"not_analyzed"},"lng":{"type":"string","index":"not_analyzed"},"mob":{"type":"long"},"mtd":{"type":"string","index":"not_analyzed"},"nic":{"type":"long"},"nov":{"type":"long"},"plc":{"type":"string","index":"not_analyzed"},"plt":{"type":"string","index":"not_analyzed"},"pwr":{"type":"string","index":"not_analyzed"},"ref":{"type":"string","index":"not_analyzed"},"sbs":{"type":"long"},"sex":{"type":"long"},"spc":{"type":"long"},"spl":{"type":"string","index":"not_analyzed"},"tag":{"type":"string","index":"not_analyzed"},"tgt":{"type":"string","index":"not_analyzed"},"trs":{"type":"string","index":"not_analyzed"},"val":{"type":"string","index":"not_analyzed"},"wsh":{"type":"string"}}}}],"aliases":{}}} All of them:
|
@bobrik I was able to reproduce the issue. It turned out that cleanup process in v1.3.0+ at the end of restore mistakenly deletes information about legacy checksums (checksums for segments created with old version of elasticsearch). As a result, consecutive snapshots don't store the checksum in snapshot metadata and have to fallback to creating copies of these old segments again and again. To reproduce this issue:
|
Great! Any thoughts about release where the fix will land? What would happen with fix? Just final snapshot with checksums or something else? |
It's going to land in 1.3.6 and 1.4.1. The fix is not going to restore checksums for old segments restored with elasticsearch v.1.3.0-1.3.5 though. You will need to restore indices with such segments again in v1.3.6+ or upgrade them to the new version using upgrade api. |
…estore This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change. Fixes elastic#8119
…estore This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change. Fixes #8119
…estore This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change. Fixes #8119
…estore This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change. Fixes #8119
…estore This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change. Fixes elastic#8119
…estore This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change. Fixes elastic#8119
I took 2 snapshots for read-only indices with curator and some indices were snapshotted again even though they didn't have any changes.
Look at the first backup (50 oldest indices):
And the subsequent backup, same indices:
Segments are here:
Those indices should be roughly the same size in snapshot.
I also took dir diff from subsequent backups:
Cluster consists of 5 nodes on 1.3.2.
cc @imotov
The text was updated successfully, but these errors were encountered: