Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshot should work when cluster is in read_only mode. #8102

Closed
webmstr opened this issue Oct 15, 2014 · 11 comments
Closed

snapshot should work when cluster is in read_only mode. #8102

webmstr opened this issue Oct 15, 2014 · 11 comments
Assignees
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v2.0.0-beta1

Comments

@webmstr
Copy link

webmstr commented Oct 15, 2014

I was trying to make a full, consistent backup before an upgrade. Snapshots are at a moment of time, which doesn't work if clients are still updating your indexes.

I tried putting the cluster into read_only mode by setting cluster.blocks.read_only: true, but running a snapshot returned this error:

{"error":"ClusterBlockException[blocked by: [FORBIDDEN/6/cluster read-only (api)];]","status":403}

Please consider allowing snapshots to provide a consistent backup by running when in read-only mode.

@clintongormley
Copy link

@webmstr Snapshots are still moment in time while updates are happening. You don't need to lock anything. A snapshot will only backup the state of the index at the point that the backup starts, it won't take any later changes into account.

@webmstr
Copy link
Author

webmstr commented Oct 16, 2014

As I mentioned, snapshots - as currently implemented - are an unreasonable method of performing a consistent backup prior to an upgrade. This enhancement would have allowed that option.

Without the enhancement, snapshots should not be used before an upgrade, because the indexes may have been changed while the snapshot was running. As such, the upgrade documentation should be changed to not propose the use of snapshots as backups, and a "full" backup procedure should be documented in its place.

@clintongormley
Copy link

Out of interest, why don't you just stop writing to your cluster? Reopening for discussion.

@clintongormley
Copy link

@imotov what are your thoughts?

@webmstr
Copy link
Author

webmstr commented Oct 17, 2014

I could turn off logstash, but that's just one potential client. Someone could be curl'ing, or using an ES plugin (like head), etc. If you need a consistent backup, you have to disconnect and lock out the clients from the server side.

@imotov
Copy link
Contributor

imotov commented Oct 17, 2014

@clintongormley see #5876 I think this one is similar.

@clintongormley
Copy link

@imotov thanks, so setting index.blocks.write to true on all indices would be a reasonable workaround, at least until #5855 is resolved.

@saahn
Copy link

saahn commented Nov 12, 2014

@clintongormley Actually, I discovered that the index.blocks.write attribute only prevents writes to existing indices. If a client tries to create a new index, that request succeeds, which brings us back to the same problem. My workaround was to shutdown the proxy node though which our clients access our ES cluster.
I am running into the same issue as @webmstr , but for different reason: I cannot create a consistent backup for a restore to a secondary datacenter because each snapshot takes ~1 hour to complete and we cannot afford to block writes from our clients for such a long period of time.
I am still trying to root cause why snapshots are taking so long; the time required for snapshot completion increases with each snapshot. However, when i restore the same data to a new cluster, snapshotting that data to a new S3 bucket takes less than a minute.

EDIT: I may have a theory on why the snapshots were taking so long... i was taking a snapshot every two hours, and the s3 bucket has a LOT of snapshots now (49). I'm thinking that the calls the ES aws plugin makes to the S3 endpoint slow down over time as the number of snapshots increase.

Or may be it's just the number of snapshots that's causing the slowness...i.e. regardless of whether the backend repository is S3 or fs? I guess I should have an additional cron job that deletes older snaphots. Is there a good rule of thumb on the number of snapshots to retain?

@colings86
Copy link
Contributor

@imotov we discussed this issue but were unclear on what the differences are between the index.blocks.* options are and why the snapshot fails with read_only set to false?

@imotov
Copy link
Contributor

imotov commented Feb 20, 2015

@colings86 there is an ongoing effort to resolve this issue in #9203

tlrx added a commit to tlrx/elasticsearch that referenced this issue Apr 9, 2015
This commit splits the current ClusterBlockLevel.METADATA into two disctins ClusterBlockLevel.METADATA_READ and ClusterBlockLevel.METADATA_WRITE blocks. It allows to make a distinction between
an operation that modifies the index or cluster metadata and an operation that does not change any metadata.

Before this commit, many operations where blocked when the cluster was read-only: Cluster Stats, Get Mappings, Get Snapshot, Get Index Settings, etc. Now those operations are allowed even when
the cluster or the index is read-only.

Related to elastic#8102, elastic#2833

Closes elastic#3703
Closes elastic#5855
tlrx added a commit to tlrx/elasticsearch that referenced this issue Apr 10, 2015
This commit splits the current ClusterBlockLevel.METADATA into two disctins ClusterBlockLevel.METADATA_READ and ClusterBlockLevel.METADATA_WRITE blocks. It allows to make a distinction between
an operation that modifies the index or cluster metadata and an operation that does not change any metadata.

Before this commit, many operations where blocked when the cluster was read-only: Cluster Stats, Get Mappings, Get Snapshot, Get Index Settings, etc. Now those operations are allowed even when
the cluster or the index is read-only.

Related to elastic#8102, elastic#2833

Closes elastic#3703
Closes elastic#5855
Closes elastic#10521
Closes elastic#10522
@javanna javanna added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Apr 10, 2015
tlrx added a commit to tlrx/elasticsearch that referenced this issue Apr 23, 2015
This commit splits the current ClusterBlockLevel.METADATA into two disctins ClusterBlockLevel.METADATA_READ and ClusterBlockLevel.METADATA_WRITE blocks. It allows to make a distinction between
an operation that modifies the index or cluster metadata and an operation that does not change any metadata.

Before this commit, many operations where blocked when the cluster was read-only: Cluster Stats, Get Mappings, Get Snapshot, Get Index Settings, etc. Now those operations are allowed even when
the cluster or the index is read-only.

Related to elastic#8102, elastic#2833

Closes elastic#3703
Closes elastic#5855
Closes elastic#10521
Closes elastic#10522
tlrx added a commit that referenced this issue Apr 23, 2015
This commit splits the current ClusterBlockLevel.METADATA into two disctins ClusterBlockLevel.METADATA_READ and ClusterBlockLevel.METADATA_WRITE blocks. It allows to make a distinction between
an operation that modifies the index or cluster metadata and an operation that does not change any metadata.
Before this commit, many operations where blocked when the cluster was read-only: Cluster Stats, Get Mappings, Get Snapshot, Get Index Settings, etc. Now those operations are allowed even when
the cluster or the index is read-only.

Related to #8102

Closes #3703
Closes #5855
Closes #10521
Closes #10522
Closes #2833
@imotov
Copy link
Contributor

imotov commented May 19, 2015

After discussing this with @tlrx it looks like the best way to address this issue is by moving snapshot and restore cluster state elements from cluster metadata to a custom cluster element where it seems to belong (since information about currently running snapshot and restore hardly qualifies as metadata).

@s1monw s1monw added v1.6.1 and removed v1.6.0 labels Jun 3, 2015
@s1monw s1monw removed the v1.5.3 label Jun 3, 2015
imotov added a commit to imotov/elasticsearch that referenced this issue Jun 4, 2015
…rom custom metadata to custom cluster state part

Information about in-progress snapshot and restore processes is not really metadata and should be represented as a part of the cluster state similar to discovery nodes, routing table, and cluster blocks. Since in-progress snapshot and restore information is no longer part of metadata, this refactoring also enables us to handle cluster blocks in more consistent manner and allow creation of snapshots of a read-only cluster.

Closes elastic#8102
@imotov imotov assigned imotov and unassigned tlrx Jun 4, 2015
imotov added a commit to imotov/elasticsearch that referenced this issue Jun 11, 2015
…rom custom metadata to custom cluster state part

Information about in-progress snapshot and restore processes is not really metadata and should be represented as a part of the cluster state similar to discovery nodes, routing table, and cluster blocks. Since in-progress snapshot and restore information is no longer part of metadata, this refactoring also enables us to handle cluster blocks in more consistent manner and allow creation of snapshots of a read-only cluster.

Closes elastic#8102
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v2.0.0-beta1
Projects
None yet
Development

No branches or pull requests

8 participants