New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shadow replicas on shared filesystems #9727
Conversation
Conflicts: src/main/java/org/elasticsearch/gateway/GatewayMetaState.java
Also adds a nocommit
make flush to a refresh factor our ShadowIndexShard to have IndexShard be idential to the master and least intrusive cleanup abstractions
Conflicts: src/main/java/org/elasticsearch/index/engine/Engine.java
…dler that skip most phases and enforces shard closing on the soruce before the target opens it's engine
// immediately return | ||
if (IndexMetaData.isIndexUsingShadowReplicas(indexMetaData.settings())) { | ||
// this doesn't replicate mappings changes, so can fail if mappings are not predefined | ||
// It was successful on the replica, although we never actually executed - in the future we will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarify this statement? cause mapping updates do get replicated, it just takes longer since it needs to head to the master and then published to the replicas, so there is a delay in mapping introduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clarify this comment
* with these settings allocates it's shards on a shared filesystem. Otherwise <code>false</code>. The default | ||
* setting for this is the returned value from {@link #isIndexUsingShadowReplicas(org.elasticsearch.common.settings.Settings)}. | ||
*/ | ||
public static boolean usesSharedFilesystem(Settings settings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the same method structure between this method and the following? I like isIndex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll rename.
@s1monw I added a |
|
||
/** Return true if a shadow engine should be used */ | ||
protected boolean useShadowEngine() { | ||
return primary == false && settings.getAsBoolean(IndexMetaData.SETTING_SHADOW_REPLICAS, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use the IndexMetaData#isIndexUsingShadowReplicas
help method here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I will change this to use the helper
left really minor comments, it looks great. One note, should we mention in the docs the second phase tasks, like do primary promotion without failing an engine? If so, I would also add a task that on get, we automatically set the "go to primary" flag if shadow replica is used and realtime get is used? |
left two more comments other than that LGTM |
Pushed more commits hooking up the |
indexMeta != null && // and we have the index | ||
IndexMetaData.isIndexUsingShadowReplicas(indexMeta.settings())) { // and the index uses shadow replicas | ||
// set the preference for the request to use "_primary" automatically | ||
request.request().preference("_primary"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use org.elasticsearch.cluster.routing.operation.plain.Preference.PRIMARY.type()
here instead?
left on minor comment! LGTM feel free to push! |
[[indices-shadow-replicas]] | ||
== Shadow replica indices | ||
|
||
experimental[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
pushed to 1.x and master! |
These commits add the shadow replicas feature for use on shared filesystems
(it does not include segment replication for non-shared filesystems yet).
If we assume that the data in the index path will already be shared across
multiple nodes, we can create and index with shadow replicas, where each replica
shard simply contains an
IndexReader
that periodically refreshes to pick upnew segments.
All indexing operations will be executed on the primary shard, and will not be
replicated to each replica, since the data will be replicated in a different
way.
During this phase, creating an index with
index.shadow_replicas: true
andnumber_of_replicas
greater than 0 will cause operations not to undergoreplication to replica shards. An index can have either regular replicas or
shadow replicas; they are mutually exclusive for an index. The
index.shadow_replicas
setting is set at index creation time and cannot bechanged dynamically.
The Elasticsearch cluster will still detect the loss of a primary shard, and
transform the replica into a primary in this situation. This transformation will
take slightly longer, since no
IndexWriter
will be maintained for each shadowreplica.
In order to ensure the data is being synchronized in a fast enough manner, The
user will need to tune the flush threshold for the index to a desired number. A
flush is needed to fsync segment files to disk, so they will be visible to all
other replica nodes. Users should test what flush threshold levels they are
comfortable with, as increased flushing can impact indexing performance. This
testing can be performed at any time, there is no need to wait for this feature
to be available first.
Once segments are available on the filesystem where the shadow replica resides,
a regular refresh (governed by the
index.refresh_interval
) can be used to makethe new data searchable.
See #8976 for the overall shadow replica plan