Introduce Local checkpoints #15111

bleskes · 2015-11-30T14:04:58Z

This PR introduces the notion of a local checkpoint on the shard level. A local check point is defined as a the highest sequence number for which all previous operations (i.e. with a lower seq#) have been processed.

The current implementation is based on a fixed in memory bit array which is used in a round robin fashion. This introduces a limit to the spread between inflight indexing operation. We are still discussing options to work around this, but I think we should move forward toward a working system and optimize from there (and either remove this limitation or better understand it's implications).

relates to #10708

Every shard group in Elasticsearch has a selected copy called a primary. When a primary shard fails a new primary would be selected from the existing replica copies. This PR introduces `primary terms` to track the number of times this has happened. This will allow us, as follow up work and among other things, to identify operations that come from old stale primaries. It is also the first step in road towards sequence numbers. Relates to elastic#10708 Closes elastic#14062

Adds a counter to each write operation on a shard. This sequence numbers is indexed into lucene using doc values, for now (we will probably require indexing to support range searchers in the future). On top of this, primary term semantics are enforced and shards will refuse write operation coming from an older primary. Other notes: - The add SequenceServiceNumber is just a skeleton and will be replaced with much heavier one, once we have all the building blocks (i.e., checkpoints). - I completely ignored recovery - for this we will need checkpoints as well. - A new based class is introduced for all single doc write operations. This is handy to unify common logic (like toXContent). - For now, we don't use seq# as versioning. We could in the future. Relates to elastic#10708 Closes elastic#14651

* this is a temporary fix until a more permanent fix is done on master * During primary relocation, some operation can be done on the source primary but reach the target primary only after the relocation is completed. At the moment the new primary will have a new primary term and as such it will reject the operations from the old one, causing data loss. This changes relocations to move the source primary to a relocated state, prevent any new operations from happening on it and waits for ongoing operations to complete. Long we term we may also conisder not incrementing the primary term on relocation.

jasontedor · 2015-12-10T13:27:23Z

core/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java

+    /**
+     * Serializes a potential null value.
+     */
+    public <T extends StreamableReader<T>> T readOptionalStreamableReader(StreamableReader<T> streamableReader) throws IOException {


Why not just make SeqNoStats implement Streamable and just use the existing readOptionalStreamable and writeOptionalStreamable? That's consistent with most (all?) the existing stats objects.

we are trying to move to use Writeable and StreamableReader which allows us to construct the object while reading from the stream and use final members. That said, I think with Java8 only changes we have something like readOptionalStreamableReader here where we pass a "factory" method which takes the stream as an input and returns an object. This can typically be a public constructor. That means we can decouple writing from reading and have Writable not inherit from StreamableReader, which will save on a use less method and a prototype. @s1monw what are your thoughts here?

jasontedor · 2015-12-10T16:01:28Z

core/src/main/java/org/elasticsearch/index/seqno/LocalCheckpointService.java

+    public LocalCheckpointService(ShardId shardId, IndexSettings indexSettings) {
+        super(shardId, indexSettings);
+        indexLagThreshold = indexSettings.getSettings().getAsInt(SETTINGS_INDEX_LAG_THRESHOLD, DEFAULT_INDEX_LAG_THRESHOLD);
+        indexLagMaxWait = indexSettings.getSettings().getAsTime(SETTINGS_INDEX_LAG_MAX_WAIT, DEFAULT_INDEX_LAG_MAX_WAIT);


Validate these settings?

Change IndexShard counters for the new simplifies ReplicationAction

jasontedor · 2015-12-10T17:45:49Z

core/src/main/java/org/elasticsearch/index/seqno/LocalCheckpointService.java

+                do {
+                    // clear the flag as we are making it free for future operations. do se before we expose it
+                    // by moving the checkpoint
+                    processedSeqNo.clear(offset);


Maybe just clear a range of bits with FixedBitSet#(int, int) instead of clearing bit by bit? The implementation looks to be more efficient and would just require care around the offset wrapping.

bleskes · 2015-12-11T08:48:03Z

@jasontedor I pushed a new approach. Can you take another look?

bleskes · 2015-12-11T13:14:27Z

closing... github made this a mess.

bleskes added 20 commits October 21, 2015 18:19

Merge branch 'master' into feature/seq_no

9541fb0

Merge branch 'master' into feature/seq_no

245a402

Merge branch 'master' into feature/seq_no

57e36db

Merge branch 'master' into feature/seq_no

a91a6f6

Merge branch 'master' into feature/seq_no

1e5af7b

Merge branch 'master' into feature/seq_no

d68b810

Merge branch 'master' into feature/seq_no

1e67ef8

Merge branch 'master' into feature/seq_no

693641d

trace logging

bc87855

Disable RecoveryWhileUnderLoadIT for now

2e1e430

Merge branch 'master' into feature/seq_no

04be4e4

Merge branch 'master' into feature/seq_no

ee10c5f

Merge branch 'master' into feature/seq_no

ba80752

re enable RecoveryWhileUnderLoadIT now that elastic#14918 is merged.

8e1ef30

merge master

e2e38b3

merge master

8de24c2

merge from master

bffd55d

bleskes added review labels Nov 30, 2015

merge master

95e8a39

jasontedor reviewed Dec 10, 2015
View reviewed changes

Merge branch 'master' into feature/seq_no

68f1a87

Change IndexShard counters for the new simplifies ReplicationAction

jasontedor reviewed Dec 10, 2015
View reviewed changes

bleskes force-pushed the local_checkpoint branch from 6621b88 to f7f4c96 Compare December 10, 2015 22:26

bleskes added 7 commits December 11, 2015 09:16

merge for master

b836781

Initial implementation

30ad721

Add tests plus other changes

9aea462

Java Docs!

1ef8a0d

tweak

4edb7aa

Another checkpoint implementation

0b50630

feedback

7f49c1a

bleskes force-pushed the local_checkpoint branch from f7f4c96 to 7f49c1a Compare December 11, 2015 08:47

bleskes closed this Dec 11, 2015

bleskes mentioned this pull request Dec 11, 2015

Introduce Local checkpoints #15390

Closed

clintongormley added :Engine :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Local checkpoints #15111

Introduce Local checkpoints #15111

bleskes commented Nov 30, 2015

jasontedor Dec 10, 2015

bleskes Dec 11, 2015

jasontedor Dec 10, 2015

jasontedor Dec 10, 2015

bleskes commented Dec 11, 2015

bleskes commented Dec 11, 2015

Introduce Local checkpoints #15111

Introduce Local checkpoints #15111

Conversation

bleskes commented Nov 30, 2015

jasontedor Dec 10, 2015

Choose a reason for hiding this comment

bleskes Dec 11, 2015

Choose a reason for hiding this comment

jasontedor Dec 10, 2015

Choose a reason for hiding this comment

jasontedor Dec 10, 2015

Choose a reason for hiding this comment

bleskes commented Dec 11, 2015

bleskes commented Dec 11, 2015