[IOTDB-860] Emend the async log applier #1635

neuyilan · 2020-08-19T11:46:52Z

No description provided.

jt2594838

The fix itself has a potential problem that when new logs keep coming in, it is very likely to time out.
Now I see you want to make sure when the snapshot is taken, all previous logs are applied. This side effect has been noticed but back to then, AsyncApplier is merely in the experimental stage. Since you have fixed one of the problems, I would like to give some pieces of advice to make AsyncApplier complete:

When starting to take a snapshot, record the current commitIndex.
Block until all logs whose indices <= the recorded commitIndex are applied. (Use RaftLogManager to do so instead of LogApplier)
Prevent the log cleaner thread to clean logs that are not applied.
Change StorageEngine.getInstance().syncCloseAllProcessor(); in takeSnapshot() to send a flush plan within the group. (So the file sequence will not be broken by snapshots)
When committed logs are recovered during start-up, re-apply all of them. (Notice that operation sequences in IoTDB are idempotent)

neuyilan · 2020-08-20T02:23:34Z

The fix itself has a potential problem that when new logs keep coming in, it is very likely to time out.
Now I see you want to make sure when the snapshot is taken, all previous logs are applied. This side effect has been noticed but back to then, AsyncApplier is merely in the experimental stage. Since you have fixed one of the problems, I would like to give some pieces of advice to make AsyncApplier complete:

When starting to take a snapshot, record the current commitIndex.

Block until all logs whose indices <= the recorded commitIndex are applied. (Use RaftLogManager to do so instead of LogApplier)

Prevent the log cleaner thread to clean logs that are not applied.

Change StorageEngine.getInstance().syncCloseAllProcessor(); in takeSnapshot() to send a flush plan within the group. (So the file sequence will not be broken by snapshots)

When committed logs are recovered during start-up, re-apply all of them. (Notice that operation sequences in IoTDB are idempotent)

Great suggestion, I think the 1-3 items you mentioned is to make sure that when do snapshot, new logs can not be added and the snapshot task should not take long time. however now the implementation can block the new logs coming in, as when do snapshot we should get the logManager lock, so new logs can not be committed(commited log also need to get the logManager lock ), we just need to wait all the previous committed log applied when do snapshot.

I'm going to rethink the 4-5 suggestions.

fix conflict

fix the UT async applier refine fix the async log applier bug fix conflict remove useless code resolve bug add UT fix conflicts

jt2594838 · 2020-08-28T01:27:07Z

...ter/src/main/java/org/apache/iotdb/cluster/log/manage/FilePartitionedSnapshotLogManager.java

+  public void syncFlushAllProcessor() {
+    logger.info("{}: Start flush all storage group processor in one data group", getName());
+    ConcurrentHashMap<String, StorageGroupProcessor> processorMap = StorageEngine.getInstance()
+        .getProcessorMap();
+    if (processorMap.size() == 0) {
+      logger.info("{}: no need to flush processor", getName());
+      return;
+    }
+    List<Path> storageGroups = new ArrayList<>();
+    for (Map.Entry<String, StorageGroupProcessor> entry : processorMap.entrySet()) {
+      Path path = new Path(entry.getKey());
+      storageGroups.add(path);
+    }
+    FlushPlan plan = new FlushPlan(null, true, storageGroups);
+    dataGroupMember.flushFileWhenDoSnapshot(plan);


I am afraid that we cannot flush the whole storage group when time partitioning is enabled. Because in that case, a storage group will be managed by several data groups, if you flush one storage group without notifying other data groups, the file integrity of other data groups will be broken.
So you should either flush other related groups (which is rather hard to find), or only flush partitions that belong to the data group.

Sure, thanks for your advice, I'd like to only flush the partitions that belong to the data group

jt2594838 added the Module - Cluster PRs for the cluster module label Aug 20, 2020

jt2594838 requested changes Aug 20, 2020

View reviewed changes

neuyilan changed the title ~~[Distributed] fix async applier bug when do snapshpot~~ [Distributed] fix async applier bug when do snapshpot[in progress] Aug 20, 2020

neuyilan force-pushed the apache_cluster_new_0819_async_applier branch 2 times, most recently from b82d6cc to 8cf4003 Compare August 21, 2020 12:29

neuyilan changed the title ~~[Distributed] fix async applier bug when do snapshpot[in progress]~~ [Distributed] remend async applier Aug 24, 2020

neuyilan changed the title ~~[Distributed] remend async applier~~ [Distributed] emend async applier Aug 24, 2020

neuyilan changed the title ~~[Distributed] emend async applier~~ [Distributed] emend async applier[in progress] Aug 24, 2020

neuyilan changed the title ~~[Distributed] emend async applier[in progress]~~ [Distributed] emend async applier Aug 25, 2020

neuyilan requested a review from jt2594838 August 25, 2020 11:53

neuyilan changed the title ~~[Distributed] emend async applier~~ [Distributed] Emend the async applier Aug 25, 2020

neuyilan changed the title ~~[Distributed] Emend the async applier~~ [Distributed] Emend the async applier[in progress] Aug 26, 2020

neuyilan marked this pull request as draft August 26, 2020 02:01

neuyilan changed the title ~~[Distributed] Emend the async applier[in progress]~~ [Distributed] Emend the async applier Aug 26, 2020

neuyilan marked this pull request as ready for review August 26, 2020 02:44

neuyilan added 14 commits August 27, 2020 13:39

async applier refine

487a467

fix the async log applier bug

52ff188

fix conflict

add UT

6602e8e

fix the UT async applier refine fix the async log applier bug fix conflict remove useless code resolve bug add UT fix conflicts

remove useless thread pool

f794745

support flush plan when do snapshot

0d2fbbe

refine syncCloseOneProcessor method in StorageGroupProcessor

ae1540f

change the log msg

ecb1a52

fix the license

1101cfb

fix the check applir log index

3833718

fix checkAppliedLogIndex

7d1fa0d

fix flushPlan bug

f1071bd

fix flushPlan bug

261dbfe

add more log message

4e8e25d

make

162edfb

neuyilan added 7 commits August 27, 2020 13:39

Delete the extra blank lines

9624981

fix flushplan serialize bug

7ae9f80

fix flushplan serialize bug

5187c10

add flushPlan test

e299193

Delete the extra blank lines

4027bba

move syncFlushAllProcessor out of synchronized (logmanager) in

94b5a10

make the applyEntries in one seperate thread

5c3f239

neuyilan force-pushed the apache_cluster_new_0819_async_applier branch from d639d8b to 5c3f239 Compare August 27, 2020 05:40

jt2594838 requested changes Aug 28, 2020

View reviewed changes

flush the partitions belong to the datagroup and fix some UT

5ac7bf3

neuyilan changed the title ~~[Distributed] Emend the async applier~~ [IOTDB-860] Emend the async log applier Sep 1, 2020

jt2594838 approved these changes Sep 4, 2020

View reviewed changes

jt2594838 merged commit 8fda671 into apache:cluster_new Sep 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IOTDB-860] Emend the async log applier #1635

[IOTDB-860] Emend the async log applier #1635

neuyilan commented Aug 19, 2020

jt2594838 left a comment •

edited

neuyilan commented Aug 20, 2020

jt2594838 Aug 28, 2020

neuyilan Aug 28, 2020

[IOTDB-860] Emend the async log applier #1635

[IOTDB-860] Emend the async log applier #1635

Conversation

neuyilan commented Aug 19, 2020

jt2594838 left a comment • edited

Choose a reason for hiding this comment

neuyilan commented Aug 20, 2020

jt2594838 Aug 28, 2020

Choose a reason for hiding this comment

neuyilan Aug 28, 2020

Choose a reason for hiding this comment

jt2594838 left a comment •

edited