OAK-9576: Multithreaded download synchronization issues #383

averma21 · 2021-10-05T05:04:40Z

No description provided.

* Fixing a problem with test

* Fixing synchronization issues * Fixing OOM issue * Adding delay between download retries

thomasmueller · 2021-10-07T09:50:11Z

...ain/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/DefaultMemoryManager.java

                "if memory drop below {} GB (max {})", pool.getName(), minMemoryBytes/ONE_GB, humanReadableByteCount(maxMemory));
        pool.setCollectionUsageThreshold(minMemoryBytes);
-        checkMemory(usage);
+        // todo - should we check and block in the beginning? This creates problem in case of download resume.


Why does it create problems?

I don't see a purpose of checking the memory during the beginning. There is nothing to do in this case. The only thing we could do is not to proceed further, but such kind of check could be added even before this stage.

This one creates a problem during resume, in case memory is low due to previous run. It won't have any registered clients who still need to dump data (since previous run task objects would be unreachable) and checkMemory waits for clients to dump data. So the process blocks.

Ok, in this case could you remove the TODO and write e.g. "We don't check memory here, as there is no good way here to free up memory in case it is low."?

thomasmueller · 2021-10-07T10:00:01Z

...ain/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/DefaultMemoryManager.java

-                            .from(cd);
-                    checkMemory(info.getUsage());
+                    synchronized (sufficientMemory) {
+                        if (sufficientMemory.get()) {


sufficientMemory is AtomicBoolean. I don't see why you would want to synchronize on it. Is the problem that you want to protect against concurrent calls on checkMemory? If yes, then checkMemory should be synchronized instead.

As per the current code, we don't want to call checkMemory again, if sufficientMemory is already false, hence this approach.

Hm this not clear... I don't currently see an answer to the question "why you would want to synchronize on it"

thomasmueller · 2021-10-07T10:01:15Z

...ain/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/StoreAndSortStrategy.java

+            File storeFile = writeToStore(storeDir, getStoreFileName());
+            return sortStoreFile(storeFile);
+        } finally {
+            nodeStates.close();


It sounds weird that createSortedStoreFile has a side effect of closing the nodeStates. Why?

yes, this looks weird since this class is not the owner of nodeStates. This has been done to prevent OOM for multi threaded download. We need to keep on closing the nodeStates as the download tasks keep on finishing. This is mainly for TraverseAndSortTask (tasks created for parallel download) but due to code flow same pattern had to be followed here.
We could improve this but some more refactoring would be needed.

Oh I see you mean that particular method - createSortedStoreFile.
I will see if we can add close method as you suggested.

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

thomasmueller · 2021-10-07T10:04:40Z

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

+                memoryManager.deregisterClient(registrationID);
+            }
+            try {
+                nodeStates.close();


Here again we close the nodeStates.... why here? What about having a separate close() method in this class?

see #383 (comment)

thomasmueller · 2021-10-07T10:08:46Z

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

        try (BufferedWriter writer = FlatFileStoreUtils.createWriter(newtmpfile, compressionEnabled)) {
+            // no concurrency issue with this traversal because addition to this list is only done in #addEntry which, for
+            // a given TraverseAndSortTask object will only be called from same thread
            for (NodeStateHolder h : entryBatch) {


Ah, no, here we don't synchronize on entryBatch! That's inconsistent synchronization, and can cause big problems I think. (The way around it would be to clone entryBatch... but again, I would prefer if we don't need any synchronization).

...java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseWithSortStrategy.java

thomasmueller · 2021-10-07T10:10:34Z

...java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseWithSortStrategy.java

+            writeToSortedFiles();
+            return sortStoreFile();
+        } finally {
+            nodeStates.close();


Here again, closing the nodeStates as a side effect of another method... I find it weird.

...c/test/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileStoreTest.java

* Using linkedlist in tasks for freeing memory early * Dumping if data is greater than one MB

* Closing node state entry traversors using try with

...src/main/java/org/apache/jackrabbit/oak/index/indexer/document/DocumentStoreIndexerBase.java

...java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileNodeStoreBuilder.java

...he/jackrabbit/oak/index/indexer/document/flatfile/MultithreadedTraverseWithSortStrategy.java

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

* Incorporating some feedback from review comments

thomasmueller · 2021-10-27T12:28:04Z

...ain/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/DefaultMemoryManager.java

                "if memory drop below {} GB (max {})", pool.getName(), minMemoryBytes/ONE_GB, humanReadableByteCount(maxMemory));
        pool.setCollectionUsageThreshold(minMemoryBytes);
-        checkMemory(usage);
+        // todo - should we check and block in the beginning? This creates problem in case of download resume.


Ok, in this case could you remove the TODO and write e.g. "We don't check memory here, as there is no good way here to free up memory in case it is low."?

thomasmueller · 2021-10-27T12:30:48Z

...ain/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/DefaultMemoryManager.java

-                            .from(cd);
-                    checkMemory(info.getUsage());
+                    synchronized (sufficientMemory) {
+                        if (sufficientMemory.get()) {


Hm this not clear... I don't currently see an answer to the question "why you would want to synchronize on it"

thomasmueller · 2021-10-27T12:33:30Z

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

+            try {
+                nodeStates.close();
+            } catch (IOException e) {
+                log.error("{} could not close NodeStateEntryTraverser", taskID);


Could you log the stack trace as well? Just add ", e" to the log.error call.

thomasmueller · 2021-10-27T12:36:29Z

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

        //Holder line consist only of json and not 'path|json'
        NodeStateHolder h = new StateInBytesHolder(e.getPath(), jsonText);
-        entryBatch.add(h);
+        synchronized (this) {


I don't understand the exact use for synchronization... which data structure needs to be protected from concurrent operations?

...main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/TraverseAndSortTask.java

* Replacing explicit synchronization with atomic operations

* Using same memory manager across retries

* Moving retry delay to exception block

amrverma added 4 commits September 16, 2021 17:36

OAK-9576 - Multithreaded download synchronization issues

6211dc6

* Fixing a problem with test

Merge branch 'trunk' into OAK-9576

ddf6fab

OAK-9576: Multithreaded download synchronization issues

29e8ef7

* Fixing synchronization issues * Fixing OOM issue * Adding delay between download retries

Merge branch 'trunk' into OAK-9576

64b9042

thomasmueller requested changes Oct 7, 2021

View reviewed changes

amrverma added 3 commits October 20, 2021 19:05

OAK-9576: Multithreaded download synchronization issues

68857aa

* Using linkedlist in tasks for freeing memory early * Dumping if data is greater than one MB

OAK-9576: Multithreaded download synchronization issues

2e8d089

* Closing node state entry traversors using try with

trivial - removing unused object

2cf0b30

averma21 requested review from fabriziofortino, nit0906, thomasmueller and tihom88 October 25, 2021 12:27

fabriziofortino requested changes Oct 26, 2021

View reviewed changes

OAK-9576: Multithreaded download synchronization issues

022844a

* Incorporating some feedback from review comments

averma21 requested a review from fabriziofortino October 27, 2021 06:03

fabriziofortino approved these changes Oct 27, 2021

View reviewed changes

averma21 added the indexing label Oct 27, 2021

thomasmueller requested changes Oct 27, 2021

View reviewed changes

amrverma added 5 commits October 28, 2021 13:45

OAK-9576: Multithreaded download synchronization issues

29b8a1e

* Replacing explicit synchronization with atomic operations

OAK-9576: Multithreaded download synchronization issues

9b80957

* Using same memory manager across retries

trivial - removing unwanted method

c711be6

OAK-9576: Multithreaded download synchronization issues

c51f707

* Moving retry delay to exception block

trivial - correcting variable name

ddb2a0d

thomasmueller self-requested a review November 30, 2021 15:16

thomasmueller approved these changes Nov 30, 2021

View reviewed changes

averma21 merged commit c04aff5 into apache:trunk Dec 1, 2021

OAK-9576: Multithreaded download synchronization issues #383

OAK-9576: Multithreaded download synchronization issues #383

Uh oh!

Conversation

averma21 commented Oct 5, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants