New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Make persists concurrent with adding rows in batch ingestion #11536

Merged

clintropolis merged 21 commits into apache:master from loquisgon:bound-perf

Sep 8, 2021

loquisgon commented Aug 3, 2021 •

edited

Loading

Recent testing have shown a potential performance degradation in the bounded memory work for segment creation of batch ingestion. The reason is most probably the fact that intermediate persists are blocking, serial, with ingestion. We will add back concurrent persists and test performance to validate that previous performance is back.

Changes:
Intermediate persists are made concurrent with adding rows in the batch appenderator.

We are also interested in supporting three main modes for the batch appenderator, this PR is implementing also the following requirements:

Fix the performance regression by making persists & push concurrent
Have a rollback flag called batchProcessingMode. This configuration setting will be able to take three string values: LEGACY, CLOSED_SEGMENTS, CLOSED_SEGMENTS_SINKS (see table below). The default value will be: CLOSED_SEGMENTS. Note that most existing tests will exercise the default but there are specific tests for the other two modes (see configuration docs to know what each of these mean).
Update docs to document the new processing mode

This PR has:

[ X] been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
[X ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
[X ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.


          Make persists concurrent with ingestion

34626b2

asdf2014 added Area - Batch Ingestion Performance labels

loquisgon changed the title ~~Make persists concurrent with ingestion~~ Make persists concurrent with adding rows in batch ingestion

jihoonson added the Bug label


          Remove semaphore but keep concurrent persists (with add) and add push…

d4d699f

… in the backround as well

Author

loquisgon commented Aug 4, 2021

I replaced the semaphore using the style of concurrency used in the appenderator. Now persists and push are ran in the background.

jihoonson reviewed

View reviewed changes

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java Outdated

@@ @@ -158,6 +169,11 @@ @@
                   maxBytesTuningConfig = tuningConfig.getMaxBytesInMemoryOrDefault();
                   skipBytesInMemoryOverheadCheck = tuningConfig.isSkipBytesInMemoryOverheadCheck();
+                  if (tuningConfig.getMaxPendingPersists() < 1) {
+                    maxPendingPersists = DEFAULT_PENDING_PERSISTS;

Contributor

jihoonson Aug 6, 2021

What is the rationale for the default of 2? The previous default was 0 which is infinite. I don't think we ever need to change this in production. The doc for maxPendingPersists was not updated in #11294, so whatever we change here, we should fix the doc too.

Author

loquisgon Aug 10, 2021 •

edited

Loading

I see it is already documented in external docs that its default is zero. Good catch thanks!

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java Outdated

+                private void initializeExecutors()
+                {
+                  log.info("There will be up to[%d] pending persists", maxPendingPersists);

Contributor

jihoonson Aug 6, 2021

How could this be useful except for debugging? If this is only useful for debugging, it should be not info.

Author

loquisgon Aug 10, 2021 •

edited

Loading

Changed to debug in next commit

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

    
                /**

                 * The following sinks metadata map and associated class are the way to retain metadata now that sinks

                 * are being completely removed from memory after each incremental persist.

                 */

                private final Map<SegmentIdWithShardSpec, SinkMetadata> sinksMetadata = new HashMap<>();

                private final ConcurrentHashMap<SegmentIdWithShardSpec, SinkMetadata> sinksMetadata = new ConcurrentHashMap<>();

Contributor

jihoonson Aug 6, 2021

Please document details of the concurrent access pattern.

Author

loquisgon Aug 11, 2021

Added comments about the usage for this in the javadoc (next commit)

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java Outdated

    
                // This variable updated in add(), persist(), and drop()

                private int rowsCurrentlyInMemory = 0;

                private final AtomicInteger rowsCurrentlyInMemory = new AtomicInteger();

Contributor

jihoonson Aug 6, 2021

What threads can access this and bytesCurrentlyInMemory concurrently? If there are any, please document details of the concurrent access pattern.

Author

loquisgon Aug 10, 2021

Removed unnecessary atomics in next committ.

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java Outdated

		private volatile Throwable persistError;


		private final ConcurrentHashMap<SegmentIdWithShardSpec, Sink> sinks = new ConcurrentHashMap<>();

Contributor

jihoonson Aug 6, 2021

Can multiple threads access this map at the same time? I don't see any unless I'm missing something. If there are any, please document details of the concurrent access pattern. It helps people a lot including reviewers, other developers, and your future-self to understand and remember how things work. Also please check out https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md. We have reasons for including it in the PR template. I would highly suggest reading it and marking the concurrency self-review item in your PR checklist.

Author

loquisgon Aug 11, 2021

No concurrency control required for this map, removed in next commit.

server/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTest.java Outdated

@@ @@ -806,7 +831,8 @@ public void testTotalRowCount() throws Exception @@
                     appenderator.add(IDENTIFIERS.get(2), createInputRow("2001", "bob", 1), null);
                     Assert.assertEquals(4, appenderator.getTotalRowCount());
-                    appenderator.persistAll(null).get();
+                    appenderator.persistAll(null);
+                    waitForPersists();

Contributor

jihoonson Aug 6, 2021

Why not persistAll(null).get()?

Author

loquisgon Aug 10, 2021

Typo --- fixed in next commit.

server/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTest.java Outdated

                   );
                 }
+                private void waitForPersists() throws InterruptedException
+                {
+                  Thread.sleep(500);

Contributor

jihoonson Aug 6, 2021

Sleeps are bad. These will make the unit testing slower. Also, I bet all the sleeps you added will make these tests quite flaky.

Author

loquisgon Aug 10, 2021

The use case here is somehow wait for a concurrent task to check on its effect, it seems like a legitimate use for sleep. Let me know what you are thinking that could be better.

Author

loquisgon Aug 10, 2021

oh I see your suggestion below...

server/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTest.java Outdated

                     appenderator.startJob();
                     appenderator.add(IDENTIFIERS.get(0), createInputRow("2000", "foo", 1), null);
+                    waitForPersists();

Contributor

jihoonson Aug 6, 2021

I think you could keep the future of persist triggered in add() in a variable. Then you can add a method used only for testing that returns the persist future. Then you can finally wait for the future to be done instead of sleeping.

Author

loquisgon Aug 10, 2021

I was trying to avoid more changes to the class (exposing methods etc) but I can pursue your suggestion

Contributor

jihoonson commented Aug 6, 2021

BTW, I assumed that the core logic is not changed and reviewed only the concurrency part. Let me know if this is not true.

Author

loquisgon commented Aug 10, 2021

You are correct, @jihoonson, core logic is not changed.

Agustin Gonzalez added 7 commits

August 10, 2021 11:32


          Go back to documented default persists (zero)

09f05a1


          Move to debug

d9268ff


          Remove unnecessary Atomics

a237c09


          Comments on synchronization (or not) for sinks & sinkMetadata


          Some cleanup for unit tests but they still need further work

0d036dd


          Shutdown & wait for persists and push on close

2ecc515


          Merge remote-tracking branch 'origin/master' into bound-perf

507c2f6

loquisgon closed this

loquisgon reopened this

Author

loquisgon commented Sep 2, 2021 •

edited

Loading

Changes I added in the latest commit (507c2f6):

Make sure close() waits for persists & pushes before closing
Merged master branch to pick up fixes to a previous issue (serialization/deserialization of JSON interval when it contains UTC timestamps)
Renamed BatchAppenderator#clearSinkMetadata to BatchAppenderator#clearSinkMemoryCountersAndDiskStoredData to make intent clearer


          Provide support for three existing batch appenderators using batchPro…

b3c45b1

…cessingMode flag

loquisgon commented

View reviewed changes

.../java/org/apache/druid/segment/realtime/appenderator/UnifiedIndexerAppenderatorsManager.java Outdated

+                        DatasourceBundle::new
+                    );
+                    Appenderator appenderator = Appenderators.createLegacyOffline(

Author

loquisgon Sep 3, 2021

This is wrong, it should be createClosedSegmentsOffline, I will fix it

Author

loquisgon Sep 3, 2021

Done


          Fix reference to wrong appenderator

37e0856

lgtm-com bot commented Sep 4, 2021

This pull request introduces 1 alert when merging 37e0856 into 59d2578 - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression


          Fix doc typos

291420e

lgtm-com bot commented Sep 6, 2021

This pull request introduces 1 alert when merging 291420e into 60efbb5 - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression


          Add BatchAppenderators class test coverage

1b1fd7b

lgtm-com bot commented Sep 7, 2021

This pull request introduces 1 alert when merging 1b1fd7b into 60efbb5 - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression

jon-wei reviewed

View reviewed changes

indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java Outdated

@@ @@ -43,6 +44,15 @@ @@
                     "org.apache.hadoop:hadoop-client:2.8.5"
                 );
+                public enum BatchProcesingMode

Contributor

jon-wei Sep 7, 2021

BatchProcesingMode -> BatchProcessingMode

imply-jhan approved these changes

View reviewed changes

docs/configuration/index.md Outdated

@@ @@ -1343,7 +1343,8 @@ Additional peon configs include: @@
               |`druid.peon.mode`|Choices are "local" and "remote". Setting this to local means you intend to run the peon as a standalone process (Not recommended).|remote|
               |`druid.indexer.task.baseDir`|Base temporary working directory.|`System.getProperty("java.io.tmpdir")`|
               |`druid.indexer.task.baseTaskDir`|Base temporary working directory for tasks.|`${druid.indexer.task.baseDir}/persistent/task`|
-              |`druid.indexer.task.useLegacyBatchProcessing`|If false, native batch ingestion will use a new, recommended, code path with memory optimized code for the segment creation phase. If true it will use the previous code path for the create segments phase of batch ingestion. This does not apply to streaming ingestion, just to batch. This setting should only be used when a bug is suspected or found in the new optimized batch ingestion code. If a bug is suspected or found, you can set this flag to `true` to fall back to previous, working but more memory intensive, code path.|`false`|
+              |`druid.indexer.task.batchMemoryMappedIndex`|DEPRECATED: Use `druid.indexer.task.batchProcessingMode` instead. If false, native batch ingestion will use a new, recommended, code path with memory optimized code for the segment creation phase. If true it will use the previous code path for the create segments phase of batch ingestion. This does not apply to streaming ingestion, just to batch. This setting should only be used when a bug is suspected or found in the new optimized batch ingestion code. If a bug is suspected or found, you can set this flag to `true` to fall back to previous, working but more memory intensive, code path.|`false`|

Contributor

imply-jhan Sep 7, 2021

Maybe call out that set batchMemoryMappedIndex to true will set batchProcessingMode to LEGACY and overwrite the batchProcessingMode value.

indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java

+                  } else {
+                    // batchProcessingMode input string is invalid, just use the default...log message somewhere???
+                    this.batchProcessingMode = BatchProcesingMode.CLOSED_SEGMENTS; // Default
+                  }

Contributor

imply-jhan Sep 7, 2021

Can we log the batchProcessingMode value after this if block?

Author

loquisgon Sep 8, 2021

yeah

clintropolis added this to the 0.22.0 milestone


          Add log message to batchProcessingMode final value, fix typo in enum …

1f27790

…name

jon-wei approved these changes

View reviewed changes


          Another typo and minor fix to log message

ed5121f

clintropolis mentioned this pull request

[Draft] 0.22.0 Release Notes #11657

Closed

Agustin Gonzalez added 2 commits

September 7, 2021 18:44


          LEGACY->OPEN_SEGMENTS, Edit docs

2ad9654


          Minor update legacy->open segments log message

c85a201

clintropolis approved these changes

View reviewed changes

indexing-service/src/test/java/org/apache/druid/indexing/common/task/IngestionTestBase.java Outdated

@@ @@ -314,7 +314,8 @@ public File getTaskReportsFile() @@
                       );
                       final TaskToolbox box = new TaskToolbox(
-                          new TaskConfig(null, null, null, null, null, false, null, null, null, false, false),
+                          new TaskConfig(null, null, null, null, null, false, null, null, null, false, false,
+                                         TaskConfig.BatchProcessingMode.CLOSED_SEGMENTS.name()),

Member

clintropolis Sep 8, 2021

nit: should this use default instead of this explicit value?

Author

loquisgon Sep 8, 2021

done in next

server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorImpl.java

+              /**
+               * This class is to support LEGACY and CLOSED_SEGMENTS appenderators. It is copied as-is
+               * from 0.21 and it is meant to keep for backward compatibility. For now though this class

Member

clintropolis Sep 8, 2021

nit: this isn't actually a copy of 0.21... its of some intermediary state between 0.21 and 0.22. I think StreamAppenderator is technically a copy of ApenderatorImpl in 0.21.

Author

loquisgon Sep 8, 2021

Updated in next

...rg/apache/druid/segment/realtime/appenderator/LegacyAndClosedSegmentsAppenderatorTester.java Outdated

+              import java.util.Map;
+              import java.util.concurrent.CopyOnWriteArrayList;
+              public class LegacyAndClosedSegmentsAppenderatorTester implements AutoCloseable

Member

clintropolis Sep 8, 2021

super-nit: OpenAndClosed... instead of LegacyAndClosed

Author

loquisgon Sep 8, 2021

yeah

.../druid/segment/realtime/appenderator/LegacyAndClosedSegmentsBatchAppenderatorDriverTest.java Outdated

+              import java.util.function.Function;
+              import java.util.stream.Collectors;
+              public class LegacyAndClosedSegmentsBatchAppenderatorDriverTest extends EasyMockSupport

Member

clintropolis Sep 8, 2021

same super nit replacing Legacy with Open

Author

loquisgon Sep 8, 2021

check

...apache/druid/segment/realtime/appenderator/LegacyAndClosedSegmentsBatchAppenderatorTest.java Outdated

+              import java.util.List;
+              import java.util.stream.Collectors;
+              public class LegacyAndClosedSegmentsBatchAppenderatorTest extends InitializedNullHandlingTest

Member

clintropolis Sep 8, 2021

same nit legacy/open

Author

loquisgon Sep 8, 2021

check

lgtm-com bot commented Sep 8, 2021

This pull request introduces 1 alert when merging c85a201 into dcee99d - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression


          More code comments, mostly small adjustments to naming etc

86973a1

lgtm-com bot commented Sep 8, 2021

This pull request introduces 1 alert when merging 86973a1 into dcee99d - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression

clintropolis reviewed

View reviewed changes

docs/configuration/index.md Outdated Show resolved Hide resolved


          fix spelling

bd83203

lgtm-com bot commented Sep 8, 2021

This pull request introduces 1 alert when merging bd83203 into dcee99d - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression


          Exclude BtachAppenderators from Jacoco since it is fully tested but J…

7f004f1

…acoco still refuses to ack coverage

lgtm-com bot commented Sep 8, 2021

This pull request introduces 1 alert when merging 7f004f1 into dcee99d - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression


          Coverage for Appenderators & BatchAppenderators, name change of a met…

d97615f

…hod that was still using "legacy" rather than "openSegments"

lgtm-com bot commented Sep 8, 2021

This pull request introduces 1 alert when merging d97615f into dcee99d - view on LGTM.com

new alerts:

1 for Uncontrolled data used in path expression

clintropolis merged commit 9efa6cc into apache:master

clintropolis added a commit to clintropolis/druid that referenced this pull request


          Make persists concurrent with adding rows in batch ingestion (apache#…

ddefd9e

…11536)

* Make persists concurrent with ingestion

* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well

* Go back to documented default persists (zero)

* Move to debug

* Remove unnecessary Atomics

* Comments on synchronization (or not) for sinks & sinkMetadata

* Some cleanup for unit tests but they still need further work

* Shutdown & wait for persists and push on close

* Provide support for three existing batch appenderators using batchProcessingMode flag

* Fix reference to wrong appenderator

* Fix doc typos

* Add BatchAppenderators class test coverage

* Add log message to batchProcessingMode final value, fix typo in enum name

* Another typo and minor fix to log message

* LEGACY->OPEN_SEGMENTS, Edit docs

* Minor update legacy->open segments log message

* More code comments, mostly small adjustments to naming etc

* fix spelling

* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage

* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

clintropolis mentioned this pull request

[Backport] Make persists concurrent with adding rows in batch ingestion #11679

Merged

clintropolis added a commit that referenced this pull request


          Make persists concurrent with adding rows in batch ingestion (#11536) (…

a5e827d

…#11679)

* Make persists concurrent with ingestion

* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well

* Go back to documented default persists (zero)

* Move to debug

* Remove unnecessary Atomics

* Comments on synchronization (or not) for sinks & sinkMetadata

* Some cleanup for unit tests but they still need further work

* Shutdown & wait for persists and push on close

* Provide support for three existing batch appenderators using batchProcessingMode flag

* Fix reference to wrong appenderator

* Fix doc typos

* Add BatchAppenderators class test coverage

* Add log message to batchProcessingMode final value, fix typo in enum name

* Another typo and minor fix to log message

* LEGACY->OPEN_SEGMENTS, Edit docs

* Minor update legacy->open segments log message

* More code comments, mostly small adjustments to naming etc

* fix spelling

* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage

* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

Co-authored-by: Agustin Gonzalez <agustin.gonzalez@imply.io>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment