Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) #11294

loquisgon · 2021-05-25T01:07:34Z

Fixes Issue #11231

Description

Refactor segment creation (and merge) phase of Appenderator for batch ingestion in such a way to decouple from streaming data structures and making memory growth bounded regardless of input file size. This PR are the changes to
implement the proposal Minimize memory utilization in Sinks/Hydrants for native batch ingestion.

The main changes were to separate AppenderatorImpl into StreamAppenderator (which is AppenderatorImpl unchanged) and BatchAppenderator. The latter is were the persistence of Sink and Firehydrant as described in the proposal is implemented. The rest of the changes were in associated interfaces and client code to mainly make the concurrency in the appenderator synchronous. The batch appenderator requires that push, persist, merge be synchronous in order to facilitate the handling of Sink and Firehydrant when they are persisted and restored. As the proposal indicates, future work is suggested to remove the concurrent APIs from BatchAppenderator (i.e any API that returns a ListenableFuture or a Future).

One area that we did not explore was to look into the memory mapping behavior impact when maxColumnsToMerge is set. This is also for future work.

Feature flag

The middle manager (i.e. Peon) configuration flag druid.indexer.task.useLegacyBatchProcessing when set to true will default to the older, known to be working code path. This is to have a fall back in place just in case there is some issue with the code in the PR after it is merged. We will remove this flag in the future. The default for this flag, false, is to use the new code (i.e. do not memory map Firehydrant indices as well as persist and clear from memory --but not from disk-- all Sink and Firehydrant at each intermediate persist )

Future work

Refactor `Appenderator` interface

See proposal for more about this

Study impact when `maxColumnsToMerge` is set

This setting also impacts the memory mapping of Firehydrant. Even though the references of the mappings are only
kept in local variables that can be garbage collected we should look into it at some point.

This PR has:

[X ] been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
[X ] added documentation for new or modified features or behaviors.
[X ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
[X ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
[X ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
[X ] been tested in a test Druid cluster.

…had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

…ally after sink has been merged

jihoonson · 2021-06-25T23:56:00Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+      metrics.incrementNumPersists();
+      metrics.incrementPersistTimeMillis(persistStopwatch.elapsed(TimeUnit.MILLISECONDS));
+      persistStopwatch.stop();
+    }

    final long startDelay = runExecStopwatch.elapsed(TimeUnit.MILLISECONDS);
    metrics.incrementPersistBackPressureMillis(startDelay);


This will report a wrong metric because there is no start delay now. I think we don't have to report it since we don't use the executor anymore. You can remove runExecStopwatch too.

jihoonson · 2021-06-25T23:59:20Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

@@ -764,7 +691,7 @@ private DataSegment mergeAndPush(
    SinkMetadata sm = sinksMetadata.get(identifier);
    if (sm == null) {
      log.warn("Sink metadata not found just before merge for identifier [%s]", identifier);
-    } else if (numHydrants != sinksMetadata.get(identifier).getNumHydrants()) {
+    } else if (numHydrants != sm.getNumHydrants()) {
      throw new ISE("Number of restored hydrants[%d] for identifier[%s] does not match expected value[%d]",
                    numHydrants, identifier, sinksMetadata.get(identifier).getNumHydrants());


This should use sm too because sinksMetadata.get(identifier) can return null if drop() is called for some reason after you get `sm above.

Suggested change

numHydrants, identifier, sinksMetadata.get(identifier).getNumHydrants());

numHydrants, identifier, sm.getNumHydrants());

jihoonson · 2021-06-26T00:11:33Z

...ervice/src/test/java/org/apache/druid/indexing/appenderator/BatchAppenderatorDriverTest.java

@@ -17,7 +17,7 @@
 * under the License.
 */

-package org.apache.druid.segment.realtime.appenderator;
+package org.apache.druid.indexing.appenderator;


The way the coverage bot works currently is running all tests and finding the lines and branches in the corresponding classes to those tests. One requirement is that the target class to test and its test class must be in the same package. So, I would suggest not moving this class if possible because you will need to move lots of other classes along with it.

Class BatchAppenderatorTester that must be moved also if we want to move the test classes to the server module is very difficult to move because it uses the constructor for class IndexTask.IndexTuningConfig which is not available in the server module. Moving that class will demand moving a lot of classes that do not make sense in the server module.

I am excluding the class again.. I see no easy way to moving them all to the same package. We are fighting against a tool (coverage) primarily, secondarily against module design (i.e. "server" has some batch stuff and it should only have "realtime" stuff). Resolving the latter, which takes non-trivial effort and should go in another ticket, will also resolve the former.

Removed class from exclusion after finding a way to move the test classes to the server module.

jihoonson · 2021-06-26T00:15:23Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BaseAppenderatorDriver.java

@@ -172,7 +172,7 @@ SegmentWithState getAppendingSegment()
  /**
   * Allocated segments for a sequence
   */
-  static class SegmentsForSequence
+  public static class SegmentsForSequence


BatchAppenderatorDriver is in the same package as BaseAppenderatorDriver. I assume you meant BatchAppenderatorDriverTest which is the class you moved to another package. As I said in my other comment, the class to test and its corresponding test class should be in the same package to help the test coverage bot. I suggested to not move the package of BatchAppenderatorDriverTest and thus you will not need to change this access modifier either. Same for other access modifier changes in this class.

jihoonson · 2021-06-26T00:19:30Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+      FireDepartmentMetrics metrics,
+      DataSegmentPusher dataSegmentPusher,
+      ObjectMapper objectMapper,
+      @Nullable SinkQuerySegmentWalker sinkQuerySegmentWalker,


This parameter must be always null per the argument checker below.. Can we just remove it?

jihoonson · 2021-06-26T00:20:43Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+   * This constructor allows the caller to provide its own SinkQuerySegmentWalker.
+   * <p>
+   * The sinkTimeline is set to the sink timeline of the provided SinkQuerySegmentWalker.
+   * If the SinkQuerySegmentWalker is null, a new sink timeline is initialized.
+   * <p>
+   * It is used by UnifiedIndexerAppenderatorsManager which allows queries on data associated with multiple
+   * Appenderators.


This javadoc is no longer correct. I think you can simply delete it.

jihoonson · 2021-06-26T00:30:58Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+        identifiersAndSinks = getIdentifierAndSinkForPersistedFile(identifier);
+      }
+      catch (IOException e) {
+        throw new ISE(e, "Failed to retrieve sinks for identifier", identifier);


Suggested change

throw new ISE(e, "Failed to retrieve sinks for identifier", identifier);

throw new ISE(e, "Failed to retrieve sinks for identifier[%s]", identifier);

jihoonson · 2021-06-26T00:31:17Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+        (dir, fileName) -> !(Ints.tryParse(fileName) == null)
+    );
+    if (sinkFiles == null) {
+      throw new ISE("Problem reading persisted sinks in path", identifierPath);


Suggested change

throw new ISE("Problem reading persisted sinks in path", identifierPath);

throw new ISE("Problem reading persisted sinks in path[%s]", identifierPath);

jihoonson · 2021-06-26T00:31:34Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+        // to pull it from there....
+        SinkMetadata sm = sinksMetadata.get(identifier);
+        if (sm == null) {
+          throw new ISE("Sink must not be null for identifier when persisting hydrant", identifier);


Suggested change

throw new ISE("Sink must not be null for identifier when persisting hydrant", identifier);

throw new ISE("Sink must not be null for identifier[%s] when persisting hydrant", identifier);

…ixed

…tSegments

lgtm-com · 2021-07-02T23:55:20Z

This pull request introduces 2 alerts when merging 62e4ac3 into a9c4b47 - view on LGTM.com

new alerts:

1 for Missing format argument
1 for Useless null check

jihoonson

LGTM overall, but please remove unnecessary code and fix the typo before this PR is merged.

jihoonson · 2021-07-06T17:59:03Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+
+    int numHydrants = 0;
+    for (FireHydrant hydrant : sink) {
+      synchronized (hydrant) {


No need for synchronization.

jihoonson · 2021-07-06T17:59:10Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+   */
+  private int persistHydrant(FireHydrant indexToPersist, SegmentIdWithShardSpec identifier)
+  {
+    synchronized (indexToPersist) {


No need for synchronization.

jihoonson · 2021-07-06T18:05:06Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+      // the invariant of exactly one, always swappable, sink with exactly one unpersisted hydrant must hold
+      int totalHydrantsForSink = hydrants.size();
+      if (totalHydrantsForSink != 1) {
+        throw new ISE("There should be only onw hydrant for identifier[%s] but there are[%s]",


Suggested change

throw new ISE("There should be only onw hydrant for identifier[%s] but there are[%s]",

throw new ISE("There should be only one hydrant for identifier[%s] but there are[%s]",

jihoonson · 2021-07-06T18:17:16Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+  private DataSegment mergeAndPush(
+      final SegmentIdWithShardSpec identifier,
+      final Sink sink,
+      final boolean useUniquePath


useUniquePath is always false so you can remove it.

jihoonson · 2021-07-07T00:43:01Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+  // copied from druid-indexing as is for testing since it is not accessible from server module,
+  // we could simplify since not all its functionality is being used
+  // but leaving as is, it could be useful later
+  private static class IndexTuningConfig implements AppenderatorConfig


Sorry I missed this change in my last review. Please clean up all the codes unused here in this class because there is no reason to keep them. It should really be a simple POJO. We can add some codes back if we really need them later.

Cleaned up but most of the code is setting defaults (when nulls are passed to the constructor) that are necessary for appenderator to function.

I don't agree that it's already simple enough. Why is it Jackson-serializable? It doesn't seem to be used in any test. Besides this, I also see lots of methods and parameters deprecated, not in use at all, or not used in any test. I can leave comments on them to help you identify them if you want. I don't think we should keep this code if the reason is just that we might need them in the future. It will be easy to add them back if we need, or we can do even better than now.

clintropolis

this change seems to be really held back by the baggage of using the Appenderator interface, which just really doesn't seem appropriate in the long term if we are going to have dedicated batch processing. The obvious things off the top of my head:

don't need to be a QuerySegmentWalker
maybe we don't need to persist everything when any hydrant overflows since there is no checkpointing, but could either only persist the full guy, or use some sort of threshold based persist to prevent tiny segments

But, since i think it should still be an improvement for batch processing, and doesn't seem to strictly make doing the more dramatic change any harder, i think it lgtm overall 👍

clintropolis · 2021-07-06T23:05:57Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+   * for functionality, depending in the field that is used. More info about the
+   * fields is annotated as comments in the class
+   */
+  private static class SinkMetadata


nit: maybe put inline class at end of file so its not in middle of private fields declaration

clintropolis · 2021-07-06T23:15:31Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+import java.util.concurrent.atomic.AtomicLong;
+import java.util.stream.Collectors;
+
+public class BatchAppenderator implements Appenderator


could you please add javadocs clearly documented the expected concurrency model of this thing? Comparing with the stream appenderator, things like persistHydrant are synchronized, but that method is not here, which makes me think it should be only called by a single thread.

However, there are things like concurrent maps and atomic integer counters in use, which makes me curious if there is some concurrency, and what might be affected. If concurrency is never expected, please remove the concurrent types because they are confusing.

Such a javadoc would save myself and any others from having to dig deep to try and trace this out for ourselves, so it would be helpful to clarify the concurrency model and if this diverges anywhere from the base Appenderator contractor or not.

Also, comparing side by side with StreamAppenderator, it looks like this class shares some lineage with it, it might be worth describing the differences in the javadocs here as well and linking to it (StreamingAppenderator would be nice too, but it didn't previously have javadocs so they are probably ok to add later...).

Added java doc and removed remaining concurrency constructs

clintropolis · 2021-07-06T23:45:10Z

docs/configuration/index.md

@@ -1334,7 +1334,7 @@ Additional peon configs include:
 |`druid.peon.mode`|Choices are "local" and "remote". Setting this to local means you intend to run the peon as a standalone process (Not recommended).|remote|
 |`druid.indexer.task.baseDir`|Base temporary working directory.|`System.getProperty("java.io.tmpdir")`|
 |`druid.indexer.task.baseTaskDir`|Base temporary working directory for tasks.|`${druid.indexer.task.baseDir}/persistent/task`|
-|`druid.indexer.task.batchMemoryMappedIndex`|If false, native batch ingestion will not map indexes thus saving heap space. This does not apply to streaming ingestion, just to batch. This setting should only be used when a bug is suspected or found in the new batch ingestion code that avoids memory mapping indices. If a bug is suspected or found, you can set this flag to `true` to fall back to previous, working but more memory intensive, code path.|`false`|
+|`druid.indexer.task.batchFallback`|If false, native batch ingestion will use memory optimized code. This does not apply to streaming ingestion, just to batch. This setting should only be used when a bug is suspected or found in the new optimized batch ingestion code. If a bug is suspected or found, you can set this flag to `true` to fall back to previous, working but more memory intensive, code path.|`false`|


I think this isn't a very intuitive name, how about something like druid.indexer.task.useDedicatedBatchProcessing, defaulting to true which is a hassle because it inverts the config.

If true, native batch ingestion will use dedicated, memory optimized processing. When set to false, native batch indexing will revert to its legacy mode, which shares the same code-path as streaming ingestion but has a higher memory footprint.

If you would rather not invert usages because I admit it might be sort of painful to change, I guess something like useLegacyBatchProcessing or similar could also work and still allow defaulting to false and also be more clear about the role of the config when encountered in the properties file

I intentionally left that variable with a vague name...if the previous that I just removed would have had a vague name like that then I could have just re-used it and maybe edit the description in the docs. I feel this name is fine (I accept it is vague--which is intentional) since it is really there for an exceptional situation. We want to get rid of this asap, potentially before the next open source release.

I'm making the 0.22 branch likely in the next week, so it seems to me that the flag must exist to provide a way to revert to 0.21 behavior for at least 1 release cycle I think. batchMemoryMappedIndex is not in 0.21, so has never been released.

Since this flag will be release, I still think I assert that it should have a better name, inverted or not, I don't see a good argument to leave it intentionally vague.

Renamed to useLegacyBatchProcessing

clintropolis · 2021-07-06T23:48:30Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+  public static final int ROUGH_OVERHEAD_PER_SINK = 5000;
+  // Rough estimate of memory footprint of empty FireHydrant based on actual heap dumps
+  public static final int ROUGH_OVERHEAD_PER_HYDRANT = 1000;


nit: these seem dupes of constants in StreamAppenderator, should it just use them directly?

I don't think we want to use stuff from StreamAppenderator but I could move them to the Appenderator interface

Left them as is since I don't think it is a good idea to start coupling back either to the interface or StreamAppenderator

It seems better to me to leave them separate since the overlord per sink and hydrant can be different from the stream ingestion in the future. Maybe even for now, since the sink can have up to only one hydrant before it is persisted, so memory pressure per sink could be different.

clintropolis · 2021-07-07T00:16:41Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java

+   * physical segments (i.e. hydrants) do not need to memory map their persisted
+   * files. In this case, the code will avoid memory mapping them thus ameliorating the occurance
+   * of OOMs.
+   */
  private final boolean isRealTime;


it seems like this should just be removed? Since this functionality hasn't been released, should it still be here? I guess it could be removed in a follow-up, but I guess it also means there is no way to turn this functionality off for batch tasks in case closing the segments in StreamAppenderator itself has a bug.

It is used by the fall back flag.

It is not really though, when the fallback is set to true, StreamAppenderator is made with isRealtime hard coded to false, instead of controlled by a flag as was introduced in #11123. This means there is no way to revert the behavior of that PR since it isn't operator controllable anymore. Rather than introduce a 2nd flag, I strongly think we should consider removing the isRealtime from StreamAppenderator before 0.22 since the behavior was previously unreleased and now there is no way to not use it (I don't think there is enough time to have no setting at all and always use BatchAppenderator, so that flag will still need to exist to choose between them).

clintropolis · 2021-07-07T00:23:09Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+import java.util.Objects;
+import java.util.concurrent.CopyOnWriteArrayList;
+
+public class BatchAppenderatorTester implements AutoCloseable


what is the difference between this and StreamAppenderatorTester, Appenderators.createRealtime vs Appenderators.createOffline and different tuning config? Also, does it need these json annotations?

I decided to create different tester classes for stream & batch so as not to couple them together. I will remove the json annotations.

clintropolis · 2021-07-07T00:27:44Z

server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java

+    // Memory footprint includes count integer in FireHydrant, shorts in ReferenceCountingSegment,
+    // Objects in SimpleQueryableIndex (such as SmooshedFileMapper, each ColumnHolder in column map, etc.)
+    int total;
+    total = Integer.BYTES + (4 * Short.BYTES) + ROUGH_OVERHEAD_PER_HYDRANT;


nit: seems like a lot of lines for a constant. if i understand correctly there is only 1 hydrant per sink, would it just make sense to factor this into the sink calculation?

Also the comment doesn't seem relevant and should be fixed or removed

…cts in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.)

jihoonson · 2021-07-08T20:31:41Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+    private final SegmentWriteOutMediumFactory segmentWriteOutMediumFactory;
+
+    @Nullable
+    private static PartitionsSpec getPartitionsSpec(


PartitionsSpec in this class is not in use any test. Please stop copying the business logic to the test. All tests must pass a proper partitionsSpec if they test partitioning-related behavior unless they verify the default partitionsSpec.

jihoonson · 2021-07-08T20:32:56Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+        Long maxBytesInMemory,
+        Boolean skipBytesInMemoryOverheadCheck,
+        Long maxTotalRows,
+        Integer rowFlushBoundary_forBackCompatibility,


Many parameters including this are deprecated in IndexTuningConfig. They only exist in IndexTuningConfig for compatibility. It doesn't seem reasonable to copy them to here.

Clean config as much as I could, take a look.

jihoonson · 2021-07-08T20:34:55Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+
+    @Deprecated
+    @Nullable
+    public Integer getNumShards()


Why do you want to add a deprecated method?

jihoonson · 2021-07-08T20:35:29Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+     */
+    @Nullable
+    @Override
+    @Deprecated


This method is not deprecated.

jihoonson · 2021-07-08T20:35:34Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+     */
+    @Override
+    @Nullable
+    @Deprecated


This method is not deprecated.

jihoonson · 2021-07-08T20:35:57Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+    }
+
+
+    public long getAwaitSegmentAvailabilityTimeoutMillis()


This method is not in use.

jihoonson · 2021-07-08T20:36:25Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+    public boolean isLogParseExceptions()
+    {
+      return logParseExceptions;
+    }
+
+    public int getMaxParseExceptions()
+    {
+      return maxParseExceptions;
+    }
+
+    public int getMaxSavedParseExceptions()
+    {
+      return maxSavedParseExceptions;
+    }


These methods are not in use.

jihoonson · 2021-07-08T20:36:34Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+      return maxPendingPersists;
+    }
+
+    public boolean isForceGuaranteedRollup()


This method is not in use.

jihoonson · 2021-07-08T20:37:00Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+      return partitionsSpec;
+    }
+
+    public PartitionsSpec getGivenOrDefaultPartitionsSpec()


This method is not in use.

jihoonson · 2021-07-08T20:41:15Z

...er/src/test/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderatorTester.java

+    }
+
+    @Deprecated
+    public List<String> getPartitionDimensions()


Same here. Why do you want to add a deprecated method?

clintropolis

lgtm after you get squared up with @jihoonson

jihoonson

Thanks for the cleanup. +1 after CI.

loquisgon force-pushed the bound-mem-dynamic branch 2 times, most recently from 77f4c49 to a3bd5dd Compare May 28, 2021 22:55

Agustin Gonzalez added 13 commits June 1, 2021 17:19

Bound memory in native batch ingest create segments

aea7754

Move BatchAppenderatorDriverTest to indexing service... note that we …

1558ce7

…had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

Remove sinks from memory and clean up intermediate persists dirs manu…

e3946fc

…ally after sink has been merged

Changed name from RealtimeAppenderator to StreamAppenderator

4e8d573

Style

b20582f

Incorporating tests from StreamAppenderatorTest

eec4f39

Keep totalRows and cleanup code

30e1ec3

Added missing dep

00922d7

Fix unit test

00dd32b

Checkstyle

864357d

allowIncrementalPersists should always be true for batch

2fa2a9d

Added sinks metadata

5b0157b

clear sinks metadata when closing appenderator

0312763

loquisgon force-pushed the bound-mem-dynamic branch from e423a99 to 0312763 Compare June 2, 2021 00:20

Agustin Gonzalez added 4 commits June 1, 2021 18:40

Style + minor edits to log msgs

e308d68

Update sinks metadata & totalRows when dropping a sink (segment)

e55c40e

Remove max

4bb61a7

Intelli-j check

dddd4e4

asdf2014 added the Area - Batch Ingestion label Jun 3, 2021

Agustin Gonzalez added 9 commits June 3, 2021 11:53

Keep a count of hydrants persisted by sink for sanity check before merge

19bdee0

Move out sanity

f1211f0

Add previous hydrant count to sink metadata

1f37d5d

Remove redundant field from SinkMetadata

e0c3e16

Remove unneeded functions

4fb1f47

Cleanup unused code

10264dc

Removed unused code

f74baaf

Remove unused field

34cffcf

Exclude it from jacoco because it is very hard to get branch coverage

fac1ca1

loquisgon reopened this Jun 25, 2021

jihoonson reviewed Jun 26, 2021

View reviewed changes

Agustin Gonzalez added 8 commits June 30, 2021 18:37

Code review comments

8077c01

Exclude class from coverage, will include again when packaging gets f…

2c71c7d

…ixed

Moved test classes to server module

ff5e2ed

More BatchAppenderator cleanup

34e1342

Fix bug in wrong counting of totalHydrants plus minor cleanup in add

dfcdf8f

Removed left over comments

3e7fcff

Have BatchAppenderator follow the Appenderator contract for push & ge…

d082d01

…tSegments

Merge branch 'master' into bound-mem-dynamic

62e4ac3

Fix LGTM violations

6704a25

jihoonson reviewed Jul 6, 2021

View reviewed changes

Agustin Gonzalez added 2 commits July 6, 2021 12:05

Review comments

3f274c1

Add stats after push is done

cea716a

jihoonson reviewed Jul 7, 2021

View reviewed changes

clintropolis reviewed Jul 7, 2021

View reviewed changes

Agustin Gonzalez added 3 commits July 7, 2021 19:28

Code review comments (cleanup, remove rest of synchronization constru…

1f267a8

…cts in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.)

Update javadocs

291039e

Add thread safety notice to BatchAppenderator

daf233b

jihoonson reviewed Jul 8, 2021

View reviewed changes

clintropolis approved these changes Jul 8, 2021

View reviewed changes

Agustin Gonzalez added 2 commits July 8, 2021 19:00

Further cleanup config

0dedc87

More config cleanup

effe090

jihoonson approved these changes Jul 9, 2021

View reviewed changes

jihoonson merged commit 7e61042 into apache:master Jul 9, 2021

jihoonson mentioned this pull request Aug 6, 2021

Make persists concurrent with adding rows in batch ingestion #11536

Merged

6 tasks

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

	numHydrants, identifier, sinksMetadata.get(identifier).getNumHydrants());
	numHydrants, identifier, sm.getNumHydrants());

	throw new ISE(e, "Failed to retrieve sinks for identifier", identifier);
	throw new ISE(e, "Failed to retrieve sinks for identifier[%s]", identifier);

	throw new ISE("Problem reading persisted sinks in path", identifierPath);
	throw new ISE("Problem reading persisted sinks in path[%s]", identifierPath);

	throw new ISE("Sink must not be null for identifier when persisting hydrant", identifier);
	throw new ISE("Sink must not be null for identifier[%s] when persisting hydrant", identifier);

	throw new ISE("There should be only onw hydrant for identifier[%s] but there are[%s]",
	throw new ISE("There should be only one hydrant for identifier[%s] but there are[%s]",

Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) #11294

Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) #11294

Conversation

loquisgon commented May 25, 2021 • edited Loading

Description

Feature flag

Future work

Refactor Appenderator interface

Study impact when maxColumnsToMerge is set

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgtm-com bot commented Jul 2, 2021

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson Jul 7, 2021 • edited Loading

Choose a reason for hiding this comment

loquisgon Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loquisgon Jul 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loquisgon Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loquisgon commented May 25, 2021 •

edited

Loading

Refactor `Appenderator` interface

Study impact when `maxColumnsToMerge` is set

jihoonson Jul 7, 2021 •

edited

Loading

loquisgon Jul 8, 2021 •

edited

Loading

loquisgon Jul 7, 2021 •

edited

Loading

loquisgon Jul 8, 2021 •

edited

Loading