Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More error reporting and stats for ingestion tasks #5418

Merged
merged 12 commits into from
Apr 6, 2018

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Feb 23, 2018

This PR adds more ingestion status and error reporting for IndexTask, KafkaIndexTask, and HadoopIndexTask.

Things changed:

  • ParseException handling:
    • reportParseExceptions and ignoreInvalidRows are now deprecated
    • Three new configurations have been added to the various tuningConfig classes:
      • logParseExceptions - log.error() when a parse exception occurs
      • maxParseExceptions - The maximum number of parse exceptions allowed before a task fails
      • maxSavedParseExceptions - Save the last N parse exceptions to a circular buffer, retrievable via a new unparseableEvents endpoint on the tasks, and also exposed in the completed task status
      • When reportParseExceptions is true, this is treated as maxParseExceptionsset to 0.
  • New row counter type: processedWithErrors -> This refers to rows where the row was parseable but a column value could not be parsed into the correct type (e.g., a row that has "helloworld" for a long column). These rows are counted against the maxParseExceptions limit.
  • TaskStatus changes
    • The task status now has an additional error message:
      • errorMsg -> Message showing cause for task failure, truncated to 100 characters. The full message is available in the TaskReport described below.
  • rowStats peon endpoint (http://localhost:<task port>/druid/worker/v1/chat/<task id>/rowStats)
    • Shows the current total row stats for an ingestion task. By default this shows stats only for the current ingestion phase. Adding ?full will return stats for all phases.
  • unparseableEvents peon endpoint (http://localhost:<task port>/druid/worker/v1/chat/<task id>/unparseableEvents)
    • Returns a list of the last N parse exceptions encountered by a task
  • Ingestion stats and errors TaskReport:
    • Available at (http://<overlord>/druid/indexer/v1/task/<task_id>/reports) after a task completes. Contains the following:
    • ingestionState -> shows what ingestion phase the ingestion task is currently in, possible values are NOT_STARTED, DETERMINE_PARTITIONS, BUILD_SEGMENTS, and COMPLETED
    • rowStats
    • unparseableEvents
    • errorMsg -> shows the failure cause of a task

Example task report:

{
  "ingestionStatsAndErrors": {
    "taskId": "index_wikiticker_2018-04-03T20:33:12.782Z",
    "payload": {
      "ingestionState": "COMPLETED",
      "unparseableEvents": {
        "determinePartitions": [
          "Encountered row with timestamp that cannot be represented as a long: [MapBasedInputRow{timestamp=246140482-04-24T15:36:27.903Z, event={isRobot=true, channel=#ca.wikipedia, timestamp=246140482-04-24T15:36:27.903Z, flags=B, isUnpatrolled=false, page=Exèrcit dels guàrdies de la revolució islàmica, diffUrl=https://ca.wikipedia.org/w/index.php?diff=17129537&oldid=17127917&rcid=29640873, added=18, comment=Robot posa data a plantilles de manteniment, commentLength=43, isNew=false, isMinor=false, delta=18, isAnonymous=false, user=ArnauBot, deltaBucket=0.0, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}]",
          "Unable to parse row [and so is this lol]",
          "Unable to parse row [this is total garbage]",
          "Unable to parse row [this is total garbag34]",
          "Unable to parse row [this is total garbage3]",
          "Unable to parse row [this is total garbage2]"
        ],
        "buildSegments": [
          "Encountered row with timestamp that cannot be represented as a long: [MapBasedInputRow{timestamp=246140482-04-24T15:36:27.903Z, event={isRobot=true, channel=#ca.wikipedia, timestamp=246140482-04-24T15:36:27.903Z, flags=B, isUnpatrolled=false, page=Exèrcit dels guàrdies de la revolució islàmica, diffUrl=https://ca.wikipedia.org/w/index.php?diff=17129537&oldid=17127917&rcid=29640873, added=18, comment=Robot posa data a plantilles de manteniment, commentLength=43, isNew=false, isMinor=false, delta=18, isAnonymous=false, user=ArnauBot, deltaBucket=0.0, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}]",
          "Unable to parse row [and so is this lol]",
          "Found unparseable columns in row: [MapBasedInputRow{timestamp=2016-06-27T00:02:20.008Z, event={isRobot=false, channel=#id.wikipedia, timestamp=2016-06-27T00:02:20.008Z, flags=!, isUnpatrolled=true, page=Ibnu Sina, diffUrl=https://id.wikipedia.org/w/index.php?diff=11687177&oldid=11444059&rcid=20812462, added=106, comment=gambar, commentLength=6, isNew=false, isMinor=false, delta=106, isAnonymous=false, user=Ftihikam, deltaBucket=bad-delta-bucket, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}], exceptions: [could not convert value [bad-delta-bucket] to long,]",
          "Unable to parse row [this is total garbage]",
          "Found unparseable columns in row: [MapBasedInputRow{timestamp=2016-06-27T00:01:07.347Z, event={isRobot=true, channel=#ceb.wikipedia, timestamp=2016-06-27T00:01:07.347Z, flags=NB, isUnpatrolled=false, page=Neqerssuaq, diffUrl=https://ceb.wikipedia.org/w/index.php?oldid=9563239&rcid=36193146, added=4150, comment=Paghimo ni bot Greenland, commentLength=24, isNew=true, isMinor=false, delta=4150, isAnonymous=false, user=Lsjbot, deltaBucket=helloworld, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}], exceptions: [could not convert value [helloworld] to long,]",
          "Unable to parse row [this is total garbag34]"
        ]
      },
      "rowStats": {
        "determinePartitions": {
          "rowsProcessed": 29,
          "rowsProcessedWithErrors": 0,
          "rowsThrownAway": 0,
          "rowsUnparseable": 7
        },
        "buildSegments": {
          "rowsProcessed": 27,
          "rowsProcessedWithErrors": 2,
          "rowsThrownAway": 0,
          "rowsUnparseable": 7
        }
      },
      "errorMsg": null
    },
    "type": "ingestionStatsAndErrors"
  }
}

Example rowStats output:

{
  "totals": {
    "buildSegments": {
      "rowsProcessedWithErrors": 0,
      "rowsProcessed": 22479,
      "rowsUnparseable": 0,
      "rowsThrownAway": 0
    }
  }
}

Example unparseableEvents output:

{
  "determinePartitions": [
    "Encountered row with timestamp that cannot be represented as a long: [MapBasedInputRow{timestamp=246140482-04-24T15:36:27.903Z, event={isRobot=true, channel=#ca.wikipedia, timestamp=246140482-04-24T15:36:27.903Z, flags=B, isUnpatrolled=false, page=Exèrcit dels guàrdies de la revolució islàmica, diffUrl=https://ca.wikipedia.org/w/index.php?diff=17129537&oldid=17127917&rcid=29640873, added=18, comment=Robot posa data a plantilles de manteniment, commentLength=43, isNew=false, isMinor=false, delta=18, isAnonymous=false, user=ArnauBot, deltaBucket=0.0, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}]",
    "Unable to parse row [and so is this lol]",
    "Unable to parse row [this is total garbage]",
    "Unable to parse row [this is total garbag34]",
    "Unable to parse row [this is total garbage3]",
    "Unable to parse row [this is total garbage2]"
  ],
  "buildSegments": [
    "Encountered row with timestamp that cannot be represented as a long: [MapBasedInputRow{timestamp=246140482-04-24T15:36:27.903Z, event={isRobot=true, channel=#ca.wikipedia, timestamp=246140482-04-24T15:36:27.903Z, flags=B, isUnpatrolled=false, page=Exèrcit dels guàrdies de la revolució islàmica, diffUrl=https://ca.wikipedia.org/w/index.php?diff=17129537&oldid=17127917&rcid=29640873, added=18, comment=Robot posa data a plantilles de manteniment, commentLength=43, isNew=false, isMinor=false, delta=18, isAnonymous=false, user=ArnauBot, deltaBucket=0.0, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}]",
    "Unable to parse row [and so is this lol]",
    "Found unparseable columns in row: [MapBasedInputRow{timestamp=2016-06-27T00:02:20.008Z, event={isRobot=false, channel=#id.wikipedia, timestamp=2016-06-27T00:02:20.008Z, flags=!, isUnpatrolled=true, page=Ibnu Sina, diffUrl=https://id.wikipedia.org/w/index.php?diff=11687177&oldid=11444059&rcid=20812462, added=106, comment=gambar, commentLength=6, isNew=false, isMinor=false, delta=106, isAnonymous=false, user=Ftihikam, deltaBucket=bad-delta-bucket, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}], exceptions: [could not convert value [bad-delta-bucket] to long,]",
    "Unable to parse row [this is total garbage]",
    "Found unparseable columns in row: [MapBasedInputRow{timestamp=2016-06-27T00:01:07.347Z, event={isRobot=true, channel=#ceb.wikipedia, timestamp=2016-06-27T00:01:07.347Z, flags=NB, isUnpatrolled=false, page=Neqerssuaq, diffUrl=https://ceb.wikipedia.org/w/index.php?oldid=9563239&rcid=36193146, added=4150, comment=Paghimo ni bot Greenland, commentLength=24, isNew=true, isMinor=false, delta=4150, isAnonymous=false, user=Lsjbot, deltaBucket=helloworld, deleted=0, namespace=Main}, dimensions=[channel, cityName, comment, countryIsoCode, countryName, isAnonymous, isMinor, isNew, isRobot, isUnpatrolled, metroCode, namespace, page, regionIsoCode, regionName, user, commentLength, deltaBucket, flags, diffUrl]}], exceptions: [could not convert value [helloworld] to long,]",
    "Unable to parse row [this is total garbag34]"
  ]
}

@jon-wei jon-wei changed the title [WIP] More error reporting and stats for ingestion tasks More error reporting and stats for ingestion tasks Mar 14, 2018
@jon-wei jon-wei removed the WIP label Mar 14, 2018
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed through KafkaSupervisorTest.java

return false;
}

if (!Objects.equals(location, that.location)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dupe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


public CircularBuffer(int capacity)
{
buffer = (E[]) new Object[capacity];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explode with a precondition check that capacity is larger than 0 here instead of exploding out of bounds here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a preconditions check


import com.google.common.base.Preconditions;

public class CircularBuffer<E>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use https://google.github.io/guava/releases/23.0/api/docs/com/google/common/collect/EvictingQueue.html? However, it would require a different strategy for getMessagesFromSavedParseExceptions where getLatest is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to keep CircularBuffer for now, since it was already in the codebase and I do want getLatest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CircularBuffer is used in this as well as ChangeRequestHistory and ChangeRequestHistory requires a randomly-accessible circular array. I think it's fine to keep this.

However, would you add some javadocs to this class? I also think we need some unit tests for this class, but it's not mandatory for this PR.

private final Map<String, Object> metrics;

@Nullable
private final String errorMsg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted this!


package io.druid.indexer;

public enum IngestionState
Copy link
Contributor

@jihoonson jihoonson Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this same for all types of tasks? If so, I think it's better to expand TaskState to include these new states because every task is the ingestion task and we don't have to keep two states for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep them separate, since I mean for IngestionState to be an additional qualifier on the existing states (RUNNING,FAILED,SUCCESS). For example, a task could be RUNNING and in DETERMINE_PARTITIONS, or RUNNING and in BUILD_SEGMENTS, or similarly with FAILED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -138,13 +175,37 @@ public boolean equals(Object o)
if (!Objects.equals(duration, that.duration)) {
return false;
}
return location.equals(that.location);

if (!Objects.equals(location, that.location)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

location can't be null.

return TaskStatus.failure(
getId(),
getTaskCompletionMetrics(),
e.getMessage(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about passing the full stack trace rather than a simple message? It would be much helpful to understand what's going on. If some applications need only error messages, they can parse the stack trace on their own.

Copy link
Contributor Author

@jon-wei jon-wei Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to the following:

  • The stack trace is passed in now, but TaskStatus will truncate the error message to 100 characters. This is to avoid large ZK/metadata objects.
  • The full error + stack trace is provided in the TaskReport that's uploaded when the task completes

} else {
fireDepartmentMetrics.incrementThrownAway();
}
catch (ParseException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this catch clause isn't necessary and handleParseException() can be called directly like

if (addResult.getParseException() != null) {
  handleParseException(e, record);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this and similar places to use handleParseException

Objects.equals(intermediateHandoffPeriod, that.intermediateHandoffPeriod) &&
Objects.equals(logParseExceptions, that.logParseExceptions) &&
Objects.equals(maxParseExceptions, that.maxParseExceptions) &&
Objects.equals(maxSavedParseExceptions, that.maxSavedParseExceptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New variables can't be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed these, I don't really think this matters much though since these comparisons aren't in any hot path

@@ -102,6 +103,9 @@
@JsonIgnore
private IndexTask indexTaskSpec;

@JsonIgnore
private final AuthorizerMapper authorizerMapper;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this.

}
}
catch (TimeoutException | ExecutionException e) {
throw Throwables.propagate(e);
}
}

@Nullable
public static List<String> getMessagesFromSavedParseExceptions(CircularBuffer<Throwable> savedParseExceptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexTask class is already very long enough. Suggest to extract as a method of a separate util class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to IndexTaskUtils

@@ -451,7 +453,7 @@ public boolean isRollup()
);

// Note: This method needs to be thread safe.
protected abstract Integer addToFacts(
protected abstract Pair<Integer, List<String>> addToFacts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Pair makes code difficult to understand because I have to imagine what makes the pair. It would be much great to add a new class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to use a new class instead of Pair

}

@VisibleForTesting
TimeAndDims toTimeAndDims(InputRow row)
Pair<TimeAndDims, List<String>> toTimeAndDims(InputRow row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to use a new class instead of Pair

@@ -72,7 +72,7 @@ public MapInputRowParser(
}
}
catch (Exception e) {
throw new ParseException(e, "Unparseable timestamp found!");
throw new ParseException(e, "Unparseable timestamp found! Event: " + theMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ParseException supports formatted string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used formatted string


package io.druid.indexer;

public enum IngestionState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


import com.google.common.base.Preconditions;

public class CircularBuffer<E>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CircularBuffer is used in this as well as ChangeRequestHistory and ChangeRequestHistory requires a randomly-accessible circular array. I think it's fine to keep this.

However, would you add some javadocs to this class? I also think we need some unit tests for this class, but it's not mandatory for this PR.

@Nullable
default Map<String, Object> getStats()
{
throw new UnsupportedOperationException("This Jobby does not implement getJobStats().");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the class name to the exception message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added class name

@Nullable
default String getErrorMessage()
{
throw new UnsupportedOperationException("This Jobby does not implement getErrorMessage().");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added class name

return jsonMapper.writeValueAsString(taskDiagsMap);
}
catch (IOException | InterruptedException ie) {
log.error("couldn't get failure cause for job.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the exception and job name to the log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added exception and job name

/**
*/
public interface Jobby
{
boolean run();

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add a javadoc describing when the return value can be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc

throw new UnsupportedOperationException("This Jobby does not implement getJobStats().");
}

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc

@@ -233,7 +290,10 @@ public Double deserialize(ByteArrayDataInput in)
typeHelper = STRING_HELPER;
}
writeString(dim, out);
typeHelper.serialize(out, row.getRaw(dim), reportParseExceptions);
String parseExceptionMessage = typeHelper.serialize(out, row.getRaw(dim), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably reportParseExceptions should be passed instead of true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed reportParseExceptions here, it should always be true (always thrown, handling depends on config)

@@ -65,7 +67,8 @@
{
ValueType getType();

void serialize(ByteArrayDataOutput out, Object value, boolean reportParseExceptions);
@Nullable
String serialize(ByteArrayDataOutput out, Object value, boolean reportParseExceptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about returning an exception message. Probably it's possible that this method always throws a ParseException on parsing errors and reportParseExceptions can be handled by the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this throw parse exception instead of returning the message

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jon-wei thanks. Looks good to me.

@jon-wei jon-wei merged commit 969342c into apache:master Apr 6, 2018
surekhasaharan pushed a commit to surekhasaharan/druid that referenced this pull request Apr 6, 2018
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
gianm pushed a commit that referenced this pull request May 3, 2018
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to #5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018
…e#5583)

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (apache#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (apache#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (apache#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (apache#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR apache#5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR apache#5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (apache#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (apache#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (apache#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (apache#5581)

Deprecated due to apache#5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (apache#5586)

Also switch various firehoses to the new method.

Fixes apache#5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
surekhasaharan pushed a commit to implydata/druid-public that referenced this pull request Jul 3, 2018
More error reporting and stats for ingestion tasks (apache#5418)
These changes are required for the tasks api improvement commit
jon-wei added a commit to implydata/druid-public that referenced this pull request Aug 17, 2018
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
@dclim dclim added this to the 0.13.0 milestone Oct 8, 2018
if (ingestionSchema.getTuningConfig().isReportParseExceptions()) {
throw e;
} else {
unparseable++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes the if (parseable > 0) below dead. See #7737.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants