Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More error reporting and stats for ingestion tasks #5418

Merged
merged 12 commits into from
Apr 6, 2018
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ public List<InputRow> parseBatch(Map<String, Object> theMap)
}
}
catch (Exception e) {
throw new ParseException(e, "Unparseable timestamp found!");
throw new ParseException(e, "Unparseable timestamp found! Event: " + theMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ParseException supports formatted string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used formatted string

}

return ImmutableList.of(new MapBasedInputRow(timestamp.getMillis(), dimensions, theMap));
Expand Down
28 changes: 28 additions & 0 deletions api/src/main/java/io/druid/indexer/IngestionState.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.indexer;

public enum IngestionState
Copy link
Contributor

@jihoonson jihoonson Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this same for all types of tasks? If so, I think it's better to expand TaskState to include these new states because every task is the ingestion task and we don't have to keep two states for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep them separate, since I mean for IngestionState to be an additional qualifier on the existing states (RUNNING,FAILED,SUCCESS). For example, a task could be RUNNING and in DETERMINE_PARTITIONS, or RUNNING and in BUILD_SEGMENTS, or similarly with FAILED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

{
NOT_STARTED,
DETERMINE_PARTITIONS,
BUILD_SEGMENTS,
COMPLETED
}
29 changes: 29 additions & 0 deletions api/src/main/java/io/druid/indexer/TaskMetricsGetter.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.indexer;

import java.util.List;
import java.util.Map;

public interface TaskMetricsGetter
{
List<String> getKeys();
Map<String, Double> getMetrics();
}
47 changes: 47 additions & 0 deletions api/src/main/java/io/druid/indexer/TaskMetricsUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.indexer;

import com.google.common.collect.Maps;

import java.util.Map;

public class TaskMetricsUtils
{
public static final String ROWS_PROCESSED = "rowsProcessed";
public static final String ROWS_PROCESSED_WITH_ERRORS = "rowsProcessedWithErrors";
public static final String ROWS_UNPARSEABLE = "rowsUnparseable";
public static final String ROWS_THROWN_AWAY = "rowsThrownAway";

public static Map<String, Object> makeIngestionRowMetrics(
long rowsProcessed,
long rowsProcessedWithErrors,
long rowsUnparseable,
long rowsThrownAway
)
{
Map<String, Object> metricsMap = Maps.newHashMap();
metricsMap.put(ROWS_PROCESSED, rowsProcessed);
metricsMap.put(ROWS_PROCESSED_WITH_ERRORS, rowsProcessedWithErrors);
metricsMap.put(ROWS_UNPARSEABLE, rowsUnparseable);
metricsMap.put(ROWS_THROWN_AWAY, rowsThrownAway);
return metricsMap;
}
}
67 changes: 64 additions & 3 deletions api/src/main/java/io/druid/indexer/TaskStatusPlus.java
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import org.joda.time.DateTime;

import javax.annotation.Nullable;
import java.util.Map;
import java.util.Objects;

public class TaskStatusPlus
Expand All @@ -38,6 +39,15 @@ public class TaskStatusPlus
private final TaskLocation location;
private final String dataSource;

@Nullable
private final Map<String, Object> metrics;

@Nullable
private final String errorMsg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted this!


@Nullable
private final Map<String, Object> context;

@JsonCreator
public TaskStatusPlus(
@JsonProperty("id") String id,
Expand All @@ -47,7 +57,10 @@ public TaskStatusPlus(
@JsonProperty("statusCode") @Nullable TaskState state,
@JsonProperty("duration") @Nullable Long duration,
@JsonProperty("location") TaskLocation location,
@JsonProperty("dataSource") String dataSource
@JsonProperty("dataSource") String dataSource,
@JsonProperty("metrics") Map<String, Object> metrics,
@JsonProperty("errorMsg") String errorMsg,
@JsonProperty("context") Map<String, Object> context
)
{
if (state != null && state.isComplete()) {
Expand All @@ -61,6 +74,9 @@ public TaskStatusPlus(
this.duration = duration;
this.location = Preconditions.checkNotNull(location, "location");
this.dataSource = dataSource;
this.metrics = metrics;
this.errorMsg = errorMsg;
this.context = context;
}

@JsonProperty
Expand Down Expand Up @@ -108,6 +124,27 @@ public TaskLocation getLocation()
return location;
}

@Nullable
@JsonProperty("metrics")
public Map<String, Object> getMetrics()
{
return metrics;
}

@Nullable
@JsonProperty("errorMsg")
public String getErrorMsg()
{
return errorMsg;
}

@Nullable
@JsonProperty("context")
public Map<String, Object> getContext()
{
return context;
}

@Override
public boolean equals(Object o)
{
Expand Down Expand Up @@ -138,13 +175,37 @@ public boolean equals(Object o)
if (!Objects.equals(duration, that.duration)) {
return false;
}
return location.equals(that.location);

if (!Objects.equals(location, that.location)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

location can't be null.

return false;
}

if (!Objects.equals(errorMsg, that.errorMsg)) {
return false;
}

if (!Objects.equals(location, that.location)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dupe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return false;
}

return Objects.equals(context, that.context);
}

@Override
public int hashCode()
{
return Objects.hash(id, type, createdTime, queueInsertionTime, state, duration, location);
return Objects.hash(
id,
type,
createdTime,
queueInsertionTime,
state,
duration,
location,
metrics,
errorMsg,
context
);
}

@JsonProperty
Expand Down
78 changes: 78 additions & 0 deletions api/src/main/java/io/druid/utils/CircularBuffer.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package io.druid.utils;

import com.google.common.base.Preconditions;

public class CircularBuffer<E>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use https://google.github.io/guava/releases/23.0/api/docs/com/google/common/collect/EvictingQueue.html? However, it would require a different strategy for getMessagesFromSavedParseExceptions where getLatest is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to keep CircularBuffer for now, since it was already in the codebase and I do want getLatest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CircularBuffer is used in this as well as ChangeRequestHistory and ChangeRequestHistory requires a randomly-accessible circular array. I think it's fine to keep this.

However, would you add some javadocs to this class? I also think we need some unit tests for this class, but it's not mandatory for this PR.

{
public E[] getBuffer()
{
return buffer;
}

private final E[] buffer;

private int start = 0;
private int size = 0;

public CircularBuffer(int capacity)
{
buffer = (E[]) new Object[capacity];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explode with a precondition check that capacity is larger than 0 here instead of exploding out of bounds here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a preconditions check

}

public void add(E item)
{
buffer[start++] = item;

if (start >= buffer.length) {
start = 0;
}

if (size < buffer.length) {
size++;
}
}

public E getLatest(int index)
{
int bufferIndex = start - index - 1;
if (bufferIndex < 0) {
bufferIndex = buffer.length + bufferIndex;
}
return buffer[bufferIndex];
}

public E get(int index)
{
Preconditions.checkArgument(index >= 0 && index < size, "invalid index");

int bufferIndex = (start - size + index) % buffer.length;
if (bufferIndex < 0) {
bufferIndex += buffer.length;
}
return buffer[bufferIndex];
}

public int size()
{
return size;
}
}
5 changes: 4 additions & 1 deletion api/src/test/java/io/druid/indexer/TaskStatusPlusTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,10 @@ public void testSerde() throws IOException
TaskState.RUNNING,
1000L,
TaskLocation.create("testHost", 1010, -1),
"ds_test"
"ds_test",
null,
null,
null
);
final String json = mapper.writeValueAsString(status);
Assert.assertEquals(status, mapper.readValue(json, TaskStatusPlus.class));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ public void ingest(Blackhole blackhole) throws Exception
{
incIndexFilteredAgg = makeIncIndex(filteredMetrics);
for (InputRow row : inputRows) {
int rv = incIndexFilteredAgg.add(row);
int rv = incIndexFilteredAgg.add(row).getRowCount();
blackhole.consume(rv);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ public void normalLongs(Blackhole blackhole) throws Exception
{
for (int i = 0; i < maxRows; i++) {
InputRow row = longRows.get(i);
int rv = incIndex.add(row);
int rv = incIndex.add(row).getRowCount();
blackhole.consume(rv);
}
}
Expand All @@ -174,7 +174,7 @@ public void normalFloats(Blackhole blackhole) throws Exception
{
for (int i = 0; i < maxRows; i++) {
InputRow row = floatRows.get(i);
int rv = incFloatIndex.add(row);
int rv = incFloatIndex.add(row).getRowCount();
blackhole.consume(rv);
}
}
Expand All @@ -187,7 +187,7 @@ public void normalStrings(Blackhole blackhole) throws Exception
{
for (int i = 0; i < maxRows; i++) {
InputRow row = stringRows.get(i);
int rv = incStrIndex.add(row);
int rv = incStrIndex.add(row).getRowCount();
blackhole.consume(rv);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ public void addRows(Blackhole blackhole) throws Exception
{
for (int i = 0; i < rowsPerSegment; i++) {
InputRow row = rows.get(i);
int rv = incIndex.add(row);
int rv = incIndex.add(row).getRowCount();
blackhole.consume(rv);
}
}
Expand Down
15 changes: 15 additions & 0 deletions common/src/main/java/io/druid/indexer/Jobby.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,24 @@

package io.druid.indexer;

import javax.annotation.Nullable;
import java.util.Map;

/**
*/
public interface Jobby
{
boolean run();

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add a javadoc describing when the return value can be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc

default Map<String, Object> getStats()
{
throw new UnsupportedOperationException("This Jobby does not implement getJobStats().");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the class name to the exception message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added class name

}

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc

default String getErrorMessage()
{
throw new UnsupportedOperationException("This Jobby does not implement getErrorMessage().");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added class name

}
}