Add support for task reports, upload reports to deep storage #5524

jon-wei · 2018-03-23T01:14:41Z

This PR allows indexing tasks to return a map of TaskReport objects along with the TaskStatus. A TaskReport will contain structured information about a task, such as row stats, parsing errors, or a list of published segments.

The TaskReports will be uploaded to deep storage along with the Task's stdout log, accessible via an overlord endpoint keyed by task ID.

This is intended to support PR #5418 and #5492 which will generate large blocks of structured information upon task completion.

This large TaskReport information is kept in deep storage only, to avoid storing large objects in the metadata storage or in zookeeper nodes (via RemoteTaskRunner).

ExecutorLifecycle will write the TaskReports to disk and strip out the TaskReport objects before returning a TaskStatus, and the deep storage upload is handled in ForkingTaskRunner like the log upload.

jihoonson · 2018-03-27T23:31:56Z

@jon-wei thanks for raising this PR. I'm reviewing it.

BTW, #5492 is changed to not use taskReport anymore. It first used the taskReport to return which segments generated by worker tasks to the supervisor task. This information is temporal and doesn't have to be stored permanently in deep storage.

jihoonson

@jon-wei left a few comments.

One more thing I want to comment is, the way of processing taskReports looks complicated because TaskStatusWithReports is returned by a task upon its completion, but two variables inside it are handled differently. TaskStatus is returned to the overlord as it does, but taskReport is uploaded to deep storage and swept out from TaskStatusWithReports. As a result, from the perspective of overlords, taskReport is always empty. This is intended to avoid such a large pressure on ZK, but, since they are handled differently, I think it's better to not keep them in a single class.

One of possible alternatives is providing an interface to tasks like adding a taskReporter to TaskToolbox. Each task can use taskReporter if they have something to report. What do you think?

jihoonson · 2018-03-27T23:35:04Z

extensions-core/hdfs-storage/src/main/java/io/druid/storage/hdfs/tasklog/HdfsTaskLogs.java

+    final Path path = getTaskReportsFileFromId(taskId);
+    log.info("Writing task reports to: %s", path);
+    pushTaskFile(path, reportFile);
+    log.info("Wrote task reports to: %s", path);


Just out of curiosity, looks like only this type of taskLogs writes two lines of logs before and after pushing task reports. Is this intended?

pushTaskLog also has two logs there, so I did the same

jihoonson · 2018-03-27T23:46:42Z

indexing-service/src/main/java/io/druid/indexing/overlord/http/OverlordResource.java

+      @PathParam("taskid") final String taskid
+  )
+  {
+    try {


Doesn't this API need to check authentication and authorization?

It's handled by @ResourceFilters(TaskResourceFilter.class) on the method

jihoonson · 2018-03-27T23:50:28Z

indexing-service/src/main/java/io/druid/indexing/worker/executor/ExecutorLifecycle.java

+                TaskStatusWithReports taskStatusWithReports = (TaskStatusWithReports) taskStatus;
+                final File reportsFileParent = reportsFile.getParentFile();
+                if (reportsFileParent != null) {
+                  reportsFileParent.mkdirs();


FileUtils.forceMkdir() would be better because it has some error checks.

Changed to forceMkdir()

jihoonson · 2018-03-27T23:50:42Z

indexing-service/src/main/java/io/druid/indexing/worker/executor/ExecutorLifecycle.java

              final File statusFileParent = statusFile.getParentFile();
              if (statusFileParent != null) {
                statusFileParent.mkdirs();
              }
              jsonMapper.writeValue(statusFile, taskStatus);

+


Unnecessary change.

Removed empty line

jihoonson · 2018-03-28T00:24:37Z

indexing-service/src/main/java/io/druid/indexing/common/task/Task.java

@@ -178,7 +178,7 @@ default int getPriority()
   *
   * @throws Exception if this task failed
   */
-  TaskStatus run(TaskToolbox toolbox) throws Exception;
+  TaskStatusWithReports run(TaskToolbox toolbox) throws Exception;


This might break the compatibility with the existing third-party task implementations.

Reverted this to just TaskStatus

jihoonson · 2018-03-28T00:48:51Z

indexing-service/src/main/java/io/druid/indexing/common/TaskStatusWithReports.java

+public class TaskStatusWithReports extends TaskStatus
+{
+  @JsonProperty
+  private TaskStatus taskStatus;


TaskStatus is duplicated because TaskStatusWithReports already extends TaskStatus.

Dropped this as a saved field

jon-wei · 2018-03-29T22:02:37Z

One of possible alternatives is providing an interface to tasks like adding a taskReporter to TaskToolbox. Each task can use taskReporter if they have something to report. What do you think?

Hm, I decided to keep the current implementation, since I think of the task log and the task reports as very similar things (unstructured vs. structured task logs) and felt like it would be nicer to handle the file writes/uploads together. Doing the uploads in a common place I felt was simpler than adding a file upload step to each individual task implementation.

As a result, from the perspective of overlords, taskReport is always empty.

I felt this was fine, anything above ExecutorLifecycle in the hierarchy of runners/tasks would only see a plain TaskStatus object, so there wouldn't be an empty "taskReports" field

jihoonson

Hm, I decided to keep the current implementation, since I think of the task log and the task reports as very similar things (unstructured vs. structured task logs) and felt like it would be nicer to handle the file writes/uploads together. Doing the uploads in a common place I felt was simpler than adding a file upload step to each individual task implementation.

Oh, yeah I definitely agree with you. My previous comment was about how to get taskReports from inside of a task to outside.

In the current implementation, taskReports are contained TaskStatusWithReports which extends TaskStatus. Since a task returns TaskStatusWithReports once it finishes its work, the report is returned with the task complete status together. Then, ExecutorLifecycle hijacks TaskStatusWithReports returned from a task, writes only the report part to a file, and replaces it with TaskStatus without reports. Once ForkingTaskRunner recognizes a task complete, it pushes the taskReports file to deep storage along with task logs.

This looks quite complicated to me because

Since Task and TaskRunner are supposed to return TaskStatus, it isn't intuitive that some of TaskRunners and Tasks return TaskStatusWithReports instead of TaskStatus, but others are not.
Even though a task returns TaskStatusWithReports, RemoteTaskRunner receives only TaskStatus without reports. This makes some other developers confused unless they are familiar with how taskStatus is notified from tasks to overlords. This may cause potential bugs in the future if they try to modify the codes around it.

Probably there is a simpler alternative and it might be possible by passing to a sort of taskReporter to tasks (via TaskToolbox) which writes taskReports to a file of the predefined path. Then, ForkingTaskRunner can read and push the taskReport file without hijacking in ExecutorLifecycle. What do you think?

jihoonson · 2018-03-29T21:54:54Z

indexing-service/src/main/java/io/druid/indexing/common/TaskStatusWithReports.java

  public TaskStatus getTaskStatus()
  {
-    return taskStatus;
+    return new TaskStatus(


This might be possible by simplly casting itself. Is this method supposed to always return a new instance? If so, the method name should be something else like newTaskStatus() rather than getTaskStatus().

It's meant to return a new base TaskStatus object without the reports, I renamed the method to "makeTaskStatusWithoutReports"

jon-wei · 2018-03-30T22:56:11Z

@jihoonson good points, I've changed this to use a TaskReportFileWriter that's injected into the toolbox

jihoonson

@jon-wei thanks. +1 after Travis.

jihoonson · 2018-03-31T00:12:04Z

indexing-service/src/main/java/io/druid/indexing/common/TaskReport.java

+ * TaskReport objects contain additional information about an indexing task, such as row statistics, errors, and
+ * published segments. They are kept in deep storage along with task logs.
+ */
+/**


Please remove this.

Removed the extra /** */

clintropolis

Overall LGTM after the refactor 🤘

clintropolis · 2018-04-02T03:33:28Z

indexing-service/src/main/java/io/druid/indexing/overlord/ThreadPoolTaskRunner.java

@@ -262,7 +262,7 @@ public void stop()
      }
    }
    final ListenableFuture<TaskStatus> statusFuture = exec.get(taskPriority)
-                                                          .submit(new ThreadPoolTaskRunnerCallable(
+                                                                     .submit(new ThreadPoolTaskRunnerCallable(


Formatting looks off here

fixed formatting

clintropolis · 2018-04-02T03:38:34Z

services/src/main/java/io/druid/cli/CliPeon.java

@@ -187,6 +188,12 @@ public void configure(Binder binder)
                    .setStatusFile(new File(taskAndStatusFile.get(1)))
            );

+            binder.bind(TaskReportFileWriter.class).toInstance(
+                new TaskReportFileWriter(
+                    new File(taskAndStatusFile.get(2))


Should you update the taskAndStatus arguments annotation to include this 3rd file? Also, maybe we should consider pulling these out into standalone variables at startup, like taskFileName statusFileName reportFileName, so it's more obvious what they are?

Updated the annotation and made standalone variables for the file paths

…5524) * Add support for task reports, upload reports to deep storage * PR comments * Better name for method * Fix report file upload * Use TaskReportFileWriter * Checkstyle * More PR comments

Add support for task reports, upload reports to deep storage

372237b

jon-wei added Feature Area - Batch Ingestion labels Mar 23, 2018

jihoonson reviewed Mar 28, 2018

View reviewed changes

jon-wei added 2 commits March 28, 2018 14:50

PR comments

61ea998

Merge remote-tracking branch 'upstream/master' into task_report

c993a29

jihoonson reviewed Mar 29, 2018

View reviewed changes

jon-wei added 3 commits March 29, 2018 16:25

Better name for method

1f88890

Fix report file upload

7bf2329

Use TaskReportFileWriter

20947d4

jon-wei force-pushed the task_report branch from 26a83b8 to 20947d4 Compare March 30, 2018 22:54

Checkstyle

5733816

jihoonson reviewed Mar 31, 2018

View reviewed changes

clintropolis approved these changes Apr 2, 2018

View reviewed changes

More PR comments

cb0bcac

jihoonson approved these changes Apr 2, 2018

View reviewed changes

jon-wei merged commit 723f7ac into apache:master Apr 2, 2018

dclim added this to the 0.13.0 milestone Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for task reports, upload reports to deep storage #5524

Add support for task reports, upload reports to deep storage #5524

jon-wei commented Mar 23, 2018 •

edited

Loading

jihoonson commented Mar 27, 2018

jihoonson left a comment

jihoonson Mar 27, 2018

jon-wei Mar 29, 2018

jihoonson Mar 27, 2018

jon-wei Mar 28, 2018

jihoonson Mar 27, 2018

jon-wei Mar 29, 2018

jihoonson Mar 27, 2018

jon-wei Mar 29, 2018

jihoonson Mar 28, 2018

jon-wei Mar 29, 2018

jihoonson Mar 28, 2018 •

edited

Loading

jon-wei Mar 29, 2018

jon-wei commented Mar 29, 2018

jihoonson left a comment

jihoonson Mar 29, 2018

jon-wei Mar 29, 2018

jon-wei commented Mar 30, 2018

jihoonson left a comment

jihoonson Mar 31, 2018

jon-wei Apr 2, 2018

clintropolis left a comment

clintropolis Apr 2, 2018

jon-wei Apr 2, 2018

clintropolis Apr 2, 2018

jon-wei Apr 2, 2018

Add support for task reports, upload reports to deep storage #5524

Add support for task reports, upload reports to deep storage #5524

Conversation

jon-wei commented Mar 23, 2018 • edited Loading

jihoonson commented Mar 27, 2018

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson Mar 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented Mar 29, 2018

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented Mar 30, 2018

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented Mar 23, 2018 •

edited

Loading

jihoonson Mar 28, 2018 •

edited

Loading