ForkingTaskRunner: Set ActiveProcessorCount for tasks. #12592

gianm · 2022-06-01T16:00:31Z

This prevents various automatically-sized thread pools from being unreasonably
large (we don't want each task to size its pools as if it is the only thing on
the entire machine).

On large machines, this solves a common cause of OutOfMemoryError due to
"unable to create native thread".

This prevents various automatically-sized thread pools from being unreasonably large (we don't want each task to size its pools as if it is the only thing on the entire machine).

kfaraz

Added minor comments.

kfaraz · 2022-06-08T08:24:54Z

indexing-service/src/main/java/org/apache/druid/indexing/overlord/ForkingTaskRunner.java

@@ -635,7 +646,7 @@ public Optional<ScalingStats> getScalingStats()
  @Override
  public void start()


Probably needs to be annotated with @LifecycleStart. I wonder if just annotating the interface method TaskRunner.start() would work so that the implementations don't need to worry about this.

Yeah, I think you're right. I think it makes sense to add the annotations in the impls, because not all impls necessarily need lifecycle management. I added it here.

kfaraz · 2022-06-08T08:26:22Z

indexing-service/src/main/java/org/apache/druid/indexing/overlord/ForkingTaskRunner.java

@@ -214,6 +218,13 @@ public TaskStatus call()
                        command.add("-cp");
                        command.add(taskClasspath);

+                        if (numProcessorsPerTask < 1) {


Nit: Is this really needed? Won't TaskRunner.run() always be called after the lifecycle start method?

It wouldn't be needed if start and run are called in the proper order. It's here as a safety check in case they aren't. (Maybe it would help debug tests that are doing things incorrectly.)

kfaraz · 2022-06-08T08:30:07Z

indexing-service/src/test/java/org/apache/druid/indexing/overlord/ForkingTaskRunnerTest.java

@@ -373,6 +375,7 @@ int waitForTaskProcessToComplete(Task task, ProcessHolder processHolder, File lo
      }
    };

+    forkingTaskRunner.setNumProcessorsPerTask();


Could we call forkingTaskRunner.start() here instead to avoid exposing the setNumProcessorsPerTask() as a @VisibleForTesting method?

I thought about that, but then thought it was best to be as minimal as possible for futureproofing. If we later add more functionality to start(), we wouldn't necessarily want that to execute in these tests.

abhishekagarwal87 · 2022-06-08T09:51:58Z

@gianm - what will be the effect of this change on existing ingestion jobs? The thread pool sizes are going to change after this change on the same hardware. Users could see changes in performance for real-time nodes if they have not configured the size of the processing thread pool explicitly.

gianm · 2022-06-10T15:00:17Z

@gianm - what will be the effect of this change on existing ingestion jobs? The thread pool sizes are going to change after this change on the same hardware. Users could see changes in performance for real-time nodes if they have not configured the size of the processing thread pool explicitly.

Yeah essentially any autoconfigured thread pool will act as if they have a slice of the machine instead of the entire machine. There are a lot of these. I think for most of them, users won't notice, and this change will be an improvement due to fewer overall threads on the machine. Users may notice the processing pool changing size if they weren't explicitly setting it. That's worth calling out in the release notes, so it's good that you added a release notes label. Thanks.

kfaraz

LGTM!

gianm · 2022-06-11T00:39:54Z

Many integration tests are failing. There might be something wrong with the way the commandline is being constructed. I'll look into it when I get a chance.

gianm · 2022-06-15T05:44:42Z

The issue was that ForkingTaskRunner needs to have its lifecycle managed. The latest patch fixes it.

forzamehlano · 2022-11-22T14:31:06Z

@gianm - what will be the effect of this change on existing ingestion jobs? The thread pool sizes are going to change after this change on the same hardware. Users could see changes in performance for real-time nodes if they have not configured the size of the processing thread pool explicitly.

Yeah essentially any autoconfigured thread pool will act as if they have a slice of the machine instead of the entire machine. There are a lot of these. I think for most of them, users won't notice, and this change will be an improvement due to fewer overall threads on the machine. Users may notice the processing pool changing size if they weren't explicitly setting it. That's worth calling out in the release notes, so it's good that you added a release notes label. Thanks.

Could you elaborate on the specific processing thread pools affected by this setting around ingest? I'm trying to work out how to revert this setting (or at least revert the behaviour) to see if this is the underlying cause in our ingest speed issues post upgrade to 24.0.0

abhishekagarwal87 · 2022-11-22T16:59:23Z

druid.processing.numThreads is the setting that you may want to look at.

abhishekagarwal87 · 2022-11-22T17:01:05Z

what kind of speed issues are you observing?

ForkingTaskRunner: Set ActiveProcessorCount for tasks.

8d4c397

This prevents various automatically-sized thread pools from being unreasonably large (we don't want each task to size its pools as if it is the only thing on the entire machine).

gianm added the Area - Ingestion label Jun 1, 2022

Fix tests.

bc519e1

gianm closed this Jun 7, 2022

gianm reopened this Jun 7, 2022

kfaraz reviewed Jun 8, 2022

View reviewed changes

abhishekagarwal87 added the Release Notes label Jun 8, 2022

Merge branch 'master' into ftr-avail-proc

fb84fde

Add missing LifecycleStart annotation.

8e63d4b

kfaraz approved these changes Jun 10, 2022

View reviewed changes

Merge branch 'master' into ftr-avail-proc

f9b4e91

ForkingTaskRunner needs ManageLifecycle.

1082e5c

gianm merged commit 70f3b13 into apache:master Jun 15, 2022

gianm deleted the ftr-avail-proc branch June 15, 2022 22:56

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

techdocsmith mentioned this pull request Aug 26, 2022

[Draft] 24.0 Release notes #12825

Closed

abhishekagarwal87 mentioned this pull request Sep 8, 2022

Test issue [Please ignore] #13055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ForkingTaskRunner: Set ActiveProcessorCount for tasks. #12592

ForkingTaskRunner: Set ActiveProcessorCount for tasks. #12592

gianm commented Jun 1, 2022

kfaraz left a comment

kfaraz Jun 8, 2022

gianm Jun 10, 2022

kfaraz Jun 8, 2022

gianm Jun 10, 2022

kfaraz Jun 8, 2022

gianm Jun 10, 2022

abhishekagarwal87 commented Jun 8, 2022

gianm commented Jun 10, 2022 •

edited

kfaraz left a comment

gianm commented Jun 11, 2022

gianm commented Jun 15, 2022

forzamehlano commented Nov 22, 2022

abhishekagarwal87 commented Nov 22, 2022

abhishekagarwal87 commented Nov 22, 2022

		@@ -635,7 +646,7 @@ public Optional<ScalingStats> getScalingStats()
		@Override
		public void start()

ForkingTaskRunner: Set ActiveProcessorCount for tasks. #12592

ForkingTaskRunner: Set ActiveProcessorCount for tasks. #12592

Conversation

gianm commented Jun 1, 2022

kfaraz left a comment

Choose a reason for hiding this comment

kfaraz Jun 8, 2022

Choose a reason for hiding this comment

gianm Jun 10, 2022

Choose a reason for hiding this comment

kfaraz Jun 8, 2022

Choose a reason for hiding this comment

gianm Jun 10, 2022

Choose a reason for hiding this comment

kfaraz Jun 8, 2022

Choose a reason for hiding this comment

gianm Jun 10, 2022

Choose a reason for hiding this comment

abhishekagarwal87 commented Jun 8, 2022

gianm commented Jun 10, 2022 • edited

kfaraz left a comment

Choose a reason for hiding this comment

gianm commented Jun 11, 2022

gianm commented Jun 15, 2022

forzamehlano commented Nov 22, 2022

abhishekagarwal87 commented Nov 22, 2022

abhishekagarwal87 commented Nov 22, 2022

gianm commented Jun 10, 2022 •

edited