Skip to content

Conversation

@abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Oct 31, 2025

The GroupByStatsMonitor helps highlight potential performance bottlenecks on the query and data nodes. For peons, it’s useful to include the dataSource and taskId dimensions to get an aggregated view of merge buffer acquisition times, the number of merge buffer queries per datasource, etc. This is useful for diagnosing identifying bottlenecks on the peons when there's a flurry of realtime group By queries to different tables, so the default tunings are not sufficient.

The existing utility MontiorsConfig.mapOfDatasourceAndTaskID() adds id as the dimension for task id. This might be okay for jvm and other sys metrics on the peons, but it's certainly confusing for the group by stats. So I've added a new method mapOfDatasourceAndTaskIdDimension() to add the taskId dimension and retained the old one.

Also, added missing @Nullable annotations to DataSourceTaskIdHolder and its callers as those are optional.

Release note

Monitors on peons that previously emitted the id dimension from JettyMonitor, OshiSysMonitor, JvmMonitor, JvmCpuMonitor, JvmThreadsMonitor and SysMonitor to represent the task ID are deprecated and will be removed in a future release. Use the taskId dimension instead.

Also, the GroupByStatsMonitor includes dataSource and taskId dimensions for metrics emitted on the peons.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Comment on lines +109 to +122
final Injector injector = Initialization.makeInjectorWithModules(
GuiceInjectors.makeStartupInjector(),
List.of(binder -> {
JsonConfigProvider.bindInstance(
binder,
Key.get(DruidNode.class, Self.class),
new DruidNode("test-inject", null, false, null, null, true, false)
);
binder.bind(Key.get(String.class, Names.named(DataSourceTaskIdHolder.DATA_SOURCE_BINDING)))
.toInstance(dataSource);
binder.bind(Key.get(String.class, Names.named(DataSourceTaskIdHolder.TASK_ID_BINDING)))
.toInstance(taskId);
})
);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
Initialization.makeInjectorWithModules
should be avoided because it has been deprecated.

Copilot Autofix

AI 3 months ago

The goal is to avoid using the deprecated Initialization.makeInjectorWithModules method. This method is a Druid utility that previously created a Guice Injector using a base injector and additional modules. In recent Druid releases, its use is discouraged, with the typical replacement being the use of Guice's own APIs (namely, Guice.createInjector or Guice.createChildInjector).

The best fix is to replace the deprecated call with a direct use of Guice's createInjector method, which receives a list of modules (in this case, the module lambda currently provided in the list argument). If the intent here is to use more than one module, just pass them as arguments; if only one, just use that module directly.

Changes required:

  • Edit line(s) 109-122 to directly call Guice.createInjector, in place of Initialization.makeInjectorWithModules.
  • Add an import for com.google.inject.Guice if it isn't already imported.

Only the code snippet shown in the question may be changed, so only changes in this file and import area are permitted.


Suggested changeset 1
server/src/test/java/org/apache/druid/server/metrics/GroupByStatsMonitorTest.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/server/src/test/java/org/apache/druid/server/metrics/GroupByStatsMonitorTest.java b/server/src/test/java/org/apache/druid/server/metrics/GroupByStatsMonitorTest.java
--- a/server/src/test/java/org/apache/druid/server/metrics/GroupByStatsMonitorTest.java
+++ b/server/src/test/java/org/apache/druid/server/metrics/GroupByStatsMonitorTest.java
@@ -22,6 +22,7 @@
 import com.google.inject.Injector;
 import com.google.inject.Key;
 import com.google.inject.name.Names;
+import com.google.inject.Guice;
 import org.apache.druid.collections.BlockingPool;
 import org.apache.druid.collections.DefaultBlockingPool;
 import org.apache.druid.guice.GuiceInjectors;
@@ -106,9 +107,8 @@
   {
     final String dataSource = "fooDs";
     final String taskId = "taskId1";
-    final Injector injector = Initialization.makeInjectorWithModules(
-        GuiceInjectors.makeStartupInjector(),
-        List.of(binder -> {
+    final Injector injector = Guice.createInjector(
+        binder -> {
           JsonConfigProvider.bindInstance(
               binder,
               Key.get(DruidNode.class, Self.class),
@@ -118,7 +118,7 @@
                 .toInstance(dataSource);
           binder.bind(Key.get(String.class, Names.named(DataSourceTaskIdHolder.TASK_ID_BINDING)))
                 .toInstance(taskId);
-        })
+        }
     );
     final DataSourceTaskIdHolder dsTaskIdHolder = new DataSourceTaskIdHolder();
     injector.injectMembers(dsTaskIdHolder);
EOF
@@ -22,6 +22,7 @@
import com.google.inject.Injector;
import com.google.inject.Key;
import com.google.inject.name.Names;
import com.google.inject.Guice;
import org.apache.druid.collections.BlockingPool;
import org.apache.druid.collections.DefaultBlockingPool;
import org.apache.druid.guice.GuiceInjectors;
@@ -106,9 +107,8 @@
{
final String dataSource = "fooDs";
final String taskId = "taskId1";
final Injector injector = Initialization.makeInjectorWithModules(
GuiceInjectors.makeStartupInjector(),
List.of(binder -> {
final Injector injector = Guice.createInjector(
binder -> {
JsonConfigProvider.bindInstance(
binder,
Key.get(DruidNode.class, Self.class),
@@ -118,7 +118,7 @@
.toInstance(dataSource);
binder.bind(Key.get(String.class, Names.named(DataSourceTaskIdHolder.TASK_ID_BINDING)))
.toInstance(taskId);
})
}
);
final DataSourceTaskIdHolder dsTaskIdHolder = new DataSourceTaskIdHolder();
injector.injectMembers(dsTaskIdHolder);
Copilot is powered by AI and may make mistakes. Always verify output.
@abhishekrb19 abhishekrb19 changed the title Add dataSource and task id dimensions to GroupByStatsMonitor for peons Add dataSource and taskId dimensions to GroupByStatsMonitor for peons Nov 1, 2025
private final Map<String, String[]> dimensions;

public JettyMonitor(String dataSource, String taskId)
public JettyMonitor(@Nullable String dataSource, @Nullable String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JettyMonitor is used only for peons. Why should this be a nullable field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t the JettyMonitor applicable to all servers, not just peons? It’s bound through the JettyServerModule here.

Copy link
Contributor

@kfaraz kfaraz Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course! Thanks for the clarification.
I don't know why I assumed this class was being used only by the peons. I have myself depended on the jetty/* metrics several times. 😅 .
I think it must have been the presence of the DatasourceTaskIdHolder in the call site which put me in the peon mindset. 😛


Since other services are going to use it too, let's add a zero-arg constructor to this class too., please accept the dimensions map in the constructor instead of an explicit datasource and taskID, since these two fields are very specific to peons.

@Inject(optional = true)
BroadcastDatasourceLoadingSpec broadcastDatasourceLoadingSpec = BroadcastDatasourceLoadingSpec.ALL;

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For peons, the CliPeon would provide the values of taskId and dataSource.

For all other servers, this would be null as these fields are initialized to null::

  @Named(DATA_SOURCE_BINDING)
  @Inject(optional = true)
  String dataSource = null;
  @Named(TASK_ID_BINDING)
  @Inject(optional = true)
  String taskId = null;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short javadoc on these getters mentioning that these are non-null only for CliPeon servers.

In a follow up PR, we should also fix up the way that DatasourceTaskIdHolder is wired and bind that class itself only for peons. There is no meaning in having this instance bound but contain null values for datasource and task ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc!

In a follow up PR, we should also fix up the way that DatasourceTaskIdHolder is wired and bind that class itself only for peons. There is no meaning in having this instance bound but contain null values for datasource and task ID.

Yeah, there's some commentary on why it's setup the way it is. I think we'll need to revisit this Guice bindings for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given that we have embedded tests now, it should be easy to fix as we can iterate much faster.

@Inject(optional = true)
BroadcastDatasourceLoadingSpec broadcastDatasourceLoadingSpec = BroadcastDatasourceLoadingSpec.ALL;

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short javadoc on these getters mentioning that these are non-null only for CliPeon servers.

In a follow up PR, we should also fix up the way that DatasourceTaskIdHolder is wired and bind that class itself only for peons. There is no meaning in having this instance bound but contain null values for datasource and task ID.

GroupByStatsProvider groupByStatsProvider,
@Merging BlockingPool<ByteBuffer> mergeBufferPool
@Merging BlockingPool<ByteBuffer> mergeBufferPool,
DataSourceTaskIdHolder dataSourceTaskIdHolder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass a dimensions map here instead of this holder since the holder class has meaning only for peons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this monitor is injected, I don’t think it can accept a Map<String, String[]> in the constructor unless it’s bound with a specific name or qualifier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can leave that for now. When we fix up the binding of DatasourceTaskIdHolder, we can figure that out.

{
private final GroupByStatsProvider groupByStatsProvider;
private final BlockingPool<ByteBuffer> mergeBufferPool;
private final Map<String, String[]> dimensions;
Copy link
Contributor

@kfaraz kfaraz Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to serviceDimensions?

In fact, I wonder if it wouldn't be cleaner to just bind the datasource and taskID as @ExtraServiceDimensions so that they are used in EmitterModule and are emitted for all metrics emitted by a peon.

Copy link
Contributor Author

@abhishekrb19 abhishekrb19 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename, but I left this field as dimensions to be consistent with the other monitors.

I wonder if it wouldn't be cleaner to just bind the datasource and taskID as @ExtraServiceDimensions so that they are used in EmitterModule and are emitted for all metrics emitted by a peon.

This is a good idea! I will try to take a stab at it separately and put up a PR as it affects all monitors - we'll need to rework the Guice bindings and likely need to reconcile how the extra service dimensions will be bound to monitors in a backwards-compatible manner too (id vs taskId is something to consider).

Copy link
Contributor

@kfaraz kfaraz Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename, but I left this field as dimensions to be consistent with the other monitors.

I see, thanks for the clarification.

@abhishekrb19 abhishekrb19 force-pushed the group_by_stats_datasource_taskId branch from ced14cc to e9d29bd Compare November 4, 2025 01:55
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some final suggestions.

|`groupBy/spilledQueries`|Number of groupBy queries that have spilled onto the disk.|This metric is only available if the `GroupByStatsMonitor` module is included.|Varies|
|`groupBy/spilledBytes`|Number of bytes spilled on the disk by the groupBy queries.|This metric is only available if the `GroupByStatsMonitor` module is included.|Varies|
|`groupBy/mergeDictionarySize`|Size of on-heap merge dictionary in bytes.|This metric is only available if the `GroupByStatsMonitor` module is included.|Varies|
|`mergeBuffer/pendingRequests`|Number of requests waiting to acquire a batch of buffers from the merge buffer pool.|`dataSource`, `taskId`. This metric is only available if the `GroupByStatsMonitor` module is included.|Should be ideally 0, though a higher number isn't representative of a problem.|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`mergeBuffer/pendingRequests`|Number of requests waiting to acquire a batch of buffers from the merge buffer pool.|`dataSource`, `taskId`. This metric is only available if the `GroupByStatsMonitor` module is included.|Should be ideally 0, though a higher number isn't representative of a problem.|
|`mergeBuffer/pendingRequests`|Number of requests waiting to acquire a batch of buffers from the merge buffer pool. This metric is only available if the `GroupByStatsMonitor` module is included.| `dataSource`, `taskId`|Should be ideally 0, though a higher number isn't representative of a problem.|

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same change might be needed in other lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

'}';
}

public static Map<String, String[]> mapOfDatasourceAndTaskID(final String datasource, final String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new method, let's do the following instead:

  • use the old method itself to emit the task ID under both the id dimension and taskId dimension
  • call out the id dimension as deprecated in the docs and release notes and remove it in D37
  • update the docs for the existing metrics to indicate that they emit taskId dimension too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

call out the id dimension as deprecated in the docs and release notes and remove it in D37

It looks like the docs for the six monitors that use this method don’t specify their dimensions clearly. I’ve added a general info note under the Real-time metrics seciton. Please let me know if I missed anything.

Comment on lines +123 to +124
Monitors on peons that previously emitted the `id` dimension from `JettyMonitor`, `OshiSysMonitor`, `JvmMonitor`, `JvmCpuMonitor`, `JvmThreadsMonitor` and `SysMonitor`
to represent the task ID are deprecated and will be removed in a future release. Use the `taskId` dimension instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Monitors on peons that previously emitted the `id` dimension from `JettyMonitor`, `OshiSysMonitor`, `JvmMonitor`, `JvmCpuMonitor`, `JvmThreadsMonitor` and `SysMonitor`
to represent the task ID are deprecated and will be removed in a future release. Use the `taskId` dimension instead.
The dimension `id` emitted by peons to represent the task ID (by monitors such as `JettyMonitor`, `OshiSysMonitor`, `JvmMonitor`, `JvmCpuMonitor`, `JvmThreadsMonitor` and `SysMonitor`)
is now deprecated and will be removed in a future release. Use the `taskId` dimension instead.

@abhishekrb19
Copy link
Contributor Author

Thanks for the review, @kfaraz! I will incorporate the doc suggestion in some other change.

@abhishekrb19 abhishekrb19 merged commit 8acdf17 into master Nov 6, 2025
91 of 92 checks passed
@abhishekrb19 abhishekrb19 deleted the group_by_stats_datasource_taskId branch November 6, 2025 06:59
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants