Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable segments read/published stats on Compaction task completion reports #15947

Conversation

adithyachakilam
Copy link
Contributor

@adithyachakilam adithyachakilam commented Feb 22, 2024

Description

This adds visibility into number of segments read/published by each parallel compaction task.

This PR adds new subclass for IngestionStatsAndErrorsTaskReportData to take new fields and ParallelIndexSupervisorTask to populate the segments read/published in different parallel compaction algos.


Release note

Parallel compaction task completion reports now have segmentsRead and segmentsPublished fields to see how effective a compaction task is.


Key changed/added classes in this PR
  • ParallelIndexSupervisorTask.java
  • IngestionStatsAndErrorsTaskReportData.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@adithyachakilam adithyachakilam marked this pull request as draft February 22, 2024 22:51
@adithyachakilam adithyachakilam force-pushed the adithyachakilam/enable-segment-stats-for-compaction branch from d1794dc to 1a8de2a Compare February 26, 2024 20:17
@adithyachakilam adithyachakilam force-pushed the adithyachakilam/enable-segment-stats-for-compaction branch from 1a8de2a to 3a945ce Compare February 26, 2024 21:31
@adithyachakilam adithyachakilam marked this pull request as ready for review February 26, 2024 21:36
@adithyachakilam adithyachakilam changed the title [WIP] Enable segments read/published stats on Compaction task completion reports Enable segments read/published stats on Compaction task completion reports Feb 27, 2024
Copy link
Contributor

@YongGang YongGang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…druid/indexing/common/task/batch/parallel/GeneratedPartitionsReport.java
docs/ingestion/tasks.md Outdated Show resolved Hide resolved
Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-02-29 at 11 31 57 AM

I tried these scenarios:

  • index_parallel with no additional subtasks (wikipedia) - the task report has segmentsRead: null and segmentsPublished: null in the report
  • index_parallel with more than 1 subtask (kttm1) - the task report has segmentsRead: null and segmentsPublished: null in the sub task and segmentsRead: 0 and semgentsPublished: 1 in the index_parallel task
  • Issued a compaction task with no subtasks (kttm1) - the task report has segmentsRead: null and segmentsPublished: null
  • issued a compaction task with sub tasks (wikipedia) - no reports for the sub tasks, but the compactTask has segmentsRead: 1 segmentsPublished: 1
  • index_kafka on kttm - report has segmentsRead: null segmentsPublished: null

Since this change is meant to be scoped to just compaction tasks, I am -1 in it's current implementation because there are many different task reports than have the fields segmentsRead and segmentsPublished added to it returning null and index_parallel tasks that have a report that says it read 0 segments (while true, that's a little confusing).

If these fields did not show up in the task reports where they are not meant to, I think the change would be good.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better! Only other thing I noticed is that index_parallel jobs with no subtasks that used to have task reports before this change no longer show a task report after this change. Can you look into that issue please

@adithyachakilam
Copy link
Contributor Author

@suneet-s, the missing report you pointed out is actually related to change: #15981 I'll make up a follow up PR to address it.

@adithyachakilam
Copy link
Contributor Author

Here is the fix: #16042

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the new fields, @adithyachakilam . I have left some comments that would be more in line with the existing report class structure.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @adithyachakilam ! Requested minor changes, otherwise looks good.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing the comments.

@kfaraz kfaraz merged commit 564c44e into apache:master Mar 7, 2024
83 checks passed
@adithyachakilam adithyachakilam deleted the adithyachakilam/enable-segment-stats-for-compaction branch March 14, 2024 23:46
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants