Skip to content

[cleanup] Remove unnecessary backward-compat check in DataTable serialization#18002

Merged
xiangfu0 merged 3 commits intoapache:masterfrom
xiangfu0:cleanup-datatable-v4-compat
Mar 27, 2026
Merged

[cleanup] Remove unnecessary backward-compat check in DataTable serialization#18002
xiangfu0 merged 3 commits intoapache:masterfrom
xiangfu0:cleanup-datatable-v4-compat

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

Summary

  • Remove conditional backward-compat guard on CPU time / memory measurement serialization in DataTableImplV4
  • The inline comment explicitly noted: "The check is not needed. We can remove it. But keeping it around for backward compatibility."
  • All current server versions support thread resource usage reporting
  • Updated DataTableSerDeTest to reflect the new always-populated behavior

Test plan

  • ./mvnw spotless:apply -pl pinot-common passes
  • ./mvnw checkstyle:check -pl pinot-common passes
  • ./mvnw license:check -pl pinot-common passes

🤖 Generated with Claude Code

Pinot Cleanup Agent and others added 2 commits March 27, 2026 02:43
The conditional check on CPU time and memory measurement during DataTable
serialization was kept solely for backward compatibility per the inline
comment. All current server versions support thread resource usage
reporting, making this guard unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep strong assertions (> 0) for the enabled-measurement case.
Use assertNotNull for the disabled case where values are always
populated but may be zero. Improve comment accuracy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.30%. Comparing base (ba0c0e0) to head (3a3b342).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18002      +/-   ##
============================================
- Coverage     63.36%   63.30%   -0.07%     
  Complexity     1543     1543              
============================================
  Files          3200     3200              
  Lines        194114   194074      -40     
  Branches      29893    29883      -10     
============================================
- Hits         122993   122849     -144     
- Misses        61461    61587     +126     
+ Partials       9660     9638      -22     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.28% <ø> (+<0.01%) ⬆️
java-21 63.25% <ø> (-0.06%) ⬇️
temurin 63.30% <ø> (-0.07%) ⬇️
unittests 63.29% <ø> (-0.07%) ⬇️
unittests1 55.53% <ø> (+<0.01%) ⬆️
unittests2 34.21% <ø> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added the cleanup Code cleanup or removal of dead code label Mar 27, 2026
@xiangfu0 xiangfu0 changed the title Remove unnecessary backward-compat check in DataTable serialization [cleanup] Remove unnecessary backward-compat check in DataTable serialization Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the conditional “backward-compat” guard so DataTable V4 serialization always includes response serialization CPU/memory metadata, and updates tests to align with always-populated metadata behavior.

Changes:

  • Always write RESPONSE_SER_CPU_TIME_NS and RESPONSE_SER_MEM_ALLOCATED_BYTES metadata in DataTableImplV4#toBytes()
  • Update DataTableSerDeTest expectations for response-serialization metadata presence when measurement is disabled
  • Adjust inline test comments to reflect the new behavior

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java Updates assertions/comments to reflect metadata that is now always populated.
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java Removes conditional checks and always serializes response serialization CPU/memory metadata.

Comment on lines 283 to 286
// When measurement is enabled, response serialization metadata should have positive values.
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.THREAD_CPU_TIME_NS.getName()));
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.SYSTEM_ACTIVITIES_CPU_TIME_NS.getName()));
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.THREAD_MEM_ALLOCATED_BYTES.getName()));
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated comment says response serialization metadata should have positive values when measurement is enabled, but the test does not assert anything about RESPONSE_SER_CPU_TIME_NS / RESPONSE_SER_MEM_ALLOCATED_BYTES in this enabled branch. Add assertions that these keys are present and validate the values (e.g., parse as long and assert > 0) or adjust the comment to match what the test actually verifies.

Copilot uses AI. Check for mistakes.
Comment on lines +297 to +302
// Response serialization metadata is always populated, but may be zero when measurement is disabled.
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.THREAD_CPU_TIME_NS.getName()));
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.SYSTEM_ACTIVITIES_CPU_TIME_NS.getName()));
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.RESPONSE_SER_CPU_TIME_NS.getName()));
Assert.assertNotNull(newDataTable.getMetadata().get(MetadataKey.RESPONSE_SER_CPU_TIME_NS.getName()));
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.THREAD_MEM_ALLOCATED_BYTES.getName()));
Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.RESPONSE_SER_MEM_ALLOCATED_BYTES.getName()));
Assert.assertNotNull(newDataTable.getMetadata().get(MetadataKey.RESPONSE_SER_MEM_ALLOCATED_BYTES.getName()));
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the intent is that these fields are always populated but might be zero when measurement is disabled, asserting only NotNull is relatively weak. Consider also validating that the values are numeric (parseable as long) and match the expected disabled semantics (e.g., == 0 if guaranteed, or >= 0 if not). This makes the test more robust and ensures the serialization contract is enforced.

Copilot uses AI. Check for mistakes.
Comment on lines +424 to +428
// Add table serialization time and memory metadata.
getMetadata().put(MetadataKey.RESPONSE_SER_CPU_TIME_NS.getName(),
String.valueOf(resourceSnapshot.getCpuTimeNs()));
getMetadata().put(MetadataKey.RESPONSE_SER_MEM_ALLOCATED_BYTES.getName(),
String.valueOf(resourceSnapshot.getAllocatedBytes()));
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that response serialization metadata is always emitted, the comment should document the expected values when measurement is disabled/unavailable (e.g., zero-filled). Adding that note here (or on the corresponding MetadataKey docs) helps downstream consumers understand whether to treat 0 differently from missing values, especially since this PR changes the old absence-based behavior.

Copilot uses AI. Check for mistakes.
getMetadata().put(MetadataKey.RESPONSE_SER_MEM_ALLOCATED_BYTES.getName(),
String.valueOf(resourceSnapshot.getAllocatedBytes()));
}
// Add table serialization time and memory metadata.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should keep the current logic and revise the comment. When CPU time/memory usage not collectable, we shouldn't return their value

Per reviewer feedback, the conditional check on CPU time and memory
measurement should be kept — when these are not collectable, we should
not return their values. Reverted the production code change and updated
the comment to accurately describe the intended behavior instead of the
old misleading TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR title and description

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

pinot-core/src/test/java/org/apache/pinot/core/common/datatable/DataTableSerDeTest.java:287

  • The metadata keys RESPONSE_SER_CPU_TIME_NS / RESPONSE_SER_MEM_ALLOCATED_BYTES are defined as LONGs, but the assertions below use Integer.parseInt, which can overflow and make this test flaky on slower/large allocations. Prefer parsing as long (Long.parseLong) and comparing using long values.
    // When ThreadCpuTimeMeasurement is enabled, responseSerializationCpuTimeNs should be positive.
    Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.THREAD_CPU_TIME_NS.getName()));
    Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.SYSTEM_ACTIVITIES_CPU_TIME_NS.getName()));
    Assert.assertNull(newDataTable.getMetadata().get(MetadataKey.THREAD_MEM_ALLOCATED_BYTES.getName()));
    Assert.assertTrue(

Comment on lines +425 to 429
// Add table serialization time and memory metadata when the corresponding measurement is enabled.
// When CPU time/memory usage is not collectable, we omit these values from the metadata.
if (ThreadResourceUsageProvider.isThreadCpuTimeMeasurementEnabled()) {
getMetadata().put(MetadataKey.RESPONSE_SER_CPU_TIME_NS.getName(),
String.valueOf(resourceSnapshot.getCpuTimeNs()));
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change only updates the comment, but the code still conditionally writes response serialization CPU/memory metadata based on ThreadResourceUsageProvider.isThread*MeasurementEnabled(). The PR title/description says the backward-compat guard was removed and the values are now always populated—either the implementation needs to be updated to match that behavior or the PR description/title should be corrected.

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 merged commit 72755a4 into apache:master Mar 27, 2026
20 checks passed
@xiangfu0 xiangfu0 deleted the cleanup-datatable-v4-compat branch March 27, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Code cleanup or removal of dead code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants