Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed ClassCastException during Multi-Stage Queries on real-time segments #16607

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nozjkoitop
Copy link
Contributor

@nozjkoitop nozjkoitop commented Jun 14, 2024

Fixed issue when msq queries throw ClassCastException with "includeSegmentSource": "REALTIME" and supervisor running.

Caused by: java.lang.ClassCastException: class [B cannot be cast to class org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder ([B is in module java.base of loader 'bootstrap'; org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolder is in unnamed module of loader java.net.URLClassLoader @38792286)
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchHolderObjectStrategy.toBytes(HllSketchHolderObjectStrategy.java:31)
	at org.apache.druid.segment.serde.ComplexMetricSerde.toBytes(ComplexMetricSerde.java:119)
	at org.apache.druid.frame.field.ComplexFieldWriter.writeTo(ComplexFieldWriter.java:65)
	at org.apache.druid.frame.write.RowBasedFrameWriter.writeDataUsingFieldWriters(RowBasedFrameWriter.java:291)
	at org.apache.druid.frame.write.RowBasedFrameWriter.writeData(RowBasedFrameWriter.java:246)
	at org.apache.druid.frame.write.RowBasedFrameWriter.addSelection(RowBasedFrameWriter.java:122)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.populateFrameWriterAndFlushIfNeeded(ScanQueryFrameProcessor.java:348)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.populateFrameWriterAndFlushIfNeededWithExceptionHandling(ScanQueryFrameProcessor.java:329)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runWithLoadedSegment(ScanQueryFrameProcessor.java:231)
	at org.apache.druid.msq.querykit.BaseLeafFrameProcessor.runIncrementally(BaseLeafFrameProcessor.java:87)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runIncrementally(ScanQueryFrameProcessor.java:158)
	at org.apache.druid.frame.processor.FrameProcessors$1FrameProcessorWithBaggage.runIncrementally(FrameProcessors.java:75)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:230)
	... 8 more

After some investigation was found that selector's getObject() for real-time segments was returning a byte array representation of HllSketch but still was trying to convert it to a byte[]. While HllSketchHolderObjectStrategy expects to have HllSketchHolder instance in toBytes() leading to the ClassCastException.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…hat include columns of HLLSketch type resulted in a ClassCastException.
Comment on lines 118 to 122
if (val == null) {
return ByteArrays.EMPTY_ARRAY;
} else if (val instanceof byte[]) {
return (byte[]) val;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect because there might exist a serde where the deserialized value is byte[], however, the serialized value can be something else.

For example, different compression techniques can transform a deserialized byte[] array to a different serialized byte[] array. This would be unique to the complex type, so it shouldn't be a universal change. Instead, the following changes can be made:

  1. If this only occurs in the HllSketchBuild serde, you can make this change specific to the HllSketchBuild serde.
  2. Else, if this occurs with all the complex types while running MSQ tasks on complex metrics, you can make the changes in MSQ-specific classes (after confirming where the byte[] array is coming from).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! It looks like this issue might not be limited to just the HLLSketch. None of the ObjectStrategies seem to fit this scenario, as they all expect specific objects for the toBytes() method. Using it with a byte array directly would lead to the CCE. I'll look into why selector.getObject is returning a byte array for a real-time segment maybe it could be fixed there as well. It is not appears in MSQ classes directly tho, it shows up at ComplexFieldWriter.writeTo()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np! Since the issue is bubbling up from the column value selectors, you can check who creates those, and why they are returning a byte array instead of the desired object type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some research and it looks like it's normal for selector.getObject to return a byte[] for real-time segments. This happens because it uses RowBasedColumnValueSelector instead of ObjectColumnSelector. It doesn't go through GenericIndexed.copyBufferAndGet() and strategy.fromByteBuffer(), and just grabs the byte[] directly from the ByteBuffer using rowSupplier.get() (at least in case of ComplexType from RealTime segment MSQ).

Right now, we don't have an ObjectStrategy that needs a byte[] as input, so initial setup would likely cause CCE failures. But you're right, we might need to handle this in the future. So, I've moved this check to the ObjectStrategy as a defaul void, making it easy to override if needed.

@@ -116,7 +116,7 @@
public byte[] toBytes(@Nullable Object val)
{
if (val != null) {
byte[] bytes = getObjectStrategy().toBytes(val);
byte[] bytes = getObjectStrategy().objectToBytes(val);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ComplexMetricSerde.getObjectStrategy
should be avoided because it has been deprecated.
@LakshSingla
Copy link
Contributor

Thanks for the changes @nozjkoitop. I'll review them shortly.

@nozjkoitop
Copy link
Contributor Author

Thanks for the changes @nozjkoitop. I'll review them shortly.

@LakshSingla Could I remind you to have a look on that please?

@LakshSingla
Copy link
Contributor

Sorry for the delay @nozjkoitop and thanks for the update 🚀
Can you please add a test in the MSQSelectTest or MSQInsertTest to validate that the PR fails before with the exception you are seeing and passes after this patch? This should be easy to reproduce. LMK if you need help with that.

The new approach is a lot cleaner than the previous one, but I still wanna see if there's a way to restrict the change to MSQ, since I suspect that there's a faulty selector at play. The MSQ test case would help in debugging this as well.

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 12, 2024
@nozjkoitop
Copy link
Contributor Author

Can you please add a test in the MSQSelectTest or MSQInsertTest to validate that the PR fails before with the exception you are seeing and passes after this patch? This should be easy to reproduce.

I've added new test-case with hyperUnique column in select from realtime segment, please have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Segment Format and Ser/De
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants