-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Description
When streaming data into a no-dictionary MV column, the segment fails to be built with the following exception:
2023/01/14 03:28:20.982 ERROR [LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] [bug__0__0__20230114T0826Z] Could not build segment
java.nio.BufferOverflowException: null
at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:409) ~[?:?]
at java.nio.ByteBuffer.put(ByteBuffer.java:914) ~[?:?]
at org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter.putBytes(VarByteChunkSVForwardIndexWriter.java:118) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba
at org.apache.pinot.segment.local.segment.creator.impl.fwd.MultiValueFixedByteRawIndexCreator.putIntMV(MultiValueFixedByteRawIndexCreator.java:119) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca0
at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:677) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074a
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:240) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba0
at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:110) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074af1d4d492b92
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:903) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475b
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:814) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:713) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
at java.lang.Thread.run(Thread.java:829) [?:?]
2023/01/14 03:28:21.003 ERROR [LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] [bug__0__0__20230114T0826Z] Could not build segment for bug__0__0__20230114T0826Z
Glancing at the code, it seems like MutableNoDictionaryColStatistics::getMaxNumberOfMultiValues returns 0, whereas it should probably return _dataSource.getDataSourceMetadata().getMaxNumValuesPerMVEntry(), similar to MutableColStatistics::getMaxNumberOfMultiValues? (I may be totally wrong though, didn't look at all the code.)
Version
ca86ef
Environment
- OpenJDK 11.0.16
- Ubuntu 18.04
Reproduction steps
-
Add the following schema to Pinot:
{ "schemaName": "bug", "dimensionFieldSpecs": [ { "name": "integers", "dataType": "INT", "singleValueField": false } ], "dateTimeFieldSpecs": [ { "name": "timestamp", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] } -
Add the following table to Pinot:
{ "tableName": "bug", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "timestamp", "timeType": "MILLISECONDS", "schemaName": "bug", "replicasPerPartition": "1" }, "tenants": {}, "tableIndexConfig": { "noDictionaryColumns": [ "integers" ], "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "bug-topic", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "localhost:9876", "realtime.segment.flush.threshold.time": "3600000", "realtime.segment.flush.threshold.rows": "500000", "stream.kafka.consumer.prop.auto.offset.reset": "smallest" } }, "metadata": { "customConfigs": {} } } -
Ingest over a segment's worth of JSON records (500K+) containing the field
integersinto the table.