Skip to content

Pinot fails to build a segment with a no-dictionary MV column #10130

@kirkrodrigues

Description

@kirkrodrigues

When streaming data into a no-dictionary MV column, the segment fails to be built with the following exception:

2023/01/14 03:28:20.982 ERROR [LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] [bug__0__0__20230114T0826Z] Could not build segment 
java.nio.BufferOverflowException: null
    at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:409) ~[?:?]
    at java.nio.ByteBuffer.put(ByteBuffer.java:914) ~[?:?]
    at org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter.putBytes(VarByteChunkSVForwardIndexWriter.java:118) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba
    at org.apache.pinot.segment.local.segment.creator.impl.fwd.MultiValueFixedByteRawIndexCreator.putIntMV(MultiValueFixedByteRawIndexCreator.java:119) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca0
    at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:677) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074a
    at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:240) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba0
    at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:110) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074af1d4d492b92
    at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:903) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475b
    at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:814) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
    at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:713) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
    at java.lang.Thread.run(Thread.java:829) [?:?]
2023/01/14 03:28:21.003 ERROR [LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] [bug__0__0__20230114T0826Z] Could not build segment for bug__0__0__20230114T0826Z

Glancing at the code, it seems like MutableNoDictionaryColStatistics::getMaxNumberOfMultiValues returns 0, whereas it should probably return _dataSource.getDataSourceMetadata().getMaxNumValuesPerMVEntry(), similar to MutableColStatistics::getMaxNumberOfMultiValues? (I may be totally wrong though, didn't look at all the code.)

Version

ca86ef

Environment

  • OpenJDK 11.0.16
  • Ubuntu 18.04

Reproduction steps

  • Add the following schema to Pinot:

    {
      "schemaName": "bug",
      "dimensionFieldSpecs": [
        {
          "name": "integers",
          "dataType": "INT",
          "singleValueField": false
        }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "timestamp",
          "dataType": "LONG",
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:MILLISECONDS"
        }
      ]
    }
  • Add the following table to Pinot:

    {
      "tableName": "bug",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "timestamp",
        "timeType": "MILLISECONDS",
        "schemaName": "bug",
        "replicasPerPartition": "1"
      },
      "tenants": {},
      "tableIndexConfig": {
        "noDictionaryColumns": [
          "integers"
        ],
        "loadMode": "MMAP",
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "lowlevel",
          "stream.kafka.topic.name": "bug-topic",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.broker.list": "localhost:9876",
          "realtime.segment.flush.threshold.time": "3600000",
          "realtime.segment.flush.threshold.rows": "500000",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
        }
      },
      "metadata": {
        "customConfigs": {}
      }
    }
  • Ingest over a segment's worth of JSON records (500K+) containing the field integers into the table.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions