Skip to content

Ability to ingest pre-computed hyperUnique value #3864

@DaimonPl

Description

@DaimonPl

We've got druid data ingestion use case in which pre-computed HLL indexes can be stored in approximately 6*10^4 rows but preparing same data in raw format and ingesting it in druid requires preparation of 10^12 rows.

Because of this we've switched to thetaSketch aggregator which allows loading of pre-computer theta sketch.

However in our use case we only need count distinct feature so theta sketch itself adds too much overhead (it's much more space consuming than hyperUnique).

It would be great if:

  • druid would allow ingesting of pre-computed hyperUnique (like thetaSketch does)
  • druid would expose public API which allows to compute and merge hyperUnique values so it can be precomputed in other systems (for example in Hive using UDF)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions