Skip to content

[Parquet] Stop converting TSDB block to parquet if it has too many labels #7195

@yeya24

Description

@yeya24

Is your feature request related to a problem? Please describe.
Parquet is a columnar format. However, the parquet library we are using has a upper bound of number of columns in a single parquet file. Exceeding the limit will cause the library to panic.

We have some workaround to shard the parquet file when converting the block if it identifies the number of columns will exceed the upper bound. However, if the TSDB block has too many labels (corresponding to parquet column), then it will create a lot of shards, which cause unnecessary complexity and cost when converting the block.

The typical parquet column limit is 32767 while I saw TSDB blocks with 2M distinct labels. This will result in 60 shards which seems too much.

Describe the solution you'd like
If converter finds the TSDB block has too many labels and it exceeds a configured threshold, upload a no convert marker for the TSDB block. Next time, converter will skip converting this block if the no convert marker exists.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions