-
Notifications
You must be signed in to change notification settings - Fork 849
Description
Is your feature request related to a problem? Please describe.
Parquet is a columnar format. However, the parquet library we are using has a upper bound of number of columns in a single parquet file. Exceeding the limit will cause the library to panic.
We have some workaround to shard the parquet file when converting the block if it identifies the number of columns will exceed the upper bound. However, if the TSDB block has too many labels (corresponding to parquet column), then it will create a lot of shards, which cause unnecessary complexity and cost when converting the block.
The typical parquet column limit is 32767 while I saw TSDB blocks with 2M distinct labels. This will result in 60 shards which seems too much.
Describe the solution you'd like
If converter finds the TSDB block has too many labels and it exceeds a configured threshold, upload a no convert marker for the TSDB block. Next time, converter will skip converting this block if the no convert marker exists.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.