You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue serves to request support for segment compaction on real-time upsert-enabled tables which currently does not exist as mentioned in a slack thread. This means that segments with old & stale entries are keep in disk and only deleted when the retention policy for segments is activated.
Giving a concrete example why this is useful:
Suppose you have have a stream of events related to user activity (updated profile, saw an article, updated preferences, etc...)
Defined a real-time table in pinot where the primary key is the userId. Segment size is 500k and the stream is partitioned.
The set of users is roughly fixed (~50M).
You want to keep segments for a largeish time period (> 2 years).
Each day ~20% (10M) of the users generate some event which is consumed by Pinot.
This will generate ~20 segments per day, over the course of 2 years we will have 14600 segments when in reality we need only 100 segments (the most up-to-date information for each user).
If the example or issue is not clear feel free to reach out.
Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
This issue serves to request support for segment compaction on real-time upsert-enabled tables which currently does not exist as mentioned in a slack thread. This means that segments with old & stale entries are keep in disk and only deleted when the retention policy for segments is activated.
Giving a concrete example why this is useful:
This will generate ~20 segments per day, over the course of 2 years we will have 14600 segments when in reality we need only 100 segments (the most up-to-date information for each user).
If the example or issue is not clear feel free to reach out.
Thank you.
The text was updated successfully, but these errors were encountered: