-
Notifications
You must be signed in to change notification settings - Fork 230
Research ability to support very high slices without allocating for intermediate slices #593
Comments
I tried digging into this and #591 but there's several issues I've run into. First, we can get around the initial However, that still leaves us with 5,739,353,673,740 slices to process. We don't track which slices have data AFAIK so we'd have to check each one. One option would be reducing the number of slices by increasing the slice width. We're currently at 1,048,576 so we'd have to increase it by quite a bit. But even if we increased slice width to The only option I can think of that would make any sense would be to provide a translation table for @travisturner What do you think? |
@benbjohnson being able to do that translation (for rows as well as columns actually) would be extremely valuable. This is definitely something we've discussed (at length), and basically haven't been confident enough to do. There are a lot of finicky bits(pun?) in the distributed case, and the translation needs to be fast in both directions to support ingestion and queries. For this particular issue, I was wondering if we could do something similar to the way that we track which containers are in use in roaring with the slice of "keys" (see TLDR; if you think doing that translation layer is feasible, I'd love to hear more. |
@benbjohnson A translation table like you suggested sounds great, and we've talked about something like that for a while, but we have yet to come up with a solution that isn't too complex. There are several concerns, a couple of which are:
If you have an idea for how we might approach this in a reasonable way it would be great. Alternatively, would it be easier for us to simply keep track of which slices contain data? Would that help with this particular problem? |
If we limited it to 32 byte strings then we're looking at 32GB per billion keys. That would be ~950 slices. Are we expecting higher cardinality than that? Keeping the whole translation table on each node is the fastest since it wouldn't have to do an extra hop to translate. However, syncing 32GB on each node wouldn't be fun. Sharding the translation table would mean that each query would have to fan out to its owner first before querying. That seems like it's mostly an issue on ingest since read queries would be aggregations and generally wouldn't use specific keys, right? Another option would be to have a translation service in front the pilosa cluster. It could be a small strongly consistent cluster and its only job would be to add new translation keys, rewrite queries to the cluster, and rewrite results from the cluster (if there are keys in the results). It could live independently of the primary cluster and only be required if translation is used. |
closing this since it seems to be covered by #1008 |
Description
In #591 we see that Pilosa can't handle extremely high column values, even when many intermediate columns aren't used.
Success criteria (What criteria will consider this ticket closeable?)
Either we've changed the way that Pilosa tracks slices to allow for this kind of behavior, or we've decided to explicitly not support it, documented it, and returned reasonable errors to clients who try to do it (which #591 should account for).
The text was updated successfully, but these errors were encountered: