Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Research ability to support very high slices without allocating for intermediate slices #593

Closed
jaffee opened this issue May 31, 2017 · 5 comments
Assignees

Comments

@jaffee
Copy link
Member

jaffee commented May 31, 2017

Description

In #591 we see that Pilosa can't handle extremely high column values, even when many intermediate columns aren't used.

Success criteria (What criteria will consider this ticket closeable?)

Either we've changed the way that Pilosa tracks slices to allow for this kind of behavior, or we've decided to explicitly not support it, documented it, and returned reasonable errors to clients who try to do it (which #591 should account for).

@benbjohnson benbjohnson self-assigned this Oct 31, 2017
@benbjohnson
Copy link
Contributor

I tried digging into this and #591 but there's several issues I've run into.

First, we can get around the initial inverseSlices panic by changing slices []uint64 to a generator (e.g. slices func() uint64) that can stream all slice numbers instead of building a slice of slice numbers.

However, that still leaves us with 5,739,353,673,740 slices to process. We don't track which slices have data AFAIK so we'd have to check each one.

One option would be reducing the number of slices by increasing the slice width. We're currently at 1,048,576 so we'd have to increase it by quite a bit. But even if we increased slice width to MaxUint32 then we'd still have 1,401,209,393 slices.

The only option I can think of that would make any sense would be to provide a translation table for columnID so that a 6018148517796380732 could be stored as 1 internally. Maybe the translation table could use strings so we could support hexidecimal UUIDs as well. That way we could support the user's value as "6018148517796380732" instead and it's just an opaque string.

@travisturner What do you think?

@jaffee
Copy link
Member Author

jaffee commented Nov 10, 2017

@benbjohnson being able to do that translation (for rows as well as columns actually) would be extremely valuable. This is definitely something we've discussed (at length), and basically haven't been confident enough to do. There are a lot of finicky bits(pun?) in the distributed case, and the translation needs to be fast in both directions to support ingestion and queries.

For this particular issue, I was wondering if we could do something similar to the way that we track which containers are in use in roaring with the slice of "keys" (see Bitmap.insertAt to jog your memory). I can tell, though, that we'll probably have to alter the query execution and fanout logic fairly significantly (or have strong consensus about which slices exist across the cluster). It may be that doing it this way isn't much easier than translating opaque strings.

TLDR; if you think doing that translation layer is feasible, I'd love to hear more.

@travisturner
Copy link
Member

travisturner commented Nov 10, 2017

@benbjohnson A translation table like you suggested sounds great, and we've talked about something like that for a while, but we have yet to come up with a solution that isn't too complex. There are several concerns, a couple of which are:

  • ensuring that the lookup doesn't (too badly) affect performance
  • having the translation table available on all nodes (for example, what happens when two nodes receive a setbit to the same column key?)

If you have an idea for how we might approach this in a reasonable way it would be great.

Alternatively, would it be easier for us to simply keep track of which slices contain data? Would that help with this particular problem?

@benbjohnson
Copy link
Contributor

If we limited it to 32 byte strings then we're looking at 32GB per billion keys. That would be ~950 slices. Are we expecting higher cardinality than that?

Keeping the whole translation table on each node is the fastest since it wouldn't have to do an extra hop to translate. However, syncing 32GB on each node wouldn't be fun.

Sharding the translation table would mean that each query would have to fan out to its owner first before querying. That seems like it's mostly an issue on ingest since read queries would be aggregations and generally wouldn't use specific keys, right?

Another option would be to have a translation service in front the pilosa cluster. It could be a small strongly consistent cluster and its only job would be to add new translation keys, rewrite queries to the cluster, and rewrite results from the cluster (if there are keys in the results). It could live independently of the primary cluster and only be required if translation is used.

@jaffee
Copy link
Member Author

jaffee commented Aug 10, 2018

closing this since it seems to be covered by #1008

@jaffee jaffee closed this as completed Aug 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants