-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taxonomy facets: can we change massive int[]
for parent/child/sibling tree to paged/block int[]
to reduce RAM pressure?
#12989
Comments
If nobody else is working on this, I think I'd like to take it! |
@msfroh - I was looking into this as well and had some thoughts about how to do it. We could replace
To implement this inteface, we could use an There are definitely some disadvantages with the block pool idea:
What do you think, did you have something else in mind? |
Oh -- I didn't have anything in mind. I just saw the issue and thought, "Hey, I could figure out how to do that!" Sounds like you've got it in hand, though! |
I'd be happy to work together on it! If we go the route I was proposing, there's a non-trivial amount of work to do:
1 and 2 can be done independently, so we could each take one of those work streams. I'll start on it in the next few days, but feel free to jump in if you get the chance. |
I took a look and I think we might be able to do it a little easier:
Then within |
I ended up running with that idea (sort of) and implemented this: #12995 The unit tests pass, but I don't think any of them allocate more than 8192 ordinals (the size of chunk that I set). |
Thanks @msfroh! The PR looks neat and you might be right that, while |
What you've missed is that I'm a big dum-dum 😁 Thanks for catching that! I refactored some code into a shared method (between the "reuse old arrays" case and the "start fresh with a TaxonomyReader" case) and foolishly applied the "start fresh" logic every time. I've fixed it in a subsequent commit (allocating chunks only starting from the index of the last chunk of the old array). I also incorporated several of the other changes that you suggested. Thanks a lot! |
Description
At Amazon product search we use taxonomy facets for the facet filtering we show customers, but this causes high RAM pressure on every refresh as the current implementation allocates a new massive
int[]
on each refresh, requiring ~2X the transient RAM usage until the old taxonomy reader is fully closed / dereferenced, causing us to over-size our heaps just to handle this short RAM surge.Yet, on each refresh, all that really is happening is a few new ints might be effectively appended to the end of the old
int[]
-- it is inherently a write once append only data structure. Reallocating the full massiveint[]
every time is silly.I think we could switch to a paged
int[]
structure? This way the new reader could share nearly all of the oldint[]
pages, and only make a new last page to hold the few new append ints?The text was updated successfully, but these errors were encountered: