Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? #12989

mikemccand · 2024-01-02T18:28:42Z

Description

At Amazon product search we use taxonomy facets for the facet filtering we show customers, but this causes high RAM pressure on every refresh as the current implementation allocates a new massive int[] on each refresh, requiring ~2X the transient RAM usage until the old taxonomy reader is fully closed / dereferenced, causing us to over-size our heaps just to handle this short RAM surge.

Yet, on each refresh, all that really is happening is a few new ints might be effectively appended to the end of the old int[] -- it is inherently a write once append only data structure. Reallocating the full massive int[] every time is silly.

I think we could switch to a paged int[] structure? This way the new reader could share nearly all of the old int[] pages, and only make a new last page to hold the few new append ints?

The text was updated successfully, but these errors were encountered:

msfroh · 2024-01-05T01:15:28Z

If nobody else is working on this, I think I'd like to take it!

stefanvodita · 2024-01-05T06:04:57Z

@msfroh - I was looking into this as well and had some thoughts about how to do it.

We could replace ParallelTaxonomyArrays with a new interface that offers three operations for each of the arrays:

interface ChunkedParallelTaxonomyArrays {

  /* Record new entry. */
  public void appendParent(int parent);

  /* Retrieve this ordinal's parent. From the user's perspective, this is like an array look-up. */
  public int getParent(int ord);

  /* There are some places where we need to know how many parents exist in total. */
  public int sizeParents();

  // Same for children and siblings
  ...
}

To implement this inteface, we could use an IntBlockPool. We would allocate new int buffers in the block pool as needed and preserve the block pool across DirectoryTaxonomyReader refreshes.

There are definitely some disadvantages with the block pool idea:

We're preserving a mutable data-structure across taxonomy refreshes. There is precedent though, with the caches in DirectoryTaxonomyReader.
We would be slightly overallocating by having the last buffer in the pool not be completely used, but I think this is a good trade-off to take for the increased efficiency and simplicity.

What do you think, did you have something else in mind?

msfroh · 2024-01-05T20:34:50Z

What do you think, did you have something else in mind?

Oh -- I didn't have anything in mind. I just saw the issue and thought, "Hey, I could figure out how to do that!" Sounds like you've got it in hand, though!

stefanvodita · 2024-01-05T22:27:21Z

I'd be happy to work together on it! If we go the route I was proposing, there's a non-trivial amount of work to do:

Create the new interface for taxonomy arrays and use it with taxonomy facets and in the taxo reader.
Augment the IntBlockPool with conveience methods that support this new use-case and implement the new taxo array interface using the IntBlockPool.

1 and 2 can be done independently, so we could each take one of those work streams. I'll start on it in the next few days, but feel free to jump in if you get the chance.

msfroh · 2024-01-05T23:57:56Z

I took a look and I think we might be able to do it a little easier:

public abstract class ParallelTaxonomyArrays {
  public class ChunkedArray {
    private final int chunkSize;
    private final int[][] chunks;

    public ChunkedArray(int chunkSize, int[][] chunks) {
      this.chunkSize = chunkSize;
      this.chunks = chunks;
    }

    public int get(int i) {
      int chunkNum = i / chunkSize;
      return chunks[chunkNum][i - (chunkNum * chunkSize)];
    }

    public int length() {
      return chunkSize * chunks.length;
    }
  }

  /** Sole constructor. */
  public ParallelTaxonomyArrays() {}

  public abstract ChunkedArray parents();
  public abstract ChunkedArray children();
  public abstract ChunkedArray siblings();
}

Then within TaxonomyIndexArrays, we can focus on building int[][] instances that get wrapped in ChunkedArray. I feel like IntBlockPool might be overkill?

msfroh · 2024-01-06T02:25:35Z

I ended up running with that idea (sort of) and implemented this: #12995

The unit tests pass, but I don't think any of them allocate more than 8192 ordinals (the size of chunk that I set).

stefanvodita · 2024-01-06T09:35:11Z

Thanks @msfroh! The PR looks neat and you might be right that, while IntBlockPool basically maintains an int[][] like our ChunkedArray, it is a bit inconveient to work with.
I left more detailed comments on the PR, but the high-level question is if we've actually reduced the memory footprint during taxonomy refreshes. It's very possible I'm missing something, but right now it looks to me like we haven't improved on that front. Doing shallow copies of the old array without allocating new memory would solve it though.

msfroh · 2024-01-07T23:46:16Z

It's very possible I'm missing something, but right now it looks to me like we haven't improved on that front. Doing shallow copies of the old array without allocating new memory would solve it though.

What you've missed is that I'm a big dum-dum 😁

Thanks for catching that! I refactored some code into a shared method (between the "reuse old arrays" case and the "start fresh with a TaxonomyReader" case) and foolishly applied the "start fresh" logic every time. I've fixed it in a subsequent commit (allocating chunks only starting from the index of the last chunk of the old array).

I also incorporated several of the other changes that you suggested. Thanks a lot!

mikemccand added the type:enhancement label Jan 2, 2024

msfroh mentioned this issue Jan 6, 2024

Split taxonomy arrays across chunks #12995

Merged

stefanvodita closed this as completed in #12995 Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? #12989

Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? #12989

mikemccand commented Jan 2, 2024

msfroh commented Jan 5, 2024

stefanvodita commented Jan 5, 2024

msfroh commented Jan 5, 2024

stefanvodita commented Jan 5, 2024

msfroh commented Jan 5, 2024

msfroh commented Jan 6, 2024

stefanvodita commented Jan 6, 2024

msfroh commented Jan 7, 2024

Taxonomy facets: can we change massive int[] for parent/child/sibling tree to paged/block int[] to reduce RAM pressure? #12989

Taxonomy facets: can we change massive int[] for parent/child/sibling tree to paged/block int[] to reduce RAM pressure? #12989

Comments

mikemccand commented Jan 2, 2024

Description

msfroh commented Jan 5, 2024

stefanvodita commented Jan 5, 2024

msfroh commented Jan 5, 2024

stefanvodita commented Jan 5, 2024

msfroh commented Jan 5, 2024

msfroh commented Jan 6, 2024

stefanvodita commented Jan 6, 2024

msfroh commented Jan 7, 2024

Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? #12989

Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? #12989