LUCENE-9996: Reduce RAM usage of DWPT for a single document. #184

jpountz · 2021-06-15T08:11:31Z

With this change, doc-value terms dictionaries use a shared ByteBlockPool
across all fields, and points, binary doc values and doc-value ordinals use
slightly smaller page sizes.

With this change, doc-value terms dictionaries use a shared `ByteBlockPool` across all fields, and points, binary doc values and doc-value ordinals use slightly smaller page sizes.

jpountz · 2021-06-15T08:12:57Z

On this simple test, memory usage for a single doc in the DWPT goes from 9.6MB to 1.4MB.

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.IntPoint;
import org.apache.lucene.document.SortedNumericDocValuesField;
import org.apache.lucene.document.SortedSetDocValuesField;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.ByteBuffersDirectory;
import org.apache.lucene.util.BytesRef;

public class IWBuffer {

  public static void main(String[] args) throws Exception {
    try (ByteBuffersDirectory dir = new ByteBuffersDirectory();
        IndexWriter w = new IndexWriter(dir, new IndexWriterConfig().setRAMBufferSizeMB(1000))) {
      Document doc = new Document();
      for (int i = 0; i < 100; ++i) {
        String keywordFieldName = "keyword" + i;
        doc.add(new StringField(keywordFieldName, "Lucene", Store.YES));
        doc.add(new SortedSetDocValuesField(keywordFieldName, new BytesRef("Lucene")));

        String numericFieldName = "numeric" + i;
        doc.add(new IntPoint(numericFieldName, 4));
        doc.add(new SortedNumericDocValuesField(numericFieldName, 4));
        doc.add(new StoredField(numericFieldName, 4));
      }
      w.addDocument(doc);
      System.out.println(w.ramBytesUsed() / 1024. / 1024.);
    }
  }

}

dnhatn

LGTM.

) With this change, doc-value terms dictionaries use a shared `ByteBlockPool` across all fields, and points, binary doc values and doc-value ordinals use slightly smaller page sizes.

LUCENE-9996: Reduce RAM usage of DWPT for a single document.

63a534b

With this change, doc-value terms dictionaries use a shared `ByteBlockPool` across all fields, and points, binary doc values and doc-value ordinals use slightly smaller page sizes.

rmuir approved these changes Jun 16, 2021

View reviewed changes

dnhatn approved these changes Jun 17, 2021

View reviewed changes

jpountz added 2 commits June 18, 2021 09:15

Merge branch 'main' into lucene9996

235d095

iter

e5065a1

jpountz merged commit 1365156 into apache:main Jun 18, 2021

jpountz deleted the lucene9996 branch June 18, 2021 07:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-9996: Reduce RAM usage of DWPT for a single document. #184

LUCENE-9996: Reduce RAM usage of DWPT for a single document. #184

jpountz commented Jun 15, 2021

jpountz commented Jun 15, 2021 •

edited

dnhatn left a comment

LUCENE-9996: Reduce RAM usage of DWPT for a single document. #184

LUCENE-9996: Reduce RAM usage of DWPT for a single document. #184

Conversation

jpountz commented Jun 15, 2021

jpountz commented Jun 15, 2021 • edited

dnhatn left a comment

Choose a reason for hiding this comment

jpountz commented Jun 15, 2021 •

edited