Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stored fields #1653

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

mayya-sharipova
Copy link
Contributor

Store schema separately for stored fields

Currently, stored fields for a document are stored in the following
format:
MetadataForField1, DataForField1 | MetadataForField2, DataForField2 ....

This patch changes the format to:
MetadataField1, MetadataField2...|DataField1, DataField2, ...

As metadata are combined together, we hope that this will improve
compression.

Co-authored-by: Colin Goodheart-Smithe colings86@users.noreply.github.com

mayya-sharipova and others added 2 commits June 29, 2020 16:19
Currently, stored fields for a document are stored in the following
format:
MetadataForField1, DataForField1 | MetadataForField2, DataForField2 ....

This patch changes the format to:
MetadataField1, MetadataField2...|DataField1, DataField2, ...

As metadata are combined together, we hope that this will improve
compression.

Co-authored-by: Colin Goodheart-Smithe <colings86@users.noreply.github.com>
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Jul 6, 2020

Benchmarking

Index size

We have 3 stored fields:

  1. id
  2. body
  3. date

CompressionMode = Fast

Executing command: find . -name '*.fdt' -exec du -ch {} +

Dataset Size of *.fdt files trunk Size of *.fdt files patch
wikimedium 1M 606M 606M
wikimedium 10M 5.5G 5.5G
wikimediumall 33M 16G 16G

The patch doesn't seem to affect the size of the index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant