JAMES-2080 Allow turning off header indexing in OpenSearch #1516
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Indicates if you wish to index headers or not (default: true). Note that specific headers (From, To, Cc, Bcc, Subject, Message-Id, Date, Content-Type) are still indexed in their dedicated type. Header indexing is expensive as each header currently need to be stored as a nested document but turning off headers indexing result in non-strict compliance with the IMAP / JMAP standards.
Before
Indexing 5.000 mails takes ~1 minute and occupies 26MB of index (5KB perf mail) which is consistant with our production mertics. The commit log is more bulky and with index occupies 185 MB.
After
Indexing 10.000 takes ~ 76s and occupies 6.9 MB of index (690 B / mails). The commit log is more bulky and with index occupies 62 MB.
Conclusion
This change allows a dramatic space reduction on OpenSearch (cost saving!) of ~ x8 for the tested workload. We also observed a x2 speedup of the indexation process.