JAMES-3740 Compact primitive collections for UID <-> MSN mapping #942

chibenwa · 2022-03-30T03:27:22Z

This reduces HEAP memory consumption of this use case by a factor 6.

What is UID <-> MSN mapping ?

In IMAP RFC-3501 there is two ways one addresses a message:

By its UID (Unique ID) that is unique (until UID_VALIDITY changes...)
By its MSN (Message Sequence Number) which is the (mutable) position of a message in the mailbox.

We then need:

Given a UID return its MSN which is for instance compulsory upon EXPUNGED notifications when QRESYNCH is not enabled.
Given a MSN based request we need to convert it back to a UID (rare).

We do store the list of UIDs, sorted, in RAM and perform binarysearches to resolve those.

What is the impact on heap?

Each uid is wrapped in a MessageUID object. This object wrapping comes with an overhead of at least 12 bytes in addition to the 8 bytes payload (long). Quick benchmarks shows it's actually worse: 10 million uids did take up to 275 MB.

    @Test
    void measureHeapUsage() throws InterruptedException {
        int count =10000000;
        testee.addAll(IntStream.range(0, count)
            .mapToObj(i -> MessageUid.of(i + 1))
            .collect(Collectors.toList()));
        Thread.sleep(1000);
        System.out.println("GCing");
        System.gc();
        Thread.sleep(1000);

        System.out.println(ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed());
    }

Now, from let's take a classical production deployment I get:

Some users have up to 2.5 million messages in their INBOX
I can get an average of 100.000 messages for each user

So for a small scale deployment, we are already "consuming" ~300 MB of memory just for the UID <-> mapping.

Scaling to 1.000 users on a single James instance we clearly see that HEAP consumption will start being a problem (~3GB) without even speaking of target of 10.000 users per James I do have in mind.

It's worth mentioning that IMAP being statefull, and UID <-> MSN mapping attached to a selected mailbox, such a mapping is long lived:

Multiple small objects would need to be copied individually by the GC, putting pressure during long gen
Those long lived object will eventually be promoted to old gen, thus the more there is the longer the resulting stop-the-world GC pauses will be.

Temporary fix ?

We can get rid of the object boxing in UidMsnConverter by using primitive type collections for instance provided by fastutils project.

The same bench was down to 84MB.

Also, we could get things more compact by using an INT representation of UIDs. (Those are most of the case below 2 billions, to be above this there need to be more than 2 billion emails transiting through one's mailbox which is highly unlikely). A fallback to "long" storage can be setted up if a UID above 2 billion is observed.

This such a compact int storage we are down to 46MB.

So taking the prior mentioned numbers we could expect a 1.000 people deployment to require ~400 MB and a larger scale 10.000 people deployment on a single James to consume up to 4GB. Not that enjoyable but definitly more manageable.

Please note that primitive collections are more GC friendly as their elements are manages together, as a single object (backing array).

What other mail servers do

I found references to Dovecote, which does a similar algorithm compared to us: binary search on a list of uids. The noticeable difference is that this list of UIDs is held on disk and not in memory as we do.

References: https://doc.dovecot.org/developer_manual/design/indexes/mail_index_api/?highlight=time

Of course, such a solution would be attractive... We could imagine keeping the last 1.000 uids in memory, which would most of the time be the ones used for MSN resolution and locate the rest on-disk, use them only when needed and thus dramatically reduce heap pressure.

Making UidMsnConverter an interface with a backing factory would enable different implementation to co-exist and allow some experimentation ;-)

chibenwa · 2022-03-30T08:37:48Z

https://issues.apache.org/jira/browse/JAMES-3740 it would be good to have also a mapDB based alternative!

chibenwa · 2022-04-08T02:30:53Z

This PR gives an instant boost to memory storage and as such is definitly an improvment.

We would benefit from having customization of UidMsnConverter, and optional implementations of it (memory? MapDB ? Range based?).

The enhancements delivered by this PR could well be backing the default memory implementation.

Hence I propose to merge this.

Arsnael

Sorry missed that PR. Just a small typo comment :)

protocols/imap/src/main/java/org/apache/james/imap/processor/base/UidMsnConverter.java

This reduces HEAP memory consumption of this use case by a factor 3.

This is the nominal use case, in the absent of concurrent message operation on the mailbox. Sorting the resulting list 'in place' allows to avoid usage of tree like structures that enforces a copy and require temporary allocations of entry. This shows: - x2.5 performance enhancements - Dramatically reduces heap pressure upon UidMsnConverter::addAll

chibenwa · 2022-04-19T14:44:00Z

Just a tiny rebase...

chibenwa self-assigned this Mar 30, 2022

chibenwa marked this pull request as draft March 30, 2022 08:37

chibenwa force-pushed the uid-msn-memory branch from c2823f7 to 1d68391 Compare April 1, 2022 08:49

chibenwa marked this pull request as ready for review April 8, 2022 02:29

Arsnael approved these changes Apr 8, 2022

View reviewed changes

protocols/imap/src/main/java/org/apache/james/imap/processor/base/UidMsnConverter.java Outdated Show resolved Hide resolved

Arsnael approved these changes Apr 8, 2022

View reviewed changes

chibenwa mentioned this pull request Apr 13, 2022

Optimise existing UID-MSN mapping (memory) linagora/james-project#4521

Closed

quantranhong1999 approved these changes Apr 13, 2022

View reviewed changes

chibenwa added 3 commits April 19, 2022 12:14

JAMES-3740 Compact primitive collections for UID <-> MSN mapping

98370ce

This reduces HEAP memory consumption of this use case by a factor 3.

JAMES-3740 Fix a typo in UidMsnConverter

662ed2e

chibenwa force-pushed the uid-msn-memory branch from 5fd53bb to 662ed2e Compare April 19, 2022 05:14

chibenwa merged commit a6f6289 into apache:master Apr 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAMES-3740 Compact primitive collections for UID <-> MSN mapping #942

JAMES-3740 Compact primitive collections for UID <-> MSN mapping #942

chibenwa commented Mar 30, 2022

chibenwa commented Mar 30, 2022

chibenwa commented Apr 8, 2022

Arsnael left a comment

chibenwa commented Apr 19, 2022

JAMES-3740 Compact primitive collections for UID <-> MSN mapping #942

JAMES-3740 Compact primitive collections for UID <-> MSN mapping #942

Conversation

chibenwa commented Mar 30, 2022

What is UID <-> MSN mapping ?

What is the impact on heap?

Temporary fix ?

What other mail servers do

chibenwa commented Mar 30, 2022

chibenwa commented Apr 8, 2022

Arsnael left a comment

Choose a reason for hiding this comment

chibenwa commented Apr 19, 2022