Skip to content

JAMES-3623 Provide a (multi-DC firendly) Distributed POP3 Application#570

Merged
chibenwa merged 14 commits intoapache:masterfrom
chibenwa:contrib-distr-pop3
Sep 9, 2021
Merged

JAMES-3623 Provide a (multi-DC firendly) Distributed POP3 Application#570
chibenwa merged 14 commits intoapache:masterfrom
chibenwa:contrib-distr-pop3

Conversation

@chibenwa
Copy link
Copy Markdown
Contributor

@chibenwa chibenwa commented Aug 3, 2021

https://www.mail-archive.com/server-dev@james.apache.org/msg70682.html

This server diverges from the distributed server by only using MessageId backed by a TimeUUID as a message identifier,
thus is multi-datacenter friendly, however comes with a reduced feature set (only supports SMTP protocol).

@chibenwa
Copy link
Copy Markdown
Contributor Author

chibenwa commented Aug 7, 2021

Why

James POP3 implementation is backed by the IMAP UID, a monotic counter. Cassandra implementation uses LightWeight transactions to back a compare and swap.

This implementation have expensive run time costs, especially in a multi data center setup (LWTs requires 4 round trips accross replica even for reads).

How

We should contribute to apache/james-project an alternative implementation of the POP3 server not
leveraging UIDs but instead using messageIds, enabling safe, easy to configure multi-datacenter POP3 setup for the Distributed Server.

Use a dedicated view in order to list the messageIds within a mailbox and the size of the messages and use
the messageIdManager to retrieve the given emails.

Offer a configuration option to choose between the classic 'uid based' implementation or the 'messageId based' implementation via the mechanism of module-choosing.

Consequences

By implementing this we will have more options in the face of bad Lightweight Transaction performances:

  • Use of -Dcassandra.unsafe.disable-serial-reads-linearizability=true option with Cassandra would be acceptable for POP3 workload but it would lead to data loss in IMAP.
  • More aggressivley the tailor made server could include a UID/ModSeq allocation compare and swap mechanism not being backed by LWTs. Trivialy exposed to data races, it
    simulates in the absence of concurrency a correct behaviour without the costs of LightWeight transactions and could timely be setted up. It would lead to data loss in IMAP.

Also POP3 workload would not need a failover script to increment UIDs upon failover. However not playing it timely would result in data loss with IMAP.

All places currently relying on IMAP UID as an identifier will not be able to be relied upon. Impact studies shows that, for some tasks like mail re-indexing will be impacted (as they are backed by IMAP UID), which is likely non critical for a pure POP3 usage.

Tests

The following changeset had been tested with Thunderbird (for the Cassandra set up)

We had been contributing extensive integration tests for the POP3 servers.

@chibenwa chibenwa force-pushed the contrib-distr-pop3 branch 3 times, most recently from 41564e3 to fd63714 Compare August 20, 2021 11:01
@chibenwa chibenwa force-pushed the contrib-distr-pop3 branch from fd63714 to cebf8f3 Compare August 23, 2021 08:53
@chibenwa
Copy link
Copy Markdown
Contributor Author

chibenwa commented Sep 7, 2021

Error Message

Passed in key must select exactly one node (found 0): handlerchain

Stacktrace

org.apache.commons.configuration2.ex.ConfigurationRuntimeException: Passed in key must select exactly one node (found 0): handlerchain

DEPRECATION NOTICE of adoptopenjdk:

This image is officially deprecated in favor of the eclipse-temurin image,
and will receive no further updates after 2021-08-01 (Aug 01, 2021). Please
adjust your usage accordingly.
@chibenwa chibenwa merged commit 9433590 into apache:master Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants