apache · chibenwa · Oct 17, 2019 · Oct 17, 2019 · Oct 17, 2019 · Oct 18, 2019
diff --git a/src/adr/0012-jmap-partial-reads.md b/src/adr/0012-jmap-partial-reads.md
@@ -0,0 +1,40 @@
+# 12. Projections for JMAP Messages
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests.
+
+## Context
+
+JMAP core RFC8620 requires that the server responds only properties requested by the client.
+
+James currently computes all of the properties regardless of their cost, and if it had been asked by the client.
+
+Clearly we can save some latencies and resources by avoiding reading/computing expensive properties that had not been explicitly requested by the client.
+
+## Decision
+
+Introduce two new datastructures representing JMAP messages:
+ - One with only metadata
+ - One with metadata + headers
+
+Given the properties requested by the client, the most appropriate message datastructure will be computed, on top of 
+existing message storage APIs that should remain unchanged.
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+GetMessages with a limited set of requested properties will no longer result necessarily in full database message read. We
+thus expect a significant improvement, for instance when only metadata are requested.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+## References
+
+ - /get method: https://tools.ietf.org/html/rfc8620#section-5.1
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2919)
diff --git a/src/adr/0013-precompute-jmap-preview.md b/src/adr/0013-precompute-jmap-preview.md
@@ -0,0 +1,48 @@
+# 13. Precompute JMAP Email preview
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests.
+
+## Context
+
+JMAP messages have a handy preview property displaying the firsts 256 characters of meaningful test of a message.
+
+This property is often displayed for message listing in JMAP clients, thus it is queried a lot.
+
+Currently, to get the preview, James retrieves the full message body, parse it using MIME parsers, removes HTML and keep meaningful text.
+
+## Decision
+
+We should pre-compute message preview.
+
+A MailboxListener will compute the preview and store it in a MessagePreviewStore.
+
+We should have a Cassandra and memory implementation.
+
+When the preview is precomputed then for these messages we can consider the "preview" property as a metadata.
+
+When the preview is not precomputed then we should compute the preview for these messages, and save the result for later.
+
+We should provide a webAdmin task allowing to rebuild the projection. The computing and storing in MessagePreviewStore 
+is idempotent and the task can be run in live without any concurrency problem.
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+We expect a huge performance enhancement for JMAP clients relying on preview for listing mails.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+## References
+
+ - https://jmap.io/server.html#1-emails JMAP client guice states that preview needs to be quick to retrieve
+
+ - Similar decision had been taken at FastMail: https://fastmail.blog/2014/12/15/dec-15-putting-the-fast-in-fastmail-loading-your-mailbox-quickly/
+
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2919)
diff --git a/src/adr/0014-blobstore-storage-policies.md b/src/adr/0014-blobstore-storage-policies.md
@@ -0,0 +1,57 @@
+# 14. Add storage policies for BlobStore
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests, as well as data repartition between Cassandra and object storage shifts.
+
+## Context
+
+James exposes a simple BlobStore API for storing raw data. However such raw data often vary in size and access patterns.
+
+As an example:
+
+ - Mailbox message headers are expected to be small and frequently accessed
+ - Mailbox message body are expected to have sizes ranging from small to big but are unfrequently accessed
+ - DeletedMessageVault message headers are expected to be small and unfrequently accessed
+
+Also, the capabilities of the various implementations of BlobStore have different strengths:
+
+ - CassandraBlobStore is efficient for small blobs and offers low latency. However it is known to be expensive for big blobs. Cassandra storage is expensive.
+ - Object Storage blob store is good at storing big blobs, but it induces higher latencies than Cassandra for small blobs for a cost gain that isn't worth it.
+
+Thus, significant performance and cost ratio refinement could be unlocked by using the right blob store for the right blob.
+
+## Decision
+
+Introduce StoragePolicies at the level of the BlobStore API.
+
+The proposed policies include:
+
+ - SizeBasedStoragePolicy: The blob underlying storage medium will be chosen depending on its size.
+ - LowCostStoragePolicy: The blob is expected to be saved in low cost storage. Access is expected to be unfrequent.
+ - PerformantStoragePolicy: The blob is expected to be saved in performant storage. Access is expected to be frequent.
+
+An HybridBlobStore will replace current UnionBlobStore and will allow to choose between Cassandra and ObjectStorage implementations depending on the policies.
+
+DeletedMessageVault, BlobExport & MailRepository will rely on LowCostStoragePolicy. Other BlobStore users will rely on SizeBasedStoragePolicy.
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+We expect small frequently accessed blobs to be located in Cassandra, allowing ObjectStorage to be used mainly for large costly blobs.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+We expect more data to be stored in Cassandra. We need to quantify this for adoption.
+
+As reads will be reading the two blobStores, no migration is required to use this composite blobstore on top an existing implementation,
+however we will benefits of the performance enhancements only for newly stored blobs.
+
+## References
+
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2921)
diff --git a/src/adr/0015-objectstorage-blobid-list.md b/src/adr/0015-objectstorage-blobid-list.md
@@ -0,0 +1,59 @@
+# 15. Persist BlobIds for avoiding persisting several time the same blobs within ObjectStorage
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests.
+
+## Context
+
+A given mail is often written to the blob store by different components. And mail traffic is heavily duplicated (several recipients receiving similar email, same attachments). This causes a given blob to often be persisted several times.
+
+Cassandra was the first implementation of the blobStore. Cassandra is a heavily write optimized NoSQL database. One can assume writes to be fast on top of Cassandra. Thus we assumed we could always overwrite blobs.
+
+This usage pattern was also adopted for BlobStore on top of ObjectStorage.
+
+However writing in Object storage:
+ - Takes time
+ - Is billed by most cloud providers
+
+Thus choosing a right strategy to avoid writing blob twice is desirable.
+
+However, ObjectStorage (OpenStack Swift) `exist` method was not efficient enough to be a real cost and performance saver.
+
+## Decision
+
+Rely on a StoredBlobIdsList API to know which blob is persisted or not in object storage. Provide a Cassandra implementation of it. 
+Located in blob-api for convenience, this it not a top level API. It is intended to be used by some blobStore implementations
+(here only ObjectStorage). We will provide a CassandraStoredBlobIdsList in blob-cassandra project so that guice products combining
+object storage and Cassandra can define a binding to it. 
+
+ - When saving a blob with precomputed blobId, we can check the existence of the blob in storage, avoiding possibly the expensive "save".
+ - When saving a blob too big to precompute its blobId, once the blob had been streamed using a temporary random blobId, copy operation can be avoided and the temporary blob could be directly removed.
+
+Cassandra is probably faster doing "write every time" rather than "read before write" so we should not use the stored blob projection for it
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+We expect to reduce the amount of writes to the object storage. This is expected to improve:
+ - operational costs on cloud providers
+ - performance improvement
+ - latency reduction under load
+
+As id persistence in StoredBlobIdsList will be done once the blob successfully saved, inconsistencies in StoredBlobIdsList
+will lead to duplicated saved blobs, which is the current behaviour.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+## Reference
+
+Previous optimization proposal using blob existence checks before persist. This work was done using ObjectStorage exist method and was prooven not efficient enough.
+
+https://github.com/linagora/james-project/pull/2011 (V2)
+
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2921)