New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADR] 53. Email rate limiting #833
Changes from 9 commits
75d01f1
d1c4508
b5df0db
4c0bd22
1be7366
277bb9e
2bf6b97
42e20ab
a267bf3
21df4b2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# 50. Integrate James with OIDC | ||
# 51. Integrate James with OIDC | ||
|
||
Date: 2021-12-06 | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# 53. Email rate limiting | ||
|
||
Date: 2021-01-10 | ||
|
||
## Status | ||
|
||
Accepted (lazy consensus). | ||
|
||
Not yet implemented. | ||
|
||
## Context | ||
|
||
Rate limiting is one of the common features expected from an email system. Examples: SaaS is | ||
one https://www.fastmail.help/hc/en-us/articles/1500000277382-Account-limits#sending/receiving | ||
, https://support.google.com/mail/answer/22839 | ||
|
||
They limit how many emails users can send/receive from/to each email account over a given period of time. | ||
We believe the rate-limiting will help James to have more benefits: | ||
|
||
- Control of the resources | ||
- Easy to configure dynamically the user policy. | ||
- Complements the storage quota | ||
- Can be a security requirement for SaaS deployments. | ||
- Minimise impacts of Open-relay types of compression. | ||
- Limiting the amount of emails sent to third parties can also prevent them from considering you as an open relay and can | ||
be beneficial as well. | ||
|
||
## Decision | ||
|
||
Set up a new maven project dedicated to rating limiting. This allows the rate limiting mailets to be embedded in a James | ||
server as a soft dependency using the external-jar loading mechanism. Please note that this will take the form of an | ||
extension jar, that could be dropped in one's James installation, and thus is optional, and not a runtime dependency. | ||
|
||
Rate limiting will be enabled per sender, per receiver and globally. For each we will provide options for email size and | ||
email count. | ||
|
||
- This can be done via mailets: | ||
- PerSenderRateLimit is per sender | ||
- PerRecipientRateLimit is per recipient | ||
- GlobalRateLimit is for everyone. Depending on the position in the pipeline this could allow rate limiting all emails in | ||
transit, relayed emails or locally delivered emails. | ||
The rate limit will be configured | ||
in [mailetcontainer.xml](/server/apps/distributed-app/sample-configuration/mailetcontainer.xml). | ||
|
||
- Those mailets will be based on a generic RateLimiter interface. We will propose two implementations for it: | ||
- In memory (guava based) suitable for single instance deployments | ||
- [Redis](https://redis.io) based, suitable for distributed deployments. | ||
|
||
The implementation chosen will be configurable as part of mailet configuration. One would be able to configure the | ||
implementation he wishes to use. | ||
|
||
- We will document such a setup, and provide a mailetcontainer sample file. | ||
|
||
## Consequences | ||
|
||
- When having a change in the rate limit configuration, we need to restart James. | ||
- Only protocols allowing to submit emails make sense here so SMTP and JMAP. | ||
- It is more than acceptable to lose all redis data, which is equivalent to resetting the rate limiting. | ||
Arsnael marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
vttranlina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Alternatives | ||
|
||
Alternatives implementation of the rate limiter can be proposed, and used within the aforementionned mailet. | ||
|
||
For instance one could rely on Cassandra counters and Cassandra time series (thus not needing additional dependencies) however we fear the potential performance impact doing so. Streaming based options, that aggregate in memory counters, might be a viable option too. | ||
|
||
## References | ||
|
||
- [JIRA](https://issues.apache.org/jira/browse/JAMES-3693) | ||
- [Redis driver](https://github.com/lettuce-io/lettuce-core#reactive-api) | ||
- [Guava rate limiter](https://guava.dev/releases/19.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I read the Guava Rate limiter docs and I do not think its API support rate-limiting by other time units than Second directly. Should we consider other libraries? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Quan! This is a very good comment!
Yes this could be problematic as saying "I allow you 100 mails per hour" is not the same as "I allow you one mail every 36 seconds". Also the Guava rate limiter is blocking which is not really fancy.
We should ALWAYS look for alternatives!
This is of course very interesting! Yet there is one key point to me: We need a James interface in plain Java (or Scala) fully abstracting away the RateLimitJ that would then just be an implementation detail (for memory AND redis implementations). Also the concept of "batching rules together" is interesting. We should definitly add this concept to our rate limiter API as well (
) @quantranhong1999 can you propose changes to this ADR following your remarks? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, you propose an external dependency. Can I ask you to do a quick assessment regarding the health of that community? -> License ? (Ultimately the question is (and yes, we should do this assessment every time we add a dependency) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Apache License 2.0
The latest release is May 2021.
The latest code update is 8 months ago and none recently.
I see some issues without answers recently.
I do not see anything but Github.
There are 10 contributors but mostly Craig Baker is the main author. It has 413 stars so I think it is good enough. IMO it may be good enough for us in near future but it is a risk for the long future because it likely has 1 main contributor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't Cassandra good enough? or you really want non-persistent data storage to have better performance? What about using Pulsar or Rabbit to propagate counters and use an in-memory datastructure on each node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
More point, with Redis
I don't sure about Pulsar (I don't use it before).
But with RabbitMQ, IMO, It does not guarantee the order of messages -> inconsistency counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cassandra counters are terrible. Actually I want to try alternatives.
Yes here the tradeoffs of loosing state IMO is outwaited by better performance.
Others are free not to use it and come up with imaginative solutions to implement this.
@vttranlina that's nonetheless interesting remarks, we could have an
alternative
section explaining that 1. we do not use Cassandra and its counters for performance reasons nor a Cassandra time serie (because it seems complex) and want to avoid the hammer-nail syndrom.We can then explain that streaming based systems could be used - but might have implications in term of scalability and implementation complexity. Of course our choice should not be exclusive of alternative solutions. We can show that by splitting the rate limiting maven plugin in two: one for the RateLimiter API and the mailet tooling, one for the Redis implementation....
Would this sound fair to you @mbaechler ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Anyway, I was just curious about the Redis choice, I have a very bad experience performance-wise with Redis.
Cassandra is probably not better for that.
And I think that an in-memory solution fits the need for the reason you described, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do I.
Honnestly if we get a bad experience too we will try data-grids out ;-)