Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

politeiawww: Rate limit email. #1398

Closed
lukebp opened this issue Apr 14, 2021 · 6 comments · Fixed by #1448
Closed

politeiawww: Rate limit email. #1398

lukebp opened this issue Apr 14, 2021 · 6 comments · Fixed by #1448
Assignees

Comments

@lukebp
Copy link
Member

lukebp commented Apr 14, 2021

Add a configurable email limit for user accounts. The limit should default to 100 emails per user per day. It should be possible to specify a different value on startup if the server operator wants to.

A potential issue with this is going to be the performance overhead of updating a large batch of users in the database everytime a email notification is sent. This needs to be done in a scalable way.

@lukebp lukebp added this to the v1.1.0 milestone Apr 14, 2021
@LasTshaMAN
Copy link
Contributor

LasTshaMAN commented May 10, 2021

Hi, I would leave some thoughts here (disscuss a possible implementation).

  • from the description above I infer that you want to store some kind of Counter for each User in the DB ?

I believe rate limiting can be eventually consistent, its probably not a big deal if some user from time to time performs 110 or even 210 requests per day instead of 100.
So, there are possible ways to optimize it:

  1. store some map[userID]SentCount in memory and simply check against that

if that's not enough (e.g. we want this "rate limiting data" to be stored between server restarts), we can make it persistent:

  1. have a go-routine to periodically dump "rate limiting data" in some DB (for performance reasons or implementation ease we might not even want to store this data in users table, maybe not even in SQL-based DB)

  2. Upon server startup restore rate limiting data from persistent storage into process memory.

also, while implementing this (better even, before implementing this), in order to be able to test rate limiting (as well as cover new email-sending functionality with unit-tests in the future - looks like there are no tests covering that at all at the moment) and not increase the complexity of politeiawww struct I would suggest extracting all email-related functionality (these are all the funcs that use politeiawww.mail object) into a separate email package under politeiawww).

If nobody is working on it, I could try to start working on this to get myself familiar with the project.

@LasTshaMAN
Copy link
Contributor

Also, maybe for everyone to be on the same page, can you provide some description for "why we need this rate limiting in the first place ?", some user stories that require addressing this issue.

@lukebp
Copy link
Member Author

lukebp commented May 10, 2021

why we need this rate limiting in the first place

There are user actions that trigger email notifications that could be abused in order to get the email server classified as spam. For example, a malicious user could run a bot that continually resets the user's password, each one triggering a new notification email. It makes more sense to rate limit the number of daily emails that can be sent rather than rate limiting the individual user actions.

I believe rate limiting can be eventually consistent

Agreed.

  1. store some map[userID]SentCount in memory and simply check against that

We're in the process of removing all memory caches so that politeiawww instances can be scaled out. We're going to want to persist this email rate limit data.

  1. have a go-routine to periodically dump "rate limiting data" in some DB (for performance reasons or implementation ease we might not even want to store this data in users table, maybe not even in SQL-based DB)

This option is what we're going to want to do.

Email notifications are setup at the application level (pi, cms). Examle: pi notifications. The notifications are already being sent in a separate go routine, which I think alleviates any performance concerns that I originally had.

Thinking through this a little more, here's how I would probably impelement it:

  • Make the email rate limit a config setting so that the sysadmin can adjust it as they see fit.
  • Add a method onto mail.Client that handles the rate limiting so that the application packages like pi don't have to worry about it.
  • Add a field onto the user.User that stores the unix timetamps of the most recent email notifications. The size of the list would be set by the email rate limit config setting. The user cannot recieve more than the max allowed during any single 24 hour period.
  • If a user has not reached the limit, send the email and update the user in the database.
  • If the user hits the limit, send them an email notification letting them know that they won't be receiving any more notifications emails until xxx, where xxx is when the rate limit will expire. You would likely need another user.User field that marks the timestamp of this notification email so that it doesn't get sent repeatedly.

This is just a rough outline off the top of my head. I'm open to other ideas if you think there is a better way to do it or if you encounter potential issues during the implementation.

The existing user database implemenation and a lot of the user layer is going to be rewritten as part of the upcoming pi proposal roadmap. A lot of the old code in politeiawww is pretty rough. This will be pulled apart and cleaned up in the near future. Just keep this in mind as you're implementing this.

If nobody is working on it, I could try to start working on this to get myself familiar with the project.

@LasTshaMAN great! Please do.

@LasTshaMAN
Copy link
Contributor

LasTshaMAN commented May 11, 2021

Thanks for elaborating.

We're in the process of removing all memory caches so that politeiawww instances can be scaled out.

Nice to know, got it!

Overall the proposed solution looks good to me (striking proper balance between implementation complexity & performance).
I've dug through the code a bit, and I have the following concerns.

mail.Client doesn't have a concept of user, it does one thing (provides a simple interface over goemail.SMTP with some additional logic) and it does it well.
It doesn't seem appropriate to me to overload mail.Client with rate-limiting based on the data in user.User struct.
Rate-limiting logic should probably by a func (call it HitNotificationRateLimit() for now) on user.User struct similar to User.NotificationIsEnabled:

// NotificationIsEnabled returns whether the user has the provided notification
// bit enabled. This function will always return false if the user has been
// deactivated.
func (u *User) NotificationIsEnabled(ntfnBit uint64) bool {
	if u.Deactivated {
		return false
	}
	return u.EmailNotifications&ntfnBit != 0
}

so that both NotificationIsEnabled() and HitNotificationRateLimit() will be called at the same place in the code (although HitNotificationRateLimit will be called from more places).

There are some places in the code, e.g. here, where we don't readily have user.User object at our disposal, although it seems that we can pull the user entry out of DB by his email (I assume email is unique across all the users, or we need to handle that also) and call user.HitNotificationRateLimit() to check.

The approach I've described above (lets call it approach 1) assumes that we store "email history" with the user.User struct.

Alternatively I can think of approach 2, based on extending mail package. Instead of modifying mail.Client I would add another entity in this package (call it Limiter for now) that wraps around mail.Client. mail.Limiter will delegate SendTo calls to mail.Client having checked that configured rate limit hasn't been violated for a particular recepient. It will also track each SendTo call and persist this info to DB.
It doesn't make sense to use user.User DB table here in my opinion, instead I would use a separate DB table.

Let me know whether any of these 2 approaches seem OK to you, so I can start some implementation.

@lukebp
Copy link
Member Author

lukebp commented May 11, 2021

One thing to keep in mind is that most of this code will be refactored in the near future as part of the user layer work outlined in the pi proposal. A plugin architecture is going to be applied to the user layer that is similar to the plugin architecture that was applied to politeiad in the v1.0.0 release. Email notifications are going to abstracted into an optional user plugin. All of this functionality will be moved into the email notifications plugin.

so that both NotificationIsEnabled() and HitNotificationRateLimit() will be called at the same place

Right now the event handlers iterate through every user in the db and manually check their notification prefence. This is not scalable. The user layer rewrite will allow the db to be queried by notification setting so the NotificationIsEnabled() code that you currently see in the handlers will not be required.

HitNotificationRateLimit() is still a good idea though. We want to build the rate limiting feature so that the basic structure will remain the same when we migrate to the email notification user plugin. The way you should be thinking about this is:

  1. A event is emitted and picked up by an event handler.
  2. The event handler queries the user db for users that have registered to receive a notification for the event. The event handler will have access to the list of user.User objects that have this notification preference set.
  3. The email notification is sent.

The event handler code will live in the application packages. The only application package that currently exist is the pi package, which is the application package for the decred proposal system. Future application packages will include one for cms and one for the reddit like forum that we will be building.

The reason I wanted to include the rate limit logic in the mail.Client is to prevent code duplication. If you put the rate limit logic in the event handlers then you'll need to reimplement it in every event handler for every application. If you put it in the mail.Client then you only need to implement it once and it will be applied to all existing and future applications.

I agree though that it doesn't really belong there. I like your Limiter idea. Maybe we should create a entirely separate package for this in order to keep a clean separation of concerns. This new package would be refactored into the email notification plugin when the time comes.

There are some places in the code, e.g. here, where we don't readily have user.User

Don't worry about the CMS code for now. Its legacy code that will be rewritten in the near future once the user plugin architecture has been implement.

You can use the pi package as a reference instead. This is how applictions like cms will be setup in the future.

It doesn't make sense to use user.User DB table here in my opinion, instead I would use a separate DB table.

I think if the user db interface had transaction capabilities it would be fine, but it doesn't so I agree. Lets use a separate table to prevent races on the user record.

To summarize, let's go with your Limiter idea, but break it out into a new package so that it is easy to refactor once we implement user plugins. This implementation will prevent duplicate code while still keeping a clean separation of concerns. You agree?

@LasTshaMAN
Copy link
Contributor

Okay, sounds good, I'll drop a PR maybe in 2-3 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants