Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added RateLimitFilter #607

Merged
merged 3 commits into from
May 6, 2023
Merged

Added RateLimitFilter #607

merged 3 commits into from
May 6, 2023

Conversation

isapir
Copy link
Member

@isapir isapir commented Mar 29, 2023

This patch adds a RateLimitFilter based on the discussion in the Dev mailing list at [1] with more features to be added later

[1] https://lists.apache.org/thread/0gt1kyjs86g9oqxofdgm0zbrb14lzgj6

public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {

String ipAddr = request.getRemoteAddr();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tomcat, which behind a proxy (e.g. nginx), this filter is meaningless. I think we can write common method (check whether x-forwarded-for header exists) that handle that case or allow user to specific a way that how to get really IP (specific header or others).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat agree. I don't recall seeing a Connector attribute (or filter) that will provide the client remoteAddr. Which can also come i handy for CDN's which send the client IP via header too. It feels like that might need to be a new feature.

If x-forwarded-for is used, we need to be wary and document its ability to be insecure too. Such as bad actors sending this header randomized. In which case, documenting that we need to document the assumption that the header is value is trusted.

So by itself String ipAddr = request.getRemoteAddr(); (I think) is OK ... if a way to implement request.getRemoteAddr() can be adjusted accordingly.

The alternative is an option called ipHeader (horrible name, just a suggestion) which is the name of the header to use to get client IP address.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I was seeing this as a niche feature for Tomcat standalone. If there's a proxy, then it should be that component doing any rate limiting. Or did I miss something ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expectation is standalone, then its just a documentation update to clarifying limitations. That should (optimistically) eliminate security practitioners claiming new CVE's

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tomcat already ships with an excellent RemoteIpFilter / RemoteIpValve and this filter does not attempt to replace them, but to work with them. The Filter documentation comes with a WARNING [1]:

if Tomcat is behind a reverse proxy then you must
make sure that the Rate Limit Filter sees the client IP address, so if for
example you are using the Remote IP Filter,
then the filter mapping for the Rate Limit Filter must come after
the mapping of the Remote IP Filter to ensure that each request has its IP
address resolved before the Rate Limit Filter is applied

Did I get that wrong or am I correct that by having the RemoteIpFilter earlier in the Filter Chain then subsequent filters will see the "translated" address when calling request.getRemoteAddr()? cc @rmaucher @ChristopherSchultz @markt-asf

This Filter can work either in a standalone configuration, or behind a reverse proxy as long as the Remote Address is resolved using the methods mentioned above. Even with reverse proxy it is useful to have this filter in place as it adds:

  1. Configuration via web.xml which might be more convenient for those who are more familiar with Servlet containers than with web servers
  2. A lenient mode in a way of enforce=false, allowing the user to configure the filter to only add a Request Attribute with the request count. That allows the application to inspect the value and make a more intelligent decision since it knows more than the web server or the filter.

[1] https://github.com/apache/tomcat/pull/607/files#diff-e79a16176f7d3d059a2bf4c43d5f50e9c107e27a2511379056ceea48b578c3e5R1001

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Filter documentation comes with a WARNING [1]:

Sorry, I didn't look the doc carefully.

Did I get that wrong or am I correct that by having the RemoteIpFilter earlier in the Filter Chain then subsequent filters will see the "translated" address when calling request.getRemoteAddr()

It's ok to use with RemoteIpFilter.

BTW, I think we should also copy WARNING to Javadoc of RateLimitFilter. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great doc ! (of course, nobody reads it)

this.ratio = ratioToPowerOf2(durationMillis);

int cleanupsPerBucketDuration = (durationMillis >= 60_000) ? 6 : 3;
Thread mt = new MaintenanceThread(durationMillis / cleanupsPerBucketDuration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess this is not needed to 1. use a dedicated thread for that which prevents an app to have it, 2. not await it properly (can leak only sleep time but still) so I would use tomcat Server utility ScheduledExecutorService to handle it properly and in destroy just cancel the related task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into that, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the background events from Tomcat would be nice, but this is not a valve. A regular filter in a regular webapp cannot use it.
In EE there is https://jakarta.ee/specifications/concurrency/3.0/ but Tomcat does not provide it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://jakarta.ee/specifications/concurrency/3.0/ does not bring anything to java se except some useless listeners for that case so it is fine to stay away from it but goal is to reuse some banalised pool from tomcat instead of creating multiple leaky active threads.
I'm not sure why this filter wouldn't use it since the filter is an internal of tomcat as the background pool, in terms of classloading there is no strong blocker - and worse case you can still get the scheduled executor service from a servlet context attribute if so which would enable to keep the filter standard and enable tomcat to inject the background pool there if the filter is present but dont think it is needed - and the filter will not be reused in another server so guess it should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for liking that spec ... I like it because it provides a standard way to pass scheduling services to the app. Also I had to reimplement "robust" scheduling (logging and reschedule when something doesn't work). As a result, if given the choice, I would prefer having it than not having it.
Right now, there's no way to pass Tomcat's utility executor to the filter, so this PR should keep the dedicated thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, JNDI is supposed to be used, but it's not very good here (users can disable naming, also still need to add an object factory for that, plus configuration).
Tomcat sets some custom Servlet context attributes in ServletContext.startInternal, this can be used for some scheduling service. I think using a thread for this filter is "ok" until it is removed once/if the feature is added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this compromise: in init store a scheduledexecutorservice (one thread) in the servletcontext if it does not already exists else reuse, will at least enable to chain filters without recreating it, wdyt? acceptable compromise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing Tomcat's utility executor is only doable when the security manager is not enabled. This is now done with a new Servlet context attribute in Tomcat trunk (11) since no security manager at all.
In all other cases, the filter should use a thread as it is doing right now, so there's nothing to fix in the PR in that area.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmaucher hmm, there is hat I mention in my previous comment ie use a scheduled executor service and not an active awaiting impl plus share the executor between the filters in the app at least (if the tomcat's one is too hard to reach, this part is ok) to avoid to consume 3 threads (and related resources) for nothing which can be an issue in some loaded env, in particular when pools are not fixed (ThreadPoolExecutor with max >> core size) or other needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I backported to 10.1 and 9.0, enabled when there's no security manager. This should be good enough and this filter can now use it instead of its dedicated thread (after merging it in is fine by me, this is still a minor improvement overall), falling back to creating an executor if not available.

* a maintenance thread cleans up keys that are prefixed by previous timestamp
* buckets.
*/
public class TimeBucketCounter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be worth using an interface there? This is one impl but not sure it fits all cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considered that one and that's the reason I added it to the utils package instead of an inner class of the RateLimitFilter, but I think that it would make more sense to extract an interface once we have a good idea for other implementations.

The problem with APIs is that once they are published it's very hard to change or get rid of them, so I think that after some time in the wild we will have a better idea of what it can or should be.

public final int increment(String identifier) {
String key = getCurrentBucketPrefix() + "-" + identifier;
AtomicInteger ai = map.computeIfAbsent(key, v -> new AtomicInteger());
return ai.incrementAndGet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth using updateAndGet or accumulate flavors? here you can end up enabling requests if you do a full rotation having enough requests in the configured window

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea that I had was to allow to configure certain paths with a different "weight", rather than configure a different Filter for different path, e.g. while requests to the homepage are counted as 1, a request for a login page might count as 10.

That way, you can configure the Filter for 300 requests per minute, for example, which will allow for 300 requests to most of the site, but only 30 requests to the login page, mitigating brute force attacks that try to guess a password.

If we do that then updateAndGet() might be more applicable, but again, I want to introduce this filter and get some feedback before adding more features to it.

Also, we have another thread on the PR about back porting this feature, and if we decide to go all the way back to Tomcat 8.5 then this would be another one of the things that needs to be rewritten to run on Java 7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can easily be mimic-ed for java 7 but point is right now you can disable the rate limit if you get enough requests. If you take the extreme case for a rate limiter - DDoS - then you can just not reach this goal so think it is worth ensuring you don't go > max+1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmannibucau Perhaps I'm missing something. Can you please explain how updateAndGet() is better than incrementAndGet()?

Looking at the implementation of both they look very similar other than the fact that updateAndGet() would have to call updateFunction.applyAsInt() which would add unnecessary overhead. Both implementations call weakCompareAndSetInt().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overhead will be 0 since it will be inlined at some point but the function enables to set the max instead of incrementing and getting the issue i mention. Alternatively you can workaround it by doing if (incrAndGet > max) v.set(max+1), it is no more atomic but likely saner than nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmannibucau But you still didn't explain how that would be better than the current code and didn't post any references. We should always push better code when possible, but can you explain what problem that change would solve and how? Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isapir well I tried to explain in your case if you have a long window (for business reason, it is a common case) you can overflow "int.max" and start allowing requests which should be forbidden whereas using an atomic "update" instead of blind increment logic enables to not increment when you already are higher than the max (therefore my proposal to reset the counter to max+1 when increment > max+1). This is easy to get in an unit test even if "slow".

Copy link
Member Author

@isapir isapir May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To exceed Integer.MAX_VALUE (2,147,483,647) in a day, for example, one would need to hit the server at the rate of almost 25,000 Requests per Second. Handling configurations that would allow 25,000 Requests per Second in a time window of a full day is not a use case for this filter, and such configuration should be strongly discouraged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok-ish but so should maybe be written as a limitation in the filter class javadoc cause rate limiting is way more generic than current impl (which is ok but reading the doc it is not obvious IMHO).

throws IOException, ServletException {

String ipAddr = request.getRemoteAddr();
int reqCount = bucketCounter.increment(ipAddr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt the "key" be configurable? thinking to common cases where it often chains (order can vary):

  1. global (key=constant=app)
  2. user (key=req.principal.name)
  3. ip

So having it configurable enables to get this behavior without overriding it, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered something similar at first, but I didn't want to over complicate the first commit. More features can be added later, for sure.

@isapir isapir self-assigned this May 4, 2023
@isapir isapir merged commit e2dd77b into main May 6, 2023
4 checks passed
@isapir isapir deleted the ratelimit-filter branch May 6, 2023 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants