Skip to content

Commit

Permalink
Merge pull request #10211 from IQSS/9356-rate-limiting-command-engine
Browse files Browse the repository at this point in the history
adding rate limiting for command engine
  • Loading branch information
landreev committed Mar 20, 2024
2 parents 4f46d15 + a9b2514 commit cb3bd0e
Show file tree
Hide file tree
Showing 22 changed files with 960 additions and 11 deletions.
20 changes: 20 additions & 0 deletions doc/release-notes/9356-rate-limiting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Rate Limiting using JCache (with Hazelcast as provided by Payara)
The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes.
Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users.
Superuser accounts are exempt from rate limiting.
Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database.
Two database settings configure the rate limiting.
Note: If either of these settings exist in the database rate limiting will be enabled.
If neither setting exists rate limiting is disabled.

`:RateLimitingDefaultCapacityTiers` is a comma separated list of default values for each tier.
In the following example, the default for tier `0` (guest users) is set to 10,000 calls per command per hour and tier `1` (authenticated users) is set to 20,000 calls per command per hour.
Tiers not specified in this setting will default to `-1` (No Limit). I.e., -d "10000" is equivalent to -d "10000,-1,-1,..."
`curl http://localhost:8080/api/admin/settings/:RateLimitingDefaultCapacityTiers -X PUT -d '10000,20000'`

`:RateLimitingCapacityByTierAndAction` is a JSON object specifying the rate by tier and a list of actions (commands).
This allows for more control over the rate limit of individual API command calls.
In the following example, calls made by a guest user (tier 0) for API `GetLatestPublishedDatasetVersionCommand` is further limited to only 10 calls per hour, while an authenticated user (tier 1) will be able to make 30 calls per hour to the same API.
`curl http://localhost:8080/api/admin/settings/:RateLimitingCapacityByTierAndAction -X PUT -d '[{"tier": 0, "limitPerHour": 10, "actions": ["GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand"]}, {"tier": 0, "limitPerHour": 1, "actions": ["CreateGuestbookResponseCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}, {"tier": 1, "limitPerHour": 30, "actions": ["CreateGuestbookResponseCommand", "GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}]'`

Hazelcast is configured in Payara and should not need any changes for this feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
[
{
"tier": 0,
"limitPerHour": 10,
"actions": [
"GetLatestPublishedDatasetVersionCommand",
"GetPrivateUrlCommand",
"GetDatasetCommand",
"GetLatestAccessibleDatasetVersionCommand"
]
},
{
"tier": 0,
"limitPerHour": 1,
"actions": [
"CreateGuestbookResponseCommand",
"UpdateDatasetVersionCommand",
"DestroyDatasetCommand",
"DeleteDataFileCommand",
"FinalizeDatasetPublicationCommand",
"PublishDatasetCommand"
]
},
{
"tier": 1,
"limitPerHour": 30,
"actions": [
"CreateGuestbookResponseCommand",
"GetLatestPublishedDatasetVersionCommand",
"GetPrivateUrlCommand",
"GetDatasetCommand",
"GetLatestAccessibleDatasetVersionCommand",
"UpdateDatasetVersionCommand",
"DestroyDatasetCommand",
"DeleteDataFileCommand",
"FinalizeDatasetPublicationCommand",
"PublishDatasetCommand"
]
}
]
41 changes: 41 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1373,6 +1373,33 @@ Before being moved there,
on your machine, large file uploads via API will cause RAM and/or swap usage bursts. You might want to point this to
a different location, restrict maximum size of it, and monitor for stale uploads.

.. _cache-rate-limiting:

Configure Your Dataverse Installation to Use JCache (with Hazelcast as Provided by Payara) for Rate Limiting
------------------------------------------------------------------------------------------------------------

Rate limiting has been added to prevent users from over taxing the system either deliberately or by runaway automated processes.
Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users.
Superuser accounts are exempt from rate limiting.
Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database.
Two database settings configure the rate limiting.
Note: If either of these settings exist in the database rate limiting will be enabled (note that a Payara restart is required for the setting to take effect). If neither setting exists rate limiting is disabled.

- :RateLimitingDefaultCapacityTiers is the number of calls allowed per hour if the specific command is not configured. The values represent the number of calls per hour per user for tiers 0,1,...
A value of -1 can be used to signify no rate limit. Tiers not specified in this setting will default to `-1` (No Limit). I.e., -d "10000" is equivalent to -d "10000,-1,-1,..."

.. code-block:: bash
curl http://localhost:8080/api/admin/settings/:RateLimitingDefaultCapacityTiers -X PUT -d '10000,20000'
- :RateLimitingCapacityByTierAndAction is a JSON object specifying the rate by tier and a list of actions (commands). This allows for more control over the rate limit of individual API command calls.
In the following example, calls made by a guest user (tier 0) for API GetLatestPublishedDatasetVersionCommand is further limited to only 10 calls per hour, while an authenticated user (tier 1) will be able to make 30 calls per hour to the same API.

:download:`rate-limit-actions.json </_static/installation/files/examples/rate-limit-actions-setting.json>` Example json for RateLimitingCapacityByTierAndAction

.. code-block:: bash
curl http://localhost:8080/api/admin/settings/:RateLimitingCapacityByTierAndAction -X PUT -d '[{"tier": 0, "limitPerHour": 10, "actions": ["GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand"]}, {"tier": 0, "limitPerHour": 1, "actions": ["CreateGuestbookResponseCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}, {"tier": 1, "limitPerHour": 30, "actions": ["CreateGuestbookResponseCommand", "GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}]'
.. _Branding Your Installation:

Expand Down Expand Up @@ -4496,3 +4523,17 @@ tab. files saved with these headers on S3 - since they no longer have
to be generated and added to the streamed file on the fly.

The setting is ``false`` by default, preserving the legacy behavior.

:RateLimitingDefaultCapacityTiers
+++++++++++++++++++++++++++++++++
Number of calls allowed per hour if the specific command is not configured. The values represent the number of calls per hour per user for tiers 0,1,...
A value of -1 can be used to signify no rate limit. Also, by default, a tier not defined would receive a default of no limit.

:RateLimitingCapacityByTierAndAction
++++++++++++++++++++++++++++++++++++
JSON object specifying the rate by tier and a list of actions (commands). This allows for more control over the rate limit of individual API command calls.
In the following example, calls made by a guest user (tier 0) for API GetLatestPublishedDatasetVersionCommand is further limited to only 10 calls per hour, while an authenticated user (tier 1) will be able to make 30 calls per hour to the same API.
{"rateLimits":[
{"tier": 0, "limitPerHour": 10, "actions": ["GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand"]},
{"tier": 0, "limitPerHour": 1, "actions": ["CreateGuestbookResponseCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]},
{"tier": 1, "limitPerHour": 30, "actions": ["CreateGuestbookResponseCommand", "GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}]}
21 changes: 21 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,18 @@
<scope>provided</scope>
</dependency>

<!-- JSON-B -->
<dependency>
<groupId>jakarta.json.bind</groupId>
<artifactId>jakarta.json.bind-api</artifactId>
</dependency>
<!-- Rope in an implementation for unit tests - is provided at runtime by appserver -->
<dependency>
<groupId>org.eclipse</groupId>
<artifactId>yasson</artifactId>
<scope>test</scope>
</dependency>

<!-- Jakarta Faces & related -->
<dependency>
<groupId>org.glassfish</groupId>
Expand Down Expand Up @@ -542,6 +554,10 @@
<artifactId>dataverse-spi</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>javax.cache</groupId>
<artifactId>cache-api</artifactId>
</dependency>
<!-- TESTING DEPENDENCIES -->
<dependency>
<groupId>org.junit.jupiter</groupId>
Expand Down Expand Up @@ -653,6 +669,11 @@
<version>3.9.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<testResources>
Expand Down
9 changes: 8 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import edu.harvard.iq.dataverse.engine.command.Command;
import edu.harvard.iq.dataverse.engine.command.CommandContext;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
import edu.harvard.iq.dataverse.engine.command.impl.CheckRateLimitForDatasetPage;
import edu.harvard.iq.dataverse.engine.command.impl.CreatePrivateUrlCommand;
import edu.harvard.iq.dataverse.engine.command.impl.CuratePublishedDatasetVersionCommand;
import edu.harvard.iq.dataverse.engine.command.impl.DeaccessionDatasetVersionCommand;
Expand All @@ -36,6 +37,7 @@
import edu.harvard.iq.dataverse.engine.command.impl.PublishDataverseCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand;
import edu.harvard.iq.dataverse.export.ExportService;
import edu.harvard.iq.dataverse.util.cache.CacheFactoryBean;
import io.gdcc.spi.export.ExportException;
import io.gdcc.spi.export.Exporter;
import edu.harvard.iq.dataverse.ingest.IngestRequest;
Expand Down Expand Up @@ -242,6 +244,8 @@ public enum DisplayMode {
SolrClientService solrClientService;
@EJB
DvObjectServiceBean dvObjectService;
@EJB
CacheFactoryBean cacheFactory;
@Inject
DataverseRequestServiceBean dvRequestService;
@Inject
Expand Down Expand Up @@ -1930,7 +1934,10 @@ private void setIdByPersistentId() {
}

private String init(boolean initFull) {

// Check for rate limit exceeded. Must be done before anything else to prevent unnecessary processing.
if (!cacheFactory.checkRate(session.getUser(), new CheckRateLimitForDatasetPage(null,null))) {
return BundleUtil.getStringFromBundle("command.exception.user.ratelimited", Arrays.asList(CheckRateLimitForDatasetPage.class.getSimpleName()));
}
//System.out.println("_YE_OLDE_QUERY_COUNTER_"); // for debug purposes
setDataverseSiteUrl(systemConfig.getDataverseSiteUrl());

Expand Down
10 changes: 9 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DataversePage.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import edu.harvard.iq.dataverse.dataverse.DataverseUtil;
import edu.harvard.iq.dataverse.engine.command.Command;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
import edu.harvard.iq.dataverse.engine.command.impl.CheckRateLimitForCollectionPage;
import edu.harvard.iq.dataverse.engine.command.impl.CreateDataverseCommand;
import edu.harvard.iq.dataverse.engine.command.impl.CreateSavedSearchCommand;
import edu.harvard.iq.dataverse.engine.command.impl.DeleteDataverseCommand;
Expand All @@ -31,6 +32,8 @@
import static edu.harvard.iq.dataverse.util.JsfHelper.JH;
import edu.harvard.iq.dataverse.util.SystemConfig;
import java.util.List;

import edu.harvard.iq.dataverse.util.cache.CacheFactoryBean;
import jakarta.ejb.EJB;
import jakarta.faces.application.FacesMessage;
import jakarta.faces.context.FacesContext;
Expand Down Expand Up @@ -118,6 +121,8 @@ public enum LinkMode {
@Inject DataverseHeaderFragment dataverseHeaderFragment;
@EJB
PidProviderFactoryBean pidProviderFactoryBean;
@EJB
CacheFactoryBean cacheFactory;

private Dataverse dataverse = new Dataverse();

Expand Down Expand Up @@ -318,7 +323,10 @@ public void updateOwnerDataverse() {

public String init() {
//System.out.println("_YE_OLDE_QUERY_COUNTER_"); // for debug purposes

// Check for rate limit exceeded. Must be done before anything else to prevent unnecessary processing.
if (!cacheFactory.checkRate(session.getUser(), new CheckRateLimitForCollectionPage(null,null))) {
return BundleUtil.getStringFromBundle("command.exception.user.ratelimited", Arrays.asList(CheckRateLimitForCollectionPage.class.getSimpleName()));
}
if (this.getAlias() != null || this.getId() != null || this.getOwnerId() == null) {// view mode for a dataverse
if (this.getAlias() != null) {
dataverse = dataverseService.findByAlias(this.getAlias());
Expand Down
12 changes: 10 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/EjbDataverseEngine.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import edu.harvard.iq.dataverse.actionlogging.ActionLogServiceBean;
import edu.harvard.iq.dataverse.authorization.AuthenticationServiceBean;
import edu.harvard.iq.dataverse.authorization.providers.builtin.BuiltinUserServiceBean;
import edu.harvard.iq.dataverse.util.cache.CacheFactoryBean;
import edu.harvard.iq.dataverse.engine.DataverseEngine;
import edu.harvard.iq.dataverse.authorization.Permission;
import edu.harvard.iq.dataverse.authorization.groups.GroupServiceBean;
Expand All @@ -16,6 +17,7 @@
import edu.harvard.iq.dataverse.engine.command.DataverseRequest;
import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
import edu.harvard.iq.dataverse.engine.command.exception.PermissionException;
import edu.harvard.iq.dataverse.engine.command.exception.RateLimitCommandException;
import edu.harvard.iq.dataverse.ingest.IngestServiceBean;
import edu.harvard.iq.dataverse.pidproviders.PidProviderFactoryBean;
import edu.harvard.iq.dataverse.privateurl.PrivateUrlServiceBean;
Expand Down Expand Up @@ -176,7 +178,9 @@ public class EjbDataverseEngine {

@EJB
EjbDataverseEngineInner innerEngine;


@EJB
CacheFactoryBean cacheFactory;

@Resource
EJBContext ejbCtxt;
Expand All @@ -202,7 +206,11 @@ public <R> R submit(Command<R> aCommand) throws CommandException {

try {
logRec.setUserIdentifier( aCommand.getRequest().getUser().getIdentifier() );

// Check for rate limit exceeded. Must be done before anything else to prevent unnecessary processing.
if (!cacheFactory.checkRate(aCommand.getRequest().getUser(), aCommand)) {
throw new RateLimitCommandException(BundleUtil.getStringFromBundle("command.exception.user.ratelimited", Arrays.asList(aCommand.getClass().getSimpleName())), aCommand);
}

// Check permissions - or throw an exception
Map<String, ? extends Set<Permission>> requiredMap = aCommand.getRequiredPermissions();
if (requiredMap == null) {
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/UserServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ private AuthenticatedUser createAuthenticatedUserForView (Object[] dbRowValues,
user.setMutedEmails(Type.tokenizeToSet((String) dbRowValues[15]));
user.setMutedNotifications(Type.tokenizeToSet((String) dbRowValues[15]));

user.setRateLimitTier((int)dbRowValues[17]);

user.setRoles(roles);
return user;
}
Expand Down Expand Up @@ -419,7 +421,7 @@ private List<Object[]> getUserListCore(String searchTerm,
qstr += " u.createdtime, u.lastlogintime, u.lastapiusetime, ";
qstr += " prov.id, prov.factoryalias, ";
qstr += " u.deactivated, u.deactivatedtime, ";
qstr += " u.mutedEmails, u.mutedNotifications ";
qstr += " u.mutedEmails, u.mutedNotifications, u.rateLimitTier ";
qstr += " FROM authenticateduser u,";
qstr += " authenticateduserlookup prov_lookup,";
qstr += " authenticationproviderrow prov";
Expand Down
13 changes: 10 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/api/AbstractApiBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import edu.harvard.iq.dataverse.engine.command.impl.GetLatestAccessibleDatasetVersionCommand;
import edu.harvard.iq.dataverse.engine.command.impl.GetLatestPublishedDatasetVersionCommand;
import edu.harvard.iq.dataverse.engine.command.impl.GetSpecificPublishedDatasetVersionCommand;
import edu.harvard.iq.dataverse.engine.command.exception.RateLimitCommandException;
import edu.harvard.iq.dataverse.externaltools.ExternalToolServiceBean;
import edu.harvard.iq.dataverse.license.LicenseServiceBean;
import edu.harvard.iq.dataverse.locality.StorageSiteServiceBean;
Expand Down Expand Up @@ -421,7 +422,7 @@ public Command<DatasetVersion> handleLatestPublished() {
}));
return dsv;
}

protected DataFile findDataFileOrDie(String id) throws WrappedResponse {
DataFile datafile;
if (id.equals(PERSISTENT_ID_KEY)) {
Expand Down Expand Up @@ -575,6 +576,8 @@ protected <T> T execCommand( Command<T> cmd ) throws WrappedResponse {
try {
return engineSvc.submit(cmd);

} catch (RateLimitCommandException ex) {
throw new WrappedResponse(rateLimited(ex.getMessage()));
} catch (IllegalCommandException ex) {
//for 8859 for api calls that try to update datasets with TOA out of compliance
if (ex.getMessage().toLowerCase().contains("terms of use")){
Expand Down Expand Up @@ -776,11 +779,15 @@ protected Response notFound( String msg ) {
protected Response badRequest( String msg ) {
return error( Status.BAD_REQUEST, msg );
}

protected Response forbidden( String msg ) {
return error( Status.FORBIDDEN, msg );
}


protected Response rateLimited( String msg ) {
return error( Status.TOO_MANY_REQUESTS, msg );
}

protected Response conflict( String msg ) {
return error( Status.CONFLICT, msg );
}
Expand Down

0 comments on commit cb3bd0e

Please sign in to comment.