9361 storage quotas #9409

landreev · 2023-02-24T22:28:09Z

What this PR does / why we need it:
(please note the draft nature of the pr; work in progress, the model/approach need to be reviewed before I can produce a finalized version)

As explained in the issue, this PR will not result in a feature that's ready to be used in real life for enforcing meaningful storage quotas. What it does is add the quota check to the upload workflow; for testing of the concept it provides a mechanism where the same storage quota limit can be defined for all the users of the installation (and only the files directly created by the user are counted towards the quota). Again, this is not going to be useful as a practical solution, but will allow us to further expand the quota definition mechanism once the checks are there. This was meant to be a sensible sprint-or-so-sized chunk of work. The PR also moves some file unpacking and processing code that turns uploads into Datatverse Datafile objects from a static Utility method and into its own dedicated Command (so that files can eventually become subject to command engine-side metering too).

Which issue(s) this PR closes:

Closes #9361

Special notes for your reviewer:

Suggestions on how to test this:

As stated elsewhere in the PR (see the long-ish comment from Mar 17, in addition to the description above), all that's implemented here is the added mechanisms in the upload pipeline that prevent a user from uploading more content once they exceed the quota. The mechanism for defining the actual quota is essentially a placeholder, just enough to test these new hooks, but not yet usable pas a real life practical solution.
So a test scenario would be to set an arbitrary limit via the :StorageQuotaSizeInBytes setting, and make sure you can't upload more than that once the limit is reached.
[Additionally:] The second setting, :UseStorageQuotas activates the enforcement of the quota (defaults to false)
I can answer any additional questions etc.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…d storage quota enforcement #9361.

…e the utility was used. #9361

…nd. #9361

…or the user on the upload page. #9361

…otas. #9361

github-actions · 2023-02-27T21:33:03Z

src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java

@@ -1285,6 +861,11 @@ public static File saveInputStreamInTempFile(InputStream inputStream, Long fileS
                throw new FileExceedsMaxSizeException (MessageFormat.format(BundleUtil.getStringFromBundle("file.addreplace.error.file_exceeds_limit"), bytesToHumanReadable(fileSize), bytesToHumanReadable(fileSizeLimit)));  
            }

+            if (storageQuotaLimit != null && fileSize > storageQuotaLimit) {
+                try {tempFile.toFile().delete();} catch (Exception ex) {}


🚫 [reviewdog] <com.puppycrawl.tools.checkstyle.checks.blocks.LeftCurlyCheck> _{reported by reviewdog 🐶}
'{' at column 21 should have line break after.

github-actions · 2023-02-27T21:33:04Z

src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateNewDataFilesCommand.java

+                try {
+                    Charset charset = null;
+                    /*
+                	TODO: (?)


🚫 [reviewdog] <com.puppycrawl.tools.checkstyle.checks.whitespace.FileTabCharacterCheck> _{reported by reviewdog 🐶}
File contains tab characters (this is the first instance).

coveralls · 2023-02-27T21:34:30Z

coverage: 20.01% (-0.04%) from 20.045% when pulling 7ab736c on 9361-storage-quotas into 4e1bc5b on develop.

scolapasta

Overall, this looks great - I did leave a few questions as comments, most not too important. The biggest is how we actually want to define quotas which leads to what the next steps are? We should discuss that and then add the summary here. (e.g. one example is to go and test this as is, and then add user specific (if that's the direction we go) as a new issue / PR; if so we should also add some integration tests here).

scolapasta · 2023-03-13T22:18:44Z

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

@@ -140,6 +142,36 @@ public class DataFileServiceBean implements java.io.Serializable {
     */
    public static final String MIME_TYPE_PACKAGE_FILE = "application/vnd.dataverse.file-package";

+    public class UserStorageQuota {


should we move this out to be its own class? (i.e. any reason for it to be an internal class here?)

scolapasta · 2023-03-13T22:19:37Z

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

+        try {
+            Long totalSize = (Long)query.setParameter("creatorId", user.getId()).getSingleResult();
+            logger.info("total size for user: "+totalSize);
+            return totalSize == null ? 0L : totalSize; 


when would this be null (as opposed to the exception thrown)? is is where we for some reason don't have sizes stored?

scolapasta · 2023-03-13T22:22:28Z

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java

+
+    public boolean isStorageQuotaEnforced() {
+        return maxTotalUploadSizeInBytes != null; 
+    }

    public Long getMaxIngestSizeInBytes() {


should we reconsider naming this? (since it does more now?)

scolapasta · 2023-03-13T22:25:01Z

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

@@ -1657,4 +1689,29 @@ public Embargo findEmbargo(Long id) {
        DataFile d = find(id);
        return d.getEmbargo();
    }
+
+    public Long getStorageUsageByCreator(AuthenticatedUser user) {
+        Query query = em.createQuery("SELECT SUM(o.filesize) FROM DataFile o WHERE o.creator.id=:creatorId");


this makes me think (as you point out) how we decide to define the limits even? by user, by dataset - for example a workaround around this limit would be to create a new account and give it permission?)

scolapasta · 2023-03-13T22:26:32Z

src/main/webapp/editFilesFragment.xhtml

@@ -155,6 +160,7 @@
                                  fileLimit="#{EditDatafilesPage.getMaxNumberOfFiles()}" 
                                  invalidSizeMessage="#{bundle['file.edit.error.file_exceeds_limit']}" 
                                  sequential="true"
+                                  previewWidth="-1"


what does this do?

TL;DR it disables the browser-side thumbnail previews.

I was delighted to discover that, as of a couple of recent versions of PrimeFaces, it is now possible to disable their own image thumbnail preview in the upload component. It used to drive me nuts - we don't really need it (because we generate our own thumbnails server-side); but, unlike with our thumbnails, there's no way to define a size limit, so attempting to upload a bunch of large enough images paralyzes the user's browser (since it's all javascript). And, adding insult to injury, the user assumes that it's our site that's crapping the bed... I am embarrassed to acknowledge that, despite all that righteous indignation, I somehow missed the point at which they had added this ability to disable the whole thing, by setting the previewWidth to -1.

mreekie · 2023-03-15T17:44:48Z

grooming

It appears this will bleed over from March 1 sprint to March 15th

mreekie · 2023-03-15T18:53:22Z

grooming

It appears this will bleed over from March 1 sprint to March 15th

Leonid noted no change in size for the march 15th sprint

landreev · 2023-03-17T21:44:58Z

Overall, this looks great - I did leave a few questions as comments, most not too important. The biggest is how we actually want to define quotas which leads to what the next steps are? We should discuss that and then add the summary here. (e.g. one example is to go and test this as is, and then add user specific (if that's the direction we go) as a new issue / PR; if so we should also add some integration tests here).

combining this with another question:

... how we decide to define the limits even?

The linked issue #9361 lists 4 other issues in which specific types of quota definitions were requested. Specifically, per collection, per dataset, per user, and even per-user-per-day. Per group has never been requested, I believe - not sure if it's needed (per dataset/collection may be enough to cover a practical scenario of a group of users, a research project/department etc., working together depositing data (?).

I'm assuming the answer is that we want all of the above. And in each case the answer - how much this user can add to the storage during this specific upload session - will be based on checking all possible quotas that may apply. Somewhat similar to how our permissions work - we check for permissions via direct role assignments, combine with group perms ... etc. Although I very much hope this does not need to be as complex as our permission structure, with all the complicated recursive db queries there. It may be possible to keep it relatively simple because, unlike with permissions, we probably don't need/want to apply the most generous of the possibilities. I mean, you can access a restricted file with either a group or a user permission, whichever you may have. With the quotas, however, I'm assuming we can get away with having a clear precedent structure. For example, the quota on the dvobject (dataset, collection), if defined, beats everything else (if not defined, check if the parent has the quota, repeat). If none are defined on the dataset/collection, then the user's own quota applies (both total and per day, possibly), if that's not defined either, then check if the installation has the default user quota and/or default dataset/collection size. (Something like that, just an example of a possible scheme).

This pr was an attempt to create a sprint-sized amount of work that could be wrapped up and merged independently. We don't need to know how the quotas are going to be defined. We just assume that we can count on a service that will give us these numbers - how many bytes this user can add during this upload session, to this dataset (plus the total applicable quota, for display and/or error messages). All it tests is the behavior of the upload page and the api, with only enough service-side code provided as to be able to test against some defined limit.

As for the next step, I don't think the 4 linked issues can be worked on separately. I.e., it doesn't seem practical to try to add a per-user quota separately from per-collection (not 100% sure though). But this is a natural next step, to actually work on the real, functional quota service.

Though there is another task that may be standalone enough - we need to work out some mechanism for counting used storage space. I don't think we can practically obtain this number in real time every time. Going through the physical storage every time is obviously impractical. Calculating it from the database (by adding the filesizes + the saved originals) is not truly accurate (skips many other aux. files), although it may be a reasonable approximation... but still inefficient, to calculate it in real time every time. So I'm assuming we'll need some system of keeping the sizes of datasets and collections cached... Figuring this out and implementing it is definitely a project of its own.

…ota is already full/exceeded. #9361

mreekie · 2023-03-29T19:15:14Z

new sprint:

no change to size.
was waiting. needed additional input from folks who were out.

landreev · 2023-04-04T14:51:35Z

Per conversation with @scolapasta last week, we decided it made sense to send the PR through the normal QA pipeline and merge it in its current form, before proceeding with the next steps of the overall effort. I will remove the "draft" designation.

scolapasta · 2023-09-11T17:45:43Z

@landreev I'll move this milestone 6.1 issue to "ready for Review", but I'm assigning you first so that you can handle the 6.0 merge and address any EE10 issues. When done, this one can actually go back to ready for QA (barring no major edits).

landreev · 2023-09-11T21:38:33Z

I don't think this PR is actually "ready for review" at this point. Just syncing it with develop appears to be a non-trivial task now that it has sat for so long. So I'll move it to "in progress" to work on that.
But even after the branch is synced and builds under 6.0, I no longer feel like it was a good idea to try to merge the PR as it was in the first place. The PR was created as an experiment in merging incomplete/partially implemented pieces. But the whole idea was to merge such incremental parts quickly and keep working on the functionality; so it seems safe to conclude that the experiment failed. I am wondering if we should just change the goal of the issue to delivering working/practically useful quota functionality and put the PR back in dev.? - idk, let's talk about this part.

resolving merge conflicts (#9361)

kcondon · 2023-09-26T14:30:14Z

@landreev I ended up doing a quick smoke test and found file upload to direct s3 fails:
file_upload_fail_direct_s3_quota_not_config.txt

I did end up also checking storage quota and it seemed to work on local file and indirect s3 upload.

landreev · 2023-09-26T16:06:35Z

Good catch! I will look into it.

@landreev I ended up doing a quick smoke test and found file upload to direct s3 fails: file_upload_fail_direct_s3_quota_not_config.txt

I did end up also checking storage quota and it seemed to work on local file and indirect s3 upload.

landreev · 2023-09-28T20:49:39Z

Fixing the problem with direct upload now (once again, good catch!).

landreev · 2023-09-29T19:39:37Z

Direct uploads should be working via the UI now.

github-actions · 2023-09-29T19:44:52Z

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9361-storage-quotas

ghcr.io/gdcc/configbaker:9361-storage-quotas

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

landreev added 4 commits February 23, 2023 11:13

(draft/work in progress) framework for a new file creation command an…

4d7df9c

…d storage quota enforcement #9361.

switched to the new Create Files command in the remaining places wher…

d5fd5e3

…e the utility was used. #9361

removed the static utility methods that have been turned into a comma…

6210c34

…nd. #9361

Added info messages about the remaining storage quota, if enforced, f…

1a22b11

…or the user on the upload page. #9361

landreev requested a review from scolapasta February 24, 2023 22:28

landreev added this to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation Feb 27, 2023

pdurbin assigned scolapasta Feb 27, 2023

Fixed the merge conflict. Merge branch 'develop' into 9361-storage-qu…

7857ad9

…otas. #9361

github-actions bot reviewed Feb 27, 2023

View reviewed changes

scolapasta moved this from Ready for Review ⏩ to Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Mar 1, 2023

scolapasta added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Mar 1, 2023

scolapasta reviewed Mar 13, 2023

View reviewed changes

scolapasta assigned landreev Mar 14, 2023

landreev moved this from Review 🔎 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Mar 15, 2023

landreev moved this from IQSS Team - In Progress 💻 to Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Mar 15, 2023

landreev moved this from Review 🔎 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Mar 15, 2023

mreekie added the D: FixRateLimitingBehaviors Address this collection of issues that impact rate limiting label Mar 15, 2023

mreekie mentioned this pull request Mar 15, 2023

Add GENERIC storage quota check to the file upload framework. #9361

Closed

An extra check, to disable the upload component right away, if the qu…

10a4753

…ota is already full/exceeded. #9361

landreev marked this pull request as ready for review April 4, 2023 15:04

scolapasta removed their assignment Apr 4, 2023

landreev moved this from IQSS Team - In Progress 💻 to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Apr 11, 2023

landreev moved this from Ready for Review ⏩ to Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Apr 11, 2023

landreev moved this from Review 🔎 to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Apr 11, 2023

scolapasta removed the Status: Waiting issues in need of input from someone currently unavailable label Sep 11, 2023

scolapasta moved this from ▶ SPRINT READY to Ready for QA ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Sep 11, 2023

scolapasta assigned landreev and unassigned landreev and kcondon Sep 11, 2023

scolapasta moved this from Ready for QA ⏩ to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Sep 11, 2023

landreev moved this from Ready for Review ⏩ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Sep 11, 2023

cmbz mentioned this pull request Sep 13, 2023

Add execution rate metering to the command engine #9356

Closed

Merge branch 'develop' into 9361-storage-quotas

a9036fc

resolving merge conflicts (#9361)

This comment has been minimized.

Sign in to view

landreev moved this from IQSS Team - In Progress 💻 to Ready for QA ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Sep 20, 2023

kcondon assigned kcondon and unassigned landreev Sep 26, 2023

landreev self-assigned this Sep 27, 2023

landreev added 2 commits September 28, 2023 17:17

saving temp. changes

29e56bb

Fixes direct upload from the UI. (#9361)

7ab736c

kcondon merged commit c26e1e7 into develop Sep 29, 2023
14 of 15 checks passed

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Ready for QA ⏩ to Done 🚀 Sep 29, 2023

kcondon deleted the 9361-storage-quotas branch September 29, 2023 20:19

pdurbin added this to the 6.1 milestone Sep 30, 2023

landreev mentioned this pull request Nov 1, 2023

Direct uploads via API/DVUploader are broken in dev. branch #10080

Closed

landreev mentioned this pull request Nov 29, 2023

Collection Quotas (8549) #10144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

9361 storage quotas #9409

9361 storage quotas #9409

landreev commented Feb 24, 2023 •

edited

github-actions bot Feb 27, 2023

github-actions bot Feb 27, 2023

coveralls commented Feb 27, 2023 •

edited

scolapasta left a comment

scolapasta Mar 13, 2023

scolapasta Mar 13, 2023

scolapasta Mar 13, 2023

scolapasta Mar 13, 2023

scolapasta Mar 13, 2023

landreev Mar 15, 2023

mreekie commented Mar 15, 2023

mreekie commented Mar 15, 2023

landreev commented Mar 17, 2023

mreekie commented Mar 29, 2023

landreev commented Apr 4, 2023

scolapasta commented Sep 11, 2023

landreev commented Sep 11, 2023

This comment has been minimized.

kcondon commented Sep 26, 2023 •

edited

landreev commented Sep 26, 2023

landreev commented Sep 28, 2023

landreev commented Sep 29, 2023

github-actions bot commented Sep 29, 2023

9361 storage quotas #9409

9361 storage quotas #9409

Conversation

landreev commented Feb 24, 2023 • edited

github-actions bot Feb 27, 2023

Choose a reason for hiding this comment

github-actions bot Feb 27, 2023

Choose a reason for hiding this comment

coveralls commented Feb 27, 2023 • edited

scolapasta left a comment

Choose a reason for hiding this comment

scolapasta Mar 13, 2023

Choose a reason for hiding this comment

scolapasta Mar 13, 2023

Choose a reason for hiding this comment

scolapasta Mar 13, 2023

Choose a reason for hiding this comment

scolapasta Mar 13, 2023

Choose a reason for hiding this comment

scolapasta Mar 13, 2023

Choose a reason for hiding this comment

landreev Mar 15, 2023

Choose a reason for hiding this comment

mreekie commented Mar 15, 2023

mreekie commented Mar 15, 2023

landreev commented Mar 17, 2023

mreekie commented Mar 29, 2023

landreev commented Apr 4, 2023

scolapasta commented Sep 11, 2023

landreev commented Sep 11, 2023

This comment has been minimized.

kcondon commented Sep 26, 2023 • edited

landreev commented Sep 26, 2023

landreev commented Sep 28, 2023

landreev commented Sep 29, 2023

github-actions bot commented Sep 29, 2023

landreev commented Feb 24, 2023 •

edited

coveralls commented Feb 27, 2023 •

edited

kcondon commented Sep 26, 2023 •

edited