Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New file usage logic for json_form_widget #3730

Merged
merged 1 commit into from
Dec 23, 2021
Merged

New file usage logic for json_form_widget #3730

merged 1 commit into from
Dec 23, 2021

Conversation

dafeder
Copy link
Member

@dafeder dafeder commented Dec 22, 2021

File usage for uploaded files on the form was broken. This PR:

  • Sets correct entity name and ID
  • Removes DKAN-specific logic (hope is to spin this module off some day)
  • Adds a hook_entity_delete to clean up file usage if the associated node is deleted

QA Steps

  • Create a new dataset via the UI and upload a CSV file
  • Check that the new file appears as expected in admin/content/files, used in 1 place
  • Click on "1 place" and confirm that the correct dataset node is referenced
  • Delete the new dataset
  • Clear cache
  • Confirm that admin/content/files now lists "0 places" for the file.

Issues

Delete hook

The new delete hook could be expensive in certain scenarios -- either sites with a lot of content types and functionality beyond DKAN, or very large catalogs. If we see issues in any implementations or from the community we could add a configuration setting to make the delete behavior optional.

Orphans

Most of the point of this was to make orphaned files go away after deleting a dataset.

If this new behavior is used in combination with setting file.settings:make_unused_managed_files_temporary to true, files will be set to "temporary" status as well once their parent dataset is deleted. This will cause them to be deleted after 6 hours (or whatever you have set in system.file:temporary_maximum_age.

But, maybe we shouldn't do or recommend this for now:

  • This setting is considered dangerous. If DKAN is being used, for instance, on a site that also has a media gallery, it would definitely not be advisable to enable this setting.
  • This file usage functionality is new and could have undiscovered bugs that prevent proper file usage from being recorded; if usage were accidentally set to 0 but a file were still in use it could be deleted unexpectedly.
  • For now, file entities are only created for files uploaded through the form. So we can imagine a scenario where a file is uploaded, and then a second dataset created that points to the same file via URL. That will not get recorded as a usage, so if we delete the first dataset, the file will be orphaned and deleted even though it's still in use by the second dataset. This is obviously an edge case but it does give me some pause.

I'd recommend we add this for now and keep our eye on the file usage. We may want to install file_delete so that we can manually delete files once we see they are unused. In the future we can experiment with the make_unused_managed_files_temporary setting or even adding our own logic to the module that sets the unused json_form_widget files to temporary automatically.

Tests

My understanding is that we still do not have a reliable way to test file uploads with cypress. Also all the code here is in the .module file; I don't actually know how to test this properly.

@dafeder dafeder marked this pull request as ready for review December 22, 2021 01:59
@dafeder dafeder requested review from clayliddell and janette and removed request for clayliddell December 22, 2021 01:59
Copy link
Member

@janette janette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@janette janette merged commit 849c788 into 2.x Dec 23, 2021
@janette janette deleted the file-usage branch December 23, 2021 06:09
@stefan-korn
Copy link
Contributor

Questions about this change:

Is it supposed to solve #3721 too?

The file usage is tied to the dataset entity, while the file is actually tied to a distribution. Deleting a dataset, does not delete the related distributions. Are the distributions with deleting the dataset to be considered obsolete or orphaned and therefore it is okay to remove the file usage although the distribution is still exisiting?

@dafeder
Copy link
Member Author

dafeder commented Apr 5, 2024

@stefan-korn yes, a distribution with no datasets referencing it is considered to be orphaned and unneeded. I agree there is an argument for tying file usage to the distribution not the dataset, but the usage is really more just a byproduct of using the form and not used anywhere in DKAN's business logic. Since it's actually the dataset form being used it gets tied to that. Because files referenced using the API or a harvest are not even given IDs or usage stats, we have mostly left this alone.

@stefan-korn
Copy link
Contributor

@dafeder : Thanks for the explanation. I am only a bit wary of file usage, because I have had situations (not DKAN in particular, but Drupal in general) where file usage was preventing expected file deletion through entity deletion or vice versa. And this can lead to an "outcry" with pointing fingers and the Drupal users often do not get the difference and implications between file and entity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants