Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Allow "deduplication" on media and file libraries #899

Draft
wants to merge 1 commit into
base: 2.x
Choose a base branch
from

Conversation

antonioribeiro
Copy link
Member

@antonioribeiro antonioribeiro commented May 14, 2021

Description

Related to #880

This PR introduces some changes:

Allow media and files to be "deduplicated" by using the SHA1 hash of files contents to generate directory or file names:

Screenshot 2021-05-14 at 19 23 51

Allow users to enable deduplication and choose if it will keep the current structure (file inside a hashed directory) or flatten it and just use the <sha1 hash>.extension as the filename:

'deduplication' => [
    'enabled' => false,
    'flatten_filename' => true,
],

I feel this is can be a problem if people wants to download their files using the file original file name, but it also fixes another problem: sometimes the filename used by the user is incompatible (has special chars, accents) with the file system used to store that file. About the download part, it can be fixed in the CMS by using the original filename (stored on the medias table) to download it instead of the UUID.

Fix a potential bug on frontend passing undefined and "null" values to the backend, generating this kind of errors while adding an image:

SQLSTATE[22P02]: Invalid text representation: 7 ERROR:  invalid input syntax for type bigint: \"undefined\" (SQL: select exists(select * from \"medias\" where \"id\" = undefined and \"medias\".\"deleted_at\" is null) as \"exists\")"
trace: [{,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…}, {,…},…]

Refactor MediaLibraryController.php and FileLibraryController.php to DRY them a bit by introducing a new LibraryController.php abstract base class containing common code.

@antonioribeiro
Copy link
Member Author

Needs more testing and maybe write some tests for it.

@antonioribeiro
Copy link
Member Author

Just edited to add a bug fix included in this PR too

@haringsrob haringsrob changed the title Allow "deduplication" on media and file libraries [3.x] Allow "deduplication" on media and file libraries Jul 8, 2022
@ifox ifox marked this pull request as draft March 29, 2023 12:48
@Tofandel
Copy link
Contributor

Tofandel commented Feb 12, 2024

In this case, I would also add that storing files should be done in a file structure that uses the first 2 chars of the hash as directory otherwise if the library grows (eg 4000 images), a list dir becomes extremely slow, if not impossible because of the amount of ram it requires

About the download part, it can be fixed in the CMS by using the original filename (stored on the medias table) to download it instead of the UUID

But that requires a controller to view the file, which is much slower than a direct access using nginx or apache

There is a few workarounds that could involve a htaccess or nginx config file to generate at the same time as the upload that adds the correct headers
Since we already are using glide as a controller for image this wouldn't add more overhead, but for files it would (but likely wouldn't be a problem)

But then uploads should likely be moved into the private folder by default as well for added security

Otherwise simply sanitizing the uploaded file name and storing the sha-1 inside the media table directly might be another solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants