Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump mutagen (v0.16.0-beta1) for non-utf8 filenames and set it to use windows 777 permissions #4206

Merged
merged 10 commits into from
Sep 18, 2022

Conversation

rfay
Copy link
Member

@rfay rfay commented Sep 14, 2022

The Problem/Issue/Bug:

  • non-utf8 filenames aren't handled well in lots of places, but Mutagen in this version can at least detect and report the problem.
  • On traditional Windows, mutagen doesn't know what permissions to use with directories and files, so with this version of mutagen that can be forced to 777 as Docker Desktop also does.

TODO

  • Make ddev debug mutagen ... work with flags; we can use that to terminate mutagen sessions
  • Figure out how to completely terminate mutagen on Windows. sudo?
  • Completely terminate mutagen on test start

How this PR Solves The Problem:

Manual Testing Instructions:

  • Verify that the non-utf8-filename problem can be recreated without this. This tarball or this code might work to recreate, or @ursbraem may have files that can be used to recreate it.
  • Verify that mutagen doesn't croak with this PR if those are being synced, but instead reports the problem
  • Verify on traditional Windows that permissions inside container are 777. Especially check vendor/bin/drush, for example.

Automated Testing Overview:

  • Consider a test that checks the problem files
  • Consider a test for windows that checks for 777 when using mutagen

Related Issue Link(s):

Release/Deployment notes:

@github-actions
Copy link

github-actions bot commented Sep 14, 2022

@rfay
Copy link
Member Author

rfay commented Sep 14, 2022

@ursbraem please take a look at this when you get a chance, would love it if you could test the binary against your problem site.

@ursbraem
Copy link
Contributor

Sure! Can you instruct me on how to install? I am back on regular homebrew ddev, how do I use the version to test with?

@rfay
Copy link
Member Author

rfay commented Sep 15, 2022

Hi @ursbraem - For this PR you would get the proper zipball from #4206 (comment) and then you can install ddev where you want it in your $PATH. You can either put somewhere before where it shows up in brew or you can brew unlink ddev and just put it where brew would have put it. If you're on macOS you may have to right-click on the ddev binary to convince Apple that it's OK to open.

@ursbraem
Copy link
Contributor

ursbraem commented Sep 17, 2022

Ok, just for the record (if I have to do it again): I did brew unlink ddev, downloaded the intel tar (AMD), extracted it, added u+x permissions to it, did export PATH=$PATH:/Users/myhome/Downloads/, cd to project folder and ddev start.

In .ddev/config.yaml I uncommented upload_dir: fileadmin so mutagen has to work with all files again.

Now I'm doing my rsync again to get all files freshened up from the production site.

@rfay
Copy link
Member Author

rfay commented Sep 17, 2022

Don't forget you'll have to either put your non-utf8 files in the main directory or temporarily remove the upload_dir or whatever changes you made to mutagen.yml

@rfay rfay merged commit c7d70ba into ddev:master Sep 18, 2022
@rfay rfay deleted the 20220914_mutagen_bump branch September 18, 2022 05:37
@ursbraem
Copy link
Contributor

ursbraem commented Sep 18, 2022

With this version, mutagen did not croak anymore. But ddev mutagen status -l revealed the culprits:

Scan problems:
	typo3_app/typo3-secure-web/fileadmin/minisites/redaktion/velostationen/Dokumente/Weitere/plan_bernmilchga�ssli.pdf (non-UTF-8): non-UTF-8 filename
	typo3_app/typo3-secure-web/fileadmin/minisites/redaktion/velostationen/Gestaltung/Fotos_Stationen/Entre�e-ve�lostation_foto_ville_de_lausanne.jpg (non-UTF-8): non-UTF-8 filename
	typo3_app/typo3-secure-web/fileadmin/minisites/redaktion/velostationen/Gestaltung/Fotos_Stationen/Su�d_Postbru�cke_2013_foto_velostation_zuerich.jpg (non-UTF-8): non-UTF-8 filename

Actually, these are the filenames:

Su¨d_Postbru¨cke_2013_foto_velostation_zuerich.jpg
Entre´e-ve´lostation_foto_ville_de_lausanne.jpg
plan_bernmilchga¨ssli.pdf

So it's the standalone diacritic characters. But not generally, I could even call a new file S¨TEST.jpg and upload it without problems. It's a damaged or wrong kind of character.

Screenshot-18 09-002042

(By the way, 'UTF8filesystem' => '1', is set in LocalConfiguration.php, but TYPO3 will still sanitize the filename on upload via Backend.)

So here is a ZIP with the original files:

Archive.zip

I went to the containing directories in TYPO3 on production, but the directories couldn't be listed in the File module due to those bad characters.

Screenshot-18 09-002041

I renamed the files on the servers file system (in the database, their identifier was already listed with the actual umlauts) and all issues were gone. Also, after rsyncing the entire filadmin to ddev, mutagen reported no more errors. Problem solved, I'd dare to say...

@rfay rfay changed the title Bump mutagen (v0.16.0-beta1) and set it to use windows 777 permissions Bump mutagen (v0.16.0-beta1) for non-utf8 filenames and set it to use windows 777 permissions Sep 18, 2022
@rfay
Copy link
Member Author

rfay commented Sep 18, 2022

Thanks so much for reporting this and pursuing it to the end! And thanks to @xenoscopic for the great fix to mutagen.

@xenoscopic
Copy link

xenoscopic commented Sep 18, 2022

The problematic ¨ characters here are most likely ISO/IEC 8859-15 encoded, whereas the manually created diacritical marks are probably defaulting to UTF-8 (which would explain why the latter will work, even though they look the same).

Most applications/OSes nowadays (at least within the last decade, maybe longer) will write the filenames using a Unicode-based encoding (usually UTF-8 or UTF-16 [1]), but there will be that occasional "vintage" filename (especially on Linux) that's encoded with some other encoding that mostly overlaps with ASCII and UTF-8 for Latin (and extended Latin) characters, but not quite. And, of course, these other encodings can still be used for Linux systems' locales (though it's not a common choice).


[1] And on systems that do use UTF-16, the file is typically converted to UTF-8 by either the OS or most libraries/runtimes when returned from a readdir call.

@ursbraem
Copy link
Contributor

ursbraem commented Oct 11, 2022 via email

@rfay
Copy link
Member Author

rfay commented Oct 11, 2022

Hi @ursbraem there will be a new release v1.21.2 with this in it today, but to get current HEAD with ddev, brew unlink ddev && brew install --HEAD --fetch-head. Details on techniques on other environments are at https://ddev.readthedocs.io/en/latest/developers/building-contributing/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants