Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

file-naming / autorenaming issues in the file-manager #8295

Closed
asaage opened this issue Mar 31, 2016 · 32 comments
Closed

file-naming / autorenaming issues in the file-manager #8295

asaage opened this issue Mar 31, 2016 · 32 comments
Assignees

Comments

@asaage
Copy link

asaage commented Mar 31, 2016

the 3.5.8 Version (and probably older versions too) strip the first charakter of the filename on upload in case of äöü..
it's not reproduceable in the online-demo due to fileupload resrictions.

following szenario:
sourc-file Änderungsantrag.pdf
destination-file (upload in contao 3.3.3) becomes Aenderungsantrag.pdf and shows up as Aenderungsantrag.pdf

compared to 3.5.8 where the behavior differs and seems to be a little messed up.
sourc-file Änderungsantrag.pdf
destination-file (upload in contao 3.5.8) will be saved as Änderungsantrag.pdf
but shows up as
nderungsantrag.pdf + renaming this file is now impossible.

still 3.5.8:
uploading Büro.png will be saved as Büro.png and shows up as Büro.png 👍
but renaming it to Büro_foo.png results in Buero_foo.png

i would consider this as 3 bugs

  • missing first letter
  • impossible renaming
  • inconsistent naming routines

(i personaly would like to be able to use every character that has no special meaning to the underlying sytem / file-system like linux/windows)

@leofeyer leofeyer added this to the 3.5.10 milestone Apr 3, 2016
@asaage
Copy link
Author

asaage commented Apr 5, 2016

What would be an appropriate way to confirm this?

@fenepedia
Copy link

I can confirm, that there is a rename problem after uploading

But I could rename the files

names

@Aybee
Copy link
Contributor

Aybee commented Apr 6, 2016

I can confirm the first post in 3.5.9 on Windows 10. Server Linux Apache 2.0 Handler.

@asaage
Copy link
Author

asaage commented Apr 6, 2016

some more test's...
unfortunately i do not have equal versions on different systems at this point

3.5.6 on win IIS
works more or less (missing Thumbnail is a serverconfig issue with umlaute)
image
renaming (possible) to Änderung2 results in
image

3.5.8 on Ubuntu Apache
image
rename wont save
image

3.3.3 on Ubuntu Apache
image
image

@ghost
Copy link

ghost commented Apr 6, 2016

Test on 3.5.9 on Lubuntu 15.10, en_US.utf8, Apache 2.4 with LANG=C, PHP mbstring, client Window 7 Firefox/IE11:
I can confirm the symptom of the dropped initial umlaut (ß as well).After upload, the filename is left-truncated, but I can download the file (via the symbol in front) with the correct name!

The database tl_files entry shows inconsistent values: in column path the full path including filename contains the original filename, however, the column name contain the incorrect filename (without leading umlaut). You can see this discrepancy in the filemanager screen when hovering over the pencil icon: the link displayed contains the correct full path - same on the edit page after you click on the pencil.

If I correct the name in the database and reload the file manager page, the initial umlaut is dropped from the display again. Also, any attempt to edit the name in the filemanager edit page fails, the database entry remains unchanged (the manually inserted leading umlaut is kept).

Selecting the file in a download element results in a wrong filename display and a successful download with a wrong filename - also for a name manually corrected in the database.

@ghost
Copy link

ghost commented Apr 7, 2016

One more test:

  • create a file "ü-test.pdf" outside Contao (by ftp, e.g.)
  • open that directory in filemanager
  • filename ist displayed as "-test.pdf", full path is ok. No database entry yet
  • click on info icon -> database entry is created with name="-test.pdf"

@asaage asaage changed the title upload: file-naming / autorenaming issues in the file-manager file-naming / autorenaming issues in the file-manager Apr 7, 2016
@asaage
Copy link
Author

asaage commented Apr 7, 2016

i wonder what @fenepedia did...
can you tell some system details?

@fenepedia
Copy link

Hmm I think @leofeyer could answer this question better than me. My Server ist hosted and configured by him.

@leofeyer
Copy link
Member

@contao/developers /cc

@ghost
Copy link

ghost commented Apr 18, 2016

For one test case, download (see my prior comment), I drilled the issue down to a problem with the PHP pathinfo() function used in Files::__get(). pathinfo() drops the first character if is an UTF8 umlaut (probably any non-ASCII7) and the system environment has LANG=C (which is in the default environment of my apache2.4 and obviously a few webservers). PHP versions 5.6.11 and 5.4

You can easily verify it by inserting

<?php var_dump(pathinfo('/home/test/ü-ütest-pdf')); ?>

into ce_download.html5 and then using any download link.

It can also be reproduced on the command line:

http@odin:~$ echo $LANG
en_US.utf8
http@odin:~$ cat x.php
<?php
var_dump(pathinfo('/home/test/ü-ütest.pdf'));
http@odin:~$ php x.php
array(4) {
  ["dirname"]=>
  string(10) "/home/test"
  ["basename"]=>
  string(13) "ü-ütest.pdf"
  ["extension"]=>
  string(3) "pdf"
  ["filename"]=>
  string(9) "ü-ütest"
}
http@odin:~$ LANG=C php x.php
array(4) {
  ["dirname"]=>
  string(10) "/home/test"
  ["basename"]=>
  string(11) "-ütest.pdf"
  ["extension"]=>
  string(3) "pdf"
  ["filename"]=>
  string(7) "-ütest"
}

A hot fix in File.php around line 160ff just for this case is

    /**
     *  TEMP.FIX for pathinfo() locale C issue
     */
    private static function mypathinfo($strPath)
    {
            $oldlocale = setlocale(LC_CTYPE, "0");
            setlocale(LC_CTYPE, 'en_US.utf8');
            $arr = pathinfo($strPath);
            setlocale(LC_CTYPE, oldlocale);
            return $arr;
    }

    /**
     * Return an object property
...
    */
    public function __get($strKey)
    {
        switch ($strKey)
        {
...
            case 'name':
            case 'basename':
                if (!isset($this->arrPathinfo[$strKey]))
                {
==>                 $this->arrPathinfo = static::mypathinfo(TL_ROOT . '/' . $this->strFile);
                }
                return $this->arrPathinfo['basename'];
                break;

This rough fix also covers the file upload case (in database, not in filemanager display)..
But I found quite a number of pathinfo() calls in the system/modules/core tree, plus more calls to basename() (which has the same problem). So a better solution is needed.

@ghost
Copy link

ghost commented Apr 18, 2016

As a brute-force approach, just inserting

setlocale(LC_CTYPE, 'en_US.utf8');

into initconfig.php without any other change seems to fix the issue, including the filemanager display and rename (rename then changes 'ü' to 'ue' etc - which I do not like at all).
I do not now if this patch has any negative effects elsewhere (sorting, case folding, ...).

@leofeyer
Copy link
Member

leofeyer commented Apr 18, 2016

Seems we have to use a library or copy the mb_pathinfo() function e.g. from here:

http://core.wp-a2z.org/oik_api/phpmailermb_pathinfo/

Or here: http://stackoverflow.com/a/34619682

@ghost
Copy link

ghost commented Apr 18, 2016

http://php.net/manual/en/function.pathinfo.php says (RTFM helps once more):

Note:

pathinfo() is locale aware, so for it to parse a path containing multibyte characters correctly, the matching locale must be set using the setlocale() function.

So it seems that locale switching is the right way (it is already used in some vendor code). In my tests, setting LC_CTYPE was sufficient (and probably less overhead than LC_ALL), and only the charset (utf8) should be relevant (using en_US as safest language code).

A solution without changing core code is redefining pathinfo and basename within the Contao namespace. Works fine for me in php 5.6 + 5.4 installations.

namespace Contao;

function pathinfo($path, $option=null)
{
    $locale = setlocale(LC_CTYPE, '0');
    setlocale(LC_CTYPE, 'en_US.utf8');
    $result = is_null($option) ? \pathinfo($path) : \pathinfo($path, $option);
    setlocale(LC_CTYPE, $locale);
    return $result;
}
function basename($path, $suffix=null)
{
    $locale = setlocale(LC_CTYPE, '0');
    setlocale(LC_CTYPE, 'en_US.utf8');
    $result = \basename($path, $suffix);
    setlocale(LC_CTYPE, $locale);
    return $result;
}

Namespaces are made for such things...
Extensions are not covered, but I think it would be valid as a core bug fix in 3.5.10

@asaage
Copy link
Author

asaage commented Apr 18, 2016

👍 sounds promising...
can the same be achived with setting
intl.default_locale in the php.ini to an appropiate value?
i am a little bit concerned about "Extensions are not covered..."

@aschempp
Copy link
Member

Using setlocale is dangerous. The manual at http://php.net/manual/en/function.setlocale.php says

Warning The locale information is maintained per process, not per thread. If you are running PHP on a multithreaded server API like IIS, HHVM or Apache on Windows, you may experience sudden changes in locale settings while a script is running, though the script itself never called setlocale(). This happens due to other scripts running in different threads of the same process at the same time, changing the process-wide locale using setlocale().

@leofeyer
Copy link
Member

I second @aschempp's concerns, therefore I'd prefer to add a String::pathinfo() method and use the PHPMailer mb_pathinfo code there.

@Toflar
Copy link
Member

Toflar commented Apr 19, 2016

Sounds like it should rather go to File, no?

@leofeyer
Copy link
Member

Also, this seems to be a very common problem. There are over 38,000 code snippets when searching for mb_pathinfo on Github: https://github.com/search?q=mb_pathinfo&type=Code&utf8=%E2%9C%93

@leofeyer
Copy link
Member

Fixed in 2276fa8 and a0bf86b.

jsonn pushed a commit to jsonn/pkgsrc that referenced this issue Apr 23, 2016
### 4.1.3 (2016-04-22)

 * Use data URIs for the image preview in the back end.
 * Use DIRECTORY_SEPARATOR to convert kernel.cache_dir into a relative path (see #464).
 * Always trigger the "isVisibleElement" hook (see contao/core#8312).
 * Do not change all sessions when switching users (see contao/core#8158).
 * Do not allow to close fieldsets with empty required fields (see contao/core#8300).
 * Make the path related properties of the File class binary-safe (see contao/core#8295).
 * Correctly validate and decode IDNA e-mail addresses (see contao/core#8306).
 * Skip forward pages entirely in the book navigation module (see contao/core#5074).
 * Do not add the X-Priority header in the Email class (see contao/core#8298).
 * Determine the search index checksum in a more reliable way (see contao/core#7652).
@asaage
Copy link
Author

asaage commented May 20, 2016

Would you reopen this? its not completely fixed. (3.5.12)
i created following 0Byte testfiles on my desktop:
image
and uploaded them via dropzone
the result in the file-view:
image
though in the file-edit view names and paths as well as the renaming-function are more or less okay
image
I wonder why it should be possible to upload something named Ö.. but not to rename something to Ö..

@leofeyer
Copy link
Member

I wonder why it should be possible to upload something named Ö.. but not to rename something to Ö..

?

@asaage
Copy link
Author

asaage commented Jun 10, 2016

forget about the last sentence - sometimes i wonder a lot...
just take a look at this:

especially the last three files which seem to be the same - but they are not.

@leofeyer
Copy link
Member

How exactly do I reproduce the issue?

@fritzmg
Copy link
Contributor

fritzmg commented Jun 11, 2016

I encountered the following under Windows:

If I put a file called Ö_test.txt into the files directory (not using the file uploader) and then go into the file manager, it will be displayed as follows:

o_test_1

When I do a dabfs sync afterwards, it will say

o_test_2

When I then upload another file named exactly the same (Ö_test.txt) using the file manager, it will display both:

o_test_3

This is the state of the database afterwards:

o_test_4

@fritzmg
Copy link
Contributor

fritzmg commented Jun 11, 2016

I can reproduce @asaage 's problem under Linux:

If I put a file called Ö_test.txt into the files directory (not using the file uploader) and then go into the file manager, it will be displayed as follows:

o_test_5

When I do a Dbafs sync afterwards, it will say

o_test_6

When I edit the file, it will display the name normally:

o_test_7

If you rename the file to _Ö_test.txt it will be converted to _Oe_test.txt. If you rename it back to Ö_test.txt it will be converted to Oe_test.txt. This is expected behaviour though I am assuming.

@leofeyer
Copy link
Member

I cannot reproduce the issue on Mac OS X, which is why I'm struggling to fix it. Anyone else?

@contao/developers /cc

@fritzmg
Copy link
Contributor

fritzmg commented Jun 15, 2016

The problem with Windows is the following: http://stackoverflow.com/questions/15055192/why-does-windows-need-to-utf8-decode-filenames-for-file-get-contents-to-work

It's quite a bitch ;). I stumbled upon this in an extension of mine, but I could never solve it directly (in my case the file paths are now rawurlencoded and then decoded - the extension transfers files from one Contao installation to another, among other things).

@asaage
Copy link
Author

asaage commented Jun 15, 2016

the db entry tl_files.name is correct if i use the uploader
the edit/text/input is also ok
only in the file-three the first character is missing - maybe you are using different methods to output those strings.

btw i'm on Linux.

i did not check how that behaves if i put the file via ftp and sync afterwards.

@leofeyer leofeyer reopened this Jul 15, 2016
@leofeyer leofeyer modified the milestones: 3.5.16, 3.5.10 Jul 15, 2016
@leofeyer leofeyer removed this from the 3.5.16 milestone Aug 25, 2016
@asaage
Copy link
Author

asaage commented Nov 22, 2016

#2508 (comment)

@leofeyer
Copy link
Member

I was able to reproduce the issue on Linux, however I'm not sure what to do now. In contrast to the initial problem description, I did not have a problem renaming the file in Contao 3.5.25.

@asaage
Copy link
Author

asaage commented Apr 21, 2017

Renaming as well as displaying the first character is working flawlessly since at least 3.5.17.
If there are no intensions to apply the same naming rules on upload's like on edit's (which i would prefer) this can probably be closed.

@leofeyer
Copy link
Member

Since we are removing those rules in Contao 4.4 entirely (see contao/core-bundle@0c0b3d8), I'm closing the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants