Decoding the subject with iconv_mime_decode() #410

freescout-helpdesk · 2023-05-31T09:19:41Z

Describe the bug

Currently Webklex/php-imap by default uses 'utf-8' decoder and \imap_utf8() function to decode subject:

Line 424 in abf080e

if ($decoder === 'utf-8' && extension_loaded('imap')) {

This approach does not always decode subjects properly.

For example Subject: =?ISO-2022-JP?B?GyRCIXlCaBsoQjEzMhskQjlmISEhViUsITwlRyVzGyhCJhskQiUoJS8lOSVGJWolIiFXQGxMZ0U5JE4kPyRhJE4jURsoQiYbJEIjQSU1JW0lcyEhIVo3bjQpJSglLyU5JUYlaiUiISYlbyE8JS8hWxsoQg==?= will be decoded into garbled string:

�$B!yBh�(B132�$B9f!!!V%,!<%G%s�(B&�$B%(%/%9%F%j%"!W@lLgE9$N$?$a$N#Q�(B&�$B#A%5%m%s!!!Z7n4)%(%/%9%F%j%"!&%o!<%/![�(B

So it does not make much sense to use by default the approach which is not 100% reliable.

Some time ago we've switched to decoding the subject simply with iconv_mime_decode():

$value = iconv_mime_decode($value, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, "UTF-8");

It works always and properly - freescout-help-desk/freescout#2965 (comment).

So maybe it worth making iconv_mime_decode() the default and the only approach for decoding the subject like this: https://github.com/freescout-helpdesk/freescout/blob/master/overrides/webklex/php-imap/src/Header.php#L208

Function:
https://github.com/freescout-helpdesk/freescout/blob/dist/app/Misc/Mail.php#L909 (decodeSubject())

Related issue: Webklex/laravel-imap#295

The text was updated successfully, but these errors were encountered:

Webklex · 2023-06-23T19:07:58Z

Hi @freescout-helpdesk ,
many thanks for your report and suggestion. I improved the decoder and added a new test case for the mentioned subject above.

Best regards & happy coding,

nuernbergerA · 2023-06-26T10:33:53Z

@Webklex Attachment::decodeName() has the same problem

Webklex · 2023-06-27T00:00:08Z

Hi @nuernbergerA ,
thanks for your report. I just pushed a new test including your mentioned scenario but can't reproduce the described issue.
Are you using the latest release? If not, please update and try again. There's a chance that the problem might have already been fixed in a newer release.

If the issue persists, please head over to https://github.com/Webklex/php-imap/issues/new?assignees=&labels=&projects=&template=bug_report.md and provide as much information as possible. Every little bit, even if it might seem unrelated could help to solve the mystery :)

Best regards & happy coding,

nuernbergerA · 2023-06-27T05:13:27Z

@Webklex i've opened a PR with a failing test for my issue ;)

freescout-helpdesk · 2023-06-27T06:29:59Z

@Webklex i've opened a PR with a failing test for my issue ;)

We've checked. This our function https://github.com/freescout-helpdesk/freescout/blob/dist/app/Misc/Mail.php#L883C8-L883C8 also successfully works for decoding headers and attachment names including your example:

=?iso-8859-1?Q?2021=5FM=E4ngelliste=5F0819306.xlsx?= >> 2021_Mängelliste_0819306.xlsx

Webklex · 2023-06-28T02:19:40Z

@nuernbergerA thanks for your input and your pull request. Please update to v5.5 and give it another try :)

@freescout-helpdesk how are you decoding attachment names / filenames? If you are relying on Attachment::decodeName(), please note that this if statement was reversed:
https://github.com/Webklex/php-imap/blob/5.5.0/src/Attachment.php#L296-L300
The fallback was used as default, which works often but can cause unexpected behavior (#421) in some cases :/

Best regards & happy coding,

freescout-helpdesk · 2023-06-28T06:14:44Z

@freescout-helpdesk how are you decoding attachment names / filenames?

We are using the same our MailHelper::decodeSubject() function to decode attachment names - it works fine.

jackwiy · 2024-07-04T13:03:19Z

@Webklex

$personal = '=?gb2312?B?z+O42yCDfN14u80=?=';
imap_mime_header_decode()
The result obtained by this method：�� |�x��

public function testUTF8Content()
    {
        $personal = '=?gb2312?B?z+O42yCDfN14u80=?=';
        $personal1 = imap_mime_header_decode($personal);

        echo '<pre>';
        print_r(['personal1'=>$personal1]);

        $header = new Header('');
        $personalParts = $header->mime_header_decode($personal);
        $personal2 = '';
        foreach ($personalParts as $p) {
            print_r(['pText'=>$p->text]);
            $personal2 .= $header->convertEncoding($p->text, $header->getEncoding($p));
        }

        print_r(['personal2'=>$personal2]);
    }

jackwiy · 2024-07-05T12:18:49Z

public function testUTF8Content()
    {
        $personal = '=?gb2312?B?z+O42yCDfN14u80=?=';

        $personal2 = imap_utf8($personal);
        print_r(['personal2'=>$personal2]);

        $header = new Header('');
        $personalParts = $header->mime_header_decode($personal);
        $personal3 = '';
        foreach ($personalParts as $p) {
            print_r(['pText'=>$p->text]);
            //$personal3 .= $header->convertEncoding($p->text, $header->getEncoding($p));
            $personal3 .= iconv($header->getEncoding($p), 'UTF-8//IGNORE', $p->text);
        }

        print_r(['personal3'=>$personal3]);
    }

thin-k-design · 2024-08-16T18:24:26Z

I have a related problem. I received an email with an attachment, whose filename is "iso-8859-1''Elk%FCld%F6tt%20felsz%F3l%EDt%E1s%20ELK-F-24011370.pdf"

The filename gets decoded as Elk�ld�tt felsz�l�t�s ELK-F-24011370.pdf

After url decoding the string it should be converted to UTF-8:

            if (str_contains($name, "''")) {
                $parts = explode("''", $name);
                if (EncodingAliases::has($parts[0])) {
                    $name = implode("''", array_slice($parts, 1));
                }
            }
            [...]
            // check if $name is url encoded
            if (preg_match('/%[0-9A-F]{2}/i', $name)) {
                $name = urldecode($name);
            }

It should be something like:

           if (str_contains($name, "''")) {
               $parts = explode("''", $name);
               if (EncodingAliases::has($parts[0])) {
                   $charset = $parts[0];
                   $name = implode("''", array_slice($parts, 1));
               }
           }
           [...]
           // check if $name is url encoded
           if (preg_match('/%[0-9A-F]{2}/i', $name)) {
               $name = urldecode($name);
               if (isset($charset)) {
                   $name = mb_convert_encoding($name, 'UTF-8', $charset);
               }
           }

I am not sure of the implications, so I haven't done a pull request yet.

thin-k-design · 2024-08-16T18:51:08Z

Actually it seems that the first part of the code is not equivalent to the RFC 2231 Specification, where a string can be represented like charset'language'encoded-text, in my case iso-8859-1''Elk%FCld%F6tt%20felsz%F3l%EDt%E1s%20ELK-F-24011370.pdf which translates in:

charset: iso-8859-1
language: empty
encoded-text: Elk%FCld%F6tt%20felsz%F3l%EDt%E1s%20ELK-F-24011370.pdf

So I think, this:

           if (str_contains($name, "''")) {
               $parts = explode("''", $name);
               if (EncodingAliases::has($parts[0])) {
                   $name = implode("''", array_slice($parts, 1));
               }
           }

may be replaced with:

            $parts = explode("'", $name, 3);
            if (EncodingAliases::has($parts[0])) {
                list($charset, $language, $name) = $parts;
            }

thin-k-design · 2024-08-17T10:26:58Z

My pull request did not pass the following test:
AttachmentLongFilenameTest.php
self::assertEquals("f7b5181985862431bfc443d26e3af2371e20a0afd676eeb9b9595a26d42e0b73", hash("sha256", $attachment->filename));

The assertion in the test assumes that the output filename of iso-8859-15''%30%31%5F%41%A4%E0%E4%3F%3F%3F%3F%40%5A%2D%30%31%32%33%34%35%36%37%38%39%2D%71%77%65%72%74%79%75%69%6F%70%61%73%64%66%67%68%6A%6B%6C%7A%78%63%76%62%6E%6D%6F%70%71%72%73%74%75%76%7A%2D%30%31%32%33%34%35%36%37%38%39%2D%71%77%65%72%74%79%75%69%6F%70%61%73%64%66%67%68%6A%6B%6C%7A%78%63%76%62%6E%6D%6F%70%71%72%73%74%75%76%7A%2D%30%31%32%33%34%35%36%37%38%39%2D%71%77%65%72%74%79%75%69%6F%70%61%73%64%66%67%68%6A%6B%6C%7A%78%63%76%62%6E%6D%6F%70%71%72%73%74%75%76%7A%2E%74%78%74 is in its original encoding and not in utf-8.

Maybe I am missing something, but should this be the intended behaviour (leaving the original encoding as is), how can I get the encoding of the filename in the Attachment class, so I can convert it myself if needed?

freescout-helpdesk mentioned this issue May 31, 2023

Subject encoding Webklex/laravel-imap#295

Closed

Webklex added bug Something isn't working validating labels Jun 23, 2023

Webklex added a commit that referenced this issue Jun 23, 2023

Header value decoding improved #410

9d6a2d5

Webklex added enhancement New feature or request and removed bug Something isn't working validating labels Jun 23, 2023

Webklex added a commit that referenced this issue Jun 26, 2023

Test case extended to include attachment filename decoding #410

510c4ea

nuernbergerA mentioned this issue Jun 27, 2023

Failing test: Attachment name decoding #421

Merged

Webklex closed this as completed in 25fa1a0 Jun 28, 2023

Webklex reopened this Jun 28, 2023

paulocardozo mentioned this issue Jun 28, 2023

Problem with subject =?UTF-8?Q? (2) #420

Open

thin-k-design mentioned this issue Aug 17, 2024

Attachment Name Encoding Fix #509

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding the subject with iconv_mime_decode() #410

Decoding the subject with iconv_mime_decode() #410

freescout-helpdesk commented May 31, 2023 •

edited

Loading

Webklex commented Jun 23, 2023

nuernbergerA commented Jun 26, 2023

Webklex commented Jun 27, 2023

nuernbergerA commented Jun 27, 2023

freescout-helpdesk commented Jun 27, 2023

Webklex commented Jun 28, 2023

freescout-helpdesk commented Jun 28, 2023

jackwiy commented Jul 4, 2024 •

edited

Loading

jackwiy commented Jul 5, 2024 •

edited

Loading

thin-k-design commented Aug 16, 2024 •

edited

Loading

thin-k-design commented Aug 16, 2024

thin-k-design commented Aug 17, 2024

Decoding the subject with iconv_mime_decode() #410

Decoding the subject with iconv_mime_decode() #410

Comments

freescout-helpdesk commented May 31, 2023 • edited Loading

Webklex commented Jun 23, 2023

nuernbergerA commented Jun 26, 2023

Webklex commented Jun 27, 2023

nuernbergerA commented Jun 27, 2023

freescout-helpdesk commented Jun 27, 2023

Webklex commented Jun 28, 2023

freescout-helpdesk commented Jun 28, 2023

jackwiy commented Jul 4, 2024 • edited Loading

jackwiy commented Jul 5, 2024 • edited Loading

thin-k-design commented Aug 16, 2024 • edited Loading

thin-k-design commented Aug 16, 2024

thin-k-design commented Aug 17, 2024

freescout-helpdesk commented May 31, 2023 •

edited

Loading

jackwiy commented Jul 4, 2024 •

edited

Loading

jackwiy commented Jul 5, 2024 •

edited

Loading

thin-k-design commented Aug 16, 2024 •

edited

Loading