Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect transcoding of text attachments #132

Closed
dkop opened this issue Aug 12, 2016 · 8 comments
Closed

Incorrect transcoding of text attachments #132

dkop opened this issue Aug 12, 2016 · 8 comments
Labels
Milestone

Comments

@dkop
Copy link

dkop commented Aug 12, 2016

https://github.com/ddeboer/imap/blob/master/src/Message/Part.php#L182

            if ($this->getType() === self::TYPE_TEXT
                && strtolower($this->getCharset()) != 'utf-8'
            ) {
                $this->decodedContent = Transcoder::create()->transcode(
                    $this->decodedContent,
                    $this->getCharset()
                );
            }

This code use iconv() or mb_convert_encoding() for transcoding content of text file. $this->getCharset() detects encoding by parameters returned by imap_fetch_structure(). If there is no encoding parameter for attachment, then $this->getCharset() returns null, which is sent to mb_convert_encoding() as input encoding. But mb_convert_encoding() not in all cases can detect encoding of file. And in this case i have attachment file with broken and unrecoverable encoding.
May be there needed an optional parameter for method Part::getDecodedContent(). For example:

    public function getDecodedContent($keepUnseen = false, $convertToUtf8 = true) {
    ...
        if ($convertToUtf8 
            && $this->getType() === self::TYPE_TEXT
            && strtolower($this->getCharset()) != 'utf-8'
        ) {
        ...
        }
    ...
    }
@Slamdunk
Copy link
Collaborator

Hi, I faced this issue too. I'll work on this as soon the CI process run again (see #170).

Can you provide a raw message body with such attachments?

@dkop
Copy link
Author

dkop commented Sep 22, 2017

1.eml.tar.gz
Encoding of attachment is CP1251.
When receiving content of attachment using getDecodedContent() all russian chars changed to symbol 3F in hex.

@Slamdunk
Copy link
Collaborator

Slamdunk commented Sep 22, 2017

Thank you very much for the feedback, we'll look soon on it

Slamdunk added a commit that referenced this issue Sep 27, 2017
Message charset: mb_convert_encoding + aliases
Closes #78 #85 #115 #132 #136 #158 #165 #171 #174 #176
@Slamdunk Slamdunk added this to the 1.0 milestone Sep 29, 2017
@Slamdunk
Copy link
Collaborator

With #196 I can correctly see Это письмо отправлено автоматически [...] when using $message->getBodyText()

@dkop
Copy link
Author

dkop commented Oct 2, 2017

Problem with method $attachment->getDecodedContent() is still exist. Encoding in file is broken after saving it.

@dkop
Copy link
Author

dkop commented Oct 2, 2017

imapencoding
Please check attached screenshot. Attachment type is text and attachment has no encoding parameter.
Maybe here
if ($this->getType() === self::TYPE_TEXT
we should not check type, but check $this->disposition !== 'attachment' or $this->disposition === null

@Slamdunk
Copy link
Collaborator

Slamdunk commented Oct 2, 2017

Hi, the screenshot contains the current public release: the fix instead is present only in the develop branch, not yet released

@Slamdunk
Copy link
Collaborator

Slamdunk commented Oct 4, 2017

Version 1 has been released with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants