Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Issues Driving Me Insane! #543

Closed
glenelkins1984 opened this issue Aug 20, 2020 · 15 comments
Closed

Encoding Issues Driving Me Insane! #543

glenelkins1984 opened this issue Aug 20, 2020 · 15 comments
Labels
need help Your support is required to solve this issue. needs investigation This will be tested / debugged or checked out.

Comments

@glenelkins1984
Copy link

Hi

I like this library but i am going insane with the encoding. Pulling 5000 emails from GMAIL works fine, but a lot of the html has encoding problems. I can't even seem to convert it all to proper UTF-8 with mbstring or utf8_encode, i just still have weird characters all over the place. I also can't just insert the html directly as the utf8 database wont' accept it.

There must be a way to have the emails displaying properly! Can someone help me here, ideally i should be able to just get the encoding from the message object, something like:

/** @var PhpImap\IncomingMail */
$mesaage->getEncoding()

But there doesn't seem to be such a thing on the IncomingMail object, if i could know what the encoding was then i could use mbstring to convert it properly.

@bapcltd-marv
Copy link
Contributor

I think the issue with that approach is different chunks of the message can be encoded differently- you'd need to be able to inspect
DataPartInfo::$charset on all IncomingMail::$dataInfo entries.

@glenelkins1984
Copy link
Author

glenelkins1984 commented Aug 20, 2020

Thanks for the reply.

So how do i do that then? What you're describing is some static properties, but i'm looping over IncomingMail objects and there is no $mailObj->charset there is also no property $mailObj->dataInfo ?

This is after pulling a mail in like so:

$remoteEmail = $imap->getMail($mailId,false);

So in my example $remoteEmail->charset and $remoteEmail->dataInfo both do not exist.

@bapcltd-marv
Copy link
Contributor

Thanks for the reply.

So how do i do that then? What you're describing is some static properties, but i'm looping over IncomingMail objects and there is no $mailObj->charset there is also no property $mailObj->dataInfo ?

This is after pulling a mail in like so:

$remoteEmail = $imap->getMail($mailId,false);

So in my example $remoteEmail->charset and $remoteEmail->dataInfo both do not exist.

I'm referring to class properties, not static properties- specifically:

protected $dataInfo = [[], []];

public $charset;

@glenelkins1984
Copy link
Author

Thanks, so $dataInfo is protected so how am i supposed to access that information? I don't see a public method that returns it.

@bapcltd-marv
Copy link
Contributor

bapcltd-marv commented Aug 21, 2020

There must be a way to have the emails displaying properly! Can someone help me here, ideally i should be able to just get the encoding from the message object, something like:

/** @var PhpImap\IncomingMail */
$mesaage->getEncoding()

You proposal would need to be modified (at least in part) to either make these methods properties directly accessible, or to add a method into IncomingMail to collate the various DataPartInfo::$charset properties (although I'm unsure if they'd be impacted by DataPartInfo::$encoding)

A property would need to be routed through magic getters, and a method's return type would need to be defined for single body + single charset, multiple body + single charset, and multiple body + multiple charset.

The proposed technique would also likely need to account for non-inline text-based attachments.

@glenelkins1984
Copy link
Author

ok, i was hoping the library would literally just have the html body encoding set on it without all this messing around.

@hrst
Copy link

hrst commented Oct 15, 2020

I think I found the issue: decodeMimeStr() is called via convertEncodingAfterFetch(). However, at this point in time, the string no longer has the MIME header info. Hence, using imap_mime_header_decode() in decodeMimeStr() will not work and it then does not convert the encoding.

$this->data = $this->mail->decodeMimeStr(

$elements = \imap_mime_header_decode($string);

I fixed it like this:

Mailbox.php

    public function decodeMimeStr(string $string): string
    {
        $newString = '';
        /** @var list<object{charset?:string, text?:string}>|false */
        $elements = \imap_mime_header_decode($string);

        if (false === $elements) {
            return $string;
        }

        foreach ($elements as $element) {
            $newString .= $this->convertToUtf8($element->text, $element->charset);
        }

        return $newString;
    }

    public function convertToUtf8(string $string, string $fromCharset): string
    {
		$fromCharset = mb_strtolower($fromCharset);
		$newString = '';

		if ('default' === $fromCharset) {
			$fromCharset = $this->decodeMimeStrDefaultCharset;
		}

		switch ($fromCharset) {
			case 'default': // Charset default is already ASCII (not encoded)
			case 'utf-8': // Charset UTF-8 is OK
				$newString .= $string;
				break;
			default:
				// If charset exists in mb_list_encodings(), convert using mb_convert function
				if (\in_array($fromCharset, $this->lowercase_mb_list_encodings(), true)) {
					$newString .= \mb_convert_encoding($string, 'UTF-8', $fromCharset);
				} else {
					// Fallback: Try to convert with iconv()
					$iconv_converted_string = @\iconv($fromCharset, 'UTF-8', $string);
					if (!$iconv_converted_string) {
						// If iconv() could also not convert, return string as it is
						// (unknown charset)
						$newString .= $string;
					} else {
						$newString .= $iconv_converted_string;
					}
				}
				break;
		}

        return $newString;
    }

DataPartInfo.php

    protected function convertEncodingAfterFetch(): string
    {
        if (isset($this->charset) and !empty(\trim($this->charset))) {
            /*$this->data = $this->mail->decodeMimeStr(
                (string) $this->data // Data to convert
            );*/
            $this->data = $this->mail->convertToUtf8(
                (string) $this->data, // Data to convert
				$this->charset
            );
        }

        return (null === $this->data) ? '' : $this->data;
    }

@dwalczyk
Copy link

dwalczyk commented Oct 20, 2020

@hrst merge request, your fix is working for me

@jmbianca
Copy link

@hrst Hi, your fix works well for me too. Is there any plan to merge it soon?

@Sebbo94BY
Copy link
Collaborator

I've created a merge request with those changes and merged it into the develop branch. Feel free to test the changes using the develop branch.

@Sebbo94BY Sebbo94BY added need help Your support is required to solve this issue. needs investigation This will be tested / debugged or checked out. labels Feb 23, 2021
@Tanktiger
Copy link

@Sebi94nbg When will there be a new release with this fix in it?

@hrst
Copy link

hrst commented Dec 14, 2021

I've just updated and the new version actually re-introduces the problem:

$this->data = $this->mail->decodeMimeStr(
(string) $this->data, // Data to convert
\trim($this->charset)
);

decodeMimeStr() is called with a second parameter, but the method does not even have it?

Also here:

$fileName = $this->decodeMimeStr($fileName, $this->serverEncoding);

I'm very confused on what the intention here was. Am I missing something and a second decodeMimeStr() method exists somewhere that does have the second parameter? I could not find it.

See the method here:

public function decodeMimeStr(string $string): string

I suspect you'd want to use convertToUtf8() in both cases. I'll update my code accordingly.

Sebbo94BY pushed a commit that referenced this issue Dec 24, 2021
Sebbo94BY pushed a commit that referenced this issue Dec 27, 2021
@Sebbo94BY
Copy link
Collaborator

The develop branch has been merged into the master and released as version 4.3.0 six days ago. Please check again, if this issue still exists.

@hrst
Copy link

hrst commented Mar 5, 2022

The current version has the same issues again, please check this comment for a fix:
#657 (comment)

@Sebbo94BY
Copy link
Collaborator

I've just released a new version 4.5.3 with some more improvements regarding these encoding issues and some unit tests.

Relates to #657.

Feel free to reopen this issue, if you should be still able to reproduce your issue(s) or just open a new ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need help Your support is required to solve this issue. needs investigation This will be tested / debugged or checked out.
Projects
None yet
Development

No branches or pull requests

7 participants