Skip to content

Commit d2cff6a

Browse files
author
Alex Vandiver
committed
Transcode the HTML part of incoming email into UTF-8 as well
Summary: D1093 did this for just the text/plain part of incoming email. Most text/html parts choose to either use entity encoding //or// are already UTF-8, thus obviating the need to transcode the HTML part. However, this is not always the case, and leads to dropped messages, by way of: ``` EXCEPTION: (Exception) Failed to JSON encode value (phacility#5: Malformed UTF-8 characters, possibly incorrectly encoded): Dictionary value at key "html" is not valid UTF8, and cannot be JSON encoded: [snip HTML part of message content]``` Generalize the charset transcoding to not apply to just the text/plain part, but both text/plain and text/html parts. Test Plan: Fed in a Windows-1252-encoded text/html part with 0x92 bytes in it; verified that $content only contained valid UTF-8 after this change. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin, epriestley Differential Revision: https://secure.phabricator.com/D18776
1 parent bea45e9 commit d2cff6a

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

scripts/mail/mail_handler.php

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,19 @@
3535
$parser = new MimeMailParser();
3636
$parser->setText(file_get_contents('php://stdin'));
3737

38-
$text_body = $parser->getMessageBody('text');
39-
40-
$text_body_headers = $parser->getMessageBodyHeaders('text');
41-
$content_type = idx($text_body_headers, 'content-type');
42-
if (
43-
!phutil_is_utf8($text_body) &&
44-
(preg_match('/charset="(.*?)"/', $content_type, $matches) ||
45-
preg_match('/charset=(\S+)/', $content_type, $matches))
46-
) {
47-
$text_body = phutil_utf8_convert($text_body, 'UTF-8', $matches[1]);
38+
$content = array();
39+
foreach (array('text', 'html') as $part) {
40+
$part_body = $parser->getMessageBody($part);
41+
$part_headers = $parser->getMessageBodyHeaders($part);
42+
$content_type = idx($part_headers, 'content-type');
43+
if (
44+
!phutil_is_utf8($part_body) &&
45+
(preg_match('/charset="(.*?)"/', $content_type, $matches) ||
46+
preg_match('/charset=(\S+)/', $content_type, $matches))
47+
) {
48+
$part_body = phutil_utf8_convert($part_body, 'UTF-8', $matches[1]);
49+
}
50+
$content[$part] = $part_body;
4851
}
4952

5053
$headers = $parser->getHeaders();
@@ -57,10 +60,7 @@
5760

5861
$received = new PhabricatorMetaMTAReceivedMail();
5962
$received->setHeaders($headers);
60-
$received->setBodies(array(
61-
'text' => $text_body,
62-
'html' => $parser->getMessageBody('html'),
63-
));
63+
$received->setBodies($content);
6464

6565
$attachments = array();
6666
foreach ($parser->getAttachments() as $attachment) {

0 commit comments

Comments
 (0)