Permalink
Browse files

Adds a hack that prevents Scripto::removeHtmlAttributes() from mungin…

…g Unicode characters.
  • Loading branch information...
jimsafley committed Oct 10, 2011
1 parent e70ef1c commit 9b1fe6e27b401a0144636377ca0b00374f7098dd
Showing with 4 additions and 2 deletions.
  1. +4 −2 lib/Scripto.php
View
@@ -741,6 +741,7 @@ static public function isValidApiUrl($apiUrl)
* often contains MediaWiki-specific attributes that may conflict with local
* settings.
*
+ * @see http://www.php.net/manual/en/domdocument.loadhtml.php#95251
* @param string $html
* @return string
*/
@@ -752,9 +753,10 @@ static public function removeHtmlAttributes($html)
return $html;
}
- // Load the HTML into DOM.
+ // Load the HTML into DOM. Must inject an XML declaration with encoding
+ // set to UTF-8 to prevent DOMDocument from munging Unicode characters.
$doc = new DOMDocument();
- $doc->loadHTML($html);
+ $doc->loadHTML('<?xml encoding="UTF-8">' . $html);
$xpath = new DOMXPath($doc);
// Iterate over and remove all attributes.

0 comments on commit 9b1fe6e

Please sign in to comment.