Skip to content
This repository
Browse code

Adds a hack that prevents Scripto::removeHtmlAttributes() from mungin…

…g Unicode characters.
  • Loading branch information...
commit 9b1fe6e27b401a0144636377ca0b00374f7098dd 1 parent e70ef1c
jimsafley authored October 10, 2011

Showing 1 changed file with 4 additions and 2 deletions. Show diff stats Hide diff stats

  1. 6  lib/Scripto.php
6  lib/Scripto.php
@@ -741,6 +741,7 @@ static public function isValidApiUrl($apiUrl)
741 741
      * often contains MediaWiki-specific attributes that may conflict with local 
742 742
      * settings.
743 743
      * 
  744
+     * @see http://www.php.net/manual/en/domdocument.loadhtml.php#95251
744 745
      * @param string $html
745 746
      * @return string
746 747
      */
@@ -752,9 +753,10 @@ static public function removeHtmlAttributes($html)
752 753
             return $html;
753 754
         }
754 755
         
755  
-        // Load the HTML into DOM.
  756
+        // Load the HTML into DOM. Must inject an XML declaration with encoding 
  757
+        // set to UTF-8 to prevent DOMDocument from munging Unicode characters.
756 758
         $doc = new DOMDocument();
757  
-        $doc->loadHTML($html);
  759
+        $doc->loadHTML('<?xml encoding="UTF-8">' . $html);
758 760
         $xpath = new DOMXPath($doc);
759 761
         
760 762
         // Iterate over and remove all attributes.

0 notes on commit 9b1fe6e

Please sign in to comment.
Something went wrong with that request. Please try again.