Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SyntaxHighlighter Code block: improper encoding/decoding of HTML entities #152

Closed
alexsanford opened this issue Jun 12, 2020 · 1 comment · Fixed by #160
Closed

SyntaxHighlighter Code block: improper encoding/decoding of HTML entities #152

alexsanford opened this issue Jun 12, 2020 · 1 comment · Fixed by #160
Labels
[Pri] High Wide impact, no workaround. [Status] Queued In the queue of issues to work on next. [Type] Bug

Comments

@alexsanford
Copy link
Contributor

Originally reported internally by @tbradsha


Steps to reproduce

  1. Create a SyntaxHighlighter Code block, any code language.
  2. Add the following contents (pulled from http://www.htmlhelp.org/reference/html40/entities/):
+-----------------------------------------------------------------------+----------+---------+--------+---------------------------+---+---+
|                               Character                               |  Entity  | Decimal |  Hex   | Rendering in Your Browser |   |   |
+-----------------------------------------------------------------------+----------+---------+--------+---------------------------+---+---+
| Entity                                                                | Decimal  | Hex     |        |                           |   |   |
| no-break space = non-breaking space                                   |     |    |   |                           |   |   |
| inverted exclamation mark                                             | ¡  | ¡  | ¡ | ¡                         | ¡ | ¡ |
| cent sign                                                             | ¢   | ¢  | ¢ | ¢                         | ¢ | ¢ |
| pound sign                                                            | £  | £  | £ | £                         | £ | £ |
| currency sign                                                         | ¤ | ¤  | ¤ | ¤                         | ¤ | ¤ |
| yen sign = yuan sign                                                  | ¥    | ¥  | ¥ | ¥                         | ¥ | ¥ |
| broken bar = broken vertical bar                                      | ¦ | ¦  | ¦ | ¦                         | ¦ | ¦ |
| section sign                                                          | §   | §  | § | §                         | § | § |
| diaeresis = spacing diaeresis                                         | ¨    | ¨  | ¨ | ¨                         | ¨ | ¨ |
| copyright sign                                                        | ©   | ©  | © | ©                         | © | © |
| feminine ordinal indicator                                            | ª   | ª  | ª | ª                         | ª | ª |
| left-pointing double angle quotation mark = left pointing guillemet   | «  | «  | « | «                         | « | « |
| not sign                                                              | ¬    | ¬  | ¬ | ¬                         | ¬ | ¬ |
| soft hyphen = discretionary hyphen                                    | ­    | ­  | ­ |                           |   |   |
| registered sign = registered trade mark sign                          | ®    | ®  | ® | ®                         | ® | ® |
| macron = spacing macron = overline = APL overbar                      | ¯   | ¯  | ¯ | ¯                         | ¯ | ¯ |
| degree sign                                                           | °    | °  | ° | °                         | ° | ° |
| plus-minus sign = plus-or-minus sign                                  | ± | ±  | ± | ±                         | ± | ± |
| superscript two = superscript digit two = squared                     | ²   | ²  | ² | ²                         | ² | ² |
| superscript three = superscript digit three = cubed                   | ³   | ³  | ³ | ³                         | ³ | ³ |
| acute accent = spacing acute                                          | ´  | ´  | ´ | ´                         | ´ | ´ |
| micro sign                                                            | µ  | µ  | µ | µ                         | µ | µ |
| pilcrow sign = paragraph sign                                         | ¶   | ¶  | ¶ | ¶                         | ¶ | ¶ |
| middle dot = Georgian comma = Greek middle dot                        | · | ·  | · | ·                         | · | · |
| cedilla = spacing cedilla                                             | ¸  | ¸  | ¸ | ¸                         | ¸ | ¸ |
| superscript one = superscript digit one                               | ¹   | ¹  | ¹ | ¹                         | ¹ | ¹ |
| masculine ordinal indicator                                           | º   | º  | º | º                         | º | º |
| right-pointing double angle quotation mark = right pointing guillemet | »  | »  | » | »                         | » | » |
| vulgar fraction one quarter = fraction one quarter                    | ¼ | ¼  | ¼ | ¼                         | ¼ | ¼ |
| vulgar fraction one half = fraction one half                          | ½ | ½  | ½ | ½                         | ½ | ½ |
| vulgar fraction three quarters = fraction three quarters              | ¾ | ¾  | ¾ | ¾                         | ¾ | ¾ |
| inverted question mark = turned question mark                         | ¿ | ¿  | ¿ | ¿                         | ¿ | ¿ |
| Latin capital letter A with grave = Latin capital letter A grave      | À | À  | À | À                         | À | À |
| Latin capital letter A with acute                                     | Á | Á  | Á | Á                         | Á | Á |
| Latin capital letter A with circumflex                                | Â  | Â  | Â | Â                         | Â | Â |
| Latin capital letter A with tilde                                     | Ã | Ã  | Ã | Ã                         | Ã | Ã |
| Latin capital letter A with diaeresis                                 | Ä   | Ä  | Ä | Ä                         | Ä | Ä |
| Latin capital letter A with ring above = Latin capital letter A ring  | Å  | Å  | Å | Å                         | Å | Å |
| Latin capital letter AE = Latin capital ligature AE                   | Æ  | Æ  | Æ | Æ                         | Æ | Æ |
| Latin capital letter C with cedilla                                   | Ç | Ç  | Ç | Ç                         | Ç | Ç |
| Latin capital letter E with grave                                     | È | È  | È | È                         | È | È |
| Latin capital letter E with acute                                     | É | É  | É | É                         | É | É |
| Latin capital letter E with circumflex                                | Ê  | Ê  | Ê | Ê                         | Ê | Ê |
| Latin capital letter E with diaeresis                                 | Ë   | Ë  | Ë | Ë                         | Ë | Ë |
| Latin capital letter I with grave                                     | Ì | Ì  | Ì | Ì                         | Ì | Ì |
| Latin capital letter I with acute                                     | Í | Í  | Í | Í                         | Í | Í |
| Latin capital letter I with circumflex                                | Î  | Î  | Î | Î                         | Î | Î |
| Latin capital letter I with diaeresis                                 | Ï   | Ï  | Ï | Ï                         | Ï | Ï |
| Latin capital letter ETH                                              | Ð    | Ð  | Ð | Ð                         | Ð | Ð |
| Latin capital letter N with tilde                                     | Ñ | Ñ  | Ñ | Ñ                         | Ñ | Ñ |
| Latin capital letter O with grave                                     | Ò | Ò  | Ò | Ò                         | Ò | Ò |
| Latin capital letter O with acute                                     | Ó | Ó  | Ó | Ó                         | Ó | Ó |
| Latin capital letter O with circumflex                                | Ô  | Ô  | Ô | Ô                         | Ô | Ô |
| Latin capital letter O with tilde                                     | Õ | Õ  | Õ | Õ                         | Õ | Õ |
| Latin capital letter O with diaeresis                                 | Ö   | Ö  | Ö | Ö                         | Ö | Ö |
| multiplication sign                                                   | ×  | ×  | × | ×                         | × | × |
| Latin capital letter O with stroke = Latin capital letter O slash     | Ø | Ø  | Ø | Ø                         | Ø | Ø |
| Latin capital letter U with grave                                     | Ù | Ù  | Ù | Ù                         | Ù | Ù |
| Latin capital letter U with acute                                     | Ú | Ú  | Ú | Ú                         | Ú | Ú |
| Latin capital letter U with circumflex                                | Û  | Û  | Û | Û                         | Û | Û |
| Latin capital letter U with diaeresis                                 | Ü   | Ü  | Ü | Ü                         | Ü | Ü |
| Latin capital letter Y with acute                                     | Ý | Ý  | Ý | Ý                         | Ý | Ý |
| Latin capital letter THORN                                            | Þ  | Þ  | Þ | Þ                         | Þ | Þ |
| Latin small letter sharp s = ess-zed                                  | ß  | ß  | ß | ß                         | ß | ß |
| Latin small letter a with grave = Latin small letter a grave          | à | à  | à | à                         | à | à |
| Latin small letter a with acute                                       | á | á  | á | á                         | á | á |
| Latin small letter a with circumflex                                  | â  | â  | â | â                         | â | â |
| Latin small letter a with tilde                                       | ã | ã  | ã | ã                         | ã | ã |
| Latin small letter a with diaeresis                                   | ä   | ä  | ä | ä                         | ä | ä |
| Latin small letter a with ring above = Latin small letter a ring      | å  | å  | å | å                         | å | å |
| Latin small letter ae = Latin small ligature ae                       | æ  | æ  | æ | æ                         | æ | æ |
| Latin small letter c with cedilla                                     | ç | ç  | ç | ç                         | ç | ç |
| Latin small letter e with grave                                       | è | è  | è | è                         | è | è |
| Latin small letter e with acute                                       | é | é  | é | é                         | é | é |
| Latin small letter e with circumflex                                  | ê  | ê  | ê | ê                         | ê | ê |
| Latin small letter e with diaeresis                                   | ë   | ë  | ë | ë                         | ë | ë |
| Latin small letter i with grave                                       | ì | ì  | ì | ì                         | ì | ì |
| Latin small letter i with acute                                       | í | í  | í | í                         | í | í |
| Latin small letter i with circumflex                                  | î  | î  | î | î                         | î | î |
| Latin small letter i with diaeresis                                   | ï   | ï  | ï | ï                         | ï | ï |
| Latin small letter eth                                                | ð    | ð  | ð | ð                         | ð | ð |
| Latin small letter n with tilde                                       | ñ | ñ  | ñ | ñ                         | ñ | ñ |
| Latin small letter o with grave                                       | ò | ò  | ò | ò                         | ò | ò |
| Latin small letter o with acute                                       | ó | ó  | ó | ó                         | ó | ó |
| Latin small letter o with circumflex                                  | ô  | ô  | ô | ô                         | ô | ô |
| Latin small letter o with tilde                                       | õ | õ  | õ | õ                         | õ | õ |
| Latin small letter o with diaeresis                                   | ö   | ö  | ö | ö                         | ö | ö |
| division sign                                                         | ÷ | ÷  | ÷ | ÷                         | ÷ | ÷ |
| Latin small letter o with stroke = Latin small letter o slash         | ø | ø  | ø | ø                         | ø | ø |
| Latin small letter u with grave                                       | ù | ù  | ù | ù                         | ù | ù |
| Latin small letter u with acute                                       | ú | ú  | ú | ú                         | ú | ú |
| Latin small letter u with circumflex                                  | û  | û  | û | û                         | û | û |
| Latin small letter u with diaeresis                                   | ü   | ü  | ü | ü                         | ü | ü |
| Latin small letter y with acute                                       | ý | ý  | ý | ý                         | ý | ý |
| Latin small letter thorn                                              | þ  | þ  | þ | þ                         | þ | þ |
| Latin small letter y with diaeresis                                   | ÿ   | ÿ  | ÿ | ÿ                         | ÿ | ÿ |
+-----------------------------------------------------------------------+----------+---------+--------+---------------------------+---+---+


+--------------------------------------------+----------+---------+----------+---------------------------+---+---+
|                 Character                  |  Entity  | Decimal |   Hex    | Rendering in Your Browser |   |   |
+--------------------------------------------+----------+---------+----------+---------------------------+---+---+
| Entity                                     | Decimal  | Hex     |          |                           |   |   |
| quotation mark = APL quote                 | "   | "   | "   | "                         | " | " |
| ampersand                                  | &    | &   | &   | &                         | & | & |
| less-than sign                             | &lt;     | &#60;   | &#x3C;   | <                         | < | < |
| greater-than sign                          | &gt;     | &#62;   | &#x3E;   | >                         | > | > |
| Latin capital ligature OE                  | &OElig;  | &#338;  | &#x152;  | Π                        | Π| Π|
| Latin small ligature oe                    | &oelig;  | &#339;  | &#x153;  | œ                         | œ | œ |
| Latin capital letter S with caron          | &Scaron; | &#352;  | &#x160;  | Š                         | Š | Š |
| Latin small letter s with caron            | &scaron; | &#353;  | &#x161;  | š                         | š | š |
| Latin capital letter Y with diaeresis      | &Yuml;   | &#376;  | &#x178;  | Ÿ                         | Ÿ | Ÿ |
| modifier letter circumflex accent          | &circ;   | &#710;  | &#x2C6;  | ˆ                         | ˆ | ˆ |
| small tilde                                | &tilde;  | &#732;  | &#x2DC;  | ˜                         | ˜ | ˜ |
| en space                                   | &ensp;   | &#8194; | &#x2002; |                           |   |   |
| em space                                   | &emsp;   | &#8195; | &#x2003; |                           |   |   |
| thin space                                 | &thinsp; | &#8201; | &#x2009; |                           |   |   |
| zero width non-joiner                      | &zwnj;   | &#8204; | &#x200C; | ‌                         | ‌ | ‌ |
| zero width joiner                          | &zwj;    | &#8205; | &#x200D; | ‍                         | ‍ | ‍ |
| left-to-right mark                         | &lrm;    | &#8206; | &#x200E; | ‎                         | ‎ | ‎ |
| right-to-left mark                         | &rlm;    | &#8207; | &#x200F; | ‏                         | ‏ | ‏ |
| en dash                                    | &ndash;  | &#8211; | &#x2013; | –                         | – | – |
| em dash                                    | &mdash;  | &#8212; | &#x2014; | —                         | — | — |
| left single quotation mark                 | &lsquo;  | &#8216; | &#x2018; | ‘                         | ‘ | ‘ |
| right single quotation mark                | &rsquo;  | &#8217; | &#x2019; | ’                         | ’ | ’ |
| single low-9 quotation mark                | &sbquo;  | &#8218; | &#x201A; | ‚                         | ‚ | ‚ |
| left double quotation mark                 | &ldquo;  | &#8220; | &#x201C; | “                         | “ | “ |
| right double quotation mark                | &rdquo;  | &#8221; | &#x201D; | ”                         | ” | ” |
| double low-9 quotation mark                | &bdquo;  | &#8222; | &#x201E; | „                         | „ | „ |
| dagger                                     | &dagger; | &#8224; | &#x2020; | †                         | † | † |
| double dagger                              | &Dagger; | &#8225; | &#x2021; | ‡                         | ‡ | ‡ |
| per mille sign                             | &permil; | &#8240; | &#x2030; | ‰                         | ‰ | ‰ |
| single left-pointing angle quotation mark  | &lsaquo; | &#8249; | &#x2039; | ‹                         | ‹ | ‹ |
| single right-pointing angle quotation mark | &rsaquo; | &#8250; | &#x203A; | ›                         | › | › |
| euro sign                                  | &euro;   | &#8364; | &#x20AC; | €                         | € | € |
+--------------------------------------------+----------+---------+----------+---------------------------+---+---+

What I expected

The text would show as above on the live site and in the editor.

What happened instead

In the editor: HTML, decimal, and hex entities are rendered as their respective characters
On the live site: decimal and hex entities are rendered to their respective characters

To complicate matters further, &amp;, &lt;, and &gt; seem to behave in even more non-standard ways:

&amp;:

  • Editor: changes everything to &amp;
  • Live: renders decimal and hex as actual character, and changes actual character as &amp;

&lt;:

  • Editor: changes everything to &lt;
  • Live: renders everything as actual character

&gt;:

  • Editor: renders everything as actual character
  • Live: renders decimal and hex as actual character, and changes actual character as &gt;

Browser / OS version

Chrome 75 on macOS 10.14.5

Context / Source

Reported in #13419156-hc. Was able to replicate on my end. This appears to be a regression per the user report, and I'd say this is a pretty serious bug that directly leads post data corruption/loss.

@alexsanford alexsanford added [Type] Bug [Status] Queued In the queue of issues to work on next. [Pri] High Wide impact, no workaround. labels Jun 12, 2020
@adnanafzal565
Copy link

Same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Pri] High Wide impact, no workaround. [Status] Queued In the queue of issues to work on next. [Type] Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants