Conversation
In EntityArrays, added escape map for CP-1252 encoding and unescape mapping. Added to StringEscapeUtils methods to employ them, preserving existing ESCAPE_HTML4 functionality.
missed a semicolon
|
I've never worked on this project before, could someone tell me what is going on? |
|
This PR is missing unit tests. |
I don't know anything about how to build the project locally |
added smart quotes test separation needed to avoid entitiyarrays test error
update default operation; see https://issues.apache.org/jira/browse/TEXT-192 as to why made default
missed a comma ugh
seems the source is in iso 8859-1
| {"null", null, null}, | ||
| {"ampersand", "bread & butter", "bread & butter"}, | ||
| {"quotes", ""bread" & butter", "\"bread\" & butter"}, | ||
| {"smart quotes", "“bread and circuses”", "\u201Cbread and circuses\u201d"}, |
There was a problem hiding this comment.
I see a new map with dozens of new entries but you are only testing a single value? Am I reading this right or are all the other map entries somehow also tested?
There was a problem hiding this comment.
I tested a single value because the tests seem to test only a few values; adding an exhaustive number of tests seems mostly just to reproduce the existing map.
That said, thinking about how the library handles numeric escapes, I don't think it solves the initial problem that led me towards creating this pull request – translating something like “ to “ or ‰ to ‰ – so I've closed it.
|
Hi @ifly6 only had time to look at your PR now, sorry. Feel free to open a new one if you have another solution for the Windows 1252 charset issue 👍 Thanks! |
Thanks for looking at it; I think the actual way forward would be something to do with changing the numeric unescaper in Commons text to decode certain ranges as Windows-1252 instead treating them as Unicode. I'm not sure right now how to implement it in a manner consistent with the existing code base. |
I think I found a working way to get the NumericUnescaper to work with the CP-1252 section; should I create a new pull request or wait for someone to get back to me? |
|
Feel free to create a new PR, or re-open thos one if you will build up on the work you've already done hhhere |
In EntityArrays, added escape map for CP-1252 encoding and unescape mapping. Added to StringEscapeUtils methods to escape them,
preserving existing ESCAPE_HTML4 functionality.Functionality added in response to bug report https://issues.apache.org/jira/browse/TEXT-192.