New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Composited unicode characters not displayed correctly. #1226
Comments
So you just deleted the whole reproducibility section of the issue template? |
Well... Since this is purely a string content issue, I thought it was not neccesary... I can add one if it's a requirement. |
Have exact problem with 0.100.2
If I open the xlsx in Excel 2013 and File -> Save As -> Browse The new excel file is valid. |
Thanks for your help but I didn't change namespaces when using closdxml and inserted the smiley without encoding it. Actually I even extracted the XLSX archive and opened the specific sheet in notepad++ and saw that the smiley was encoded without asking as : I generated the exact same excel with openxml lib and it worked first try. Character was not encoded and displayed correctly in notepad++ Should I open a new issue and link a file + code sample ? |
@yoyos I don't think it is a separate issue. By "should be just written to the file and not escaped" I meant that ClosedXML should just write it out as UTF-8 char without doing anything. It calls EDIT:Basically the result should be this (this is file created by excel for UTF-8 encoding): |
@jahav Did you see the referenced PR and linked issue? As far as I could tell, there isn't an easy way to determine whether the emoji chars need to be escaped, except if you reimplement the logic from that library. |
@igitur Sorry, I missed those. Thanks. |
@igitur Ok, I looked at at I am still confused (don't like PR because it is either another 1MB dependency or a another fork). Confusion often happens when multiple encoding (String in UTF-16, output in UTF-8 and custom OpenXML requirement for XML encoding). I think that it can be solved simply by using classic Basically only add char.IsSurrogate to XmlEncoder.EncodeString: var len = encodeStr.Length;
for (var i = 0; i < len; ++i)
{
var currentChar = encodeStr[i];
if (XmlConvert.IsXmlChar(currentChar))
{
sb.Append(currentChar);
}
else if (char.IsSurrogate(currentChar) && i < len - 1 && XmlConvert.IsXmlSurrogatePair(encodeStr[i + 1], currentChar))
{
sb.Append(currentChar);
sb.Append(encodeStr[++i]);
}
else
{
sb.Append(XmlConvert.EncodeName(currentChar.ToString()));
}
} That is all. The problem stem from passing a a half of a surrogate pair to Encoding in the stream (the |
@yoyos I made a PR #1978 that should fix the issue, but since I couldn't reproduce the original issue, can you please try the dev version of nuget package from the PR if it solves the problem? Here is a guide: https://github.com/ClosedXML/ClosedXML/wiki/Development-Builds Once you confirm, I will merge it the PR #1978 . |
Thanks @jahav With ClosedXML v0.100.3:
With your PR:
|
Read and complete the full issue template
Do you want to request a feature or report a bug?
Version of ClosedXML
0.94.2
What is the current behavior?
When I fill a cell with value:
C0 π 0/2 β0 β οΈ0
It is displayed like this:
C0 _xD83D__xDC4D_ 0/2 β0 β οΈ0
in LibreOffice 6.2.3.2What is the expected behavior or new feature?
Unicode emoji should display correctly.
Now, I don't know if this is a ClosedXML bug, or a LibreOffice bug. It seems π is a composited unicode character... dunno which are the rules to display this in excel and libreoffice.
Reproducibility
This is an important section. Read it carefully. Failure to do so will cause a 'RTFM' comment.
Without a code sample, it is unlikely that your issue will get attention. Don't be lazy. Do the effort and assist the developers to reproduce your problem. Code samples should be minimal complete and verifiable. Sample spreadsheets should be attached whenever applicable. Remove sensitive information.
Code to reproduce problem:
The text was updated successfully, but these errors were encountered: