Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some spaces in docx is removed #3

Closed
JayXon opened this issue Sep 5, 2014 · 16 comments
Closed

some spaces in docx is removed #3

JayXon opened this issue Sep 5, 2014 · 16 comments

Comments

@JayXon
Copy link
Owner

JayXon commented Sep 5, 2014

The reason is TinyXML2 does not support xml:space="preserve".

<w:t xml:space="preserve"> </w:t>

will become

<w:t xml:space="preserve"/>

But the author of TinyXML2 is not willing to fix it.
Should I switch to another xml library?

@JayXon JayXon added bug labels Sep 5, 2014
@javiergutierrezchamorro
Copy link

Have not found yet any document that experiences the removal of spaces. Not sure if this is a highly probable issue or not.

@nullptr-leo
Copy link

This situation often occures in the graph or sheet captions.

JayXon added a commit that referenced this issue Oct 27, 2014
@nullptr-leo
Copy link

Is this bug fixed now? The offcial says RapidXml has a better performance.

@JayXon
Copy link
Owner Author

JayXon commented Jan 20, 2015

There is a workaround to skip a xml inside docx, so it should be safe to use leanify on docx now.
But I'm not sure if this issue happens to xlsx and pptx.
Rapidxml looks like a dead project, haven't updated for 5 years.

@TPS
Copy link

TPS commented Dec 27, 2015

Known issue filed @ leethomason/tinyxml2#242. Maybe chime in there? Author seems unwilling needs help/patch to fix.

@JayXon
Copy link
Owner Author

JayXon commented Dec 28, 2015

I'm considering switching to pugixml, although it does not support xml:space="preserve", it has parse_ws_pcdata_single flag which will preserve white space in this case. It's also much faster than Tinyxml2 yet still lightweight.

@javiergutierrezchamorro

Being it faster is a good deal for nowadays large XML files.

@TPS
Copy link

TPS commented Jan 10, 2016

I figured bringing it up directly @ zeux/pugixml#74 would probably be helpful.… ☺

Update: Heh, @zeux seems to have accepted zeux/pugixml#74 as enhancement.… Stay tuned! ☺

Last update from zeux/pugixml#74: Oh, well, denied (for perhaps good reasons — read the long update in linked issue).

How do you all plan to proceed?

@JayXon
Copy link
Owner Author

JayXon commented Jan 10, 2016

I can always stick to the original plan of using parse_ws_pcdata_single even if pugixml doesn't support xml:space="preserve". It will keep unnecessary whitespace in some cases, but it's better than corrupt the file.

@zeux also mentioned that I should be able to remove the unnecessary whitespace in Leanify by using "some advanced client code".

@TPS
Copy link

TPS commented Jan 10, 2016

I figured that's how you'd want to go.… Looking forward to it! ☺

JayXon added a commit that referenced this issue Feb 27, 2016
@JayXon
Copy link
Owner Author

JayXon commented Mar 15, 2016

The switch to pugixml is complete. Could you guys try out the latest nightly? Thanks.

@TPS
Copy link

TPS commented Mar 15, 2016

I use it via FileOptimizer, so, how about it, @javiergutierrezchamorro?

@javiergutierrezchamorro

@TPS it is already updated in the repository as r440: https://sourceforge.net/p/nikkhokkho/code/440/

@TPS
Copy link

TPS commented Mar 15, 2016

Thanks, @JayXon & @javiergutierrezchamorro.

@JayXon
Copy link
Owner Author

JayXon commented Mar 20, 2016

Closing this, feel free to reopen this or create a new issue if you found any problem.

@TPS
Copy link

TPS commented Nov 21, 2023

Sorry to necropost, but has PugiXML (or current XML parser/optimizer) done everything needed for this? Just FYI, TinyXML just got this implemented @ leethomason/tinyxml2#941.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants