New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two more wrongly encoded cases in LaTeXML.pm #949

Merged
merged 1 commit into from Feb 25, 2018

Conversation

Projects
None yet
2 participants
@dginev
Collaborator

dginev commented Feb 7, 2018

While investigating #948 I spotted a broken bold 0 encoding when printing to stdout from latexmlc.

Turns out there were two cases that needed new patches

  • One was the fatal error case for core processing, which was missing the encode call
  • The other was the XML format case in post-processing, which was leveraging the duck-typing of all document objects (i.e. all of them having toString) to return differently encoded values -- since exactly one of our toString calls isn't returning bytes. So I added an explicit check there.

Keeps feeling awkward... I still don't care enough to raise a larger stink about refactoring :> If I hit this once again I might.

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 7, 2018

One way of seeing this may be that every use of the serialize_aux-based toString requires/expects encoded unicode. Which may be an indication that the encode call could be pushed more tightly into the actual method?

@brucemiller

This comment has been minimized.

Owner

brucemiller commented Feb 7, 2018

Actually that's not true. I should have caught that in your comments, but serialize_aux is internal to implementing LaTeXML::Core::Document->toString so (virtually) "nobody" uses it directly, but everybody uses toString and they expect characters. (should correct the comments). As I said elsewhere, only XML::LibXML::Document->toString (and toStringHMTL) return bytes and those really should be locked away to prevent confusion

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 7, 2018

Ah, but I phrased my claim extremely carefully this time.

All toString calls for the final output of the created Document by latexml, be it Core, Post, or some fragment in the --whatsout=fragment route, are currently expected to return bytes. All classic executables, as well as LaTeXML.pm, do a print assuming the payload is already in bytes.

This is the entire realm I was quantifying over when I said every, I was being quite careful.

The internal toString methods are not something i have studied closely recently, but I do recall the main workhorse internally tend to be the ToString and Stringify global functions instead.

Anyway, I think we are saying the same thing from 2 perspectives. Maybe renaming all Document methods away from toString into a serialize or serializeFinal call, that is always bytes, and mandating all toString calls return chars, would be the right refactor? And we refactor/rename away any uses of the toString calls in the executables and Writer, to be sure.

@brucemiller

This comment has been minimized.

Owner

brucemiller commented Feb 25, 2018

Oh! I forgot to merge... is this one still needed?

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 25, 2018

It is still needed, yes.

I thought you left it open so that I rework it so that there is a specially named method that always encodes to Unicode, but if you have decided that is a long-term goal, merging here is still a good choice. Fixes a problem.

@brucemiller

This comment has been minimized.

Owner

brucemiller commented Feb 25, 2018

Ah, not really; a nice safe general approach will be good, but I'm not waiting.

@brucemiller brucemiller merged commit 1b8e67b into brucemiller:master Feb 25, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dginev dginev deleted the dginev:stdout-pmml-latexmlc branch Feb 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment