New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binmode not needed if we always output bytes #938

Merged
merged 2 commits into from Feb 3, 2018

Conversation

Projects
None yet
2 participants
@dginev
Collaborator

dginev commented Feb 3, 2018

Simple enough (now that we have a proper understanding), all result prints expect bytes now, i.e. the result to be printed has already been utf-8 encoded.

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 3, 2018

Fixes #926 for latexmlc

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 3, 2018

well, almost. A certain path through latexmlc still has wide characters as a serialization never gets utf-8 encoded. Which is why I was wondering if we shouldn't put the encode call already in serialize_aux in Document, since I think this is the only serializer that does not explicitly do an encoding to bytes.

The current master branch shifts that responsibility to the executables, notably latexml, which means I need to be extra careful in latexmlc to not skip a case. Luckily some of the tests for the daemon runs test the logs!

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 3, 2018

Added a commit that should handle things with the current approach.

@brucemiller

This comment has been minimized.

Owner

brucemiller commented Feb 3, 2018

The thing that got us in trouble was the inconsistency: one rare case being already encoded when most other stuff was characters. I'd prefer to keep things as consistent and localized as possible. So, I'm less inclined to change Core::Document's approach. But you're right that latexmlc has to deal with a variety of objects and we need a way to know (or use polymorphism) whether each object is encoded or not.

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 3, 2018

Wouldn't a convention that any toString method returns an encoded string be more consistent?

@dginev

This comment has been minimized.

Collaborator

dginev commented Feb 3, 2018

The other type of object that latexmlc can return are archives, and that is just binary data, so it follows the result is just bytes convention

@brucemiller brucemiller merged commit 47597b4 into brucemiller:master Feb 3, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@brucemiller

This comment has been minimized.

Owner

brucemiller commented Feb 3, 2018

Oh, I just merged, but probably would like to request a comment to the effect of method is always bytes (similar to what I added in latexml/math, perhaps?)

@dginev dginev referenced this pull request Feb 3, 2018

Merged

More encoding cleanup #940

@dginev dginev deleted the dginev:latexmlc-stdout-enconding branch Feb 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment