Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binmode not needed if we always output bytes #938

Merged
merged 2 commits into from Feb 3, 2018

Conversation

@dginev
Copy link
Collaborator

@dginev dginev commented Feb 3, 2018

Simple enough (now that we have a proper understanding), all result prints expect bytes now, i.e. the result to be printed has already been utf-8 encoded.

@dginev
Copy link
Collaborator Author

@dginev dginev commented Feb 3, 2018

Fixes #926 for latexmlc

@dginev
Copy link
Collaborator Author

@dginev dginev commented Feb 3, 2018

well, almost. A certain path through latexmlc still has wide characters as a serialization never gets utf-8 encoded. Which is why I was wondering if we shouldn't put the encode call already in serialize_aux in Document, since I think this is the only serializer that does not explicitly do an encoding to bytes.

The current master branch shifts that responsibility to the executables, notably latexml, which means I need to be extra careful in latexmlc to not skip a case. Luckily some of the tests for the daemon runs test the logs!

@dginev
Copy link
Collaborator Author

@dginev dginev commented Feb 3, 2018

Added a commit that should handle things with the current approach.

@brucemiller
Copy link
Owner

@brucemiller brucemiller commented Feb 3, 2018

The thing that got us in trouble was the inconsistency: one rare case being already encoded when most other stuff was characters. I'd prefer to keep things as consistent and localized as possible. So, I'm less inclined to change Core::Document's approach. But you're right that latexmlc has to deal with a variety of objects and we need a way to know (or use polymorphism) whether each object is encoded or not.

@dginev
Copy link
Collaborator Author

@dginev dginev commented Feb 3, 2018

Wouldn't a convention that any toString method returns an encoded string be more consistent?

@dginev
Copy link
Collaborator Author

@dginev dginev commented Feb 3, 2018

The other type of object that latexmlc can return are archives, and that is just binary data, so it follows the result is just bytes convention

@brucemiller brucemiller merged commit 47597b4 into brucemiller:master Feb 3, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@brucemiller
Copy link
Owner

@brucemiller brucemiller commented Feb 3, 2018

Oh, I just merged, but probably would like to request a comment to the effect of method is always bytes (similar to what I added in latexml/math, perhaps?)

@dginev dginev mentioned this pull request Feb 3, 2018
@dginev dginev deleted the dginev:latexmlc-stdout-enconding branch Feb 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants