More encoding cleanup #940
Conversation
I agree with everything except perhaps the conclusion. I'd argue we want either bytes or characters (virtually) all the time (and currently we're almost always characters). You seem to argue that toString ought to return bytes consistently. But recall that XML::LibXML has already provided us with 1 toString that returns bytes and many that return characters! Simplest is to isolate that 1 toString method (or not use it?) |
Well I would be happy to go that way too, but that's not as simple either, as there is another method that returns bytes after serializing - So it's messy either way... I almost catch myself wishing there was a |
Yeah, OK, so two methods --- actually, I was bunding that with the XML::LibXML::Document->toString method --- and in fact, they're nicely encapsulated in LaTeXML::Post::Writer. Perhaps that's a good model for it? |
Writer is nice for the classic local uses, but for LaTeXML.pm it is not possible to use it -- for example in the web services / daemon server uses, where the serialized result is passed back to some middleware which passes it over the wire to the final recipient. Not seeing any easy wins... If there was a reliable generic |
Follow-up to #938 .
In fact I think this PR demonstrates why I think leaving things in the current state will cause more trouble (at least for me) down the line when I forget this discussion thread. Having identically named
toString
methods that return different types of data (unencoded Perl chars vs encoded Unicode bytes) is just awkward.Then again I realize there are more toString methods out there, and they all use characters internally. So might as well punt with this... I just caught myself again mistakenly double-encoding in LaTeXML.pm (see line 327 in diff), it's just too opaque to easily notice which method is getting called.