New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure raw binmode when writing archive files in latexmlc #960
Conversation
I am also about to add a test case here, which will keep us away from regressions. |
On 03/19/2018 12:16 PM, Deyan Ginev wrote:
It turns out that without an explicit |:raw| binmode, printing out
binary data on Windows 10 leads to corrupted archives, for reasons I do
not fully understand at the moment.
For unixy systems, files are always just bytes;
the binmode only determines a layer which may
encode, say Unicode, into bytes.
But as I understand it, windows actually
distinguishes binary from text files, so
binmode for a zip apparently matters.
|
@brucemiller indeed, that seems to be the case. I have now added an end-to-end test for epub generation via latexmlc, which runs the executable via the shell, and then checks the resulting file has the expected data integrity and epub contents. I stumbled on a couple of validator issues while debugging via:
so added some minor upgrades to keep the epub valid. |
Courtesy of Travis, I can now also claim the PR keeps the epub generation operational in linux |
Cool! Thanks!! |
Didn't notice exactly where it comes from, but when I make test, now, I get an extra "No obvious problems\n Wrote 931_testdPmT.epub" in the output.... Doesn't cause the tests to fail, but distracts from the quick visual confirmation. |
Yes, it's from this specific PR, it's an extra print from running latexmlc externally. I'll try to contain it and submit another PR. |
Fixes #827 and #806
This is a platform-specific fix, and in fact I need to double-check this keeps the epub generation as-is on Linux.
It turns out that without an explicit
:raw
binmode, printing out binary data on Windows 10 leads to corrupted archives, for reasons I do not fully understand at the moment.While debugging, I first verified the archive integrity, all EPUB-needed files were added to the Archive::Zip object and then serialized via IO::String. However, the same payload lead to different
.epub
file contents when e.g. the compression level was changed, leading to the OPS subdirectory completely disappearing when the archive is viewed in 7zip.In the end, it turned out the produced archives were partially readable by 7zip (my Win10 archive reader of choice), but were silently corrupted at some hard to predict point in the file-writing process.
Adding an explicit
:raw
binmode to the file handle in latexmlc resolved this fully, and created a fully functioning.epub
file in Windows 10.