Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure raw binmode when writing archive files in latexmlc #960

Merged
merged 2 commits into from Mar 25, 2018

Conversation

@dginev
Copy link
Collaborator

dginev commented Mar 19, 2018

Fixes #827 and #806

This is a platform-specific fix, and in fact I need to double-check this keeps the epub generation as-is on Linux.

It turns out that without an explicit :raw binmode, printing out binary data on Windows 10 leads to corrupted archives, for reasons I do not fully understand at the moment.

While debugging, I first verified the archive integrity, all EPUB-needed files were added to the Archive::Zip object and then serialized via IO::String. However, the same payload lead to different .epub file contents when e.g. the compression level was changed, leading to the OPS subdirectory completely disappearing when the archive is viewed in 7zip.

In the end, it turned out the produced archives were partially readable by 7zip (my Win10 archive reader of choice), but were silently corrupted at some hard to predict point in the file-writing process.

Adding an explicit :raw binmode to the file handle in latexmlc resolved this fully, and created a fully functioning .epub file in Windows 10.

@dginev

This comment has been minimized.

Copy link
Collaborator Author

dginev commented Mar 19, 2018

I am also about to add a test case here, which will keep us away from regressions.

@brucemiller

This comment has been minimized.

Copy link
Owner

brucemiller commented Mar 19, 2018

@dginev

This comment has been minimized.

Copy link
Collaborator Author

dginev commented Mar 19, 2018

@brucemiller indeed, that seems to be the case.

I have now added an end-to-end test for epub generation via latexmlc, which runs the executable via the shell, and then checks the resulting file has the expected data integrity and epub contents.

I stumbled on a couple of validator issues while debugging via:

latexmlc literal:test --dest test.epub --log test.log

so added some minor upgrades to keep the epub valid.

@dginev

This comment has been minimized.

Copy link
Collaborator Author

dginev commented Mar 19, 2018

With the upgrades in this PR, the basic epub test is almost valid, with the exception of some extremely picky CSS rules. But we can merge as-is, as it resolves the Windows-specific problem.

css_epub_validation

@dginev

This comment has been minimized.

Copy link
Collaborator Author

dginev commented Mar 19, 2018

Courtesy of Travis, I can now also claim the PR keeps the epub generation operational in linux 👍

@brucemiller

This comment has been minimized.

Copy link
Owner

brucemiller commented Mar 25, 2018

Cool! Thanks!!

@brucemiller brucemiller merged commit bced839 into master Mar 25, 2018
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@brucemiller

This comment has been minimized.

Copy link
Owner

brucemiller commented Mar 25, 2018

Didn't notice exactly where it comes from, but when I make test, now, I get an extra "No obvious problems\n Wrote 931_testdPmT.epub" in the output.... Doesn't cause the tests to fail, but distracts from the quick visual confirmation.

@dginev

This comment has been minimized.

Copy link
Collaborator Author

dginev commented Mar 26, 2018

Yes, it's from this specific PR, it's an extra print from running latexmlc externally. I'll try to contain it and submit another PR.

@dginev dginev deleted the raw-binmode-for-epub branch Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.