New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with wrongly checking all files inside archive input #814

Merged
merged 1 commit into from Nov 14, 2016

Conversation

Projects
None yet
2 participants
@dginev
Collaborator

dginev commented Nov 14, 2016

This is an embarrassing oversight on my part - i had swapped the places of the grep/map pair, grepping on perl objects, instead of the file name held by them.

That lead to archives with no TeX file from the arXiv corpus to be falsely searched through, and in fact often .cls and .sty files were being converted in the absence of a real source, due to this error.

An example directory tree I just tested with looks like:

-rw-r--r-- 1 dreamweaver dreamweaver  41139 Jun 29  2004 llncs.cls
-rw-r--r-- 1 dreamweaver dreamweaver  12121 Jun 29  2004 mathpartir.sty
-rw-r--r-- 1 dreamweaver dreamweaver   4209 Jun 29  2004 welldef.bbl
-rw-r--r-- 1 dreamweaver dreamweaver  87850 Jun 29  2004 welldef.tex.cry

As the authors used an encrypted .cry file, this archive contains no usable tex source and should be marked invalid by latexml. Which was the intention all along, and is indeed what happens with the patch here. But with the current master, converting the archive holding these files identifies llncs.cls as the main TeX file and tries to convert it, failing with a Fatal.

I'd appreciate a quick merge, would be happy to rerun the unexpected Fatals, which look to be largely due to this oversight. There may be other harder to detect cases of this, but they'll come to light in further reruns.

@brucemiller brucemiller merged commit cb55411 into brucemiller:master Nov 14, 2016

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@brucemiller

This comment has been minimized.

Owner

brucemiller commented Nov 14, 2016

makes sense.

@dginev dginev deleted the dginev:archive-input-extra-ignores branch Feb 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment