Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bib-file parsing issues #917

Closed
asmaier opened this issue Jan 5, 2018 · 2 comments
Closed

Bib-file parsing issues #917

asmaier opened this issue Jan 5, 2018 · 2 comments

Comments

@asmaier
Copy link

asmaier commented Jan 5, 2018

I encountered the following errors when running latexmlpost on a *.bib-file of mine. With the original bibtex none of these issues caused problems, so I cannot say, if these errors come from bibtex being too sloppy or latexmlpost being too strict. However because they can cause followup errors in the parsing and latexmlpost seems to have a hardcoded limit of 100 errors above which it stops processing the bibliography, they can prevent one from successfully converting a *.bib-file. So I document them here:

  1. %-sign outsides the field url cause problems. E.g.
@ARTICLE{Bryan1997,
  author = {Bryan, Greg L. and Norman, Michael L.},
  title = {{A Hybrid AMR Application for Cosmology and Astrophysics}},
  year = {1997},
  eprint = {astro-ph/9710187},
  pdf = {Bryan1997.pdf},
  slaccitation = {%%CITATION = ASTRO-PH 9710187;%%},
}
@ARTICLE{Ensslin2006,
  author = {Enßlin, T.~A. and Vogt, C.},
  title = {{Magnetic turbulence in cool cores of galaxy clusters}},
  journal = {A\&A},
  year = {2006},
  volume = {453},
  pages = {447-458},
  month = jul,
  adsnote = {Provided by the SAO/NASA Astrophysics Data System},
  adsurl = {http://adsabs.harvard.edu/abs/2006A%26A...453..447E},
  doi = {10.1051/0004-6361:20053518},
  eprint = {arXiv:astro-ph/0505517},
  keywords = {galaxies: cluster: general, cooling flows, magnetic
	 fields, turbulence, X-rays: galaxies: clusters, intergalactic medium},
}

Removing the field slaccitationand renaming the field adsurl to url fixed the errors.

  1. Be careful with the field month:
@ARTICLE{Kiessling2003,
  author = {Kiessling, M.K.-H.},
  title = {{The ''Jeans swindle'' - A true story-mathematically speaking}},
  journal = {Advances in Applied Mathematics},
  year = {2003},
  volume = {31},
  pages = {132-149(18)},
  month = july,
  doi = {doi:10.1016/S0196-8858(02)00556-0 },
  pdf = {Kiessling2003.pdf},
  url = {http://www.ingentaconnect.com/content/els/01968858/2003/00000031/00000001/art00556},
}
@ARTICLE{Veynante2002,
  author = {Veynante, D. and Vervisch, L.},
  title = {{Turbulent combustion modeling}},
  journal = {Progress in Energy and Combustion Science},
  year = {2002},
  volume = {28},
  pages = {193-266(74)},
  month = March,
  doi = {doi:10.1016/S0360-1285(01)00017-X},
  pdf = {Veynante2002.pdf},
  url = {http://www.ingentaconnect.com/content/els/03601285/2002/00000028/00000003/art00017},
}

You must either use the correct macro for the month field, e.g. month = jul (and not july), or you must use curly brackets month = {March} (and not month = March) (see also https://tex.stackexchange.com/questions/70455/bibtex-month-format) .

  1. Math symbols and operators must be put between $..$, e.g. the following will cause problems with latexmlpost:
@INPROCEEDINGS{Norman1999,
  author = {Norman, M.~L. and Bryan G.~L.},
  title = {{Cosmological Adaptive Mesh Refinement^{CD}}},
  booktitle = {ASSL Vol. 240: Numerical Astrophysics},
  year = {1999},
  pages = {19-+},
  adsnote = {Provided by the NASA Astrophysics Data System},
  pdf = {Norman1999.pdf},
  url = {http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1999numa.conf...19N&db_ key=AST},
}

To fix this write Refinement$^{CD}$ in the title field.

  1. The ambersand symbol & can cause parsing errors
@ARTICLE{Shyy1997,
  author = {Shyy, W. and Krishnamurty, V.S.},
  title = {{Compressibility effects in modeling complex turbulent flows}},
  journal = {Progress in Aerospace Sciences},
  year = {1997},
  volume = {33},
  pages = {587-645(59)},
  abstract = {... In the present review, the
	 compressibility effect is investigated in the context of engineering models
	 needed for complex flow computations, particularly the k-&unknown;
	 model. ...},
  doi = {doi:10.1016/S0376-0421(97)00005-5},
  pdf = {Shyy1997.pdf},
  url = {http://www.ingentaconnect.com/content/els/03760421/1997/00000033/00000009/art00005},
}

Replacing k-&unknown; with the correct k-$\epsilon$ in the abstract field was necessary to get rid of the parsing error.

It would be nice, if these issues could be fixed in latexmlpost or at least give a reasonable error message, see also #916 .

@dginev dginev added this to the LaTeXML-0.8.4 milestone Jan 5, 2018
@brucemiller
Copy link
Owner

Yeah, this is tricky. Firstly, bibtex is indeed forgiving in its design: it never actually processes any TeX/LaTeX. It only rearranges it, according to a bibliography style file (bst), in most cases dropping the data it's not interested in. LaTeXML attempts to process, convert and preserve all the data with the goal of producing an xml representation of the data which can (hopefully) be useful on its own.

Of course, this gets screwed up by not knowing enough about the type of data in each field: such as your adsurl being a form of url; and slaccitation being who-knows-what. LaTeXML's bibtex engine knows the types of the standard fields; perhaps it should process unknown ones as if they were verbatim? But that might be unexpected to some. We've done experiments reading the *.bst files, be even then the types are at best impliicit in the style file. Hmm...

And of course, this is all made worse by the fact that LaTeXML rewrites the bib into a more TeX-like form before processing, and looses track of where the original source was, so the error messages become even more incomprehensible.

Undoubtedly this can all be improved, but needs some thought... and maybe some 'votes'...

@brucemiller
Copy link
Owner

To solve the 1st item, LaTeXML should probably treat any unknown fields as completely verbatim; I've made that patch. That avoids errors in this more common scenario, but will lead to less-than-optimal output if the user expected the field to be processed as markup. In the latter case, they can still declare the fields to fix it up.

I'm not sure what your point in items 2,3,4 is; LaTeXML does give errors for these cases, as latex would (after bibtex processing). I think this is just a dup of #916, that those errors are hidden if the bibliography is processed during postprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants