closedown biblatex-cslc-onverter #7

johanneswilm · 2019-11-12T20:24:11Z

Hey, I just discovered this chart. I have been participating in the maintenance of biblatex-csl-converter over the past few years. Based on your chart it looks like Idea (reworked) gives the same output quality as biblatex-csl-converter. Does that mean that it can be used as a drop in replacement and that it covers all the same features? If that is the case, is there any reason why I would continue to maintain biblatex-csl-converter?

The text was updated successfully, but these errors were encountered:

larsgw · 2019-11-12T20:47:25Z

Does that mean that it can be used as a drop in replacement and that it covers all the same features?

Probably not, the Syntax column is a big simplification. The whole chart is meant as a way to compare different parser to replace the current one, and so is only compared on features the current one had or that I wanted for the new one. A number of differences, in terms of features, in idea-reworked, compared to biblatex-csl-converter:

different API, functions instead of classes
no async function
worse error and warning handling: no API for warnings, no error recovery
no field checking (although I'm planning to)
no EDTF, names, etc. support yet but that is definitely planned as well (however, as part of bibtex-mapping)

So it definitely isn't a drop in replacement, as the API is quite different, and depending on your needs it may not be possible at all to switch.

johanneswilm · 2019-11-12T21:04:42Z

Ok, I understand. So "complete" doesn't mean "feature complete" but rather "completely covers what the other parser did"? Maybe that could be added somewhere as else it looks a bit misleading and users that may be better off using one the other parsers are lead to believe that they shouldn't. I'd prefer not to have to set up a different chart making counter claims, etc. . Speed isn't much of a concern for Fidus Writer's usecase of biblatex-csl-converter as it's totally fine to wait 250 ms for a single citation to be converted and even up to several minutes if a user uploads their entire mega collection as processing will happen entirely on that user's machine.

Accuracy is more important and also keeping maintenance costs down. So if there is another parser that can do the exact same but is maintained by someone else, I'd like to shut down biblatex-csl-converter. And if there isn't one, then I'd like for everyone else out there who needs the same functionality to contribute to biblatex-csl-converter so that we don't have to do all the maintenance by ourselves. That's why it would be nice to make sure people aren't mislead by that chart somehow.

And yes, please once you think your parser or one of the other ones covers all the features, let me know and I can see whether it still makes sense to put an AST converter on top and drop biblatex-csl-converter altogether.

larsgw · 2019-11-12T21:19:17Z

"complete" means nothing more and nothing less than that it parses syntax.bib accurately, which encompasses all the syntax I had in mind for the new parser (apart from syntax within values).

Maybe that could be added somewhere as else it looks a bit misleading and users that may be better off using one the other parsers are lead to believe that they shouldn't. I'd prefer not to have to set up a different chart making counter claims, etc.

That's fair, I just didn't really intend this repository for other users to make choices with. What's missing from the description is "the new BibTeX parser formula for Citation.js". And the comparisons where either because I wanted to see if my new parser was up to the task, or because someone asked me to add it to the comparison. But I definitely see where you're coming from, and you're not the only one, so I'll change it up and also add more detailed comparisons.

I can see whether it still makes sense to put an AST converter on top

I'm not really sure what you mean by this. How is an AST converter "on top", and if you'd be dropping biblatex-csl-converter where woud it be on top of?

johanneswilm · 2019-11-12T21:25:55Z

But I definitely see where you're coming from, and you're not the only one, so I'll change it up and also add more detailed comparisons.

Thank you very much for that. And yes, just a little bit of wording so that others understand what the purpose of the chart is and that it's not a full feature comparison of everything is all that I'm asking for. The comparison is still quite interesting.

I'm not really sure what you mean by this.

Sorry, let me reword. Currently biblatex-csl-converter outputs exactly the javascript object format we use internally in Fidus Writer. So if we switch to something else, then we'll probably need that parser + a converter from the output of that parser to the format we use internally in Fidus Writer. So there would be a bit of development cost creating this converter. That's all I was trying to say.

retorquere · 2019-11-13T00:09:31Z

I don't mean to pile on just to be antagonistic, but idea-reworked parses syntax.bib (which is invalid BTW -- biblatex chokes on it) into

[
  {
    type: 'book',
    label: 'sweig42',
    properties: {
      author: "Stefan Swe{\\i}g and Xavier D\\'ecoret",
      title: ' The {impossible} ℡—book ',
      publisher: ' D\\"ead Poₑeet Society',
      year: 1942,
      month: '03'
    }
  }
]

I don't know if I'm calling it wrong:

const parser = require('./lib/idea-reworked')
const fs = require('fs')
console.log(parser.parse(fs.readFileSync('test/files/syntax.bib', 'utf-8')))

but it doesn't seem to do diacritics replacement, anything with braces, and for the subscript interpretation it just picks up the first character. Also, biblatex ignores leading and trailing spaces so title and publisher should have been trimmed. And TEL is superscript?

retorquere · 2019-11-13T00:19:30Z

Wait, I got that wrong -- syntax.bib has double backslashes in the text, so it's not supposed to do diacritics conversions as there are none. Anyhow, that still leaves braces, subscript and superscript, and trimming.

larsgw · 2019-11-13T08:56:07Z

which is invalid BTW -- biblatex chokes on it

natbib should not, at least the last time I checked.

The double backslashes are by mistake, I'll fix them.
I thought I did trimming, but I'll fix that as well.
For superscript and subscript, I implemented it like that specifically but I don't know why. I'm converting them to Unicode characters which has limited support, but I think CSL supports  and  markup.
TEL gets converted to the corresponding Unicode character in Zotero, which is were I got a lot of stuff from in the first version, and I kept it that way.

retorquere · 2019-11-13T09:06:40Z

which is invalid BTW -- biblatex chokes on it

natbib should not, at least the last time I checked.

Fair enough, it does.

* For superscript and subscript, I implemented it like that specifically but I don't know why. I'm converting them to Unicode characters which has limited support,

But that doesn't apply here -- a unicode subscript e does (clearly) exist, the parser just doesn't convert the other two es.

but I think CSL supports  and  markup.

It does. My parser converts to unicode sub/superscript where possible and uses  and  where that's not possible.

* `TEL` gets converted to the corresponding Unicode character in Zotero, which is were I got a lot of stuff from in the first version, and I kept it that way.

I don't really follow -- in syntax.bib I see TEL as \u54\u45\u4C, after conversion it show up as \u2121. The TEL in the input isn't a single character, it's a word, and title casing by a CSL style is going to affect it differently.

larsgw · 2019-11-13T14:52:03Z

• I found just transforming the first character (if it's supported) more consistent than to create a string with part sub/superscript and part normal text
• Regarding TEL: that's the point (well, not the title casing) https://github.com/zotero/translators/blob/bae2057067e2fde076252a3b897a7e689a173c71/BibTeX.js#L1707

retorquere · 2019-11-13T15:26:36Z

• I found just transforming the first character (if it's supported) more consistent than to create a string with part sub/superscript and part normal text

$_{eee}$ should become either ₑₑₑ or eee, not ₑee. The braces mean that the entire string is subscript.

• Regarding TEL: that's the point (well, not the title casing) https://github.com/zotero/translators/blob/bae2057067e2fde076252a3b897a7e689a173c71/BibTeX.js#L1707

That table is a lossy mapping from unicode to ASCII TeX, you can't always revert this table for TeX to unicode mapping -- TEL being one such instance that should not be reversed. If the unicode char maps to a string that does not contain TeX-reserved characters, you generally do not want to use it as a reverse mapping.

retorquere · 2019-11-13T15:41:36Z

That table is a lossy mapping from unicode to ASCII TeX, you can't always revert this table for TeX to unicode mapping

Case in point: the reverse table is held separately here, and I would argue that
the reverse mapping of {TEL} is a poor choice -- {TEL} means "the phrase TEL, not to be messed with in sentence casing". It does not mean "Telephone Sign" (which is the name of \u2121 in the unicode table).

johanneswilm · 2019-11-13T15:51:55Z

Interesting conversation you guys are having here.

but I think CSL supports  and  markup.

Does that mean this parser does not support the other html tags either? biblatex-csl-exporter currently supports these in CSL export:

const TAGS = {
    'strong': {open:'<b>', close: '</b>'},
    'em': {open:'<i>', close: '</i>'},
    'sub': {open:'<sub>', close: '</sub>'},
    'sup': {open:'<sup>', close: '</sup>'},
    'smallcaps': {open:'<span style="font-variant:small-caps;">', close: '</span>'},
    'nocase': {open:'<span class="nocase">', close: '</span>'},
    'enquote': {open:'“', close: '”'},
    'url': {open:'', close: ''},
    'undefined': {open:'[', close: ']'}
 }

retorquere · 2019-11-13T15:57:21Z

citeproc supports these; enquote and later in your table isn't markup so CSL won't mind. I can't find what CSL formally support, but everything that uses citeproc in its various incarnations will support the markup listed under that link.

johanneswilm · 2019-11-13T16:00:54Z

enquote and later in your table isn't markup so CSL

Right, because as far as I know, citeproc-js doesn't have any corresponding tag for these. All the other ones are in that list you are linking to.

retorquere · 2019-11-13T16:01:23Z

Correct.

johanneswilm · 2019-11-13T16:04:27Z

@retorquere Ah, now I understand your reply. My first comment on this here was not formulated very well. I updated it now. I wasn't asking whether citeproc supports it (I know it does), I was wondering about this parser.

larsgw · 2019-11-14T17:09:12Z

Does that mean this parser does not support the other html tags either?

It does, but not all the commands it seems (code):

const richTextMappings = {
  textit: 'i',
  textbf: 'b',
  textsc: 'sc',
  textsuperscript: 'sup',
  textsubscript: 'sub'
}

retorquere · 2019-11-14T19:07:30Z

That misses at least mkbibbold, bf and bfseries for bold, sl, em, it, itshape, mkbibitalic, mkbibemph, emph for italics, sc and scshape for smallcaps, and citeproc doesn't support <sc>, just 

Parsing stuff like {partially \bf bold} but not this is interesting (in the apocryphal Chinese sense) in that \bf affects everything after it until the end of the current block, so here, only the word bold should be bold. That sample is synthetic, just for illustration; in practice you'd see the much more sensible partially {\bf bold} but not this but here the interesting aspect is that here the braces do not mean nocase. If a block has a command at the start, it is ignored for case protection by bib(la)tex.

larsgw · 2019-11-14T22:39:27Z

Okay, that's some more things to add to the list. This does make me lean towards moving more parts of the parsing to earlier in the process.

For {partially \bf bold} but not this, are the braces still a nocase, since the \bf is not at the start of the block?
<sc> was mentioned in (although not part of) the old specification, I think that is were I got it. It seems to still be included in some test cases

retorquere · 2019-11-15T09:31:48Z

Okay, that's some more things to add to the list. This does make me lean towards moving more parts of the parsing to earlier in the process.

I don't see any other way this can be done. In a one-pass parser, it must be done during the parse, since you need the context to make these decision. In a two-pass parser like mine, the decision can be postponed until the 2nd pass.

For {partially \bf bold} but not this, are the braces still a nocase, since the \bf is not at the start of the block?

Yes:

\documentclass{article}
\usepackage[american]{babel}
\usepackage[backend=biber, style=apa]{biblatex}
\DeclareLanguageMapping{american}{american-apa}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}

@article{03, author = "03", 
title =    "{\bf Next: Bold}",
}

@article{04, author = "04", 
title =    "{Next: \bf Bold}",
}

@article{05, author = "05", 
title =    "{Next: Bold}",
}

\end{filecontents}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\printbibliography
\end{document}

gives

(n.d.). NEXT: BOLD.
(n.d.). Next: Bold.
(n.d.). Next: Bold.

<sc> was mentioned in (although not part of) the old specification, I think that is were I got it. It seems to still be included in some test cases

I think most will actually still support it, but it's out of spec (even if I think it looks better)

retorquere · 2020-03-10T00:05:34Z

I haven't used B-C-C in a while, but it always used to be noticeably faster than the BBT parser. I don't know why the latest tests don't bear this out.

larsgw mentioned this issue Oct 21, 2020

Progress on the active parser ("citationjs") #3

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

closedown biblatex-cslc-onverter #7

closedown biblatex-cslc-onverter #7

johanneswilm commented Nov 12, 2019

larsgw commented Nov 12, 2019

johanneswilm commented Nov 12, 2019

larsgw commented Nov 12, 2019

johanneswilm commented Nov 12, 2019

retorquere commented Nov 13, 2019

retorquere commented Nov 13, 2019

larsgw commented Nov 13, 2019 •

edited

retorquere commented Nov 13, 2019

larsgw commented Nov 13, 2019 •

edited

retorquere commented Nov 13, 2019

retorquere commented Nov 13, 2019

johanneswilm commented Nov 13, 2019 •

edited

retorquere commented Nov 13, 2019 •

edited

johanneswilm commented Nov 13, 2019

retorquere commented Nov 13, 2019

johanneswilm commented Nov 13, 2019

larsgw commented Nov 14, 2019

retorquere commented Nov 14, 2019

larsgw commented Nov 14, 2019

retorquere commented Nov 15, 2019

retorquere commented Mar 10, 2020

closedown biblatex-cslc-onverter #7

closedown biblatex-cslc-onverter #7

Comments

johanneswilm commented Nov 12, 2019

larsgw commented Nov 12, 2019

johanneswilm commented Nov 12, 2019

larsgw commented Nov 12, 2019

johanneswilm commented Nov 12, 2019

retorquere commented Nov 13, 2019

retorquere commented Nov 13, 2019

larsgw commented Nov 13, 2019 • edited

retorquere commented Nov 13, 2019

larsgw commented Nov 13, 2019 • edited

retorquere commented Nov 13, 2019

retorquere commented Nov 13, 2019

johanneswilm commented Nov 13, 2019 • edited

retorquere commented Nov 13, 2019 • edited

johanneswilm commented Nov 13, 2019

retorquere commented Nov 13, 2019

johanneswilm commented Nov 13, 2019

larsgw commented Nov 14, 2019

retorquere commented Nov 14, 2019

larsgw commented Nov 14, 2019

retorquere commented Nov 15, 2019

retorquere commented Mar 10, 2020

larsgw commented Nov 13, 2019 •

edited

larsgw commented Nov 13, 2019 •

edited

johanneswilm commented Nov 13, 2019 •

edited

retorquere commented Nov 13, 2019 •

edited