-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add citation support #129
Conversation
30829da
to
cab0f8e
Compare
96cd8d2
to
8a377cd
Compare
@jgm - can I ask you a high-level question about citeproc and pandoc integration for citation rendering? You render citations independently of the document, and insert them in the document how, and when? |
@bdarcus - after the input format is parsed to a Pandoc AST, we apply processCitations :: PandocMonad m => Pandoc -> m Pandoc which transforms the Pandoc AST by (1) replacing each citation with the formatted citation and (2) adding a bibliography. The code is in Text.Pandoc.Citeproc. The transformed AST can then be rendered by any of the pandoc writers. Small complication: for display details, we use special Span and Div elements. These will be ignored by most writers, but for a few writers we've implemented code that responds to them by doing the proper formatting (e.g. docx, latex, html). |
Thanks @jgm! I have a hard time reading Haskell code. Am I correct that the output you use from citeproc is basically the same as the server JSON; an array of citation strings? |
My Haskell citeproc library uses polymorphic types. -- | Process a list of 'Citation's, producing formatted citations
-- and a bibliography according to the rules of a CSL 'Style'.
-- If a 'Lang' is specified, override the style's default locale.
-- To obtain a 'Style' from an XML stylesheet, use
-- 'parseStyle' from "Citeproc.Style".
citeproc :: CiteprocOutput a
=> CiteprocOptions -- ^ Rendering options
-> Style a -- ^ Parsed CSL style
-> Maybe Lang -- ^ Overrides default locale for style
-> [Reference a] -- ^ List of references (bibliographic data)
-> [Citation a] -- ^ List of citations to process
-> Result a For pandoc we use instance CiteprocOutput Inlines where
... We also have an instance for HTML, which we use for the standard citeproc test suite. The advantage of this is that when we're using pandoc, we can define bibliography entries with any of the formatting pandoc provides (e.g. math), and this will be carried through all the way to the result. |
I only need to implement this to a proof-of-concept state ATM, so my plan is just return something similar to the citeproc server JSON. {
"citations": [ ... ],
"bibliography": [ ... ],
} I was just confused how one would replace the citation input with that output, but I guess it doesn't matter too much now.
Right. Am thinking to use |
Signed-off-by: Bruce D'Arcus <bdarcus@gmail.com>
This more closely aligns the model with the haskell citeproc implementation. Signed-off-by: Bruce D'Arcus <bdarcus@gmail.com>
Add a struct to handle intermediately rendered output. The intention is something similar to the haskell citeproc server json. Signed-off-by: Bruce D'Arcus <bdarcus@gmail.com>
08be9bd
to
855b3bb
Compare
Signed-off-by: Bruce D'Arcus <bdarcus@gmail.com>
7e227a2
to
ea15946
Compare
Signed-off-by: Bruce D'Arcus <bdarcus@gmail.com>
It seems right to add this key piece of functionality next.
I will likely only add author-date initially, since that's all I really use myself. But if so, I will design it all along the same lines as 1.0.
Also, I will initially only support a more abstract import format; not actual documents. Still waiting on djot support for citations.
I thought I had this working, but it turns out not;
process_citations
is currently returning empty vectors.Digging a bit more, I think I may need to rethink and refactor the rendering code to account for the citations.
Details
I'm not sure how best to do this, but probably need to look at https://github.com/jgm/citeproc and https://github.com/zotero/citeproc-rs, though I have a hard time understanding the code in many places.
This could be the citation definition, but doesn't seem right.
https://github.com/zotero/citeproc-rs/blob/2ab195a1e6f84f0ff284813ece61dc62096abbfe/crates/pandoc-types/src/definition.rs#L222
See, though, the design document. It takes a parallel approach, where in "Pass 1", it creates different representations of the intermediate output, that can be resolved in "Pass 2."
haskell citeproc
Here's the haskell processor type, which makes more sense to me.
https://github.com/jgm/citeproc/blob/6969ce218d0dfdee29d54cce674c7f9cef4b4f0a/src/Citeproc/Types.hs#L310
https://github.com/jgm/citeproc/blob/6969ce218d0dfdee29d54cce674c7f9cef4b4f0a/src/Citeproc/Types.hs#L263
Here's the high-level processing logic, which is basically what I am planning here.
https://github.com/jgm/citeproc/blob/6969ce218d0dfdee29d54cce674c7f9cef4b4f0a/src/Citeproc.hs#L20C23-L20C23
Question: how are rendered citations inserted in document?
Disambiguation
... I also need to figure out where and how disambiguation fits in this.
https://github.com/jgm/citeproc/blob/6969ce218d0dfdee29d54cce674c7f9cef4b4f0a/src/Citeproc/Eval.hs#L408
I'm hoping other aspects of this design will make this part easier, but I haven't yet figured it out.
My initial thoughts:
The main aspects of disambiguation I need to focus on first are (author) names, and years.
The latter is easy because in practice it's global. So I've already implemented it.
The former is the tricky piece, since typically it applies to citations, and not bibliographies (I guess unless a style requires a given name initial to be expanded?).
I suppose one option would be to follow the citeproc-rs approach: somehow generate alternate name representations on first pass, and disambiguate them separately.
Maybe I could create a hash-table for author names, something vaguely like:
Regardless of the details, the idea would be to lookup the right name with disambiguation string in that hash map.