Bibliography system, now built in. #64

Drew-S · 2018-05-23T21:43:20Z

Uses jsdom and citation-js to create bibliography and format citations. Built in system that is called after pug has finished its rendering. Currently included by default, setting the style is done so through the bibliography mixin. This mixin behaviour will change with a plugin system and style will be defined when the plugin is defined in the user's pug file.

Add a citation +cite(key, page)
Add a bibliography +bibliography(style)

The key needs to be an acceptable key that citation-js can use and page is optional (page is for a range of pages the writer is referencing). The bibliography style defaults to citation-apa and needs to be a default accepted style that citation-js has built in.

Currently, I have not implemented a way to use a local bibtex or similar libraries, but citation-js does have support for that and I plan to add it. Additionally, citation-js has a plugin system itself and there may be support for third party styles, this may be added as well.

Zulko · 2018-05-23T23:36:10Z

I really appreciate this, just the proof of principle itself that the bibliography can be done is really nice. I also realize that converters.js will soon be crowded. Give us a greenlight and well merge.

Important edit: please make the bibiography optional, especially since it parses the whole document so it is bound to be slow (in my use cases, the html variable is often HUGE because I inline complex SVGs, so everything parsing html is a bottleneck).

Could cheerio be used instead of JSDOM since cheerio is already a dependency of ReLaXed ?

Right now you are parsing the full page to find citations, but would there be a way that the +cite mixin "logs in" the citations, i.e. add the citations to a global references list variable, so that +bibliography would just need to read that list instead of parsing the full HTML ?

Drew-S · 2018-05-23T23:45:33Z

It is ready to go as it is. I am looking into the puppeteer API, which I am not familiar with, but with the little experimenting I have done it looks like I can accomplish the bibliography without adding in jsdom, although it will take a little longer to get it done.

It looks like I could also do the page numbers in relation to a ToC with puppeteer, but again it will take some time for me to understand the API.

edit: I am not familiar with cheerio, but I will take a look at the API, it sounds like it will do that. As for +cite adding to a global variable, I would like to do that as I agree parsing the entire HTML is poor design but I am unaware of a way to do that yet.

Drew-S · 2018-05-24T02:53:25Z

So I figured out how to build the bibliography using cheerio instead of jsdom. So we can save requirements and what not there. As for the pug portion, I see no way of having a pug mixin affect a global variable that can be referenced outside of the pug / rendered html file, meaning we will still need to parse the html file to do a bibliography either way.

The way I implemented cheerio in switching over, improves a little bit about how your footer and header is being handled and in doing so create only one cheerio of the main document so that bibliography and footer / header affects.

edit: I will be adding ~~html.indexOf()~~ regex.test() checks for bibliography so loading of the html into cheerio is not done unless needed.

Zulko · 2018-05-24T07:43:02Z

That last change is a problem for me, because you removed an optimization of the page header/footer handling. As I said cheerio is slow on big pages, so in my code I only apply it to the end of the page, where the footer and header are defined, so that it always take almost no time at all.

So the two should remain separate cheerios (which will be fine, because the cheerio of page header/footer takes no time).

But the notion that separate plugins could share a cheerio is interesting, the plugins to come could have a special subclass of plugins that work on cheerio. If at least one of these plugins is activated, ReLaXed cheerios the page once, gives the cheerioed page it to each cheerio-plugin in turn, and at the end transforms back the cheerio into html. But again, page header/footer would not be a cheerio plugin, because it only needs to parse the end of the page.

Drew-S · 2018-05-24T15:06:56Z

So I am looking into puppeteer and I know through a bit of work, it can get the elements used for creating the header and footer in the final pdf. As long as the selection uses await it will not render incorrectly as it serializes the call before page.pdf(). Given that puppeteer has to render the entire document to create the final pdf we could potentially cut out cheerio as an additional renderer and reduce the runtime even further if we can fully manipulate the DOM from puppeteer. ~~I have not figured out fully how to manipulate~~, only retrieve HTML strings from puppeteer, but I do believe it is possible to manipulate it and to create a ToC and Bibliography.

edit: Turns out, the way to grab the HTML is the same as manipulate, and it works to affect final render. I will convert my bibliography over to that, as well as the header and footer. If that should work I would suggest we cut out cheerio.

Zulko · 2018-05-24T16:21:35Z

From what i understand you are using the fact that Chrome needs to parse the HTML anyways, and you are using it as your DOM generator. I believe that's an elegant way to get the content of the footer and header (I just hope that the DOM conserves internal scripts and CSS, idk, I'm not a DOM expert). I'm not sure how that would work for the bibliography though, but I'm interested in what you will find.

Drew-S · 2018-05-24T16:24:34Z

That is what I am doing, and it appears to be working so far. The code does end up being a little more complicated, but it works. Instead of manipulating the DOM in cheerio, we cut out cheerio, save some runtime, and use Chrome to manipulate the DOM. With cheerio we are generating the DOM twice, (with jsdom, thrice), but if we can manipulate with just Chrome, only once.

Edit: So, I ran a few tests with no cheerio just puppeteer, cheerio with puppeteer, and just puppeteer with the basic example. Good news. With or without cheerio, the runtime is about the same for by bibliography example (3.2s) same either way, the basic example, runs in (0.28s) time, so if the user does not have a bibliography, footer, or header it runs as fast as it needs. I am going to look into puppeteer a bit further before I upload any changes. ~~I am wondering if we can speed it up, as it is a pretty noticeable increase in runtime either way.~~ It is not puppeteer that is slowing it down, its citation-js it may be slow because it is retrieving data from wikidata for my example, or in its operations, I believe that the bibliography will always slow it down by about 3s. Testing just the footer increases the runtime by (0.02s) (0.28s -> 0.3s) from basic.

Drew-S · 2018-05-24T17:32:05Z

Now that I have an understanding of how puppeteer works I can probably figure out a way to create a ToC next. The bibliography in puppeteer has a good chunk of code and now converters.js is kinda large, as I work on the ToC I will probably split the content out into different files to make the work easier, as well as add code commenting. But for now, even with a larger converters.js I believe the bibliography is as good as its going to get for now.

Currently, the in text citation is limited to APA style only, even though the bibliography itself supports all of citation-js built in styles. Since citation-js does not do in text citations, it only does the bibliography and data managing, I will need to build all the different in text formats for the different styles.

Zulko · 2018-05-24T17:37:22Z

Please keep the cheerio exposure here:
https://github.com/Drew-S/ReLaXed/blob/059a6227ffd215581869581c347df32ed3fb33c6/src/converters.js#L154

I need it in the builtin_mixin.pug

Drew-S · 2018-05-24T17:44:18Z

Yea, I can put it back in, I removed it to see if it actually affected anything and from my tests it did not, I did not realise it was used in the builtin_mixin.pug file.

Zulko · 2018-05-24T17:45:25Z

the tests are very incomplete, which is why I'm having a look at your commits as you go 😛

Zulko · 2018-05-25T12:07:33Z

green light ?

Drew-S · 2018-05-25T16:25:02Z

Definitely. The bibliography is good to go.

Drew Sommer added 11 commits May 22, 2018 18:10

Added a bibliography system, using citation-js and mixins

43e2954

Added Bibliography system using citation-js

f0db35b

Fix some minor git error

30b3b92

Added .directory to ignore list

69c9e7f

Naming convention

f72499e

fixed a typo

faad68a

Fixed typo the wrong way (fixed), improved naming

7015d74

Made bibliography system to be built in.

140e2ca

Placed bibliography html render in wrong spot, removed extra tags.

50e7e5d

Fixed error in comparing citations and bibliography elements

529ad87

Fixed intext citation, page was ref data-key, not data-page

bcbad49

Switched from jsdom to cheerios. Made cheerios a single instance

a5dc06d

Drew Sommer added 3 commits May 24, 2018 10:42

Convert from cheerio to puppeteer fully

bf830af

Testing runtime variations

9ff3a32

Switched bibliography from cheerio to puppeteer

059a622

Added cheerio back for pug rendering.

7408379

Zulko merged commit 42342cc into RelaxedJS:master May 25, 2018

Drew-S deleted the bibliography branch May 25, 2018 17:29

Drew-S mentioned this pull request May 25, 2018

Citation/bibliography support #27

Closed

will-hart mentioned this pull request May 30, 2018

Loading bibtex files for citations #74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bibliography system, now built in. #64

Bibliography system, now built in. #64

Drew-S commented May 23, 2018

Zulko commented May 23, 2018 •

edited

Drew-S commented May 23, 2018 •

edited

Drew-S commented May 24, 2018 •

edited

Zulko commented May 24, 2018

Drew-S commented May 24, 2018 •

edited

Zulko commented May 24, 2018

Drew-S commented May 24, 2018 •

edited

Drew-S commented May 24, 2018 •

edited

Zulko commented May 24, 2018

Drew-S commented May 24, 2018

Zulko commented May 24, 2018

Zulko commented May 25, 2018

Drew-S commented May 25, 2018 •

edited

Bibliography system, now built in. #64

Bibliography system, now built in. #64

Conversation

Drew-S commented May 23, 2018

Zulko commented May 23, 2018 • edited

Drew-S commented May 23, 2018 • edited

Drew-S commented May 24, 2018 • edited

Zulko commented May 24, 2018

Drew-S commented May 24, 2018 • edited

Zulko commented May 24, 2018

Drew-S commented May 24, 2018 • edited

Drew-S commented May 24, 2018 • edited

Zulko commented May 24, 2018

Drew-S commented May 24, 2018

Zulko commented May 24, 2018

Zulko commented May 25, 2018

Drew-S commented May 25, 2018 • edited

Zulko commented May 23, 2018 •

edited

Drew-S commented May 23, 2018 •

edited

Drew-S commented May 24, 2018 •

edited

Drew-S commented May 24, 2018 •

edited

Drew-S commented May 24, 2018 •

edited

Drew-S commented May 24, 2018 •

edited

Drew-S commented May 25, 2018 •

edited