Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bibliography system, now built in. #64

Merged
merged 16 commits into from
May 25, 2018
Merged

Conversation

Drew-S
Copy link
Member

@Drew-S Drew-S commented May 23, 2018

Uses jsdom and citation-js to create bibliography and format citations. Built in system that is called after pug has finished its rendering. Currently included by default, setting the style is done so through the bibliography mixin. This mixin behaviour will change with a plugin system and style will be defined when the plugin is defined in the user's pug file.

  • Add a citation +cite(key, page)
  • Add a bibliography +bibliography(style)

The key needs to be an acceptable key that citation-js can use and page is optional (page is for a range of pages the writer is referencing). The bibliography style defaults to citation-apa and needs to be a default accepted style that citation-js has built in.

Currently, I have not implemented a way to use a local bibtex or similar libraries, but citation-js does have support for that and I plan to add it. Additionally, citation-js has a plugin system itself and there may be support for third party styles, this may be added as well.

@Zulko
Copy link
Member

Zulko commented May 23, 2018

I really appreciate this, just the proof of principle itself that the bibliography can be done is really nice. I also realize that converters.js will soon be crowded. Give us a greenlight and well merge.

Important edit: please make the bibiography optional, especially since it parses the whole document so it is bound to be slow (in my use cases, the html variable is often HUGE because I inline complex SVGs, so everything parsing html is a bottleneck).

Could cheerio be used instead of JSDOM since cheerio is already a dependency of ReLaXed ?

Right now you are parsing the full page to find citations, but would there be a way that the +cite mixin "logs in" the citations, i.e. add the citations to a global references list variable, so that +bibliography would just need to read that list instead of parsing the full HTML ?

@Drew-S
Copy link
Member Author

Drew-S commented May 23, 2018

It is ready to go as it is. I am looking into the puppeteer API, which I am not familiar with, but with the little experimenting I have done it looks like I can accomplish the bibliography without adding in jsdom, although it will take a little longer to get it done.

It looks like I could also do the page numbers in relation to a ToC with puppeteer, but again it will take some time for me to understand the API.

edit: I am not familiar with cheerio, but I will take a look at the API, it sounds like it will do that. As for +cite adding to a global variable, I would like to do that as I agree parsing the entire HTML is poor design but I am unaware of a way to do that yet.

@Drew-S
Copy link
Member Author

Drew-S commented May 24, 2018

So I figured out how to build the bibliography using cheerio instead of jsdom. So we can save requirements and what not there. As for the pug portion, I see no way of having a pug mixin affect a global variable that can be referenced outside of the pug / rendered html file, meaning we will still need to parse the html file to do a bibliography either way.

The way I implemented cheerio in switching over, improves a little bit about how your footer and header is being handled and in doing so create only one cheerio of the main document so that bibliography and footer / header affects.

edit: I will be adding html.indexOf() regex.test() checks for bibliography so loading of the html into cheerio is not done unless needed.

@Zulko
Copy link
Member

Zulko commented May 24, 2018

That last change is a problem for me, because you removed an optimization of the page header/footer handling. As I said cheerio is slow on big pages, so in my code I only apply it to the end of the page, where the footer and header are defined, so that it always take almost no time at all.

So the two should remain separate cheerios (which will be fine, because the cheerio of page header/footer takes no time).

But the notion that separate plugins could share a cheerio is interesting, the plugins to come could have a special subclass of plugins that work on cheerio. If at least one of these plugins is activated, ReLaXed cheerios the page once, gives the cheerioed page it to each cheerio-plugin in turn, and at the end transforms back the cheerio into html. But again, page header/footer would not be a cheerio plugin, because it only needs to parse the end of the page.

@Drew-S
Copy link
Member Author

Drew-S commented May 24, 2018

So I am looking into puppeteer and I know through a bit of work, it can get the elements used for creating the header and footer in the final pdf. As long as the selection uses await it will not render incorrectly as it serializes the call before page.pdf(). Given that puppeteer has to render the entire document to create the final pdf we could potentially cut out cheerio as an additional renderer and reduce the runtime even further if we can fully manipulate the DOM from puppeteer. I have not figured out fully how to manipulate, only retrieve HTML strings from puppeteer, but I do believe it is possible to manipulate it and to create a ToC and Bibliography.

edit: Turns out, the way to grab the HTML is the same as manipulate, and it works to affect final render. I will convert my bibliography over to that, as well as the header and footer. If that should work I would suggest we cut out cheerio.

@Zulko
Copy link
Member

Zulko commented May 24, 2018

From what i understand you are using the fact that Chrome needs to parse the HTML anyways, and you are using it as your DOM generator. I believe that's an elegant way to get the content of the footer and header (I just hope that the DOM conserves internal scripts and CSS, idk, I'm not a DOM expert). I'm not sure how that would work for the bibliography though, but I'm interested in what you will find.

@Drew-S
Copy link
Member Author

Drew-S commented May 24, 2018

That is what I am doing, and it appears to be working so far. The code does end up being a little more complicated, but it works. Instead of manipulating the DOM in cheerio, we cut out cheerio, save some runtime, and use Chrome to manipulate the DOM. With cheerio we are generating the DOM twice, (with jsdom, thrice), but if we can manipulate with just Chrome, only once.

Edit: So, I ran a few tests with no cheerio just puppeteer, cheerio with puppeteer, and just puppeteer with the basic example. Good news. With or without cheerio, the runtime is about the same for by bibliography example (3.2s) same either way, the basic example, runs in (0.28s) time, so if the user does not have a bibliography, footer, or header it runs as fast as it needs. I am going to look into puppeteer a bit further before I upload any changes. I am wondering if we can speed it up, as it is a pretty noticeable increase in runtime either way. It is not puppeteer that is slowing it down, its citation-js it may be slow because it is retrieving data from wikidata for my example, or in its operations, I believe that the bibliography will always slow it down by about 3s. Testing just the footer increases the runtime by (0.02s) (0.28s -> 0.3s) from basic.

@Drew-S
Copy link
Member Author

Drew-S commented May 24, 2018

Now that I have an understanding of how puppeteer works I can probably figure out a way to create a ToC next. The bibliography in puppeteer has a good chunk of code and now converters.js is kinda large, as I work on the ToC I will probably split the content out into different files to make the work easier, as well as add code commenting. But for now, even with a larger converters.js I believe the bibliography is as good as its going to get for now.

Currently, the in text citation is limited to APA style only, even though the bibliography itself supports all of citation-js built in styles. Since citation-js does not do in text citations, it only does the bibliography and data managing, I will need to build all the different in text formats for the different styles.

@Zulko
Copy link
Member

Zulko commented May 24, 2018

Please keep the cheerio exposure here:
https://github.com/Drew-S/ReLaXed/blob/059a6227ffd215581869581c347df32ed3fb33c6/src/converters.js#L154

I need it in the builtin_mixin.pug

@Drew-S
Copy link
Member Author

Drew-S commented May 24, 2018

Yea, I can put it back in, I removed it to see if it actually affected anything and from my tests it did not, I did not realise it was used in the builtin_mixin.pug file.

@Zulko
Copy link
Member

Zulko commented May 24, 2018

the tests are very incomplete, which is why I'm having a look at your commits as you go 😛

@Zulko
Copy link
Member

Zulko commented May 25, 2018

green light ?

@Drew-S
Copy link
Member Author

Drew-S commented May 25, 2018

Definitely. The bibliography is good to go.

@Zulko Zulko merged commit 42342cc into RelaxedJS:master May 25, 2018
@Drew-S Drew-S deleted the bibliography branch May 25, 2018 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants