How far could I get with using the nbconvert preprocessors on their own? #111

choldgraf · 2019-11-11T22:11:22Z

Hey there 👋 I think this is a really cool project, thanks for building it!

I'm working with a similar project for publishing HTML-based books in the Jupyter ecosystem (called jupyter book). I'm wondering if I could leverage some (maybe all?) of ipypublish for the HTML generation process.

Currently, I am doing these two things in building a book:

For a collection of ipynb and text files, first build an HTML page for each (with no header)
Use a static-site generator to stitch them all into a book w/ nice CSS and JS.

For 1, I'm using a combination of nbconvert templates and preprocessors. The goal is to output a single HTML file for each page that can be stitched together as a book by the SSG. Currently, it uses a standard nbconvert markdown -> HTML pipeline, which misses a lot of features (such as citations, math notation, captions, etc) that ipypublish seems to provide.

I'm wondering if I could use ipypublish for some or all of the single-page generation process (e.g. either using some of the nbconvert preprocessors, or just using ipypublish directly instead of my own code). However, thus far I have tried to avoid a dependence on Pandoc because of the extra overhead it creates at build-time (not a big deal if you're only building one page, but problematic if you're building 100).

I'm curious if you could give an idea for how far one could get using this tool with the nbconvert preprocessors alone. It seems there's some amount of functionality for processing latex tags etc, though it also seems that the pandoc filters are slightly overlapping in their feature-set as well. If any of this would be a helpful addition to the documentation, I'm happy to make some PRs to add what I learn. I'd love to be able to leverage this tool and contribute improvements upstream rather than maintaining my own HTML-generation code if it makes sense without too much added complexity.

I will continue digging into the code but I thought I'd ask in the meantime :-)

The text was updated successfully, but these errors were encountered:

chrisjsewell · 2019-11-11T23:11:38Z

Hey @choldgraf, that's funny I've just been contacted by @jstac, about collaborating within the Sloan Grant that he said you are also a part of, and you've literally just been cc'd in to he's response lol!

Output a single HTML file for each page that can be stitched together as a book by the SSG

Would it not be better doing this within the Sphinx framework?
As I mention in #91 (comment),
I have effectively deprecated the 'raw' HTML file conversion, in favour of converting via Sphinx.
This is mainly because it is a real pain to get all the internal reference/citation links working manually, and is infinitely easier to let Sphinx handle this.

pandoc filters are slightly overlapping in their feature-set

That's probably because, initially I was only working with preprocessors, then the ipubpandoc filter came later. Again it's a case of not reinventing the wheel, in

ipypublish/ipypublish/preprocessors/latextags_to_html.py

Line 40 in d252697

class LatexTagsToHTML(Preprocessor):

(with hindsight) I was essentially trying to reimplement something that pandoc (and panflute) does better.

I have tried to avoid a dependence on Pandoc because of the extra overhead it creates at build-time

I'd say one of the issues with using Pandoc, within the current nbconvert conversion mechanism,
is that you have to call it separately for every markdown cell. I feel it would be better if there was a way to collect all the markdown cells, apply pandoc, then split up the result.
This is additional reason to the ones I mentioned to @jstac, that if I was to properly revamp ipypublish (or creating a collaborative package within the Sloan Grant :)), I would consider
dropping nbconvert's conversion mechanism, and implementing a new one.

choldgraf · 2019-11-12T00:14:29Z

Yo - a few responses below:

This is mainly because it is a real pain to get all the internal reference/citation links working manually, and is infinitely easier to let Sphinx handle this.

Totally - for me the biggest challenge is that my experience using Markdown in Sphinx w/ the recommonmark extension has not been great, and I really don't want folks to have to use rST :-/

That said, I do a lot of sphinx work as well so it's something I'd be happy to revisit (I'd probably want to use it under-the-hood, since Jupyter Book is meant to be language-agnostic as well).

This is additional reason to the ones I mentioned to @jstac, that if I was to properly revamp ipypublish (or creating a collaborative package within the Sloan Grant :)), I would consider
dropping nbconvert's conversion mechanism, and implementing a new one.

Makes sense to me - I agree that it seems clunky to have multiple conversions steps. We worked with John from the Pandoc project last year to get ipynb accepted as a valid input / output for pandoc, and I've been meaning to explore what this would look like for something like Jupyter Book (I've just been hesitant to do so because it still feels like it'd add a lot of overhead if you're running it once per page instead of once per cell...also I don't know Haskell at all :-P ).

choldgraf · 2019-11-12T00:14:54Z

ah, I see you've already thought about using Pandoc for notebooks in #79 :-)

chrisjsewell · 2019-11-12T00:47:39Z

I don't know Haskell at all

Me neither :) that's certainly a drawback about pandoc. Although, as I mentioned, panflute has made it a lot easier to manipulate the AST within python.

choldgraf · 2019-11-12T01:09:19Z

Yeah I agree - that's pretty cool. I also heard that somebody is working on a Rust implementation of Pandoc, which would be way cooler than learning Haskell IMO (though to be honest I don't have time to learn either lol)

Either way, am I correct in the conclusion that the nbconvert preprocessors probably are not the best way to utilize ipypublish's functionality?

chrisjsewell · 2019-11-12T01:14:36Z

Yeh I’d say no, not for markdown To HTML conversion.

choldgraf · 2019-11-12T01:42:27Z

sounds good - I'll consider this issue closed then. I'll try to figure out other ways to leverage this build system.

choldgraf mentioned this issue Nov 11, 2019

Investigate ipypublish executablebooks/meta#4

Closed

choldgraf closed this as completed Nov 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How far could I get with using the nbconvert preprocessors on their own? #111

How far could I get with using the nbconvert preprocessors on their own? #111

choldgraf commented Nov 11, 2019

chrisjsewell commented Nov 11, 2019 •

edited

Loading

choldgraf commented Nov 12, 2019

choldgraf commented Nov 12, 2019 •

edited

Loading

chrisjsewell commented Nov 12, 2019

choldgraf commented Nov 12, 2019

chrisjsewell commented Nov 12, 2019

choldgraf commented Nov 12, 2019

How far could I get with using the nbconvert preprocessors on their own? #111

How far could I get with using the nbconvert preprocessors on their own? #111

Comments

choldgraf commented Nov 11, 2019

chrisjsewell commented Nov 11, 2019 • edited Loading

choldgraf commented Nov 12, 2019

choldgraf commented Nov 12, 2019 • edited Loading

chrisjsewell commented Nov 12, 2019

choldgraf commented Nov 12, 2019

chrisjsewell commented Nov 12, 2019

choldgraf commented Nov 12, 2019

chrisjsewell commented Nov 11, 2019 •

edited

Loading

choldgraf commented Nov 12, 2019 •

edited

Loading