Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How far could I get with using the nbconvert preprocessors on their own? #111

Closed
choldgraf opened this issue Nov 11, 2019 · 7 comments
Closed

Comments

@choldgraf
Copy link

Hey there 👋 I think this is a really cool project, thanks for building it!

I'm working with a similar project for publishing HTML-based books in the Jupyter ecosystem (called jupyter book). I'm wondering if I could leverage some (maybe all?) of ipypublish for the HTML generation process.

Currently, I am doing these two things in building a book:

  1. For a collection of ipynb and text files, first build an HTML page for each (with no header)
  2. Use a static-site generator to stitch them all into a book w/ nice CSS and JS.

For 1, I'm using a combination of nbconvert templates and preprocessors. The goal is to output a single HTML file for each page that can be stitched together as a book by the SSG. Currently, it uses a standard nbconvert markdown -> HTML pipeline, which misses a lot of features (such as citations, math notation, captions, etc) that ipypublish seems to provide.

I'm wondering if I could use ipypublish for some or all of the single-page generation process (e.g. either using some of the nbconvert preprocessors, or just using ipypublish directly instead of my own code). However, thus far I have tried to avoid a dependence on Pandoc because of the extra overhead it creates at build-time (not a big deal if you're only building one page, but problematic if you're building 100).

I'm curious if you could give an idea for how far one could get using this tool with the nbconvert preprocessors alone. It seems there's some amount of functionality for processing latex tags etc, though it also seems that the pandoc filters are slightly overlapping in their feature-set as well. If any of this would be a helpful addition to the documentation, I'm happy to make some PRs to add what I learn. I'd love to be able to leverage this tool and contribute improvements upstream rather than maintaining my own HTML-generation code if it makes sense without too much added complexity.

I will continue digging into the code but I thought I'd ask in the meantime :-)

@chrisjsewell
Copy link
Owner

chrisjsewell commented Nov 11, 2019

Hey @choldgraf, that's funny I've just been contacted by @jstac, about collaborating within the Sloan Grant that he said you are also a part of, and you've literally just been cc'd in to he's response lol!

Output a single HTML file for each page that can be stitched together as a book by the SSG

Would it not be better doing this within the Sphinx framework?
As I mention in #91 (comment),
I have effectively deprecated the 'raw' HTML file conversion, in favour of converting via Sphinx.
This is mainly because it is a real pain to get all the internal reference/citation links working manually, and is infinitely easier to let Sphinx handle this.

pandoc filters are slightly overlapping in their feature-set

That's probably because, initially I was only working with preprocessors, then the ipubpandoc filter came later. Again it's a case of not reinventing the wheel, in

class LatexTagsToHTML(Preprocessor):
(with hindsight) I was essentially trying to reimplement something that pandoc (and panflute) does better.

I have tried to avoid a dependence on Pandoc because of the extra overhead it creates at build-time

I'd say one of the issues with using Pandoc, within the current nbconvert conversion mechanism,
is that you have to call it separately for every markdown cell. I feel it would be better if there was a way to collect all the markdown cells, apply pandoc, then split up the result.
This is additional reason to the ones I mentioned to @jstac, that if I was to properly revamp ipypublish (or creating a collaborative package within the Sloan Grant :)), I would consider
dropping nbconvert's conversion mechanism, and implementing a new one.

@choldgraf
Copy link
Author

Yo - a few responses below:

This is mainly because it is a real pain to get all the internal reference/citation links working manually, and is infinitely easier to let Sphinx handle this.

Totally - for me the biggest challenge is that my experience using Markdown in Sphinx w/ the recommonmark extension has not been great, and I really don't want folks to have to use rST :-/

That said, I do a lot of sphinx work as well so it's something I'd be happy to revisit (I'd probably want to use it under-the-hood, since Jupyter Book is meant to be language-agnostic as well).

This is additional reason to the ones I mentioned to @jstac, that if I was to properly revamp ipypublish (or creating a collaborative package within the Sloan Grant :)), I would consider
dropping nbconvert's conversion mechanism, and implementing a new one.

Makes sense to me - I agree that it seems clunky to have multiple conversions steps. We worked with John from the Pandoc project last year to get ipynb accepted as a valid input / output for pandoc, and I've been meaning to explore what this would look like for something like Jupyter Book (I've just been hesitant to do so because it still feels like it'd add a lot of overhead if you're running it once per page instead of once per cell...also I don't know Haskell at all :-P ).

@choldgraf
Copy link
Author

choldgraf commented Nov 12, 2019

ah, I see you've already thought about using Pandoc for notebooks in #79 :-)

@chrisjsewell
Copy link
Owner

I don't know Haskell at all

Me neither :) that's certainly a drawback about pandoc. Although, as I mentioned, panflute has made it a lot easier to manipulate the AST within python.

@choldgraf
Copy link
Author

Yeah I agree - that's pretty cool. I also heard that somebody is working on a Rust implementation of Pandoc, which would be way cooler than learning Haskell IMO (though to be honest I don't have time to learn either lol)

Either way, am I correct in the conclusion that the nbconvert preprocessors probably are not the best way to utilize ipypublish's functionality?

@chrisjsewell
Copy link
Owner

Yeh I’d say no, not for markdown To HTML conversion.

@choldgraf
Copy link
Author

sounds good - I'll consider this issue closed then. I'll try to figure out other ways to leverage this build system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants