Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for include directives #80

Merged
merged 8 commits into from
Feb 26, 2020
Merged

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Feb 26, 2020

MyST now supports the include directive! See tests/test_sphinx/sourcedirs/includes and tests/test_sphinx/test_sphinx_builds for an example sphinx project with includes, and it's build outputs.

@choldgraf and @akhmerov, given the discussion in executablebooks/MyST-NB#32,
over having multiple kernels on a single HTML page. You should be able to just use this out-of-the box to work with the current jupter-sphinx and multiple MyST files:

```{include} nb1.ipynb.md
```

```{include} nb2.ipynb.md
```

Note: **/*.ipynb.md should be in the exclude_patterns so they are not executed twice.

The include directive only works though for including documents of the same type (e.g. md ->md or rst -> rst) if you want to actually parse the file content. So it would be interesting to consider if/how something like an include-nb directive could work, for stitching multiple notebooks into one MyST file:

```{include-nb} nb1.ipynb
```

```{include-nb} nb2.ipynb
```

@chrisjsewell
Copy link
Member Author

@mmcky this was triggered by your question, even though it wasn't actually for includes 😆

@akhmerov
Copy link
Contributor

Where is the restriction of the same filetype come from?

@chrisjsewell
Copy link
Member Author

Where is the restriction of the same filetype come from?

Because include is done at the parser (docutils) level and is essentially just injecting more text into the block of text that the parser must tokenize. It has no 'knowledge' of filetype, it just reads the text from the file you specify and parses it. For example, this would still be read as an rST file, even though it has a different extension.

.. include:: include.html

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2020

Following from the above comment, it's also important to note that any relative file references in the included document will be read as relative to the importing/parent file. See this issue: https://stackoverflow.com/a/50261574/5033292.

(FYI in LaTex you also have the import package which 'fixes' this issue with relative paths)

Although this is a docutils/sphinx related matter, maybe it would be helpful to talk about this behavior in the MyST documentation?

@choldgraf
Copy link
Member

I'm +1 on documenting this feature in the docs!

Am I correct in thinking that we needed this PR because the include:: directive is a special-case of other directives?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2020

Am I correct in thinking that we needed this PR because the include:: directive is a special-case of other directives?

Well it was one of the 'edge-case' directives that didn't work with the minimal docutils 'Mock' classes that cover most of the core docutils/sphinx directives, because they call extra attributes/methods that haven't yet been implemented. With enough mocked methods it probably wouldn't be a special case, but actually I think I've done a better job than the docutils code, of ensuring that errors are reported correctly; pointing to the correct included document, and correct source text line number.

In this PR I've also actually added some more methods to cover the glossary and csv-table directives, which just about covers them all now (see tests/test_renderers/test_roles_directives.py for the final outliers).

@choldgraf
Copy link
Member

@chrisjsewell
Copy link
Member Author

I'm +1 on documenting this feature in the docs!

Added #81 for that, so I'll merge this, unless you want to check over any of the code/circleCI artefacts @choldgraf?

@akhmerov
Copy link
Contributor

Does it actually cover the use case of executablebooks/MyST-NB#32 (combining multiple kernels in one file)?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2020

Does it actually cover the use case of ExecutableBookProject/MyST-NB#32 (combining multiple kernels in one file)?

For text-based documents, i.e. {kernel1.md, kernel2.md, ...} -> merge.md, it should yes. This should be easy for you to check with rST or MyST and jupyter-sphinx at your earliest convenience 😄. For notebooks {nb1.ipynb, nb2.ipynb, ...} -> merge.md, it would require an additional directive on the MyST-NB side, that should be able to be written along the lines of the Include class, but specifically calling the notebook parser, to inject AST into the merge.md document node.

@akhmerov
Copy link
Contributor

Yeah, I was wondering about the notebooks, or otherwise files that don't have an directive for specifying the kernel, but provide the kernel either in the front matter, or metadata.

It seems that the AST resulting from the inclusion of several such files should contain several instances of the jupyter-sphinx kernel nodes or equivalent. That, in turn, would seem to require postprocessing the output of the parsing.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2020

It seems that the AST resulting from the inclusion of several such files should contain several instances of the jupyter-sphinx kernel nodes or equivalent.

Well that's if you actually require a 'kernel node'. Surely you only need this if you intend to execute the notebooks after the parsing phase?

MyST-NB doesn't currently use any such node, because it assumes the notebooks are pre-executed. Then if you're (hopefully at some point) using the jupyter-cache you are retrieving outputs from it at the parsing phase; with it either having been populated before sphinx-build, or I could envisage an option to do this in one of sphinx's early (pre-parsing) phases, whereby:

  1. You get the list of updated notebooks from Sphinx (or text-based documents that can be converted to notebooks via jupytext). This is all of them, not just the ones that Sphinx deems are outdated.
  2. You 'stage' them in the cache, so it can work out which ones require re-execution.
  3. The executor is called and the cache updated.
  4. Continue the sphinx build as normal.

Each 'included' notebook here, and its outputs, are dealt with at the time it is parsed/included. The only post-processing then is what MyST-NB already does with mime-bundles.

@akhmerov
Copy link
Contributor

I thought that querying the cache would happen after the parsing—at least that was in your previous proposal. This is also why I thought that keeping the kernel node is important, since otherwise one cannot obtain the cache key.

You're saying it'd be done differently now?

@chrisjsewell
Copy link
Member Author

You're saying it'd be done differently now?

Yep. I reserve the right to flip-flop in my proposals 😆 I (now) think it works better with 'getting the execution done early', because (a) it makes it easier to separate the execution from the documentation generation, (b) you have less post-processing to deal with.

@akhmerov
Copy link
Contributor

On the other hand, if the execution and treating outputs is done before parsing, then it needs to be implemented once per file format. Say one wants to keep some rst files?

@chrisjsewell
Copy link
Member Author

On the other hand, if the execution and treating outputs is done before parsing, then it needs to be implemented once per file format.

Why once per file format? You just gather all files that can be converted to notebooks (e.g. have a jupytext converter) and stage them on the cache. Note we are doing an initial 'fast' parse here, whereby we just want to identify the kernel and code-cells and don't need to tokenize the whole document. This may make it a bit slower than doing a single parse and post-processing, but makes the process a lot more robust and easy(ish) to implement.

Thinking about it now, during this 'fast' parse phase, we would store a mapping of each document's docame -> hash (the one for code+kernel) in the sphinx env. Then at the parsing stage, we could quickly retrieve it, without the need for a second parse.

Say one wants to keep some rst files?

Not sure what you mean by 'keep' here? Nothing is actually changed in the sphinx source folder, just the cache is updated.

@akhmerov
Copy link
Contributor

Not sure what you mean by 'keep' here?

As in "keep using". Jupytext doesn't have an rst reader, AFAIK, and therefore the workflow you propose would mean implementing this. At the same time generating a sphinx AST is the one thing all formats for the EBP have in common.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2020

At the same time generating a sphinx AST is the one thing all formats for the EBP have in common.

Well we can discuss this more on MyST-NB, but for now, I'm going to merge this 😄

@chrisjsewell chrisjsewell merged commit 46bf7e1 into develop Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants