Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How would this use jupyter-cache? #32

Closed
choldgraf opened this issue Feb 23, 2020 · 14 comments
Closed

How would this use jupyter-cache? #32

choldgraf opened this issue Feb 23, 2020 · 14 comments

Comments

@choldgraf
Copy link
Member

Once jupyter-cache is ready for prototyping etc, we should also figure out how to use it as a part of building Sphinx sites with notebooks. Here is one way to do it:

Each notebook will have a unique URI in the cache that is tied to its location on disk. In Sphinx, when we parse a source file we also have the file location of that file. So, when a source file is parsed that also has a key in the Jupyter Cache registry, then instead of pulling cell['outputs'] and inserting it into the cell mimebundle, we could instead grab those outputs from the cache. From then on, everything proceeds as normal. (somewhere around https://github.com/ExecutableBookProject/myst-nb/blob/master/sphinx_notebook/parser.py#L80)

So it would be something like the following (and assuming that the cache had already been run and cached before Sphinx entered the equation)

  • There's a configuration option like "myst_nb_use_cache"
  • Upon parsing a file and if the option is True, check in .jupyter_cache to see if there's a URI for the file.
    • Maybe do some kind of sanity check to make sure the cache is up-to-date with the source file?
  • If there is, then when you get to https://github.com/ExecutableBookProject/myst-nb/blob/master/sphinx_notebook/parser.py#L80, instead grab the output corresponding to the current cell from the cache. It'll be returned as a mimebundle. Insert it as if you were doing cell['output'].
  • From then on, everything is the same.
@choldgraf
Copy link
Member Author

@chrisjsewell is this what you had in mind?

@chrisjsewell
Copy link
Member

@chrisjsewell is this what you had in mind?

Yep that seems like the general idea. The one thing I'm considering is whether to match the notebooks by URI or a hash, but that wouldn't entail much change to the code.

@akhmerov
Copy link
Contributor

In Sphinx, when we parse a source file we also have the file location of that file.

Does that mean 1 notebook per source file?

@akhmerov
Copy link
Contributor

What if there isn't an output corresponding to the notebook?

@chrisjsewell
Copy link
Member

Does that mean 1 notebook per source file?

Are you thinking here about how jupyter-sphinx allows multiple kernels per source file?

@akhmerov
Copy link
Contributor

Yes, indeed.

@chrisjsewell
Copy link
Member

Yes, indeed.

Well yeh that won't be the case here. It will be one kernel per source file. It's just not possible to have round-trip conversion with a multi-kernel file. jupyter-sphinx only allows consecutive 'blocks' of kernel->cell->cell... yeh? So I'd say its much easier to just have the separate blocks/sections in separate files. The one thing that I've mentioned before, is that it would be good to explore in sphinx what is the best way (if possible) to combine multiple source pages into a single HTML page.

@chrisjsewell
Copy link
Member

Unless you see any added benefit to have multiple kernels in a single source file?

@akhmerov
Copy link
Contributor

It's just not possible to have round-trip conversion with a multi-kernel file.

Is the round-trip conversion a goal? It seems absent from the v3 pipeline of executablebooks/meta#21

Unless you see any added benefit to have multiple kernels in a single source file?

Not really. If there's a clear way to combine multiple source files into one HTML output, that seems to be a cleaner optimal solution. I do think there's a use case to having multiple kernels per HTML page.

@chrisjsewell
Copy link
Member

Is the round-trip conversion a goal?

100%, its part of section 1 in the pipeline; you have to be able to switch between writing code (most probably in the notebook form) and writing documentation / what you commit to git (most likely the MyST format file). This will likely use jupytext

@choldgraf
Copy link
Member Author

@akhmerov I don't think that this should preclude the existence of tooling that lets you have multiple kernels per sourcefile, but for the scope of MyST-NB, it's about representing ipynb-like files in Sphinx, which equates to the 1 kernel per notebook file. I wanna make sure we have that basic use-case covered before we get wacky with kernels :-)

@chrisjsewell
Copy link
Member

Given executablebooks/jupyter-cache#8:

Each notebook will have a unique URI in the cache that is tied to its location on disk.

This is no longer the case; you just 'give' it a notebook and it will give you back a pointer (primary key) to the one in the cache (matched by hash) or raise a KeyError; see cache.match_commit_notebook(nb: nbf.NotebookNode) -> int and cache.match_commit_file(path: str) -> int. This means the cache is resilient to being moved and won't require an up-to-date sanity check.

we could instead grab those outputs from the cache.

You can either grab the entire notebook (plus artefacts); cache.get_commit_bundle(pk: int) -> NbBundleOut, or query specific cells; cache.get_commit_codecell(pk: int, index: int) -> nbf.NotebookNode.

There's a configuration option like "myst_nb_use_cache"

Yes, and an option to specify the path to the cache.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 24, 2020

Now all you have to do is:

cache = JupyterCacheBase("path/to/the/cache")
try:
    # this will give you the "final" notebook with all outputs populated
    pk, notebook = cache.merge_match_into_notebook(notebook)
    # you may also want to copy execution artefacts to the build folder
    with cache.commit_artefacts_temppath(pk) as folder:
        shutil.copytree(folder, "path/in/build/folder")
except KeyError:
    # a match was not found
    pass

@chrisjsewell
Copy link
Member

closed in #55 (also #116 add support for text based notebooks, and #87 is IMO the correct way to implement multiple kernels)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants