Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sphinx_astropy.ext.example extension for building an example gallery #29

Closed
wants to merge 77 commits into from

Conversation

jonathansick
Copy link

This PR implements a functional example gallery Sphinx extension, as initially described in astropy/astropy#7242 — this PR supersedes #22.

The premise is that documentation pages potentially contain many useful snippets of content that are useful in their own right, outside the context of the page where they are originally written. These snippets can be examples, how-tos, and so on. This Sphinx extension provides a way of surfacing these pieces of content into a centralized gallery.

From a documentation author's perspective, the main API is the example directive, which demarks example content:

.. example:: Title of the example
   :tags: first-tag, another-tag

   Content of the example.

   More content in the example.

This content is not part of the example.

The example directive does not change the visual appearance of the example content in the documentation text.

During the Sphinx build, though, that content is copied into a new, auto-generated page at /examples/title-of-the-example.html. There is an index page that lists all examples at /examples/index.html. There are also pages for each tag that list examples with the associated tag: examples/tags/first-tag.html and examples/tags/another-tag.html

Demo

This is a demo of an example gallery generated from examples identified in the astropy.io and astropy.nddata packages.

http://astropy-example.jsick.codes.s3-website-us-east-1.amazonaws.com/examples/index.html

Configurability

There are three configuration variables:

  • astropy_examples_enabled. If True, the example gallery (i.e., pages in /examples/) are generated. This defaults to False so that projects can begin using the example directive, but not need to change the appearance of the build documentation. This provides a means of incremental adoption.

  • astropy_examples_dir. The directory where the example gallery is published. The default is examples.

  • astropy_examples_h1. Configures the character to use for "h1" headlines in reStructuredText. Defaults to #.

Processing overview

During the builder-inited phase, the preprocess_examples function scans every doc in the Sphinx source tree to find example directives. This function generates a standalone reStructuredText page for each example, a page for each tag, and the main index page. By creating these pages in the builder-inited phase, they can be parsed later on during the regular build.

When the example directive is handled, the directive passes the content through, but wrapped in a custom ExampleMarkerNode. In HTML, this node becomes a <div> with class astropy-example-source and an id attribute that identifies the example.

The standalone example pages are templated to include a example-content directive. These directives add a <div> to the HTML page with a class of astropy-example-content and an id attribute that identifies the example that belongs.

The builder-finished phase is when the example content is copied onto the standalone example pages. Using BeautifulSoup4, the extension copies content within div.astropy-example-source tags and then replaces the div.astropy.example-content tags with that content.

Next steps

Beyond this PR, the next steps for this extension are:

  • Create a more sophisticated browsing interface on the example gallery's landing pages. We want a card layout with a thumbnail image and teaser text for each example.

  • Publish data about the standalone examples so that they can be listed from learn.astropy.org.

  • Better support for incremental Sphinx builds. Right now, the extension clears out the example directory on each build, but a more selective approach to rebuilds can be taken.

This package will gather all Sphinx extensions related to the example
gallery. There will be two main parts: a directive that marks examples,
and extensions that index and render those examples.

This boilerplate includes the standard Sphinx setup function for the
extension.
This includes the sphinx_astropy.ext.example Sphinx extension by default
in Sphinx builds.
This directive marks the scope of example content in the original
documentation, and lets authors add a title and tags.

Currently a pass-through directive. It parses the content of the
directive and adds it back to the document

Examples are persisted in the build environment for later
post-processing (to build the example gallery). Examples are keyed by
their unique ID (a slugified version of the title). The dict items
contains metadata and the content of the example (to later generate
standalone example pages).

The directive also collects title and tags as metadata.
This target node lets us backlink to the example in the main
documentation. In html, the link is an id on the first element of the
example content. example IDs are unique since they're the keys of the
env.sphinx_astropy_examples dictionary in the environment.
sphinx.testing.fixtures let us build Sphinx sites from pytest and then
inspect the built site. There can be multiple test sites, each test site
is a directory in the sphinx_astropy/test/roots/ directory.

This is the same pattern that Sphinx uses for its test, so this is likely
the easiest way to test our Sphinx extensions.
http://www.sphinx-doc.org/en/master/devguide.html#unit-testing

Note I had to add the pytest_plugins line, to load the Sphinx pytest
plugin, from a new conftest.py file at the root of the project, not from
sphinx_astropy/tests/conftest.py The reason for this is outlined in
https://docs.pytest.org/en/latest/deprecations.html#pytest-plugins-in-non-top-level-conftest-files

The rest of the pytest configuration is done in
sphinx_astropy/tests/conftest.py, which is where you'd expect most
configuration to go. This configuration is largely based on Sphinx's:
https://github.com/sphinx-doc/sphinx/blob/master/tests/conftest.py
The example-marker.rst file contains several instances of the example
directive, testing different conditions (having tags, or not, and having
different types of content in the example).
This test generates a site in the XML format since then it's easy to
search for nodes and their attributes.

Unfortunately this test strategy doesn't work for Sphinx <1.7, because
the pytest fixtures aren't available. Thus I have pytest skip these
tests for Sphinx <1.7.  I think this is still the best way to test
sphinx extensions and will continue to be so in the future because this
is how Sphinx tests itself.
There is a new test case (test-example-gallery-duplicates) because
otherwise the SphinxError would always be raised for regular testing of
the example directive.
As the docstring comment says, I found a weird case that while enabling
the numpydoc extension in the test environment, I would get false alarms
about duplicate examples already in the build environment. These
duplicate examples came from other test functions. Somehow the
environment is being preserved across builds now that numpydoc is
activated.

To make the ExampleMarkerDirective robust against this case, it's now
making sure that the duplicate instance is from a different document and
line number before raising a SphinxError.
If the tests are failing, it's useful to see the debug-level logging.

Note: the '2' verbosity enables DEBUG-level logging. I can't find a
cleaner alias to this.

Also, it would be nicer to make this the default while using the sphinx
pytest mark, but I can't find an easy way to do that.
Now the test-example-gallery root is using an autodoc+numpydoc
processing pipeline in its build configuration. This confirms that the
directive does work in a docstring as expected.
The purge_doc callback is required to remove examples from a cached
environment if a document is removed in a subsequent build. Otherwise
the examples in the cached environment from previous builds would
continue to exist in subsequent builds.

The tests simulate a env-purge-doc event and separately ensure that
purge_examples got registered as a env-purge-doc callback.
This refactoring allows _check_for_existing_example to be used outside
the ExampleMarkerDirective, like in a env-merge-info event callback.
This env-merge-info callback handles merging sphinx_astropy_examples
from parallel build environments when Sphinx is run in parallel read
mode.

The tests run a full-scale integration-type build with Sphinx running in
parallel (-j 4).
These should be separate things so that the "example ID" is the
slugified version of the title (which is unique by design). Then the ref
ID has the example-src prefix to be a unique reference ID to the
example's source location. Also adds a ref_id field to the example's
dict in the build environment.
This way the tranlation between a title and an example ID is codified
into an API that can be used multiple places.
The content_node key in the example data stores a copy of the parsed
docutils nodes for the example. It turns out that it's easier to use the
parsed content here rather than parse it during the
process_pending_example_nodes() callback where a "state" is not readily
available for parsing reStructuredText.
The strategy behind detect_examples() is to identify examples in the
reStructuredText source before Sphinx parses them by using a regular
expression. This lets us create stubs for example pages before Sphinx
does its regular parsing.
This configuration lets us control the directory where the example
gallery is generated.
The ExamplePage class builds upon ExampleSource, but now contains the
concerns about rendering a standalone example page.
Templates for the standalone example pages, landing pages, tag browsing
pages, and so on, can be Jinja-formatted templates. This implementation
is adapted from sphinx.ext.autosummary, which has similar needs. By
doing this, the user can customize the templates at the builder, theme,
or project level.

The render is integrated into the ExamplePage class
(ExamplePage.render), which automatically detects and uses a template
named 'astropy_example/examplepage.rst'. The extension ships with a
default implementation of the template.

The under tests demonstrate rendering a standalone example page. This
ins't hooked up to piping in the content of the example, yet, though.
This commit puts together the work on detecting examples from source
(detect_examples) and the work on rendering stubs for standalone example
pages (ExamplePage) and run a pipeline during the builder-inited event.
This event happens early in the Sphinx build processor so that we can
create example pages before Sphinx actually begins to read and process
these pages.

This also adds a new config variable, astropy_examples_h1, which
customizes the underline character used for making titles for "h1"
headings in reStructuredText.
The landing page is the index.rst for the example gallery. It provides a
toctree for all the examples. The LandingPage class is implemented
similarly to the ExamplePage class in that it takes page data and is
responsible for computing paths, docnames, and rendering for itself.

In the future the template for the landing page could be enhanced into a
tiled gallery view, for example.

The test verifies that the index.rst file's reStructuredText is rendered
correctly.

Since there's now a toctree, individual example pages don't need the
`:orphan:` field.
The TagPage is like a specialized version of the LandingPage that
indexes exaples that have a given tag.

The TagPage.generate_tag_pages constructor simultaneously makes tag
pages for the set of tags given the population of examples, and also
provides references to those tag pages with each relevant example page.
This provides a nice way to categorize examples and to provide discovery
of other tags.

This is implemented purely in the Jinja templating layer.
This makes it easier to use from Jinja to test lengths
This demonstrates how to use Jinja templating to provide links from a
standalone example page back to the original source page and to pages
for each associated tag.
This directive inserts parsed content for the example from the
application environment.
The source pages are read *before* standalone example pages are read to
ensure that they can be parsed into the environment and are available
to the ExampleContentDirective.
With Sphinx <= 1.7, :download: roles with external (i.e., https:// urls)
download links don't work. This prevents these tests from running.
Named equation reference links do not seem to work with Sphinx 1.7.

As well, the format for download links from the Matplotlib plot
extension with Sphinx 1.7 are different compared to more recent Sphinx
versions, so its best to skip that test since the tests would need to be
customized for Sphinx 1.7.
Since substitution_reference nodes are resolved just _after_ the
ExampleMarkerDirective is run, substitution_reference nodes could be part
of the example content that is republished on the standalone example
page. Thus we also need to copy the substitution_definition nodes to
include them in the standalone example page.

Based on experimentation, it seems that traversing the document's nodes
and the directive content's nodes together only gets substitution
definitions that are written above or within the example directive. This
is a caveat that will need to be added to the documentation.
This commit takes the substitution_definitions field captured by the
ExampleMarkerDirective and inserts those substitution_definition nodes
into the standalone example page.

The key part of this is to ensure we call document.note_substitution_def
because these new substitution_defs weren't already parsed.
Normally the document.note_footnote and document.note_footnote_refs (and
their autonumbered counterparts) are called when reStructuredText is
parsed. However, since the footnotes and footnote_refs are pre-parsed in
a standalone example page, we need to manually note them. This state is
consumed by the docutils Footnotes transform.
It will be useful for ExampleSource.docname to be a real docname so it
can be used with the BuildEnvironment APIs that translate between
docnames and paths.

Now the example page template can use a new, separate attribute
abs_docname that is useful in the doc role.
This provides the actual docname, for use with Sphinx APIs.
Now the ExampleMarkerDirective wraps the generated example in a custom
container node, ExampleMarkerNode.

This node has visitors that implement this node into a <div> in HTML
with a class of astropy-example-source. This div+class marks the content
of an example so that it can be copied by the HTML postprocessor into
the standalone example pages.

NOTE: is_node_registered() is backported here from Sphinx 1.8+

This small function from is needed for tests, and is no longer needed
here if Sphinx 1.8 becomes the minimum supported version of the
extension.
This node, ExampleContentNode, just provides an empty <div> on the HTML
page.

The class is "astropy-example-content" and the ID is the ID of the
example (which will be used to look up the example content).

Because we're no longer adding example content into the standalone
example page as part of the regular Sphinx/docutils build, we no longer
need to access and process any of the docutils nodes related to the
original example content.

Many tests are skipped because the expected content isn't be rendered
given this change. These tests will be reactivated later when examples
are being published via an HTML postprocessor.
Content nodes and metadata are no longer stored in the
`sphinx_astropy_examples` attribute of the BuildEnvironment. Now that
exmaples are copied and rendered as an HTML post-processing step, we
don't need to keep this metadata in the Sphinx build. Consequently, some
additonal hooks can also go:

- merge_examples in env-merge-info
- purge_examples in env-purge-doc
- reorder_example_page_reading in env-before-read-docs

This also means that the extension can be marked as safe for reading in
parallel.

Some tests no longer apply because the
BuildEnvironment.sphinx_astropy_examples attribute is no longer set. In
many cases, I've marked these to be skipped; eventually we'll want to
reactivate these tests in the future given the new scheme.
Metadata is cached to the sphinx_astropy_examples attribute of the
BuildEnvironment as part of the example gallery preprocessing step
during build-inited (recall that before this caching got added as part
of the example directive.

This is necessary so that the postprocessing step know where to find the
source for each example and the associated standalone example page.

This is now also the best way to check if there are duplicate examples
since all examples are combined at this point.

Notes on tests:

1. It seems this SphinxError is build raised in the equivalent of the
test set up with the pytest.mark.sphinx pytest extension. This means
that a full build is the best way to simulate this.

2. Because docstrings are not scanned by the preprocessor, examples
embedded in docstrings are no longer part of the example gallery. Thus
I've dropped the associated test to look for example from the
example_func docstring.
Will be used to go from example ID to the reference label for the
example's source.
This is used by the example extension to manipulate built HTML pages to
insert examples into standalone pages.
This implements a new approach for populating standalone example pages.
Rather than republish the examples to standalone example pages as part
of the Sphinx build, this approach operates on the HTML that Sphinx has
built.

The postprocess_examples function operates as a hook for the
build-finished Sphinx event. For every example in the cached
sphinx_astropy_examples attribute of the BuildEnvironment, this method
extracts the example from the source page, adapts any relative links and
image sources, and inserts the example's <div> into the standalone
example page.

If config.astropy_examples_enabled is False, then do not run post-processing.
With the large change in sphinx_astropy.ext.example to populate examples
by post processing HTML rather than copying docutils nodes, we had to
temporarily disable many of the unit tests. This commit re-works the
tests to work with the new processing strategy.

The pytest.mark.sphinx mark doesn't work with the build-finished event
(not sure why), but this means we can't use that approach to test the
HTML output. Instead, we use the CLI-based build of the example
projects. Since a lot of test functions consume those builds, I've
turned them into session-scoped fixtures.

Now that BeautifulSoup4 is a dependency of sphinx_astropy, the tests use
BeautifulSoup to check the build output. This approach is a lot cleaner
than directly using the built-in html.parser module.

Finally, I've also organized the tests around different types:

1. Unit tests that don't depend on a Sphinx build.
2. Unit tests that use the ``sphinx`` pytest mark. These tests operate on
   the Sphinx application instance after a build and test environment
   persistence.
3. Tests that run a Sphinx build through its command-line interface and
   analyze the resulting HTML product.
A link to an anchor, like #section-id, now works like this:

1. If the ID exists within the scope of the example, then the href is
left as-is. This is what the reader expects and minimizes the disruption
of follow a link in an example to a different part of the example.

2. If the ID doesn't exist in the example, but only on the source page,
then the href is adapted to point back to the source page.
We want to support incremental rebuilds so that only those example on
pages that were changed are re-scanned and rebuilt. However, that
involves finer-grained cache invalidation. To make the example extension
work with incremental rebuilds we're starting with the *simplest* thing,
which is to start the example gallery fresh on each build. The
sphinx_astropy_examples attribute on the BuildEnvironment was already
being reset on each build; now the example directory in the source tree
itself is deleted and re-created on each Sphinx build.
Since Sphinx 1.7 is the minimum required version, it's no longer
necessary to avoid testing against Sphinx 1.6.
The original 'roots' terminology is based on how Sphinx organizes its
own tests. Specifically, that terminology also carries over into the
pytest.mark.sphinx marker's testroot parameter.

To match sphinx-automodapi, this commit renames the 'roots' directory to
'cases' (as in, test cases). The pytest.mark.sphinx marker still works,
but I've added an extra "casesdir" pytest fixture to refer to the cases
directory with the canonical terminology.
This is a draft of "getting started" documentation for
sphinx_astropy.ext.example. Right now it isn't part of a documentation
build, but could be once a Sphinx project is set up.
@jonathansick
Copy link
Author

@astrofrog do you want to review this as-is, or should I first transplant this into a standalone Python package and separate GitHub repo?

I was thinking about what the package should be named, in that case. PyPI sphinx-examples is available. I was also thinking about the comment from the coord meeting that we aren't necessarily building an "example gallery" (assuming that a gallery implies images/plots). Can we incorporate some terminology into the name that reflects its nature of transplanting existing content? Maybe sphinx-highlights? What do you think @kelle ?

@jonathansick
Copy link
Author

I remember that "example library" was an emerging consensus from the Coord meeting. sphinx-example-library could be the PyPI package?

@bsipocz
Copy link
Member

bsipocz commented Dec 19, 2019

I rather like the mix of the two from above sphinx-example-highlights. calling it library is a bit overloads the word with normal life meaning, and that's a bit confusing...

@astrofrog
Copy link
Member

@jonathansick - can we close this PR now that the extension lives in https://github.com/astropy/sphinx-example-index?

@jonathansick
Copy link
Author

Yep, moving this over to the sphinx-example-index repo right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants