Skip to content

Commit

Permalink
add readme for librustdoc
Browse files Browse the repository at this point in the history
  • Loading branch information
QuietMisdreavus committed Feb 16, 2018
1 parent 932c736 commit 7aca34b
Showing 1 changed file with 170 additions and 0 deletions.
170 changes: 170 additions & 0 deletions src/librustdoc/README.md
@@ -0,0 +1,170 @@
# The walking tour of rustdoc

Rustdoc is implemented entirely within the crate `librustdoc`. After partially compiling a crate to
get its AST (technically the HIR map) from rustc, librustdoc performs two major steps past that to
render a set of documentation:

* "Clean" the AST into a form that's more suited to creating documentation (and slightly more
resistant to churn in the compiler).
* Use this cleaned AST to render a crate's documentation, one page at a time.

Naturally, there's more than just this, and those descriptions simplify out lots of details, but
that's the high-level overview.

(Side note: this is a library crate! The `rustdoc` binary is crated using the project in
`src/tools/rustdoc`. Note that literally all that does is call the `main()` that's in this crate's
`lib.rs`, though.)

## Cheat sheet

* Use `x.py build --stage 1 src/libstd src/tools/rustdoc` to make a useable rustdoc you can run on
other projects.
* Add `src/libtest` to be able to use `rustdoc --test`.
* If you've used `rustup toolchain link local /path/to/build/$TARGET/stage1` previously, then
after the previous build command, `cargo +local doc` will Just Work.
* Use `x.py doc --stage 1 src/libstd` to use this rustdoc to generate the standard library docs.
* The completed docs will be available in `build/$TARGET/doc/std`, though the bundle is meant to
be used as though you would copy out the `doc` folder to a web server, since that's where the
CSS/JS and landing page are.
* Most of the HTML printing code is in `html/format.rs` and `html/render.rs`. It's in a bunch of
`fmt::Display` implementations and supplementary functions.
* The types that operates on are defined in `clean/mod.rs`, right next to the custom `Clean` trait
used to process them out of the rustc HIR.
* The bits specific to using rustdoc as a test harness are in `test.rs`.
* The Markdown renderer is loaded up in `html/markdown.rs`, including functions for extracting
doctests from a given block of Markdown.
* The tests on rustdoc *output* are located in `src/test/rustdoc`, where they're handled by the test
runner of rustbuild and the supplementary script `src/etc/htmldocck.py`.
* Tests on search index generation are located in `src/test/rustdoc-js`, as a series of JavaScript
files that encode queries on the standard library search index and expected results.

## From crate to clean

In `core.rs` are two central items: the `DocContext` struct, and the `run_core` function. The latter
is where rustdoc calls out to rustc to compile a crate to the point where rustdoc can take over. The
former is a state container used when crawling through a crate to gather its documentation.

The main process of crate crawling is done in `clean/mod.rs` through several implementations of the
`Clean` trait defined within. This is a conversion trait, which defines one method:

```rust
pub trait Clean<T> {
fn clean(&self, cx: &DocContext) -> T;
}
```

`clean/mod.rs` also defines the types for the "cleaned" AST used later on to render documentation
pages. Each usually accompanies an implementation of `Clean` that takes some AST or HIR type from
rustc and converts it into the appropriate "cleaned" type. "Big" items like modules or associated
items may have some extra processing in its `Clean` implementation, but for the most part these
impls are straightforward conversions. The "entry point" to this module is the `impl Clean<Crate>
for visit_ast::RustdocVisitor`, which is called by `run_core` above.

You see, I actually lied a little earlier: There's another AST transformation that happens before
the events in `clean/mod.rs`. In `visit_ast.rs` is the type `RustdocVisitor`, which *actually*
crawls a `hir::Crate` to get the first intermediate representation, defined in `doctree.rs`. This
pass is mainly to get a few intermediate wrappers around the HIR types and to process visibility
and inlining. This is where `#[doc(inline)]`, `#[doc(no_inline)]`, and `#[doc(hidden)]` are
processed, as well as the logic for whether a `pub use` should get the full page or a "Reexport"
line in the module page.

Back in `clean/mod.rs`, the other major thing that happens here is the special collection of doc
comments (and basic `#[doc=""]` attributes) into a separate field in the `Attributes` struct from
the other attributes. The primary output of this process is a `clean::Crate` with a tree of `Item`s
which describe the publicly-documentable items in the target crate.

### Hot potato

Before moving on to the next major step, a few important "passes" occur over the documentation.
These do things like combine the separate "attributes" into a single string and strip leading
whitespace to make the document easier on the markdown parser, or drop items that are not public or
deliberately hidden with `#[doc(hidden)]`. These are all implemented in the `passes/` directory, one
file per pass. By default, all of these passes are run on a crate, but the ones regarding dropping
private/hidden items can be bypassed by passing `--document-private-items` to rustdoc.

(Strictly speaking, you can fine-tune the passes run and even add your own, but [we're trying to
deprecate that][44136]. If you need finer-grain control over these passes, please let us know!)

[44136]: https://github.com/rust-lang/rust/issues/44136

## From clean to crate

This is where the "second phase" in rustdoc begins. This phase primarily lives in the `html/`
folder, and it all starts with `run()` in `html/render.rs`. This code is responsible for setting up
the `Context`, `SharedContext`, and `Cache` which are used during rendering, copying out the static
files which live in every rendered set of documentation (things like the fonts, CSS, and JavaScript
that live in `html/static/`), creating the search index, and printing out the source code rendering,
before beginning the process of rendering all the documentation for the crate.

Several functions implemented directly on `Context` take the `clean::Crate` and set up some state
between rendering items or recursing on a module's child items. From here the "page rendering"
begins, via an enormous `write!()` call in `html/layout.rs`. The parts that actually generate HTML
from the items and documentation occurs within a series of `std::fmt::Display` implementations and
functions that pass around a `&mut std::fmt::Formatter`. The top-level implementation that writes
out the page body is the `impl<'a> fmt::Display for Item<'a>` in `html/render.rs`, which switches
out to one of several `item_*` functions based on the kind of `Item` being rendered.

Depending on what kind of rendering code you're looking for, you'll probably find it either in
`html/render.rs` for major items like "what sections should I print for a struct page" or
`html/format.rs` for smaller component pieces like "how should I print a where clause as part of
some other item".

Whenever rustdoc comes across an item that should print hand-written documentation alongside, it
calls out to `html/markdown.rs` which interfaces with the Markdown parser. This is exposed as a
series of types that wrap a string of Markdown, and implement `fmt::Display` to emit HTML text. It
takes special care to enable certain features like footnotes and tables and add syntax highlighting
to Rust code blocks (via `html/highlight.rs`) before running the Markdown parser. There's also a
function in here (`find_testable_code`) that specifically scans for Rust code blocks so the
test-runner code can find all the doctests in the crate.

### From soup to nuts

(alternate title: ["An unbroken thread that stretches from those first `Cell`s to us"][video])

[video]: https://www.youtube.com/watch?v=hOLAGYmUQV0

It's important to note that the AST cleaning can ask the compiler for information (crucially,
`DocContext` contains a `TyCtxt`), but page rendering cannot. The `clean::Crate` created within
`run_core` is passed outside the compiler context before being handed to `html::render::run`. This
means that a lot of the "supplementary data" that isn't immediately available inside an item's
definition, like which trait is the `Deref` trait used by the language, needs to be collected during
cleaning, stored in the `DocContext`, and passed along to the `SharedContext` during HTML rendering.
This manifests as a bunch of shared state, context variables, and `RefCell`s.

Also of note is that some items that come from "asking the compiler" don't go directly into the
`DocContext` - for example, when loading items from a foreign crate, rustdoc will ask about trait
implementations and generate new `Item`s for the impls based on that information. This goes directly
into the returned `Crate` rather than roundabout through the `DocContext`. This way, these
implementations can be collected alongside the others, right before rendering the HTML.

## Other tricks up its sleeve

All this describes the process for generating HTML documentation from a Rust crate, but there are
couple other major modes that rustdoc runs in. It can also be run on a standalone Markdown file, or
it can run doctests on Rust code or standalone Markdown files. For the former, it shortcuts straight
to `html/markdown.rs`, optionally including a mode which inserts a Table of Contents to the output
HTML.

For the latter, rustdoc runs a similar partial-compilation to get relevant documentation in
`test.rs`, but instead of going through the full clean and render process, it runs a much simpler
crate walk to grab *just* the hand-written documentation. Combined with the aforementioned
"`find_testable_code`" in `html/markdown.rs`, it builds up a collection of tests to run before
handing them off to the libtest test runner. One notable location in `test.rs` is the function
`make_test`, which is where hand-written doctests get transformed into something that can be
executed.

## Dotting i's and crossing t's

So that's rustdoc's code in a nutshell, but there's more things in the repo that deal with it. Since
we have the full `compiletest` suite at hand, there's a set of tests in `src/test/rustdoc` that make
sure the final HTML is what we expect in various situations. These tests also use a supplementary
script, `src/etc/htmldocck.py`, that allows it to look through the final HTML using XPath notation
to get a precise look at the output. The full description of all the commands available to rustdoc
tests is in `htmldocck.py`.

In addition, there are separate tests for the search index and rustdoc's ability to query it. The
files in `src/test/rustdoc-js` each contain a different search query and the expected results,
broken out by search tab. These files are processed by a script in `src/tools/rustdoc-js` and the
Node.js runtime. These tests don't have as thorough of a writeup, but a broad example that features
results in all tabs can be found in `basic.js`. The basic idea is that you match a given `QUERY`
with a set of `EXPECTED` results, complete with the full item path of each item.

0 comments on commit 7aca34b

Please sign in to comment.