Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

New docs #3602

Merged
merged 30 commits into from Jan 13, 2020
Merged

New docs #3602

merged 30 commits into from Jan 13, 2020

Conversation

DeNeutoy
Copy link
Contributor

@DeNeutoy DeNeutoy commented Jan 10, 2020

This PR brings a new way of building docs for allennlp:

Pros

  • docstrings are now markdown rather than Restructured Text
  • The docs look much nicer
  • Our tutorials are integrated with the docs
  • The docs have very good search functionality with autocomplete etc.
  • We do not have to manually maintain a doc structure; instead, it is auto-generated.
  • Responsive design that will be usable on mobile
  • Google Analytics integration

Cons

  • The docs take longer to generate, making re-building them automatically on every PR not feasible.

The new docs use:

Build Process

The docs are built using a new scripts/build_docs.py script, which crawls through the allennlp/allennlp directory, generating documentation in the docs/api/relative/path/to/file.md directory. Additionally, this script builds a YAML representation of the new docs, which is inserted into the mkdocs.yml file at the root of the repository, which describes how to build the mkdocs site. The site can then be built using mkdocs build.

Instead of doing this on every commit, I have added a --check flag to this script, which allows us to check if any new files which need documenting have been added. If this check failed in CI, someone would just have to run python scripts/build_docs.py and commit the changes. This is not strictly necessary (as we can just run it when we want to deploy the docs), but helps keep the repository up to date with the docs that we have deployed. I could remove this check entirely if people think it's not necessary. It's also feasible that we could not check in the docs/api section, as it is a lot of files/LOC. I'm easy either way, I included it mostly for clarity of what's happening and what the auto-generated files look like.

Changes

  • Remove old docs
  • Move most markdown tutorials into the docs/tutorials directory, so they can be included in the docs. Resolves part of Update tutorials for the new split repos and v1.0 #3438
  • Adds a new build script which generates the docs from the allennlp source.
  • Added docs/, containing the new markdown docs.

After reviewing this PR, I need to make a few more changes. In particular, to make the new docs render parameters for classes and functions nicely, I need to change all occurrences of

Parameters
----------

to

# Parameters

I am waiting until this is reviewed to do this, because it will touch almost every file in the repository, so I want to make that change and merge immediately, so I don't have to resolve 1 trillion merge conflicts.

@DeNeutoy DeNeutoy changed the title New docs [WIP] New docs Jan 10, 2020
@DeNeutoy
Copy link
Contributor Author

After some discussion at the issue review, the only docs that will be checked in are those which are not autogenerated.

Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting to see actual docstring changes, as if they are rendered with markdown instead of rst there will need to be a bunch of changes. You mentioned the Parameters one, but there are a bunch more.

docs/README.md Outdated
@@ -0,0 +1,220 @@
<p align="center"><img width="40%" src="doc/static/allennlp-logo-dark.png" /></p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating the file at the top level? Was that removed? Oh, was this an autogenerated file that you didn't intend to be included at this point? Same with the other top-level markdown documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I can remove them. They are copied in.

@DeNeutoy
Copy link
Contributor Author

Yes, there are a lot more changes - however, i'm only going to do the Parameters and Returns ones initially, as I need to do these changes immediately before merging, otherwise the conflicts will be hideous. Instead, we discussed having an allennlp docs party to clean up the rest of them as a team.

@DeNeutoy DeNeutoy requested a review from schmmd January 11, 2020 01:33
@DeNeutoy DeNeutoy changed the title [WIP] New docs New docs Jan 11, 2020

# doc theme
sphinx_rtd_theme
pydoc-markdown==2.0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be pinned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't user requirements, so it's less important that they are unpinned. We should try using a bot to upgrade some of these dependencies.

@@ -20,11 +20,6 @@ This tutorial will show both you how to create `DatasetReader`s
that allow this lazy behavior, and how to handle this laziness
when training a model.

If you're not interested in the details,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you're changing this in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I've moved 90% of the tutorials to live in the docs, and this one was unnecessary. It made sense to delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -20,11 +20,6 @@ This tutorial will show both you how to create `DatasetReader`s
that allow this lazy behavior, and how to handle this laziness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this file, but it's the closest file I can comment on: why have docs/api/.gitkeep? You want the directory to be around in some script that generates the docs? Why not just create that directory in the script?

mkdocs.yml Outdated
repo_url: https://github.com/allenai/allennlp
edit_uri: edit/master/docs/
#google_analytics:
# - 'UA-133183413-1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a safe id to have in source control?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's served publicly from the pages that register GA. For example, it is on view-source:https://demo.allennlp.org/reading-comprehension (and we just copy and paste code from Google).

from ruamel.yaml import YAML


exclude_files = [".DS_Store", "__init__.py", "__init__.pyc", "README.md", "version.py", "run.py"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation in any __init__.py files that we want to keep? If so, where should it go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why exclude run.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have docs in __init__ in my opinion, to make discoverability easier. run.py is excluded because no user will actually use this, as it is installed as an entrypoint in setup.py, meaning the entrypoint is allennlp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few cases where we have non-trivial logic in an __init__ file. It'd be nice if those showed up in the docs somehow. Not really sure how to accomplish this. I'm fine with moving whatever docs are in an existing (trivial) __init__ file to a better location.

On run.py, it might be nice to include it with some better documentation around "if you want to write your own entrypoint to allennlp, instead of using ours, you might take this script as a starting place."

Copy link
Contributor Author

@DeNeutoy DeNeutoy Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other cases apart from the PytorchXtoXWrappers in modules/etc? Those probably shouldn't be in the docs.

We don't want people to be writing "entrypoints" for allennlp (we want them to just write scripts, using allennlp as a library) and anyway, run.py is not the place to start for someone writing a script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the others I was thinking of (initializers, learning rate schedulers, regularizers), and the logic I was thinking of is not actually inside an __init__ file, so that was a false alarm.

By "entrypoint" I just meant "script", but I just checked what's in run.py, and you're right, that's not a good example, and it doesn't need to be in the docs.

def render_docs(relative_src_path: str, src_file: str, to_file: str, modifier="++") -> None:
"""
Shells out to pydocmd, which creates a .md file from the docstrings of python functions and classes in
the file we specify.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to include more description of the modifier parameter here. Say what the default does, and point to documentation for what the other options are and what they do (e.g., "See usage information here: https://pypi.org/project/pydoc-markdown/").

exclude_files = [".DS_Store", "__init__.py", "__init__.pyc", "README.md", "version.py", "run.py"]


def render_docs(relative_src_path: str, src_file: str, to_file: str, modifier="++") -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name makes me think this should operate on more than one file. You have build_docs_for_file below, but just render_docs here. Maybe render_file instead?

"""
Build docs for an individual python file. If `check` is passed, we don't generate the docs themselves,
but instead we just generate what the corresponding mkdocs YAML would be, so we can compare it to the
current version and see if it has changed (which would mean the docs need re-building).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what is in the YAML - if I update a docstring without changing the module structure at all, would this trigger a rebuild?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was a halfway house solution to retain some checks - but I'll remove this as we don't need it.

if args.check:
if nav_obj[docs_key] != nav_entries:
raise ValueError(
"Changes are required to the API documentation. Run `./scripts/build_docs.py` and commit the changes."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions aren't true anymore, right?

"Changes are required to the API documentation. Run `./scripts/build_docs.py` and commit the changes."
)
else:
print("No changes required!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking at this whole check logic, I don't think it does what we want. We'd want to update minor docstring fixes, and those wouldn't be caught. We should probably just remove the whole check stuff, and always rebuild on master commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also potentially run render_file on all files that were touched in a specific PR, to make sure there aren't rendering errors introduced by that PR. But that doesn't need to be done now.

Copy link
Member

@schmmd schmmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Love it.

We should note that we're changing from updating the docs per release to updating the docs on an on-going basis. I think this is important, and if we want to additionally add per-release docs (e.g. http://docs.allennlp.org/v0.8.0) we can do that additionally. Presently I don't know which version our docs correspond to exactly...

@DeNeutoy
Copy link
Contributor Author

@schmmd Just FYI, this change doesn't really necessitate changing how often we release docs. Broadly, i'm in favour of Spacy's way of maintaining a single source for docs, and just noting new features which will only work with certain versions. I think we can solidify that once 1) We release 1.0 and 2) we deploy docs somewhere other than Github Pages.

@DeNeutoy DeNeutoy merged commit 7d0d0d9 into allenai:master Jan 13, 2020
@DeNeutoy DeNeutoy deleted the new_docs branch January 13, 2020 17:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants