Merge remote-tracking branch 'origin/master' into vision

allenai · Oct 6, 2020 · e39a5f6 · e39a5f6
2 parents f1e46fd + 39ddb52
commit e39a5f6
Show file tree

Hide file tree

Showing 54 changed files with 1,381 additions and 345 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -11,10 +11,7 @@ assignees: ''
 Please fill this template entirely and do not erase any of it.
 We reserve the right to close without a response bug reports which are incomplete.
 
-If you can't fill in the checklist then it's likely that this is a question, not a bug,
-in which case it probably belongs on our discource forum instead:
-
-https://discourse.allennlp.org/
+If you have a question rather than a bug, please ask on [Stack Overflow](https://stackoverflow.com/questions/tagged/allennlp) rather than posting an issue here.
 -->
 
 ## Checklist

diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md
@@ -0,0 +1,10 @@
+---
+name: Question
+about: Ask a question
+title: ''
+labels: 'question'
+assignees: ''
+
+---
+
+Please ask questions on [Stack Overflow](https://stackoverflow.com/questions/tagged/allennlp) rather than on GitHub.  We monitor and triage questions on Stack Overflow with the AllenNLP label and questions there are more easily searchable for others.
diff --git a/.github/workflows/master.yml b/.github/workflows/master.yml
@@ -42,7 +42,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python: ['3.6', '3.7']
+        python: ['3.7', '3.8']
 
     steps:
     - uses: actions/checkout@v2
@@ -110,7 +110,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python: ['3.6', '3.7']
+        python: ['3.7', '3.8']
 
     steps:
     - uses: actions/checkout@v2
@@ -230,7 +230,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python: ['3.6', '3.7']
+        python: ['3.7', '3.8']
 
     steps:
     - name: Setup Python

diff --git a/.github/workflows/pull_request.yml b/.github/workflows/pull_request.yml
@@ -49,7 +49,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python: ['3.6', '3.7']
+        python: ['3.7', '3.8']
 
     steps:
     - uses: actions/checkout@v2
@@ -118,7 +118,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python: ['3.6', '3.7']
+        python: ['3.7', '3.8']
 
     steps:
     - uses: actions/checkout@v2

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -29,19 +29,58 @@ data loaders.  Those are coming soon.
 
 ### Added
 
+- Added a `build-vocab` subcommand that can be used to build a vocabulary from a training config file.
+- Added `tokenizer_kwargs` argument to `PretrainedTransformerMismatchedIndexer`.
+- Added `tokenizer_kwargs` and `transformer_kwargs` arguments to `PretrainedTransformerMismatchedEmbedder`.
+- Added official support for Python 3.8.
+- Added a script: `scripts/release_notes.py`, which automatically prepares markdown release notes from the
+  CHANGELOG and commit history.
+- Added a flag `--predictions-output-file` to the `evaluate` command, which tells AllenNLP to write the
+  predictions from the given dataset to the file as JSON lines.
 - Added the ability to ignore certain missing keys when loading a model from an archive. This is done
   by adding a class-level variable called `authorized_missing_keys` to any PyTorch module that a `Model` uses.
   If defined, `authorized_missing_keys` should be a list of regex string patterns.
 - Added `FBetaMultiLabelMeasure`, a multi-label Fbeta metric. This is a subclass of the existing `FBetaMeasure`.
+- Added ability to pass additional key word arguments to `cached_transformers.get()`, which will be passed on to `AutoModel.from_pretrained()`.
+- Added an `overrides` argument to `Predictor.from_path()`.
+- Added a `cached-path` command.
+- Added a function `inspect_cache` to `common.file_utils` that prints useful information about the cache. This can also 
+  be used from the `cached-path` command with `allennlp cached-path --inspect`.
+- Added a function `remove_cache_entries` to `common.file_utils` that removes any cache entries matching the given
+  glob patterns. This can used from the `cached-path` command with `allennlp cached-path --remove some-files-*`.
 
 ### Changed
 
+- Subcommands that don't require plugins will no longer cause plugins to be loaded or have an `--include-package` flag.
+- Allow overrides to be JSON string or `dict`.
 - `transformers` dependency updated to version 3.1.0.
+- When `cached_path` is called on a local archive with `extract_archive=True`, the archive is now extracted into a unique subdirectory of the cache root instead of a subdirectory of the archive's directory. The extraction directory is also unique to the modification time of the archive, so if the file changes, subsequent calls to `cached_path` will know to re-extract the archive.
+- Removed the `truncation_strategy` parameter to `PretrainedTransformerTokenizer`. The way we're calling the tokenizer, the truncation strategy takes no effect anyways.
+
+### Removed
+
+- Removed `common.util.is_master` function.
 
 ### Fixed
 
-- Ignore *args when constructing classes with `FromParams`.
-- Ensured some consistency in the types of the values that metrics return
+- Class decorators now displayed in API docs.
+- Fixed up the documentation for the `allennlp.nn.beam_search` module.
+- Ignore `*args` when constructing classes with `FromParams`.
+- Ensured some consistency in the types of the values that metrics return.
+- Fix a PyTorch warning by explicitly providing the `as_tuple` argument (leaving
+  it as its default value of `False`) to `Tensor.nonzero()`.
+- Remove temporary directory when extracting model archive in `load_archive`
+  at end of function rather than via `atexit`.
+- Fixed a bug where using `cached_path()` offline could return a cached resource's lock file instead
+  of the cache file.
+- Fixed a bug where `cached_path()` would fail if passed a `cache_dir` with the user home shortcut `~/`.
+- Fixed a bug in our doc building script where markdown links did not render properly
+  if the "href" part of the link (the part inside the `()`) was on a new line.
+- Changed how gradients are zeroed out with an optimization. See [this video from NVIDIA](https://www.youtube.com/watch?v=9mS1fIYj1So)
+  at around the 9 minute mark.
+- Fixed a bug where parameters to a `FromParams` class that are dictionaries wouldn't get logged
+  when an instance is instantiated `from_params`.
+- Fixed a bug in distributed training where the vocab would be saved from every worker, when it should have been saved by only the local master process.
 
 ## [v1.1.0](https://github.com/allenai/allennlp/releases/tag/v1.1.0) - 2020-09-08
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -74,7 +74,7 @@ When you're ready to contribute code to address an open issue, please follow the
         upstream https://github.com/allenai/allennlp.git (fetch)
         upstream https://github.com/allenai/allennlp.git (push)
 
-    Finally, you'll need to create a Python 3.6 or 3.7 virtual environment suitable for working on AllenNLP. There a number of tools out there that making working with virtual environments easier, but the most direct way is with the [`venv` module](https://docs.python.org/3.7/library/venv.html) in the standard library.
+    Finally, you'll need to create a Python 3 virtual environment suitable for working on AllenNLP. There a number of tools out there that making working with virtual environments easier, but the most direct way is with the [`venv` module](https://docs.python.org/3.7/library/venv.html) in the standard library.
 
     Once your virtual environment is activated, you can install your local clone in "editable mode" with
 

diff --git a/Dockerfile b/Dockerfile
@@ -1,7 +1,7 @@
 # This Dockerfile creates an environment suitable for downstream usage of AllenNLP.
 # It's built from a wheel installation of allennlp.
 
-FROM python:3.7
+FROM python:3.8
 
 ENV LC_ALL=C.UTF-8
 ENV LANG=C.UTF-8

diff --git a/Dockerfile.test b/Dockerfile.test
@@ -1,6 +1,6 @@
 # Used to build an image for running tests.
 
-FROM python:3.7
+FROM python:3.8
 
 ENV LC_ALL=C.UTF-8
 ENV LANG=C.UTF-8

diff --git a/README.md b/README.md
@@ -29,8 +29,9 @@
 
 - [Website](https://allennlp.org/)
 - [Guide](https://guide.allennlp.org/)
-- [Forum](https://discourse.allennlp.org)
 - [Documentation](https://docs.allennlp.org/) ( [latest](https://docs.allennlp.org/latest/) | [stable](https://docs.allennlp.org/stable/) | [master](https://docs.allennlp.org/master/) )
+- [Forum](https://discourse.allennlp.org)
+- [Stack Overflow](https://stackoverflow.com/questions/tagged/allennlp)
 - [Contributing Guidelines](CONTRIBUTING.md)
 - [Officially Supported Models](https://github.com/allenai/allennlp-models)
     - [Pretrained Models](https://github.com/allenai/allennlp-models/blob/master/allennlp_models/pretrained.py)
@@ -49,7 +50,7 @@ created a couple of template repositories that you can use as a starting place:
 * If you'd prefer to use python code to configure your experiments and run your training loop, use
   [this template](https://github.com/allenai/allennlp-template-python-script). There are a few
   things that are currently a little harder in this setup (loading a saved model, and using
-  distributed training), but except for those its functionality is equivalent to the config files
+  distributed training), but otherwise it's functionality equivalent to the config files
   setup.
 
 In addition, there are external tutorials:
@@ -105,12 +106,12 @@ We support AllenNLP on Mac and Linux environments. We presently do not support W
 #### Setting up a virtual environment
 
 [Conda](https://conda.io/) can be used set up a virtual environment with the
-version of Python required for AllenNLP.  If you already have a Python 3.6 or 3.7
+version of Python required for AllenNLP.  If you already have a Python 3
 environment you want to use, you can skip to the 'installing via pip' section.
 
 1.  [Download and install Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html).
 
-2.  Create a Conda environment with Python 3.7:
+2.  Create a Conda environment with Python 3.7 (3.6 or 3.8 would work as well):
 
     ```
     conda create -n allennlp python=3.7

diff --git a/RELEASE_PROCESS.md b/RELEASE_PROCESS.md
@@ -1,63 +1,62 @@
 # AllenNLP GitHub and PyPI Release Process
 
 This document describes the procedure for releasing new versions of the core library.
-Most of the heavy lifting is actually done on GitHub Actions.
-All you have to do is ensure the version in `allennlp/version.py` matches the target release version
-and then trigger a GitHub release with the right tag.
 
 > ❗️ This assumes you are using a clone of the main repo with the remote `origin` pointed
 to `git@github.com:allenai/allennlp.git` (or the `HTTPS` equivalent).
 
-The format of the tag should be `v{VERSION}`, i.e. the intended version of the release preceeded with a `v`.
-So for the version `1.0.0` release the tag will be `v1.0.0`.
-
-To make things easier, start by setting the tag to an environment variable, `TAG`.
-Then you can copy and paste the commands below without worrying about mistyping the tag.
-
 ## Steps
 
-1. Update `allennlp/version.py` (if needed) with the correct version and the `CHANGELOG.md` so that everything under the "Unreleased" section is now under a section corresponding to this release. Then commit and push these changes with:
+1. Set the environment variable `TAG`, which should be of the form `v{VERSION}`.
+
+    For example, if the version of the release is `1.0.0`, you should set `TAG` to `v1.0.0`:
 
+    ```bash
+    export TAG='v1.0.0'
     ```
-    git commit -a -m "Prepare for release $TAG"
-    git push
+
+    Or if you use `fish`:
+
+    ```fish
+    set -x TAG 'v1.0.0'
     ```
-
-    At this point `echo $TAG` should exactly match the output of `./scripts/get_version.py current`.
 
-2. Then add the tag in git to mark the release:
+2. Update `allennlp/version.py` with the correct version. Then check that the output of
 
     ```
-    git tag $TAG -m "Release $TAG"
+    python scripts/get_version.py current
     ```
 
-3. Push the tag to the main repo.
+    matches the `TAG` environment variable.
+
+3. Update the `CHANGELOG.md` so that everything under the "Unreleased" section is now under a section corresponding to this release.
+
+4. Commit and push these changes with:
+
+    ```
+    git commit -a -m "Prepare for release $TAG" && git push
+    ```
+
+5. Then add the tag in git to mark the release:
 
     ```
-    git push --tags origin master
+    git tag $TAG -m "Release $TAG" && git push --tags
     ```
 
-4. Find the tag you just pushed [on GitHub](https://github.com/allenai/allennlp/tags) and
-click edit. Now copy over the latest section from the [`CHANGELOG.md`](https://raw.githubusercontent.com/allenai/allennlp/master/CHANGELOG.md). And finally, add a section called "Commits" with the output of a command like the following:
+6. Find the tag you just pushed [on GitHub](https://github.com/allenai/allennlp/tags), click edit, then copy over the output of:
 
-    ```bash
-    OLD_TAG=$(git describe --always --tags --abbrev=0 $TAG^)
-    git log $OLD_TAG..$TAG --oneline
     ```
-
-    ```fish
-    set -x OLD_TAG (git describe --always --tags --abbrev=0 $TAG^)
-    git log $OLD_TAG..$TAG --oneline
+    python scripts/release_notes.py
     ```
 
     On a Mac, for example, you can just pipe the above command into `pbcopy`.
 
-5. Click "Publish Release", and if this is a pre-release make sure you check that box.
+7. Check the box "This is a pre-release" if the release is a release candidate (ending with `rc*`). Otherwise leave it unchecked.
 
-That's it! GitHub Actions will handle the rest.
+8. Click "Publish Release". GitHub Actions will then handle the rest, including publishing the package to PyPI the Docker image to Docker Hub.
 
 
-6. After publishing the release for the core repo, follow the same process to publish a release for the `allennlp-models` repo.
+9. After the [GitHub Actions workflow](https://github.com/allenai/allennlp/actions?query=workflow%3AMaster+event%3Arelease) finishes, follow the same process to publish a release for the `allennlp-models` repo.
 
 
 ## Fixing a failed release

diff --git a/allennlp/commands/__init__.py b/allennlp/commands/__init__.py
@@ -1,10 +1,13 @@
 import argparse
 import logging
-from typing import Any, Optional
+import sys
+from typing import Any, Optional, Tuple, Set
 
 from overrides import overrides
 
 from allennlp import __version__
+from allennlp.commands.build_vocab import BuildVocab
+from allennlp.commands.cached_path import CachedPath
 from allennlp.commands.evaluate import Evaluate
 from allennlp.commands.find_learning_rate import FindLearningRate
 from allennlp.commands.predict import Predict
@@ -46,28 +49,54 @@ def add_argument(self, *args, **kwargs):
         super().add_argument(*args, **kwargs)
 
 
-def create_parser(prog: Optional[str] = None) -> argparse.ArgumentParser:
+def parse_args(prog: Optional[str] = None) -> Tuple[argparse.ArgumentParser, argparse.Namespace]:
     """
-    Creates the argument parser for the main program.
+    Creates the argument parser for the main program and uses it to parse the args.
     """
     parser = ArgumentParserWithDefaults(description="Run AllenNLP", prog=prog)
     parser.add_argument("--version", action="version", version=f"%(prog)s {__version__}")
 
     subparsers = parser.add_subparsers(title="Commands", metavar="")
 
-    for subcommand_name in sorted(Subcommand.list_available()):
-        subcommand_class = Subcommand.by_name(subcommand_name)
-        subcommand = subcommand_class()
-        subparser = subcommand.add_subparser(subparsers)
-        subparser.add_argument(
-            "--include-package",
-            type=str,
-            action="append",
-            default=[],
-            help="additional packages to include",
-        )
+    subcommands: Set[str] = set()
+
+    def add_subcommands():
+        for subcommand_name in sorted(Subcommand.list_available()):
+            if subcommand_name in subcommands:
+                continue
+            subcommands.add(subcommand_name)
+            subcommand_class = Subcommand.by_name(subcommand_name)
+            subcommand = subcommand_class()
+            subparser = subcommand.add_subparser(subparsers)
+            if subcommand_class.requires_plugins:
+                subparser.add_argument(
+                    "--include-package",
+                    type=str,
+                    action="append",
+                    default=[],
+                    help="additional packages to include",
+                )
+
+    # Add all default registered subcommands first.
+    add_subcommands()
+
+    # If we need to print the usage/help, or the subcommand is unknown,
+    # we'll call `import_plugins()` to register any plugin subcommands first.
+    argv = sys.argv[1:]
+    plugins_imported: bool = False
+    if not argv or argv == ["--help"] or argv[0] not in subcommands:
+        import_plugins()
+        plugins_imported = True
+        # Add subcommands again in case one of the plugins has a registered subcommand.
+        add_subcommands()
+
+    # Now we can parse the arguments.
+    args = parser.parse_args()
+
+    if not plugins_imported and Subcommand.by_name(argv[0]).requires_plugins:  # type: ignore
+        import_plugins()
 
-    return parser
+    return parser, args
 
 
 def main(prog: Optional[str] = None) -> None:
@@ -77,17 +106,14 @@ def main(prog: Optional[str] = None) -> None:
     work for them, unless you use the ``--include-package`` flag or you make your code available
     as a plugin (see [`plugins`](./plugins.md)).
     """
-    import_plugins()
-
-    parser = create_parser(prog)
-    args = parser.parse_args()
+    parser, args = parse_args(prog)
 
     # If a subparser is triggered, it adds its work as `args.func`.
     # So if no such attribute has been added, no subparser was triggered,
     # so give the user some help.
     if "func" in dir(args):
         # Import any additional modules needed (to register custom classes).
-        for package_name in args.include_package:
+        for package_name in getattr(args, "include_package", []):
             import_module_and_submodules(package_name)
         args.func(args)
     else: