VQAv2 (#4639)

* albert works, but bert-base-uncased still gives zero gradients * Note * Formatting * Adds Registrable base classes for image operations * Adds a real example of a image2image module * Run the new code (without implementation) in the nlvr2 reader * Solve some issue involving circular imports * add new modules for vilbert * add parameters for detectron image loader. * push current code on implementing proposal generator. * push current progress on proposal generator * Update FasterRCNNProposalGenerator & Merge Detectron2 config * Loading of weights should now work * black, flake, mypy * Run detectron pipeline pieces one at a time This is unfinished and will not run this way. * Fix the data format for the backbone * Handle image sizes separately * remove drop and mask functionality from reader * make comment better * remove proposal_embedder, and finish proposal generator * working on grid embedder * added simple test for resnet backbone, which passes * Got proposal generator test passing * Change default number of detections per image: 100 => 36 * Fix detectron config hierarchy: test_detectron_per_image * Make number of detections configurable & Add test * rename ProposalGenerator to RegionDetector * try to fix makefile * another attempt at makefile * quotes in the pip command... * added a simple test for the dataset reader, made it pass * add feature caching to the dataset reader * another try with the makefile * a better temporary fix for installing detectron * writing files before committing is good... * fix tests * fix (at least part of) the vilbert tests * ok, this makefile change should actually work * add torchvision, try to remove eager import of detectron code * flake * cleanup * more cleanup * mypy, flake * add back code I shouldn't have removed * black * test and flake fixes * fix region_detector for multiple images and add feature and coords padding * fix imports * restore null grid embedder * add back (todo) null region detector * Bring back import changes, to fix circular imports caused by NLVR2 reader * region detector test passing * model test finally passing * update torchvision version * add vqav2 dataset * add gpu support for detectron feature extraction * add lmdbCache to cache feature into lmdb database * fix typo * update vqa jsonnet * fix url adding by cat * Fixes type annotation * Fixes borked error message * New feature cache * Formatting * Fix the tensor cache * Be explicit about our dependencies * Use the new tensor cache * Adds a test using the tensor cache * Run NLVR dataprep on GPU * Tqdm when finding images * Fixes padding in array field * Adjust max_length when truncating in PretrainedTransformerTokenizer * Fewer print statements * remove VQA from this branch and copy default vilbert parameters. * add VQAv2 dataset * Added dataset reader and model tests, which are now passing * Sanjay's vision features cache script (#4633) * Use LMDB cache in NLVR2 dataset reader; fix a few typos * Standalone script for caching image features * Removing reference to LMDB cache in NLVR2 dataset reader * Adding back asterisk in nlvr2 dataset reader * Fixing one variable name mistake * Decreasing batch size and making a few cuda-related changes * Loading images in batches to avoid GPU OOM error * Pedantic changes for consistency * Run the pre-processing with the models and not the data loading * Filter out paths of images already cached * Add image extensions other than png * Fixes import error * Makes the vision features script work alongside other scripts or training runs Co-authored-by: sanjays <sanjays@ip-10-0-0-157.us-west-2.compute.internal> Co-authored-by: sanjays <sanjays@ip-10-1-10-157.us-west-2.compute.internal> Co-authored-by: Sanjay Subramanian <sanjays@allennlp-server1.corp.ai2> Co-authored-by: Sanjay Subramanian <sanjays_ssubramanian@hotmail.com> * Adds missing imports * Makes TensorCache into a real MutableMapping * Formatting * Changelog * Fix typecheck * Makes the NLVR2 reader work with Pete's new code * Fix type annotation * Formatting * Backwards compatibility * Restore NLVR to former glory * Types and multi-process reading for VQAv2 * Formatting * Fix tests * Fix broken config * Update grid embedder test * Fix vilbert_from_huggingface configuration * Don't run the vilbert_from_huggingface test anymore * Remove unused test fixtures * Fix the region detector test * Fix vilbert-from-huggingface and bring it back * Fuck the linter * Fix for VQA test * Why was this metric disabled? * Black and flake * Re-add VQA reader * Image featurizers now need to be called with sizes * Run the region detector test on GPU * Run more stuff on GPU The CPU test runner doesn't have enough memory. * Depend on newer version of Detectron * Reinstall Detectron before running tests * Just force CUDA to be on, instead of reinstalling Detecton2 * Fixes VQA2 DatasetReader * Fix documentation * Detectron needs CUDA_HOME to be set during install At least this thing fails quickly. * Try a different way of wrangling the detectron installer * Try a different way of wrangling the detectron installer * Bring back amp * Refactored VQA reader * More training paths * Remove debug code * Don't check in debug code * Auto-detect GPU to use * Apply indexers later * Fix typo * Register the model * Fields live on CPU. Only batches get GPUs. * black * black, flake * mypy * more flake * More realistic training config * Adds a basic Predictor for VQAv2 * Make vilbert output human-readable * Forgot to enumerate * Use the right namspace * Trying to make tests faster, and passing * add image prefix when loading coco image * fix vqav2 dataset reader and config file * use two regions, to make tests pass * black * Output probabilities in addition to logits * Make it possible to turn off the cache * Turn off the cache in the predictor * Fix the VQA predictor * change the experiment to the defualt vilbert hyperparams. * add default experiment_from_huggingface.json * fix typos in vqa reader * Proper probabilities * Formatting * Remove unused variable * Make mypy happy * Fixed loss function, metric, and got tests to pass * Updates the big training config * Put real settings into the vilbert_vqa config * Strings are lists in Python * Make mypy happy * Formatting * Unsatisfying mypy * Config changes to make this run * Fix dimensionality of embeddings * clean the code and add the image_num_heads and combine_num_heads * fix answer vocab and add save and load from pre-extracted vocab * fix loss and update save_answer_vocab script * Typo * Fixed fusion method * Tweaking the VQA config some more * Moved the from_huggingface config * 20 epochs * Set up the learning rate properly * Simplify * Hardcoded answer vocab * Don't be lazy * Steps per epoch cannot be None * Let's chase the right score * Fixing some parameter names * Fields are stored on CPUs * Bigger batch size, easier distributed training * Don't run the debug code by default * VQA with the Transformer Toolkit (#4729) * transformer toolkit: BertEmbeddings * transformer toolkit: BertSelfAttention * transformer toolkit: BertSelfOutput * transformer toolkit: BertAttention * transformer toolkit: BertIntermediate * transformer toolkit: BertOutput * transformer toolkit: BertLayer * transformer toolkit: BertBiAttention * transformer toolkit: BertEmbeddings * transformer toolkit: BertSelfAttention * transformer toolkit: BertSelfOutput * transformer toolkit: BertAttention * transformer toolkit: BertIntermediate * transformer toolkit: BertOutput * transformer toolkit: BertLayer * transformer toolkit: BertBiAttention * Attention scoring functions * merging output and self output * utility to replicate layers, further cleanup * adding sinusoidal positional encoding * adding activation layer * adding base class for generic loading of pretrained weights * further generalizing, adding tests * updates * adding bimodal encoder, kwargs in from_pretrained_module * vilbert using transformer toolkit * fixing test function * changing to torch.allclose * fixing attention score api * bug fix in bimodal output * changing to older attention modules * _construct_default_mapping returns mapping * adding kwargs to _get_input_arguments, adding examples * using cached_transformers * making transformer_encoder more general * added get_relevant_module, loading by name * fixing constructor name * undoing failure after merge * misc minor changes * Transformer toolkit (#4577) * transformer toolkit: BertEmbeddings * transformer toolkit: BertSelfAttention * transformer toolkit: BertSelfOutput * transformer toolkit: BertAttention * transformer toolkit: BertIntermediate * transformer toolkit: BertOutput * transformer toolkit: BertLayer * transformer toolkit: BertBiAttention * transformer toolkit: BertEmbeddings * transformer toolkit: BertSelfAttention * transformer toolkit: BertSelfOutput * transformer toolkit: BertAttention * transformer toolkit: BertIntermediate * transformer toolkit: BertOutput * transformer toolkit: BertLayer * transformer toolkit: BertBiAttention * Attention scoring functions * merging output and self output * utility to replicate layers, further cleanup * adding sinusoidal positional encoding * adding activation layer * adding base class for generic loading of pretrained weights * further generalizing, adding tests * updates * adding bimodal encoder, kwargs in from_pretrained_module * vilbert using transformer toolkit * fixing test function * changing to torch.allclose * fixing attention score api * bug fix in bimodal output * changing to older attention modules * _construct_default_mapping returns mapping * adding kwargs to _get_input_arguments, adding examples * using cached_transformers * making transformer_encoder more general * added get_relevant_module, loading by name * fixing constructor name * undoing failure after merge * misc minor changes Co-authored-by: Dirk Groeneveld <dirkg@allenai.org> * separate num_attention_heads for both modalities, default arguments * adding tests for toolkit examples * debug statements for failing test * removing debug statements, reordering * Typo * Some compatibility with the transformer toolkit * Reorganize the image inputs * More transformer toolkit compatibility * Debug settings * Let's be more tolerant * Fix how VilBERT runs Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> * Make the region detector and region embedder lazy * Fix references to the model * Make various automated tests pass * Formatting * More logging * One more logging statement * Read answer vocab from vocab file instead of determining it automatically * Don't keep the files open so long * Use most of the validation set for training as well * Get ready to be lazy * Upgrade paths * Be lazy * Keep unanswerable questions only during test time * Fix the from_huggingface config * Fixes the VQA score * VQA specific metric * Fixes some tests * Tests pass! * Formatting * Use the correct directory * Use the region detector that's meant for testing * Read the test split properly * Be a little more verbose while discovering images * Modernize Vilbert VQA * Update NLVR, but it still doesn't run * Formatting * Remove NLVR * Fix the last test * Formatting * Conditionally export the VilbertVqaPredictor * ModuleNotFoundError is a type of ImportError * Fix test-install * Try the broken test with a fixed seed * Try a bunch of seeds * Smaller model to get bigger magnitudes * Now that the test works, we don't need to specify the seeds anymore Co-authored-by: Matt Gardner <mattg@allenai.org> Co-authored-by: jiasenlu <jiasenlu@gatech.edu> Co-authored-by: Jaemin Cho <heythisischo@gmail.com> Co-authored-by: jiasenlu <echosenm@gmail.com> Co-authored-by: sanjays <sanjays@ip-10-0-0-157.us-west-2.compute.internal> Co-authored-by: sanjays <sanjays@ip-10-1-10-157.us-west-2.compute.internal> Co-authored-by: Sanjay Subramanian <sanjays@allennlp-server1.corp.ai2> Co-authored-by: Sanjay Subramanian <sanjays_ssubramanian@hotmail.com> Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com> Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com>
allenai · Nov 23, 2020 · b659e66 · b659e66
1 parent c787230
commit b659e66
Show file tree

Hide file tree

Showing 47 changed files with 1,764 additions and 689 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- Added `TensorCache` class for caching tensors on disk
+- Added reader for the NLVR2 dataset
+- Added cache for Detectron models that we might re-use several times in the code base
+- Added abstraction and concrete implementation for image loading
+- Added abstraction and concrete implementation for `GridEmbedder`
+- Added abstraction and demo implementation for an image augmentation module.
+- Added abstraction and concrete implementation for region detectors.
 - A new high-performance default `DataLoader`: `MultiProcessDataLoading`.
 - A `MultiTaskModel` and abstractions to use with it, including `Backbone` and `Head`.  The
   `MultiTaskModel` first runs its inputs through the `Backbone`, then passes the result (and
@@ -33,7 +40,7 @@ dataset at every epoch) and a `MultiTaskScheduler` (for ordering the instances w
 - The `DataLoader` now decides whether to load instances lazily or not.
   With the `PyTorchDataLoader` this is controlled with the `lazy` parameter, but with
   the `MultiProcessDataLoading` this is controlled by the `max_instances_in_memory` setting.
-- `TensorField` is now implemented in terms of torch tensors, not numpy.
+- `ArrayField` is now called `TensorField`, and implemented in terms of torch tensors, not numpy.
 
 
 ## Unreleased (1.x branch)

diff --git a/allennlp/commands/train.py b/allennlp/commands/train.py
@@ -483,6 +483,9 @@ def _train_worker(
     return None
 
 
+DataPath = Union[str, List[str], Dict[str, str]]
+
+
 class TrainModel(Registrable):
     """
     This class exists so that we can easily read a configuration file with the `allennlp train`
@@ -554,16 +557,16 @@ def from_partial_objects(
         serialization_dir: str,
         local_rank: int,
         dataset_reader: DatasetReader,
-        train_data_path: str,
+        train_data_path: DataPath,
         model: Lazy[Model],
         data_loader: Lazy[DataLoader],
         trainer: Lazy[Trainer],
         vocabulary: Lazy[Vocabulary] = Lazy(Vocabulary),
         datasets_for_vocab_creation: List[str] = None,
         validation_dataset_reader: DatasetReader = None,
-        validation_data_path: str = None,
+        validation_data_path: DataPath = None,
         validation_data_loader: Lazy[DataLoader] = None,
-        test_data_path: str = None,
+        test_data_path: DataPath = None,
         evaluate_on_test: bool = False,
         batch_weight_key: str = "",
     ) -> "TrainModel":

diff --git a/allennlp/common/testing/model_test_case.py b/allennlp/common/testing/model_test_case.py
@@ -73,6 +73,7 @@ def ensure_model_can_train_save_and_load(
         metric_terminal_value: float = None,
         metric_tolerance: float = 1e-4,
         disable_dropout: bool = True,
+        seed: int = None,
     ):
         """
         # Parameters
@@ -108,6 +109,11 @@ def ensure_model_can_train_save_and_load(
             If True we will set all dropout to 0 before checking gradients. (Otherwise, with small
             datasets, you may get zero gradients because of unlucky dropout.)
         """
+        if seed is not None:
+            random.seed(seed)
+            numpy.random.seed(seed)
+            torch.manual_seed(seed)
+
         save_dir = self.TEST_DIR / "save_and_load_test"
         archive_file = save_dir / "model.tar.gz"
         model = train_model_from_file(param_file, save_dir, overrides=overrides)

diff --git a/allennlp/common/util.py b/allennlp/common/util.py
@@ -27,6 +27,7 @@
     Tuple,
     TypeVar,
     Union,
+    Sequence,
 )
 
 import numpy
@@ -143,7 +144,7 @@ def lazy_groups_of(iterable: Iterable[A], group_size: int) -> Iterator[List[A]]:
 
 
 def pad_sequence_to_length(
-    sequence: List,
+    sequence: Sequence,
     desired_length: int,
     default_value: Callable[[], Any] = lambda: 0,
     padding_on_right: bool = True,
@@ -174,6 +175,7 @@ def pad_sequence_to_length(
 
     padded_sequence : `List`
     """
+    sequence = list(sequence)
     # Truncates the sequence to the desired length.
     if padding_on_right:
         padded_sequence = sequence[:desired_length]
@@ -342,8 +344,8 @@ def import_module_and_submodules(package_name: str) -> None:
         # Import at top level
         try:
             module = importlib.import_module(package_name)
-        except ModuleNotFoundError as err:
-            if err.name in ("detectron2", "torchvision"):
+        except ImportError as err:
+            if err.name in {"detectron2", "torchvision"}:
                 logger.warning(
                     "vision module '%s' is unavailable since '%s' is not installed",
                     package_name,
@@ -651,6 +653,21 @@ def format_size(size: int) -> str:
     return f"{size}B"
 
 
+def nan_safe_tensor_divide(numerator, denominator):
+    """Performs division and handles divide-by-zero.
+
+    On zero-division, sets the corresponding result elements to zero.
+    """
+    result = numerator / denominator
+    mask = denominator == 0.0
+    if not mask.any():
+        return result
+
+    # remove nan
+    result[mask] = 0.0
+    return result
+
+
 def shuffle_iterable(i: Iterable[T], pool_size: int = 1024) -> Iterable[T]:
     import random
 

diff --git a/allennlp/data/data_loaders/multi_process_data_loader.py b/allennlp/data/data_loaders/multi_process_data_loader.py
@@ -81,7 +81,7 @@ class MultiProcessDataLoader(DataLoader):
     max_instances_in_memory: `int`, optional (default = `None`)
         If not specified, all instances will be read and cached in memory for the duration
         of the data loader's life. This is generally ideal when your data can fit in memory
-        during training. However, when you're datasets are too big, using this option
+        during training. However, when your datasets are too big, using this option
         will turn on lazy loading, where only `max_instances_in_memory` instances are processed
         at a time.
 

diff --git a/allennlp/data/dataset_readers/__init__.py b/allennlp/data/dataset_readers/__init__.py
@@ -20,7 +20,7 @@
 from allennlp.data.dataset_readers.text_classification_json import TextClassificationJsonReader
 
 try:
-    from allennlp.data.dataset_readers.nlvr2 import Nlvr2Reader
+    from allennlp.data.dataset_readers.vqav2 import VQAv2Reader
 except ModuleNotFoundError as err:
     if err.name not in ("detectron2", "torchvision"):
         raise
diff --git a/allennlp/data/dataset_readers/dataset_reader.py b/allennlp/data/dataset_readers/dataset_reader.py
@@ -1,7 +1,7 @@
 from dataclasses import dataclass
 import itertools
 from os import PathLike
-from typing import Iterable, Iterator, Optional, Union, TypeVar
+from typing import Iterable, Iterator, Optional, Union, TypeVar, Dict, List
 import logging
 import warnings
 
@@ -58,6 +58,9 @@ class DistributedInfo:
 
 _T = TypeVar("_T")
 
+PathOrStr = Union[PathLike, str]
+DatasetReaderInput = Union[PathOrStr, List[PathOrStr], Dict[str, PathOrStr]]
+
 
 class DatasetReader(Registrable):
     """
@@ -178,14 +181,19 @@ def __init__(
         if util.is_distributed():
             self._distributed_info = DistributedInfo(dist.get_world_size(), dist.get_rank())
 
-    def read(self, file_path: Union[PathLike, str]) -> Iterator[Instance]:
+    def read(self, file_path: DatasetReaderInput) -> Iterator[Instance]:
         """
         Returns an iterator of instances that can be read from the file path.
         """
         if not isinstance(file_path, str):
-            file_path = str(file_path)
-
-        for instance in self._multi_worker_islice(self._read(file_path)):
+            if isinstance(file_path, list):
+                file_path = [str(f) for f in file_path]
+            elif isinstance(file_path, dict):
+                file_path = {k: str(v) for k, v in file_path.items()}
+            else:
+                file_path = str(file_path)
+
+        for instance in self._multi_worker_islice(self._read(file_path)):  # type: ignore
             if self._worker_info is None:
                 # If not running in a subprocess, it's safe to apply the token_indexers right away.
                 self.apply_token_indexers(instance)