Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
VQAv2 (#4639)
Browse files Browse the repository at this point in the history
* albert works, but bert-base-uncased still gives zero gradients

* Note

* Formatting

* Adds Registrable base classes for image operations

* Adds a real example of a image2image module

* Run the new code (without implementation) in the nlvr2 reader

* Solve some issue involving circular imports

* add new modules for vilbert

* add parameters for detectron image loader.

* push current code on implementing proposal generator.

* push current progress on proposal generator

* Update FasterRCNNProposalGenerator & Merge Detectron2 config

* Loading of weights should now work

* black, flake, mypy

* Run detectron pipeline pieces one at a time

This is unfinished and will not run this way.

* Fix the data format for the backbone

* Handle image sizes separately

* remove drop and mask functionality from reader

* make comment better

* remove proposal_embedder, and finish proposal generator

* working on grid embedder

* added simple test for resnet backbone, which passes

* Got proposal generator test passing

* Change default number of detections per image: 100 => 36

* Fix detectron config hierarchy: test_detectron_per_image

* Make number of detections configurable & Add test

* rename ProposalGenerator to RegionDetector

* try to fix makefile

* another attempt at makefile

* quotes in the pip command...

* added a simple test for the dataset reader, made it pass

* add feature caching to the dataset reader

* another try with the makefile

* a better temporary fix for installing detectron

* writing files before committing is good...

* fix tests

* fix (at least part of) the vilbert tests

* ok, this makefile change should actually work

* add torchvision, try to remove eager import of detectron code

* flake

* cleanup

* more cleanup

* mypy, flake

* add back code I shouldn't have removed

* black

* test and flake fixes

* fix region_detector for multiple images and add feature and coords padding

* fix imports

* restore null grid embedder

* add back (todo) null region detector

* Bring back import changes, to fix circular imports caused by NLVR2
reader

* region detector test passing

* model test finally passing

* update torchvision version

* add vqav2 dataset

* add gpu support for detectron feature extraction

* add lmdbCache to cache feature into lmdb database

* fix typo

* update vqa jsonnet

* fix url adding by cat

* Fixes type annotation

* Fixes borked error message

* New feature cache

* Formatting

* Fix the tensor cache

* Be explicit about our dependencies

* Use the new tensor cache

* Adds a test using the tensor cache

* Run NLVR dataprep on GPU

* Tqdm when finding images

* Fixes padding in array field

* Adjust max_length when truncating in PretrainedTransformerTokenizer

* Fewer print statements

* remove VQA from this branch and copy default vilbert parameters.

* add VQAv2 dataset

* Added dataset reader and model tests, which are now passing

* Sanjay's vision features cache script (#4633)

* Use LMDB cache in NLVR2 dataset reader; fix a few typos

* Standalone script for caching image features

* Removing reference to LMDB cache in NLVR2 dataset reader

* Adding back asterisk in nlvr2 dataset reader

* Fixing one variable name mistake

* Decreasing batch size and making a few cuda-related changes

* Loading images in batches to avoid GPU OOM error

* Pedantic changes for consistency

* Run the pre-processing with the models and not the data loading

* Filter out paths of images already cached

* Add image extensions other than png

* Fixes import error

* Makes the vision features script work alongside other scripts or training runs

Co-authored-by: sanjays <sanjays@ip-10-0-0-157.us-west-2.compute.internal>
Co-authored-by: sanjays <sanjays@ip-10-1-10-157.us-west-2.compute.internal>
Co-authored-by: Sanjay Subramanian <sanjays@allennlp-server1.corp.ai2>
Co-authored-by: Sanjay Subramanian <sanjays_ssubramanian@hotmail.com>

* Adds missing imports

* Makes TensorCache into a real MutableMapping

* Formatting

* Changelog

* Fix typecheck

* Makes the NLVR2 reader work with Pete's new code

* Fix type annotation

* Formatting

* Backwards compatibility

* Restore NLVR to former glory

* Types and multi-process reading for VQAv2

* Formatting

* Fix tests

* Fix broken config

* Update grid embedder test

* Fix vilbert_from_huggingface configuration

* Don't run the vilbert_from_huggingface test anymore

* Remove unused test fixtures

* Fix the region detector test

* Fix vilbert-from-huggingface and bring it back

* Fuck the linter

* Fix for VQA test

* Why was this metric disabled?

* Black and flake

* Re-add VQA reader

* Image featurizers now need to be called with sizes

* Run the region detector test on GPU

* Run more stuff on GPU

The CPU test runner doesn't have enough memory.

* Depend on newer version of Detectron

* Reinstall Detectron before running tests

* Just force CUDA to be on, instead of reinstalling Detecton2

* Fixes VQA2 DatasetReader

* Fix documentation

* Detectron needs CUDA_HOME to be set during install

At least this thing fails quickly.

* Try a different way of wrangling the detectron installer

* Try a different way of wrangling the detectron installer

* Bring back amp

* Refactored VQA reader

* More training paths

* Remove debug code

* Don't check in debug code

* Auto-detect GPU to use

* Apply indexers later

* Fix typo

* Register the model

* Fields live on CPU. Only batches get GPUs.

* black

* black, flake

* mypy

* more flake

* More realistic training config

* Adds a basic Predictor for VQAv2

* Make vilbert output human-readable

* Forgot to enumerate

* Use the right namspace

* Trying to make tests faster, and passing

* add image prefix when loading coco image

* fix vqav2 dataset reader and config file

* use two regions, to make tests pass

* black

* Output probabilities in addition to logits

* Make it possible to turn off the cache

* Turn off the cache in the predictor

* Fix the VQA predictor

* change the experiment to the defualt vilbert hyperparams.

* add default experiment_from_huggingface.json

* fix typos in vqa reader

* Proper probabilities

* Formatting

* Remove unused variable

* Make mypy happy

* Fixed loss function, metric, and got tests to pass

* Updates the big training config

* Put real settings into the vilbert_vqa config

* Strings are lists in Python

* Make mypy happy

* Formatting

* Unsatisfying mypy

* Config changes to make this run

* Fix dimensionality of embeddings

* clean the code and add the image_num_heads and combine_num_heads

* fix answer vocab and add save and load from pre-extracted vocab

* fix loss and update save_answer_vocab script

* Typo

* Fixed fusion method

* Tweaking the VQA config some more

* Moved the from_huggingface config

* 20 epochs

* Set up the learning rate properly

* Simplify

* Hardcoded answer vocab

* Don't be lazy

* Steps per epoch cannot be None

* Let's chase the right score

* Fixing some parameter names

* Fields are stored on CPUs

* Bigger batch size, easier distributed training

* Don't run the debug code by default

* VQA with the Transformer Toolkit (#4729)

* transformer toolkit: BertEmbeddings

* transformer toolkit: BertSelfAttention

* transformer toolkit: BertSelfOutput

* transformer toolkit: BertAttention

* transformer toolkit: BertIntermediate

* transformer toolkit: BertOutput

* transformer toolkit: BertLayer

* transformer toolkit: BertBiAttention

* transformer toolkit: BertEmbeddings

* transformer toolkit: BertSelfAttention

* transformer toolkit: BertSelfOutput

* transformer toolkit: BertAttention

* transformer toolkit: BertIntermediate

* transformer toolkit: BertOutput

* transformer toolkit: BertLayer

* transformer toolkit: BertBiAttention

* Attention scoring functions

* merging output and self output

* utility to replicate layers, further cleanup

* adding sinusoidal positional encoding

* adding activation layer

* adding base class for generic loading of pretrained weights

* further generalizing, adding tests

* updates

* adding bimodal encoder, kwargs in from_pretrained_module

* vilbert using transformer toolkit

* fixing test function

* changing to torch.allclose

* fixing attention score api

* bug fix in bimodal output

* changing to older attention modules

* _construct_default_mapping returns mapping

* adding kwargs to _get_input_arguments, adding examples

* using cached_transformers

* making transformer_encoder more general

* added get_relevant_module, loading by name

* fixing constructor name

* undoing failure after merge

* misc minor changes

* Transformer toolkit (#4577)

* transformer toolkit: BertEmbeddings

* transformer toolkit: BertSelfAttention

* transformer toolkit: BertSelfOutput

* transformer toolkit: BertAttention

* transformer toolkit: BertIntermediate

* transformer toolkit: BertOutput

* transformer toolkit: BertLayer

* transformer toolkit: BertBiAttention

* transformer toolkit: BertEmbeddings

* transformer toolkit: BertSelfAttention

* transformer toolkit: BertSelfOutput

* transformer toolkit: BertAttention

* transformer toolkit: BertIntermediate

* transformer toolkit: BertOutput

* transformer toolkit: BertLayer

* transformer toolkit: BertBiAttention

* Attention scoring functions

* merging output and self output

* utility to replicate layers, further cleanup

* adding sinusoidal positional encoding

* adding activation layer

* adding base class for generic loading of pretrained weights

* further generalizing, adding tests

* updates

* adding bimodal encoder, kwargs in from_pretrained_module

* vilbert using transformer toolkit

* fixing test function

* changing to torch.allclose

* fixing attention score api

* bug fix in bimodal output

* changing to older attention modules

* _construct_default_mapping returns mapping

* adding kwargs to _get_input_arguments, adding examples

* using cached_transformers

* making transformer_encoder more general

* added get_relevant_module, loading by name

* fixing constructor name

* undoing failure after merge

* misc minor changes

Co-authored-by: Dirk Groeneveld <dirkg@allenai.org>

* separate num_attention_heads for both modalities, default arguments

* adding tests for toolkit examples

* debug statements for failing test

* removing debug statements, reordering

* Typo

* Some compatibility with the transformer toolkit

* Reorganize the image inputs

* More transformer toolkit compatibility

* Debug settings

* Let's be more tolerant

* Fix how VilBERT runs

Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com>

* Make the region detector and region embedder lazy

* Fix references to the model

* Make various automated tests pass

* Formatting

* More logging

* One more logging statement

* Read answer vocab from vocab file instead of determining it automatically

* Don't keep the files open so long

* Use most of the validation set for training as well

* Get ready to be lazy

* Upgrade paths

* Be lazy

* Keep unanswerable questions only during test time

* Fix the from_huggingface config

* Fixes the VQA score

* VQA specific metric

* Fixes some tests

* Tests pass!

* Formatting

* Use the correct directory

* Use the region detector that's meant for testing

* Read the test split properly

* Be a little more verbose while discovering images

* Modernize Vilbert VQA

* Update NLVR, but it still doesn't run

* Formatting

* Remove NLVR

* Fix the last test

* Formatting

* Conditionally export the VilbertVqaPredictor

* ModuleNotFoundError is a type of ImportError

* Fix test-install

* Try the broken test with a fixed seed

* Try a bunch of seeds

* Smaller model to get bigger magnitudes

* Now that the test works, we don't need to specify the seeds anymore

Co-authored-by: Matt Gardner <mattg@allenai.org>
Co-authored-by: jiasenlu <jiasenlu@gatech.edu>
Co-authored-by: Jaemin Cho <heythisischo@gmail.com>
Co-authored-by: jiasenlu <echosenm@gmail.com>
Co-authored-by: sanjays <sanjays@ip-10-0-0-157.us-west-2.compute.internal>
Co-authored-by: sanjays <sanjays@ip-10-1-10-157.us-west-2.compute.internal>
Co-authored-by: Sanjay Subramanian <sanjays@allennlp-server1.corp.ai2>
Co-authored-by: Sanjay Subramanian <sanjays_ssubramanian@hotmail.com>
Co-authored-by: Akshita Bhagia <akshita23bhagia@gmail.com>
Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com>
  • Loading branch information
11 people committed Nov 23, 2020
1 parent c787230 commit b659e66
Show file tree
Hide file tree
Showing 47 changed files with 1,764 additions and 689 deletions.
9 changes: 8 additions & 1 deletion CHANGELOG.md
Expand Up @@ -10,6 +10,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Added `TensorCache` class for caching tensors on disk
- Added reader for the NLVR2 dataset
- Added cache for Detectron models that we might re-use several times in the code base
- Added abstraction and concrete implementation for image loading
- Added abstraction and concrete implementation for `GridEmbedder`
- Added abstraction and demo implementation for an image augmentation module.
- Added abstraction and concrete implementation for region detectors.
- A new high-performance default `DataLoader`: `MultiProcessDataLoading`.
- A `MultiTaskModel` and abstractions to use with it, including `Backbone` and `Head`. The
`MultiTaskModel` first runs its inputs through the `Backbone`, then passes the result (and
Expand All @@ -33,7 +40,7 @@ dataset at every epoch) and a `MultiTaskScheduler` (for ordering the instances w
- The `DataLoader` now decides whether to load instances lazily or not.
With the `PyTorchDataLoader` this is controlled with the `lazy` parameter, but with
the `MultiProcessDataLoading` this is controlled by the `max_instances_in_memory` setting.
- `TensorField` is now implemented in terms of torch tensors, not numpy.
- `ArrayField` is now called `TensorField`, and implemented in terms of torch tensors, not numpy.


## Unreleased (1.x branch)
Expand Down
9 changes: 6 additions & 3 deletions allennlp/commands/train.py
Expand Up @@ -483,6 +483,9 @@ def _train_worker(
return None


DataPath = Union[str, List[str], Dict[str, str]]


class TrainModel(Registrable):
"""
This class exists so that we can easily read a configuration file with the `allennlp train`
Expand Down Expand Up @@ -554,16 +557,16 @@ def from_partial_objects(
serialization_dir: str,
local_rank: int,
dataset_reader: DatasetReader,
train_data_path: str,
train_data_path: DataPath,
model: Lazy[Model],
data_loader: Lazy[DataLoader],
trainer: Lazy[Trainer],
vocabulary: Lazy[Vocabulary] = Lazy(Vocabulary),
datasets_for_vocab_creation: List[str] = None,
validation_dataset_reader: DatasetReader = None,
validation_data_path: str = None,
validation_data_path: DataPath = None,
validation_data_loader: Lazy[DataLoader] = None,
test_data_path: str = None,
test_data_path: DataPath = None,
evaluate_on_test: bool = False,
batch_weight_key: str = "",
) -> "TrainModel":
Expand Down
6 changes: 6 additions & 0 deletions allennlp/common/testing/model_test_case.py
Expand Up @@ -73,6 +73,7 @@ def ensure_model_can_train_save_and_load(
metric_terminal_value: float = None,
metric_tolerance: float = 1e-4,
disable_dropout: bool = True,
seed: int = None,
):
"""
# Parameters
Expand Down Expand Up @@ -108,6 +109,11 @@ def ensure_model_can_train_save_and_load(
If True we will set all dropout to 0 before checking gradients. (Otherwise, with small
datasets, you may get zero gradients because of unlucky dropout.)
"""
if seed is not None:
random.seed(seed)
numpy.random.seed(seed)
torch.manual_seed(seed)

save_dir = self.TEST_DIR / "save_and_load_test"
archive_file = save_dir / "model.tar.gz"
model = train_model_from_file(param_file, save_dir, overrides=overrides)
Expand Down
23 changes: 20 additions & 3 deletions allennlp/common/util.py
Expand Up @@ -27,6 +27,7 @@
Tuple,
TypeVar,
Union,
Sequence,
)

import numpy
Expand Down Expand Up @@ -143,7 +144,7 @@ def lazy_groups_of(iterable: Iterable[A], group_size: int) -> Iterator[List[A]]:


def pad_sequence_to_length(
sequence: List,
sequence: Sequence,
desired_length: int,
default_value: Callable[[], Any] = lambda: 0,
padding_on_right: bool = True,
Expand Down Expand Up @@ -174,6 +175,7 @@ def pad_sequence_to_length(
padded_sequence : `List`
"""
sequence = list(sequence)
# Truncates the sequence to the desired length.
if padding_on_right:
padded_sequence = sequence[:desired_length]
Expand Down Expand Up @@ -342,8 +344,8 @@ def import_module_and_submodules(package_name: str) -> None:
# Import at top level
try:
module = importlib.import_module(package_name)
except ModuleNotFoundError as err:
if err.name in ("detectron2", "torchvision"):
except ImportError as err:
if err.name in {"detectron2", "torchvision"}:
logger.warning(
"vision module '%s' is unavailable since '%s' is not installed",
package_name,
Expand Down Expand Up @@ -651,6 +653,21 @@ def format_size(size: int) -> str:
return f"{size}B"


def nan_safe_tensor_divide(numerator, denominator):
"""Performs division and handles divide-by-zero.
On zero-division, sets the corresponding result elements to zero.
"""
result = numerator / denominator
mask = denominator == 0.0
if not mask.any():
return result

# remove nan
result[mask] = 0.0
return result


def shuffle_iterable(i: Iterable[T], pool_size: int = 1024) -> Iterable[T]:
import random

Expand Down
2 changes: 1 addition & 1 deletion allennlp/data/data_loaders/multi_process_data_loader.py
Expand Up @@ -81,7 +81,7 @@ class MultiProcessDataLoader(DataLoader):
max_instances_in_memory: `int`, optional (default = `None`)
If not specified, all instances will be read and cached in memory for the duration
of the data loader's life. This is generally ideal when your data can fit in memory
during training. However, when you're datasets are too big, using this option
during training. However, when your datasets are too big, using this option
will turn on lazy loading, where only `max_instances_in_memory` instances are processed
at a time.
Expand Down
2 changes: 1 addition & 1 deletion allennlp/data/dataset_readers/__init__.py
Expand Up @@ -20,7 +20,7 @@
from allennlp.data.dataset_readers.text_classification_json import TextClassificationJsonReader

try:
from allennlp.data.dataset_readers.nlvr2 import Nlvr2Reader
from allennlp.data.dataset_readers.vqav2 import VQAv2Reader
except ModuleNotFoundError as err:
if err.name not in ("detectron2", "torchvision"):
raise
18 changes: 13 additions & 5 deletions allennlp/data/dataset_readers/dataset_reader.py
@@ -1,7 +1,7 @@
from dataclasses import dataclass
import itertools
from os import PathLike
from typing import Iterable, Iterator, Optional, Union, TypeVar
from typing import Iterable, Iterator, Optional, Union, TypeVar, Dict, List
import logging
import warnings

Expand Down Expand Up @@ -58,6 +58,9 @@ class DistributedInfo:

_T = TypeVar("_T")

PathOrStr = Union[PathLike, str]
DatasetReaderInput = Union[PathOrStr, List[PathOrStr], Dict[str, PathOrStr]]


class DatasetReader(Registrable):
"""
Expand Down Expand Up @@ -178,14 +181,19 @@ def __init__(
if util.is_distributed():
self._distributed_info = DistributedInfo(dist.get_world_size(), dist.get_rank())

def read(self, file_path: Union[PathLike, str]) -> Iterator[Instance]:
def read(self, file_path: DatasetReaderInput) -> Iterator[Instance]:
"""
Returns an iterator of instances that can be read from the file path.
"""
if not isinstance(file_path, str):
file_path = str(file_path)

for instance in self._multi_worker_islice(self._read(file_path)):
if isinstance(file_path, list):
file_path = [str(f) for f in file_path]
elif isinstance(file_path, dict):
file_path = {k: str(v) for k, v in file_path.items()}
else:
file_path = str(file_path)

for instance in self._multi_worker_islice(self._read(file_path)): # type: ignore
if self._worker_info is None:
# If not running in a subprocess, it's safe to apply the token_indexers right away.
self.apply_token_indexers(instance)
Expand Down

0 comments on commit b659e66

Please sign in to comment.