Transformer toolkit (#4577)

* transformer toolkit: BertEmbeddings * transformer toolkit: BertSelfAttention * transformer toolkit: BertSelfOutput * transformer toolkit: BertAttention * transformer toolkit: BertIntermediate * transformer toolkit: BertOutput * transformer toolkit: BertLayer * transformer toolkit: BertBiAttention * transformer toolkit: BertEmbeddings * transformer toolkit: BertSelfAttention * transformer toolkit: BertSelfOutput * transformer toolkit: BertAttention * transformer toolkit: BertIntermediate * transformer toolkit: BertOutput * transformer toolkit: BertLayer * transformer toolkit: BertBiAttention * Attention scoring functions * merging output and self output * utility to replicate layers, further cleanup * adding sinusoidal positional encoding * adding activation layer * adding base class for generic loading of pretrained weights * further generalizing, adding tests * updates * adding bimodal encoder, kwargs in from_pretrained_module * vilbert using transformer toolkit * fixing test function * changing to torch.allclose * fixing attention score api * bug fix in bimodal output * changing to older attention modules * _construct_default_mapping returns mapping * adding kwargs to _get_input_arguments, adding examples * using cached_transformers * making transformer_encoder more general * added get_relevant_module, loading by name * fixing constructor name * undoing failure after merge * misc minor changes Co-authored-by: Dirk Groeneveld <dirkg@allenai.org>
allenai · Oct 10, 2020 · 91631ef · 91631ef
1 parent 677a9ce
commit 91631ef
Show file tree

Hide file tree

Showing 33 changed files with 2,371 additions and 833 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -23,6 +23,7 @@ data loaders.  Those are coming soon.
 - Added abstraction and concrete implementation for `GridEmbedder`
 - Added abstraction and demo implementation for an image augmentation module.
 - Added abstraction and concrete implementation for region detectors.
+- Transformer toolkit to plug and play with modular components of transformer architectures.
 
 ### Changed
 
@@ -126,7 +127,7 @@ data loaders.  Those are coming soon.
 
 ### Added
 
-- `Predictor.capture_model_internals()` now accepts a regex specifying which modules to capture
+- `Predictor.capture_model_internals()` now accepts a regex specifying which modules to capture.
 
 
 ## [v1.1.0rc4](https://github.com/allenai/allennlp/releases/tag/v1.1.0rc4) - 2020-08-20

diff --git a/allennlp/common/testing/__init__.py b/allennlp/common/testing/__init__.py
@@ -10,6 +10,8 @@
 from allennlp.common.testing.model_test_case import ModelTestCase
 from allennlp.common.testing.distributed_test import run_distributed_test
 
+from allennlp.modules.transformer import TransformerModule
+
 from allennlp.training.metrics import Metric
 
 
@@ -100,3 +102,27 @@ def global_distributed_metric(
         atol = exact[1]
 
     assert_metrics_values(metrics, desired_values, rtol, atol)  # type: ignore
+
+
+def assert_equal_parameters(
+    old_module: torch.nn.Module,
+    new_module: TransformerModule,
+    ignore_missing: bool = False,
+    mapping: Optional[Dict] = None,
+):
+    """
+    Tests if the parameters present in the `new_module` are equal to the ones in `old_module`.
+    Note that any parameters present in the `old_module` that are not present in `new_module`
+    are ignored.
+    """
+    mapping = mapping or {}
+
+    old_parameters = dict(old_module.named_parameters())
+
+    for name, parameter in new_module.named_parameters():
+        for key, val in mapping.items():
+            name = name.replace(key, val)
+        if ignore_missing:
+            if name not in old_parameters:
+                continue
+        assert torch.allclose(old_parameters[name], parameter)