Skip to content
Permalink
Browse files

Remove deprecated SpanPruner (#2248)

* Remove deprecated SpanPruner

* Remove mentions of spanpruner

* Update tutorial to replace num_spans_to_keep with num_items_to_keep
  • Loading branch information...
nelson-liu authored and matt-gardner committed Jan 1, 2019
1 parent a98481c commit 3cff7d35714aaaa6db5cd1d9a46a338b945fb092
@@ -23,9 +23,3 @@
from allennlp.modules.attention import Attention
from allennlp.modules.input_variational_dropout import InputVariationalDropout
from allennlp.modules.bimpm_matching import BiMpmMatching

def SpanPruner(*args, **kwargs) -> Pruner: # pylint: disable=invalid-name
import warnings
warnings.warn("``SpanPruner`` was renamed to ``Pruner`` in version 0.6.2. It will be removed "
"in version 0.8.", DeprecationWarning)
return Pruner(*args, **kwargs)
@@ -67,7 +67,7 @@ def forward(self, # pylint: disable=arguments-differ
scores = self._scorer(embeddings)

if scores.size(-1) != 1 or scores.dim() != 3:
raise ValueError(f"The scorer passed to SpanPruner must produce a tensor of shape"
raise ValueError(f"The scorer passed to Pruner must produce a tensor of shape"
f"(batch_size, num_items, 1), but found shape {scores.size()}")
# Make sure that we don't select any masked items by setting their scores to be very
# negative. These are logits, typically, so -1e20 should be plenty negative.

This file was deleted.

Oops, something went wrong.
@@ -26,9 +26,6 @@
'allennlp.modules.alternating_highway_lstm',
# Private base class, no docs needed.
'allennlp.modules.encoder_base',
# Deprecated module name (renamed to allennlp.modules.pruner). This can be removed once the
# module is removed (probably in version 0.8).
'allennlp.modules.span_pruner',
# Moved to dataset_readers/semantic_parsing. TODO(Mark): remove in version 0.8.
'allennlp.data.dataset_readers.atis',
'allennlp.data.dataset_readers.nlvr',
@@ -111,20 +111,20 @@ assert spans == [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3)]
There are other helpful functions in `allennlp.data.dataset_readers.dataset_utils.span_utils`,
such as a function to convert between BIO labelings and span-based representations.

### Use a SpanPruner
### Use a Pruner

It's not always possible to prune spans before they enter your model. AllenNLP contains
a [`SpanPruner`](https://github.com/allenai/allennlp/blob/741ea01e50cfbda2d890110adea41e9141ed46f7/allennlp/modules/span_pruner.py#L8), which allows you to prune spans based on a parameterized function which
a [`Pruner`](https://github.com/allenai/allennlp/blob/3f0953d19de3676ea82e642659fc96d90690e34d/allennlp/modules/pruner.py#L8), which allows you to prune spans based on a parameterized function which
is trained end-to-end with the rest of your model.

```python
import torch
from torch.autograd import Variable
from allennlp.modules import SpanPruner
from allennlp.modules import Pruner
# Create a linear layer which will score our spans.
linear_scorer = torch.nn.Linear(5, 1)
pruner = SpanPruner(scorer=linear_scorer)
pruner = Pruner(scorer=linear_scorer)
# Here we'll create some spans from a random tensor of shape
# (batch_size, num_spans, embedding_size). Typically this would
@@ -135,28 +135,28 @@ mask = Variable(torch.ones([3, 4]))
# There's quite a bit to unpack here.
# See below for a full explanation.
pruned_embeddings, pruned_mask, pruned_indices, pruned_scores = pruner(spans, mask, num_spans_to_keep=3)
pruned_embeddings, pruned_mask, pruned_indices, pruned_scores = pruner(spans, mask, num_items_to_keep=3)
```

A `SpanPruner` has four return values:
A `Pruner` has four return values:

1. First, we've got our `pruned_embeddings`.
These are of shape `(batch_size, num_spans_to_keep, embedding_size)`
These are of shape `(batch_size, num_items_to_keep, embedding_size)`
The spans we kept correspond to the top k with respect to the parameterized
span scorer. The other spans just get discarded, and your eventual loss
function for your model won't be a function of the discarded spans!

2. Secondly, we've got the `pruned_mask`, which has shape `(batch_size, num_spans_to_keep)`.
2. Secondly, we've got the `pruned_mask`, which has shape `(batch_size, num_items_to_keep)`.
In 99% of cases, this will be all ones. However, if you have masked spans in a
batch element, and you request that the `SpanPruner` keeps more than the number
batch element, and you request that the `Pruner` keeps more than the number
of non-masked spans, there will be some masked elements in the returned spans.

3. Thirdly, we have the `pruned_indices` which has shape `(batch_size, num_spans_to_keep)` which are the indices of the top k scoring spans in the original ``spans`` tensor.
3. Thirdly, we have the `pruned_indices` which has shape `(batch_size, num_items_to_keep)` which are the indices of the top k scoring spans in the original ``spans`` tensor.
This is returned because it can be useful to retain pointers to the original spans,
if each span is being scored by multiple distinct scorers, such as in the co-reference
model, for instance.

4. Finally, we have the `pruned_scores`, which has shape `(batch_size, num_spans_to_keep, 1)`.
4. Finally, we have the `pruned_scores`, which has shape `(batch_size, num_items_to_keep, 1)`.
This is returned so that you can incorporate the scores of the spans into some loss function.

## Existing AllenNLP examples for generating `SpanFields`
@@ -171,4 +171,4 @@ classify whether or not they are constituents in a constitutency parse of the se
Currently, both the [Coreference Model](https://github.com/allenai/allennlp/blob/741ea01e50cfbda2d890110adea41e9141ed46f7/allennlp/models/coreference_resolution/coref.py#L173)
and the [Span Based Constituency Parser](https://github.com/allenai/allennlp/blob/741ea01e50cfbda2d890110adea41e9141ed46f7/allennlp/models/constituency_parser.py#L162)
use span representations from the output of bi-directional LSTMs. Take a look and see how they're used in
a model context!
a model context!

0 comments on commit 3cff7d3

Please sign in to comment.
You can’t perform that action at this time.