<a href="https://colab.research.google.com/github/Giochen/google_code/blob/main/Learning_to_Ignore_Long_Document_Coreference_with_Bounded_Memory_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This colab notebook performs inference with a model trained on LitBank described in our EMNLP 2020 paper [Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks](https://www.aclweb.org/anthology/2020.emnlp-main.685.pdf)

### Clone Github repo

In [None]:
%%capture
! git clone https://github.com/shtoshni92/long-doc-coref.git

### Install Relevant Libraries

In [None]:
%%capture
! pip install torch==1.6.0
! pip install transformers==4.2.2
! pip install scipy==1.4.1

### Download Pretrained Models

The pretrained models are released [here](https://drive.google.com/drive/folders/1UFhkrlBP-O2MeaxVygZcuP9RWuglOTmN?usp=sharing). In this example we will download one of the LitBank models, specifically the litbank_lbmem_val_3_dev_77.3 one - A LB-MEM model with 20 cells which is trained on cross validation split 3 of LitBank

In [None]:
!gdown --id 1PKlFab387j_1GnYA9E4lq-8nQ9csEeAL

Downloading...
From: https://drive.google.com/uc?id=1PKlFab387j_1GnYA9E4lq-8nQ9csEeAL
To: /content/model.pth
187MB [00:01, 119MB/s]


In [None]:
ls

[0m[01;34mlong-doc-coref[0m/  model.pth  [01;34msample_data[0m/


### Inference on Sample Text

Now that we have the code and the pretrained model, time to test the model on some sample data. 

In [None]:
import sys
sys.path.append('long-doc-coref/src')

# This will also download the SpanBERT model finetuned for Coreference (by Joshi et al, 2020) from Huggingface
from inference.inference import Inference
inference_model = Inference("model.pth")

{'base_data_dir': '/share/data/speech/shtoshni/research/litbank_coref/data', 'base_model_dir': '/share/data/speech/shtoshni/research/litbank_coref/models', 'dataset': 'litbank', 'conll_scorer': '../lrec2020-coref/reference-coreference-scorers/scorer.pl', 'model_size': 'large', 'doc_enc': 'overlap', 'pretrained_bert_dir': '/share/data/speech/shtoshni/resources', 'max_segment_len': 512, 'max_span_width': 20, 'ment_emb': 'attn', 'top_span_ratio': 0.3, 'mem_type': 'learned', 'num_cells': 20, 'mlp_size': 3000, 'mlp_depth': 1, 'entity_rep': 'wt_avg', 'emb_size': 20, 'cross_val_split': 3, 'new_ent_wt': 2.0, 'num_train_docs': None, 'max_training_segments': 5, 'sample_invalid': 0.25, 'dropout_rate': 0.3, 'label_smoothing_wt': 0.0, 'max_epochs': 25, 'seed': 0, 'init_lr': 0.0002, 'no_singletons': False, 'eval': False, 'slurm_id': '6077327_172', 'model_dir': '/share/data/speech/shtoshni/research/litbank_coref/models/coref_aff65ce80c7eefcce3c2451b554e1e68', 'best_model_dir': '/share/data/speech/sht

### Sample Doc

Here's an excerpt from [The War of the Worlds](https://en.wikipedia.org/wiki/The_War_of_the_Worlds). The LitBank annotations for the doc are visualized [here](https://ttic.uchicago.edu/~shtoshni/coref/litbank_html/36_the_war_of_the_worlds.html) -- the excerpt is just the prefix (The doc was part of the dev set for the LitBank model we are using).


In [None]:
doc = """
    BOOK ONE THE COMING OF THE MARTIANS CHAPTER ONE THE EVE OF THE WAR No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely by intelligences greater than man 's and yet as mortal as his own ; that as men busied themselves about their various concerns they were scrutinised and studied , perhaps almost as narrowly as a man with a microscope might scrutinise the transient creatures that swarm and multiply in a drop of water .
    With infinite complacency men went to and fro over this globe about their little affairs , serene in their assurance of their empire over matter .
    It is possible that the infusoria under the microscope do the same .
    No one gave a thought to the older worlds of space as sources of human danger , or thought of them only to dismiss the idea of life upon them as impossible or improbable .
    It is curious to recall some of the mental habits of those departed days .
    At most terrestrial men fancied there might be other men upon Mars , perhaps inferior to themselves and ready to welcome a missionary enterprise .
    Yet across the gulf of space , minds that are to our minds as ours are to those of the beasts that perish , intellects vast and cool and unsympathetic , regarded this earth with envious eyes , and slowly and surely drew their plans against us .
    And early in the twentieth century came the great disillusionment .
    The planet Mars , I scarcely need remind the reader , revolves about the sun at a mean distance of 140,000,000 miles , and the light and heat it receives from the sun is barely half of that received by this world .
    It must be , if the nebular hypothesis has any truth , older than our world ; and long before this earth ceased to be molten , life upon its surface must have begun its course .
    The fact that it is scarcely one seventh of the volume of the earth must have accelerated its cooling to the temperature at which life could begin .
    It has air and water and all that is necessary for the support of animated existence .
    Yet so vain is man , and so blinded by his vanity , that no writer , up to the very end of the nineteenth century , expressed any idea that intelligent life might have developed there far , or indeed at all , beyond its earthly level .
    Nor was it generally understood that since Mars is older than our earth , with scarcely a quarter of the superficial area and remoter from the sun , it necessarily follows that it is not only more distant from time 's beginning but nearer its end .
    The secular cooling that must someday overtake our planet has already gone far indeed with our neighbour .
    Its physical condition is still largely a mystery , but we know now that even in its equatorial region the midday temperature barely approaches that of our coldest winter .
    Its air is much more attenuated than ours , its oceans have shrunk until they cover but a third of its surface , and as its slow seasons change huge snowcaps gather and melt about either pole and periodically inundate its temperate zones .
    That last stage of exhaustion , which to us is still incredibly remote , has become a present-day problem for the inhabitants of Mars .
    """

In [None]:
output = inference_model.perform_coreference(doc)

Token indices sequence length is longer than the specified maximum sequence length for this model (679 > 512). Running this sequence through the model will result in indexing errors


In [None]:
for cluster in output["clusters"]:
  print(cluster)

[((9, 13), 'THE EVE OF THE WAR')]
[((5, 6), 'THE MARTIANS')]
[((71, 75), 'a man with a microscope')]
[((12, 13), 'THE WAR')]
[((14, 15), 'No one')]
[((181, 183), 'most terrestrial men')]
[((407, 408), 'no writer')]
[((397, 397), 'man'), ((403, 403), 'his')]
[((547, 548), 'its oceans'), ((552, 552), 'they')]
[((578, 580), 'its temperate zones')]
[((28, 29), 'this world'), ((100, 101), 'this globe'), ((237, 238), 'this earth'), ((308, 309), 'this world'), ((325, 326), 'our world'), ((331, 332), 'this earth'), ((360, 361), 'the earth'), ((453, 454), 'our earth')]
[((40, 40), 'man'), ((48, 48), 'his')]
[((53, 53), 'men'), ((55, 55), 'themselves'), ((57, 57), 'their'), ((60, 60), 'they'), ((94, 94), 'men'), ((103, 103), 'their'), ((109, 109), 'their'), ((112, 112), 'their'), ((248, 248), 'their')]
[((524, 526), 'its equatorial region')]
[((130, 131), 'No one')]
[((136, 140), 'the older worlds of space'), ((150, 150), 'them'), ((159, 159), 'them')]
[((604, 607), 'the inhabitants of Mars')]
[

#### Remove Singletons


In [None]:
for cluster in output["clusters"]:
  if len(cluster) > 1:
    print(cluster)

[((397, 397), 'man'), ((403, 403), 'his')]
[((547, 548), 'its oceans'), ((552, 552), 'they')]
[((28, 29), 'this world'), ((100, 101), 'this globe'), ((237, 238), 'this earth'), ((308, 309), 'this world'), ((325, 326), 'our world'), ((331, 332), 'this earth'), ((360, 361), 'the earth'), ((453, 454), 'our earth')]
[((40, 40), 'man'), ((48, 48), 'his')]
[((53, 53), 'men'), ((55, 55), 'themselves'), ((57, 57), 'their'), ((60, 60), 'they'), ((94, 94), 'men'), ((103, 103), 'their'), ((109, 109), 'their'), ((112, 112), 'their'), ((248, 248), 'their')]
[((136, 140), 'the older worlds of space'), ((150, 150), 'them'), ((159, 159), 'them')]
[((182, 183), 'terrestrial men'), ((196, 196), 'themselves')]
[((191, 191), 'Mars'), ((264, 266), 'The planet Mars'), ((296, 296), 'it'), ((311, 311), 'It'), ((340, 340), 'its'), ((351, 351), 'it'), ((365, 365), 'its'), ((376, 376), 'It'), ((429, 429), 'there'), ((438, 438), 'its'), ((449, 449), 'Mars'), ((474, 474), 'it'), ((505, 506), 'our neighbour'), ((50