# Lesson notebook 9 - Coreference Resolution



### Resolution with NeuralCoref

We'll use SpaCy again, a pretrained open source language processing pipeline.  It provides a platform for processing text in a number of ways without having to perform any fine-tuning or training.  It can also be trained or fine-tuned.

We'll use it to demonstrate SpaCy's coreference resolution capabilities out of the box.  Take a look at the coreference clusters that it finds.  How well do you think it performs?

### Resolution Experiment with BERT Embeddings


<a id = 'returnToTop'></a>

## Notebook Contents

  * 1. [Online Demo](#onlineDemo)
  * 2. [Setup](#spacySetup)
  * 3. [Coreference Resolution Examples](#spacyCoref)
  * 4. [Classroom Exercise](#exercise)
    * 4.1 [Coref Resolution via Contextualized BERT Embeddings](#corefBERT)
  * 5. [Answers](#answers)      










[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datasci-w266/2022-fall-main/blob/master/materials/lesson_notebooks/lesson_9_CoreferenceResolution.ipynb)

[Return to Top](#returnToTop)  
<a id = 'onlineDemo'></a>

## 1. Online Demo


Run a visual example of coreference resolution [here](https://huggingface.co/coref/) without using this notebook.  If you have debug checked in the upper right corner, the display includes the scores for each of the possible coreference links.  This nicely illustrates the approach of performing the **pairwise comparison of all spans or mentions** and then only keeping the high scoring pairs to aggregate in to clusters. 

[Return to Top](#returnToTop)  
<a id = 'spacySetup'></a>

## 2. Setup

Here is a link to the [repo](https://github.com/huggingface/neuralcoref) for the nerualcoref code we'll be running in this lesson notebook. 

This coref code only works with Spacy version 2.1.  Therefore we are making it available in a spearate notebook so that there are no collisions between separate versions of SpaCy.

**Note**: There is a newly released experiment that updates the coref capability of SpaCy in 3.4.x based on this [paper](https://aclanthology.org/2021.emnlp-main.605.pdf) that significantly speeds up the clustering of coreference mentions (words and/or spans). You can see it here [https://github.com/explosion/spacy-experimental/releases/tag/v0.6.0](https://github.com/explosion/spacy-experimental/releases/tag/v0.6.0) if you want to try the experimental software.

In [1]:
!pip install -q -U spacy==2.1

[K     |████████████████████████████████| 27.7 MB 1.5 MB/s 
[K     |████████████████████████████████| 2.1 MB 56.8 MB/s 
[K     |████████████████████████████████| 82 kB 308 kB/s 
[K     |████████████████████████████████| 184 kB 59.6 MB/s 
[K     |████████████████████████████████| 3.2 MB 54.3 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 3.4.0 requires spacy<3.5.0,>=3.4.0, but you have spacy 2.1.0 which is incompatible.
confection 0.0.3 requires srsly<3.0.0,>=2.4.0, but you have srsly 1.0.5 which is incompatible.
altair 4.2.0 requires jsonschema>=3.0, but you have jsonschema 2.6.0 which is incompatible.[0m
[?25h

In [2]:
!pip install -q neuralcoref

[K     |████████████████████████████████| 286 kB 25.5 MB/s 
[K     |████████████████████████████████| 132 kB 53.0 MB/s 
[K     |████████████████████████████████| 79 kB 7.5 MB/s 
[K     |████████████████████████████████| 9.2 MB 35.3 MB/s 
[K     |████████████████████████████████| 127 kB 21.9 MB/s 
[?25h

In [3]:
!python -m spacy download en_core_web_sm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en_core_web_sm==2.1.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz (11.1 MB)
[K     |████████████████████████████████| 11.1 MB 26.1 MB/s 
[?25hBuilding wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... [?25l[?25hdone
  Created wheel for en-core-web-sm: filename=en_core_web_sm-2.1.0-py3-none-any.whl size=11074433 sha256=584ca8d3a31ac46be9782110d6b95f5f7b7c1631569ef8d977317eb122c9e937
  Stored in directory: /tmp/pip-ephem-wheel-cache-ym0j2571/wheels/59/4f/8c/0dbaab09a776d1fa3740e9465078bfd903cc22f3985382b496
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
  Attempting uninstall: en-core-web-sm
    Found existing installation: en-core-web-sm 3.4.0
    Uninstalling en-core-web-sm-3.4.0:
      Successfully uninstalled en-cor

In [4]:
# Load a SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('en_core_web_sm')

In [5]:
# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref.add_to_pipe(nlp)

100%|██████████| 40155833/40155833 [00:00<00:00, 49374404.45B/s]


<spacy.lang.en.English at 0x7f41ce5862d0>

[Return to Top](#returnToTop)  
<a id = 'spacyCoref'></a>

## 3. Coreference Resolution Examples

Here is an example that includes two characters with multiple references to each.  If the system works properly we would expect it to produce two clusters of mentions, one for the sister and one for the dog.

In [6]:
# You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp(u'My sister has a dog. She loves him. He worships the ground she walks upon.')

doc._.has_coref
doc._.coref_clusters

[My sister: [My sister, She, she], a dog: [a dog, him, He]]

Here are the sentences used in the live session slides about Abraham Lincoln.  

In [7]:
doc = nlp(u'On the afternoon of November 19, 1863, Lincoln went to Gettysburg. He gave his famous speech there. It has been recognized as one of the great speeches of American history.')

doc._.has_coref
doc._.coref_clusters

[Lincoln: [Lincoln, He, his], his famous speech: [his famous speech, It]]

The system is definetly not perfect.  Coreference resolution is a very challenging problem.  Let's give it a harder example.  We'll still have two characters -- the Bond villain Blofeld and his cat. If the system works perfectly, it should generate two clusters -- one for Blofeld and one for the cat.  The Blofeld cluster should contain *Blofeld*, *he*, and *the villain*. The cat cluster should contain *cat* and *her*.

In [8]:
doc = nlp(u'Ernst Blofeld has a cat. He loves her. The villain has always been fond of animals.')
doc._.has_coref
doc._.coref_clusters

[Ernst Blofeld: [Ernst Blofeld, He, her]]

Coreference resolution is a very difficult problem so it isn't surprising that the model gets some things wrong.  It doesn't create a cluster for the cat.

## 4. Classroom Exercise

Let's try some more experiments with coreference resolution.  There's a test called the [Winograd schema challenge](https://en.wikipedia.org/wiki/Winograd_schema_challenge) that creates a sentence where a change in one word changes the pronoun reference.  For example, in this sentence:


> The city councilmen refused the demonstrators a permit because *they* **feared/advocated** violence.



if we use the verb **feared** then *they* refers back to councilmen. However, if we use the verb **advocated** then *they* refers back to demonstrators.

Let's see how well SpaCy does with these challenging examples.

In [9]:
doc = nlp(u'The lion saw the fish and it pounced.')
doc._.has_coref
doc._.coref_clusters

[The lion: [The lion, it]]

In [10]:
doc = nlp(u'The lion saw the fish and it was swimming.')
doc._.has_coref
doc._.coref_clusters

[The lion: [The lion, it]]

It looks like this neuralcoref model matchs pronouns by type (person vs object), gender, and number (singular vs plural). But if there are two nouns that match, it seems to keep picking the first one, regardless of the rest of the sentence context.

[Return to Top](#returnToTop)  
<a id = 'corefBERT'></a>

### 4.1 Coref Resolution via contextualized BERT embeddings

What if we tried to use contextualized embeddings more directly to solve this problem? Would a contextualized embedding for "it" be more similar to the contextualized embedding for "lion" in the first sentence, and for "fish" in the second?

We could try using the embeddings that come out of a pre-trained BERT model. We aren't fine-tuning them for this task, so they probably won't work super well. But we might be able to see a bigger difference in predicted corefs based on meaningful changes in the sentence context.

In [11]:
!pip install -q transformers

[K     |████████████████████████████████| 5.3 MB 25.9 MB/s 
[K     |████████████████████████████████| 163 kB 60.9 MB/s 
[K     |████████████████████████████████| 7.6 MB 51.2 MB/s 
[?25h

In [12]:
import numpy as np
from scipy.spatial.distance import cosine

In [13]:
from transformers import TFBertModel, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertModel.from_pretrained('bert-base-uncased')

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/536M [00:00<?, ?B/s]

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


Let's create a function to use contextualized embeddings from a pre-trained BERT model, and pick the closest noun by cosine similarity to the pronoun. It's very simple and there's no fine-tuning for the coref task, so it doesn't work for all cases but illustrates another approach to the task. (It gets things wrong that neural coref does right, because we only check for nouns using SpaCy, not for the other things like person, gender, and number, though those could be added as rules too.)

In [14]:
def find_pronoun_coref(text, pronoun):
    bert_tokens = tokenizer.tokenize(text)
    pronoun_loc = bert_tokens.index(pronoun)

    spacy_doc = nlp(text)
    bert_tokens_pos = []
    for spacy_tok in spacy_doc:
        bert_toks = tokenizer.tokenize(spacy_tok.text)
        for bert_tok in bert_toks:
            bert_tokens_pos.append(spacy_tok.pos_)
    
    input_ids = tokenizer.convert_tokens_to_ids(bert_tokens)
    bert_context_embeds = model.predict(np.array([input_ids]))[0]

    nouns_dist_to_pronoun = [(bert_tokens[i],
                              cosine(bert_context_embeds[0, i, :],
                                     bert_context_embeds[0, pronoun_loc, :]))
                             for i in range(len(bert_tokens))
                             if i != pronoun_loc and bert_tokens_pos[i] in {'NOUN', 'PROPN'}]
    closest_noun, closest_dist = sorted(nouns_dist_to_pronoun, key=lambda x: x[1])[0]
    return closest_noun, closest_dist

Now we can run some Winograd schema challenge examples through the function and see how well it works.

In [15]:
find_pronoun_coref('The lion saw the fish and it pounced.', 'it')



('lion', 0.2446916699409485)

In [16]:
find_pronoun_coref('The lion saw the fish and it was swimming.', 'it')



('fish', 0.23219847679138184)

In [17]:
find_pronoun_coref('The fisherman hooked a big fish but he lost it.', 'he')



('fisherman', 0.18205267190933228)

In [18]:
find_pronoun_coref('The fisherman hooked a big fish but he swam away.', 'he')



('fish', 0.185979425907135)

In [19]:
find_pronoun_coref('The girls ate the apples because they were hungry.', 'they')



('girls', 0.2043570876121521)

In [20]:
find_pronoun_coref('The girls ate the apples because they were ripe.', 'they')



('apples', 0.23722773790359497)

Try to come up with other examples that involve an ambiguous pronoun and that BERT contextualized embeddings get right?