# Lecture 13 - Coreference Resolution

provided by [Stanford CS224N](https://www.youtube.com/watch?v=rmVRLeJRkl4)

---

<div class="alert alert-block alert-info">
Table of Contents: <br>
    
<ul>
    <li>1. <a href="#1.-Introduction">Introduction</a></li>
    <li>2. <a href="#2.-What-Is-Coreference-Resolution">What is Coreference Resolution?</a></li>
    <li>3. <a href="#3.-Types-of-References">Types of References</a></li>
    <li>4. <a href="#4.-Types-of-Coreference-Models">Types of Coreference Models</a></li>
    <li>5. <a href="#5.-Resource">Resource</a></li>
</ul>
</div>

# 1. Introduction

The structure of this lecture follows:

1. What is coreference resolution?
2. Applications 
3. Mention Detection
4. Types of References (Linguistics)
5. Rule-based Coreference Resolution
6. Mention-pair/Mention-ranking
7. ConvNets for Language
8. SOTA Neural Coreference Systems
9. Results

# 2. What is Coreference Resolution?

__Coreference Resolution__ is the task of identifying all mentions that refer to the same entity in the world. For example: Vincent ate the spaghetti. He thought it was great. Here "he" refers to Vincent and "it" refers to spaghetti.

This sub-field of NLP is one of many cornerstones to teaching a machine how to understand and produce useful language. It has applications in full text understanding, machine translation.

Coreference Resolution is traditionally done in 2 parts: 
* detect the mentions (easy) 
* cluster the mentions (hard)

We can define a __mention__ as a span of text referring to some entity like the following:
* pronouns
* named entities
    * Paris
* noun phrases

For pronouns, we can use a part-of-speech tagger, NER systems for named entities, and a parser for noun phrases.

# 3. Types of References

Specifically, __Coreference__ is when 2 mentions refer to the same entity. A related linguistic concept is an __anaphora__ where a term (_anaphor_) refers to another term (antecedent). For example,

Barack Obama said he would sign the bill.

Here "he" references "Barack Obama".

![image.png](attachment:image.png) <br>
_Figure 1. Coreference vs Anaphora._

So cases where we detect an anaphora _may_ be a coreference. However not all anaphoras are coreferences. 

Every dancer twisted her knee. 

No dancer twisted her knee.

Both of these sentences contain anaphoric relationships but "her knee" doesn't refer to a specific entity. These are called __bridging anaphoras__. If an anaphora is also a coreference, then it is called a __pronominal anaphora__. __Cataphoras__ are the exact opposite of anaphoras. The reference follows after the antecedent.

Coreference and anaphora are just 2 examples of how we build a __discourse model__ of whatever we are listening or reading. 

# 4. Types of Coreference Models

There are 4 types of coreference models:
* rule-based (pronominal anaphora resolution)
* mention-pair
* mention-ranking
* clustering (skipped in this lecture)

__Hobbs' naive algorithm__ is a rule-based model for tackling pronominal anaphora resolution. 

![image.png](attachment:image.png) <br>
![image-2.png](attachment:image-2.png) <br>
_Figure 2. Hobbs' naive algorithm._

There is an additional problem/nuance! Pronominal coreference can also be __knowledge-based__. 

She poured water from the pitcher into the cup until it was full.

She poured water from the pitcher into the cup until it was empty.

Here "it" can mean the pitcher or the cup. The reason you know it is because of world knowledge! These are referred to as __Winograd Schemas__.

Mention-pair is another method for coreference resolution. We can train a binary classifier that assigns every pair of mentions a probability of being coreferent. 

![image.png](attachment:image.png) <br> 
_Figure 3. Mention-pair approach._

We can train it kind of contrastively by maximizing the likelihood of actual coreferences to have a predicted probability of 1 while negative pairs would be minimized.

This approach explodes with more mentions as you can tell! 

Mention-ranking, from what I understand, is an extension of that by adding an NA token in the front so not all mentions need to associated with another mention. Mention-ranking is characterized by assigning each mention its highest scoring candidate antecedent according to the model.

This "model" can be a statistical classifier, neural network or more complex methods.

![image.png](attachment:image.png) <br> 
_Figure 4. Neural network for coreference resolution._

![image.png](attachment:image.png) <br>
_Figure 5. BiLSTM for coreference resolution._

Modern day SOTA methods use BERT!

![image.png](attachment:image.png) <br>
_Figure 6. Different method performances._

# 5. Resource

If you missed the link right below the title, I'm providing the resource here again along with the course website.

- [Stanford CS224N](https://www.youtube.com/watch?v=rmVRLeJRkl4)
- [Course Website](http://web.stanford.edu/class/cs224n/)

This is a series of 23 lectures provided by Stanford.
