In [None]:
#!pip install spacy==2.1.0
!python -m spacy download en_core_web_sm
!pip install neuralcoref

import spacy
import neuralcoref

nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp)

# Lab 9: Coreference

For the ninth practical of the subject, 

The statement is:

1. Consider the first paragraph in Alice’s Adventures in Wonderland, by Lewis Carroll:

```
Alice was beginning to get very tired of sitting by her sister on the bank, 
and of having nothing to do: once or twice she had peeped into the book her 
sister was reading, but it had no pictures or conversations in it, ‘and what 
is the use of a book,’ thought Alice ‘without pictures or conversations?’
```

2. Apply the spaCy coreference solver to the previous paragraph.
3. Show the coreference chains.
4. What do you think about them? Justify your answer.

In [None]:
first_paragraph = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?'"
doc = nlp(first_paragraph)

In [None]:
# Has any coreference has been resolved in the Doc?
doc._.has_coref

True

In [None]:
# All the clusters of corefering mentions in the Doc
for cluster in doc._.coref_clusters:
  print('Entity:', cluster[0])
  print('Coreferences to entity:', cluster[0:])
  print()

Entity: Alice
Coreferences to entity: [Alice, her, she, her, Alice ']

Entity: her sister
Coreferences to entity: [her sister, her sister]

Entity: it
Coreferences to entity: [it, it]



In [None]:
# Increase the greedyness to coreference 'it' better
nlp.remove_pipe("neuralcoref")
neuralcoref.add_to_pipe(nlp, greedyness = 0.52)
doc = nlp(first_paragraph)
# All the clusters of corefering mentions in the Doc
for cluster in doc._.coref_clusters:
  print('Entity:', cluster[0])
  print('Coreferences to entity:', cluster[0:])
  print()

Entity: Alice
Coreferences to entity: [Alice, her, she, her, Alice ']

Entity: her sister
Coreferences to entity: [her sister, her sister]

Entity: the book her sister was reading
Coreferences to entity: [the book her sister was reading, it, it]



In [None]:
# testing with an increased greedyness
nlp.remove_pipe("neuralcoref")
neuralcoref.add_to_pipe(nlp, greedyness = 0.56)
doc = nlp(first_paragraph)
# All the clusters of corefering mentions in the Doc
for cluster in doc._.coref_clusters:
  print('Entity:', cluster[0])
  print('Coreferences to entity:', cluster[0:])
  print()

Entity: Alice
Coreferences to entity: [Alice, her, she, her, Alice, Alice ']

Entity: sitting by her sister on the bank
Coreferences to entity: [sitting by her sister on the bank, the bank]

Entity: her sister
Coreferences to entity: [her sister, her sister]

Entity: the book her sister was reading
Coreferences to entity: [the book her sister was reading, it, it]



In [None]:
# testing with increased greediness
nlp.remove_pipe("neuralcoref")
neuralcoref.add_to_pipe(nlp, greedyness = 0.6)
doc = nlp(first_paragraph)
# All the clusters of corefering mentions in the Doc
for cluster in doc._.coref_clusters:
  print('Entity:', cluster[0])
  print('Coreferences to entity:', cluster[0:])
  print()

Entity: Alice
Coreferences to entity: [Alice, sitting by her sister on the bank, her, her sister, the bank, having nothing to do, nothing to do, she, the book her sister was reading, her, her sister, it, no pictures or conversations, conversations, it, the use of a book, a book, Alice, Alice ', pictures or conversations, conversations]



## Conclusions

Corerefence is useful in determining which mentions in a discourse refer to
the same real world entity, property or situation. This can be applied in many areas of NLP, for example, it is required to fully understand a text in reading machines, extract information, automatic summarization, and question answering.

In the STS task, this might not seem as useful, since usually we only have one sentence and the corefrences are usually resolved from previous context. However, it might be the case that in one sentence the entity appears more than once, and, in the other sentence, it appears once and then it appears as pronouns appears. In this case, resolving the coreferences could potentially help in the Jaccard similarity.

**What do you think about them? Justify your answer.**

Observing the obtained clusters, we see that the main entities and their coreferences have been found. However, in the original - default greediness at 0.5 - case, the entity `book` failed to be coreferenced properly. Instead of taking the entity `book` it took `it` which is not something we would want since `it` is a pronoun and by itself does not really mean much.

Increasing the greediness, where more greedy means more coreference links, we see that it removes `it` and returns an entity that is not a pronoun for the cluster. Nevertheless, it took perhaps way too much of the entity as it took `the book her sister was reading`. Trying fine-tune the greediness to only get `book` seems fruitless, as reducing the parameter we return to the default case, where it took `it` as the entity.

Moreover, we see that increasing the greediness even more, the coreferences start to degrade. For example, with `0.56`, we get a coreference from `the bank` to `sitting by her sister on the bank`, which does not really seem helpful at all, since they are part of the same sentence. Increasing the greediness even more to `0.6`, starts to agglomerate everything in the `Alice` cluster.
