# Coreference Resolution

In this notebook, we investigate methods for coreference resolution. In NLP, coreference resolution analyzes references of text that point to the same entity. For example, given a sentence such as \
**'My brother loves to play basketball. He plays as a point guard'**, \
this sentence will get resolved as \
**'My brother loves to play basketball. My brother plays as a point guard'**.\
\
This reduces the complexity in speech that is evident in a wide variety of text sources.

In [1]:
starwars_text = 'Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).'
starwars_text

'Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).'

In [2]:
import spacy
from stanfordcorenlp import StanfordCoreNLP
import json
from collections import defaultdict
import nltk

# Stanford CoreNLP

In [3]:
nlp = StanfordCoreNLP("../stanford-corenlp-4.2.0", quiet=False)
annotated = nlp.annotate(starwars_text, properties={'annotators': 'coref', 'pipelineLanguage': 'en', 'coref.algorithm' : 'statistical'})
result = json.loads(annotated)
corefs = result['corefs']

In [4]:
corefs

{'7': [{'id': 1,
   'text': 'Darth Vader , also known by his birth name Anakin Skywalker',
   'type': 'PROPER',
   'number': 'SINGULAR',
   'gender': 'MALE',
   'animacy': 'ANIMATE',
   'startIndex': 1,
   'endIndex': 12,
   'headIndex': 2,
   'sentNum': 1,
   'position': [1, 2],
   'isRepresentativeMention': True},
  {'id': 2,
   'text': 'his',
   'type': 'PRONOMINAL',
   'number': 'SINGULAR',
   'gender': 'MALE',
   'animacy': 'ANIMATE',
   'startIndex': 7,
   'endIndex': 8,
   'headIndex': 7,
   'sentNum': 1,
   'position': [1, 3],
   'isRepresentativeMention': False},
  {'id': 0,
   'text': 'Anakin Skywalker',
   'type': 'PROPER',
   'number': 'SINGULAR',
   'gender': 'MALE',
   'animacy': 'ANIMATE',
   'startIndex': 10,
   'endIndex': 12,
   'headIndex': 11,
   'sentNum': 1,
   'position': [1, 1],
   'isRepresentativeMention': False},
  {'id': 6,
   'text': 'Darth Vader',
   'type': 'PROPER',
   'number': 'SINGULAR',
   'gender': 'MALE',
   'animacy': 'ANIMATE',
   'startIndex': 1

In [5]:
import sys
sys.path.append('../')
from text_to_graph import process_NER, process_corefs, process_dependency_matching

In [6]:
# Perform Named Entity Recognition with spaCy
ner_dict = process_NER(text=starwars_text)
print('***** Completed NER *****')
    
# Generate Coreferences and Dependencies with CoreNLP
corefs = process_corefs(text=starwars_text, corenlp_path='../stanford-corenlp-4.2.0')
print("Coreferences found: ", len(corefs))

# Perform Replacement with Named Entities and Dependencies
resolved_text = process_dependency_matching(text=starwars_text, ner_dict=ner_dict, corefs=corefs)
print('***** Completed Coreference Resolution *****')

***** Completed NER *****
Coreferences found:  3
***** Completed Coreference Resolution *****


In [7]:
resolved_text

'Anakin Skywalker , is a fictional character in the Star Wars franchise. Anakin Skywalker appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. Anakin Skywalker was created by George Lucas and has been portrayed by numerous actors. Anakin Skywalker appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. Anakin Skywalker is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, Anakin Skywalker falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine( also known as Darth Sidious) . '

# Neural Coref

In [8]:
import neuralcoref

In [9]:
nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)

doc = nlp('My sister has a dog. She loves him.')

print(doc._.has_coref)
print(doc._.coref_clusters)
print(doc._.coref_resolved)

True
[My sister: [My sister, She], a dog: [a dog, him]]
My sister has a dog. My sister loves a dog.


### Large Text Example

In [10]:
doc = nlp(starwars_text)
doc._.has_coref

True

In [11]:
doc._.coref_clusters

[Darth Vader: [Darth Vader, his],
 Darth Vader: [Darth Vader, his, his],
 the original film trilogy: [the original film trilogy, the prequel trilogy],
 The character: [The character, His, his, his character, He, he, his],
 the Force: [the Force, the Force]]

In [12]:
doc._.coref_resolved

'Darth Vader, also known by Darth Vader birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while Darth Vader past as Anakin Skywalker and the story of Darth Vader corruption are central to the narrative of the original film trilogy. The character was created by George Lucas and has been portrayed by numerous actors. The character appearances span the first six Star Wars films, as well as Rogue One, and The character is heavily referenced in Star Wars: The Force Awakens. The character is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, The character falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of The character Sith master, Emperor Palpatine (also known as Darth