# Coreference Resolution

In this notebook, we investigate methods for coreference resolution. In NLP, coreference resolution analyzes references of text that point to the same entity. For example, given a sentence such as \
**'My brother loves to play basketball. He plays as a point guard'**, \
this sentence will get resolved as \
**'My brother loves to play basketball. My brother plays as a point guard'**.\
\
This reduces the complexity in speech that is evident in a wide variety of text sources.

In [9]:
starwars_text = 'Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).'
starwars_text

'Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).'

In [3]:
import spacy
from stanfordcorenlp import StanfordCoreNLP
import json
from collections import defaultdict
import nltk

# Stanford CoreNLP

In [21]:
nlp = StanfordCoreNLP("../stanford-corenlp-4.2.0", quiet=False)
annotated = nlp.annotate(starwars_text, properties={'annotators': 'coref', 'pipelineLanguage': 'en', 'coref.algorithm' : 'statistical'})
result = json.loads(annotated)
corefs = result['corefs']

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [16]:
import sys
sys.path.append('../')
from text_to_graph import process_NER, process_corefs, process_dependency_matching

In [17]:
# Perform Named Entity Recognition with spaCy
ner_dict = process_NER(text=starwars_text)
print('***** Completed NER *****')
    
# Generate Coreferences and Dependencies with CoreNLP
corefs = process_corefs(text=starwars_text, corenlp_path='../stanford-corenlp-4.2.0')
print("Coreferences found: ", len(corefs))

# Perform Replacement with Named Entities and Dependencies
resolved_text = process_dependency_matching(text=starwars_text, ner_dict=ner_dict, corefs=corefs)
print('***** Completed Coreference Resolution *****')

***** Completed NER *****
Coreferences found:  3
***** Completed Coreference Resolution *****


In [19]:
starwars_text

'Luke Skywalker is a fictional character and the main protagonist of the original film trilogy of the Star Wars franchise created by George Lucas. \nThe character, portrayed by Mark Hamill, is an important figure in the Rebel Alliance\'s struggle against the Galactic Empire. \nHe is the twin brother of Rebellion leader Princess Leia Organa of Alderaan, a friend and brother-in-law of smuggler Han Solo, an apprentice to Jedi Masters Obi-Wan "Ben" Kenobi and Yoda, the son of fallen Jedi Anakin Skywalker (Darth Vader) and Queen of Naboo/Republic Senator Padmé Amidala and maternal uncle of Kylo Ren / Ben Solo. \nThe now non-canon Star Wars expanded universe depicts him as a powerful Jedi Master, husband of Mara Jade, the father of Ben Skywalker and maternal uncle of Jaina, Jacen and Anakin Solo., In 2015, the character was selected by Empire magazine as the 50th greatest movie character of all time.\nOn their list of the 100 Greatest Fictional Characters, Fandomania.com ranked the character

In [18]:
resolved_text

"Luke Skywalker is a fictional character and the main protagonist of the original film trilogy of the Star Wars franchise created by George Lucas. The character , portrayed by Mark Hamill , is an important figure in the Rebel Alliance 's struggle against the Galactic Empire. The character , portrayed by Mark Hamill is the twin brother of Rebellion leader Princess Leia Organa of Alderaan, a friend and brother-in-law of smuggler Han Solo, an apprentice to Jedi Masters Obi-Wan `` Ben '' Kenobi and Yoda, the son of fallen Jedi Anakin Skywalker( Darth Vader) and Queen of Naboo/Republic Senator Padmé Amidala and maternal uncle of Kylo Ren/ Ben Solo. The now non-canon Star Wars expanded universe depicts him as a powerful Jedi Master, husband of Mara Jade, the father of Ben Skywalker and maternal uncle of Jaina, Jacen and Anakin Solo., In 2015, the character was selected by Empire magazine as the 50th greatest movie character of all time. On The now non-canon Star Wars list of the 100 Greatest

# Neural Coref

In [1]:
import neuralcoref
import spacy
nlp = spacy.load("en_core_web_lg")

In [4]:
raw_text = 'Hello, world. Here are two sentences.'
# nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))
doc = nlp(raw_text)
sentences = [sent.string.strip() for sent in doc.sents]
print(sentences)

['Hello, world.', 'Here are two sentences.']


In [2]:

neuralcoref.add_to_pipe(nlp)

doc = nlp('My sister has a dog. She loves him.')

print(doc._.has_coref)
print(doc._.coref_clusters)
print(doc._.coref_resolved)

True
[My sister: [My sister, She], a dog: [a dog, him]]
My sister has a dog. My sister loves a dog.


### Large Text Example

In [10]:
doc = nlp(starwars_text)
doc._.has_coref

True

In [11]:
doc._.coref_clusters

[Darth Vader: [Darth Vader, his],
 Darth Vader: [Darth Vader, his, his],
 the original film trilogy: [the original film trilogy, the prequel trilogy],
 The character: [The character, His, his, his character, He, he, his],
 the Force: [the Force, the Force]]

In [12]:
doc._.coref_resolved

'Darth Vader, also known by Darth Vader birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while Darth Vader past as Anakin Skywalker and the story of Darth Vader corruption are central to the narrative of the original film trilogy. The character was created by George Lucas and has been portrayed by numerous actors. The character appearances span the first six Star Wars films, as well as Rogue One, and The character is heavily referenced in Star Wars: The Force Awakens. The character is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, The character falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of The character Sith master, Emperor Palpatine (also known as Darth

In [4]:
! pip install neuralcoref



In [5]:
import spacy
import neuralcoref

nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)
doc1 = nlp('My sister has a dog. She loves him.')
print(doc1._.coref_clusters)

doc2 = nlp('Angela lives in Boston. She is quite happy in that city.')
for ent in doc2.ents:
    print(ent._.coref_cluster)

AttributeError: type object 'neuralcoref.neuralcoref.array' has no attribute '__reduce_cython__'

In [1]:
import spacy
import neuralcoref

nlp = spacy.load('en')

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


In [2]:
neuralcoref.add_to_pipe(nlp)

<spacy.lang.en.English at 0x7f8e9212bb10>

In [None]:
doc1 = nlp('My sister has a dog. She loves him.')