[Coreference]((https://en.wikipedia.org/wiki/Coreference)) resolution is the task of finding all expressions that refer to the same entity in a text.

# Using Spacy

## Library

In [11]:
# Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('en_core_web_sm')

# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref.add_to_pipe(nlp)

<spacy.lang.en.English at 0x2468621af28>

## Example

In [12]:
examples = [
    u'My sister has a dog and she loves him.',
    u'My sister has a dog and she loves him. He is cute.',
    u'My sister has a dog and she loves her.',
    u'My brother has a dog and he loves her.',
    u'Mary and Julie are sisters. They love chocolates.',
    u'John and Mary are neighbours. She admires him because he works hard.',
    u'X and Y are neighbours. She admires him because he works hard.',
    u'The dog chased the cat. But it escaped.',
]

## Code

In [13]:
def printMentions(doc):
    print('\nAll the "mentions" in the given text:')
    for cluster in doc._.coref_clusters:
        print(cluster.mentions)

def printPronounReferences(doc):
    print('\nPronouns and their references:')
    for token in doc:
        if token.pos_ == 'PRON' and token._.in_coref:
            for cluster in token._.coref_clusters:
                print(token.text + " => " + cluster.main.text)
                
def processDoc(text):
    doc = nlp(text)
    if doc._.has_coref:
        print("Given text: " + text)
        printMentions(doc)
        printPronounReferences(doc)

In [14]:
if __name__ == "__main__":
    processDoc(examples[3])

Given text: My brother has a dog and he loves her.

All the "mentions" in the given text:
[My brother, he, her]

Pronouns and their references:
he => My brother
her => My brother


In [18]:
# Or
# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp(u'Carol told Bobi to attend the party. They arrived together.')

doc._.has_coref
doc._.coref_clusters

[the party: [the party, They]]

# Using StanfordNLP

**Steps:**
1. Install java 64 bit (in cmd try java -d64 -version)
2. Follow instructions from https://www.khalidalnajjar.com/setup-use-stanford-corenlp-server-python/
    * Go to [link](https://stanfordnlp.github.io/CoreNLP/index.html#download) and install Standofrd CoreNLP
    * Extract it in a folder in C drive.
    * Run Stanford NLP Java Server
    * **java -mx2g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000**
    * Change ram size in -mx2g folder.
3. "CTRL-C" command kills the server and releases the memory.
4. [Link](https://stanfordnlp.github.io/CoreNLP/memory-time.html) for memory/time optimisation.

In [1]:
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

In [2]:
def resolve(corenlp_output):
    """ Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
    for coref in corenlp_output['corefs']:
        mentions = corenlp_output['corefs'][coref]
        antecedent = mentions[0]  # the antecedent is the first mention in the coreference chain
        for j in range(1, len(mentions)):
            mention = mentions[j]
            if mention['type'] == 'PRONOMINAL':
                # get the attributes of the target mention in the corresponding sentence
                target_sentence = mention['sentNum']
                target_token = mention['startIndex'] - 1
                # transfer the antecedent's word form to the appropriate token in the sentence
                corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']
                
def print_resolved(corenlp_output):
    """ Print the "resolved" output """
    possessives = ['hers', 'his', 'their', 'theirs']
    for sentence in corenlp_output['sentences']:
        for token in sentence['tokens']:
            output_word = token['word']
            # check lemmas as well as tags for possessive pronouns in case of tagging errors
            if token['lemma'] in possessives or token['pos'] == 'PRP$':
                output_word += "'s"  # add the possessive morpheme
            output_word += token['after']
            print(output_word, end='')

In [3]:
text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
       "hers is blue. It is older than hers. The big cat ate its dinner."

output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

resolve(output)

print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)

Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate its dinner.
Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate His car's dinner.

# Stanford Vs. Spacy

In [18]:
text = ["The music was so loud that it couldn't be enjoyed.",
        "Our neighbors dislike the music. If they are angry, the cops will show up soon.",
        "If they are angry about the music, the neighbors will call the cops.",
        "Despite her difficulty, Wilma came to understand the point.",
        "Carol told Bobi to attend the party. They arrived together.",
        "When Carol helps Bob and Bob helps Carol, they can accomplish any task.",
        "The project leader is refusing to help. The jerk thinks only of himself.",
        "Some of our colleagues are going to be supportive. These kinds of people will earn our gratitude.",
        "The trophy would not fit in the brown suitcase because it was too big."]        

In [10]:
for i in range(0, len(text)):
    output = nlp.annotate(text[i], properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

    resolve(output)

    print('Original:', text[i])
    print('Resolved: ', end='')
    print_resolved(output)
    print()

Original: The music was so loud that it couldn't be enjoyed.
Resolved: The music was so loud that The music couldn't be enjoyed.
Original: Our neighbors dislike the music. If they are angry, the cops will show up soon.
Resolved: Our's neighbors dislike the music. If Our neighbors are angry, the cops will show up soon.
Original: If they are angry about the music, the neighbors will call the cops.
Resolved: If they are angry about the music, the neighbors will call the cops.
Original: Despite her difficulty, Wilma came to understand the point.
Resolved: Despite her's difficulty, Wilma came to understand the point.
Original: Carol told Bobi to attend the party. They arrived together.
Resolved: Carol told Bobi to attend the party. They arrived together.
Original: When Carol helps Bob and Bob helps Carol, they can accomplish any task.
Resolved: When Carol helps Bob and Bob helps Carol, Bob and Bob can accomplish any task.
Original: The project leader is refusing to help. The jerk thinks onl

In [19]:
for i in range(0, len(text)):
    if __name__ == "__main__":
        processDoc(text[i])
        print()

Given text: The music was so loud that it couldn't be enjoyed.

All the "mentions" in the given text:
[The music, it]

Pronouns and their references:
it => The music

Given text: Our neighbors dislike the music. If they are angry, the cops will show up soon.

All the "mentions" in the given text:
[Our neighbors, they]

Pronouns and their references:
they => Our neighbors


Given text: Despite her difficulty, Wilma came to understand the point.

All the "mentions" in the given text:
[her, Wilma]

Pronouns and their references:

Given text: Carol told Bobi to attend the party. They arrived together.

All the "mentions" in the given text:
[the party, They]

Pronouns and their references:
They => the party

Given text: When Carol helps Bob and Bob helps Carol, they can accomplish any task.

All the "mentions" in the given text:
[Carol, Carol]
[Bob, Bob]
[Bob and Bob, they]

Pronouns and their references:
they => Bob and Bob

Given text: The project leader is refusing to help. The jerk thin

In [43]:
#!/usr/bin/env python
# coding: utf8
"""Example of training an additional entity type

This script shows how to add a new entity type to an existing pretrained NER
model. To keep the example short and simple, only four sentences are provided
as examples. In practice, you'll need many more — a few hundred would be a
good start. You will also likely need to mix in examples of other entity
types, which might be obtained by running the entity recognizer over unlabelled
sentences, and adding their annotations to the training set.

The actual training is performed by looping over the examples, and calling
`nlp.entity.update()`. The `update()` method steps through the words of the
input. At each word, it makes a prediction. It then consults the annotations
provided on the GoldParse instance, to see whether it was right. If it was
wrong, it adjusts its weights so that the correct action will score higher
next time.

After training your model, you can save it to a directory. We recommend
wrapping models as Python packages, for ease of deployment.

For more details, see the documentation:
* Training: https://spacy.io/usage/training
* NER: https://spacy.io/usage/linguistic-features#named-entities

Compatible with: spaCy v2.1.0+
Last tested with: v2.1.0
"""
from __future__ import unicode_literals, print_function

import plac
import random
from pathlib import Path
import spacy
from spacy.util import minibatch, compounding


# new entity label
LABEL = "ANIMAL"

# training data
# Note: If you're using an existing model, make sure to mix in examples of
# other entity types that spaCy correctly recognized before. Otherwise, your
# model might learn the new type, but "forget" what it previously knew.
# https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting
TRAIN_DATA = [
    (
        "Horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("Do they bite?", {"entities": []}),
    (
        "horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
    (
        "they pretend to care about your feelings, those horses",
        {"entities": [(48, 54, LABEL)]},
    ),
    ("horses?", {"entities": [(0, 6, LABEL)]}),
]



model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
new_model_name=("New model name for model meta.", "option", "nm", str),
output_dir=("Optional output directory", "option", "o", Path),
n_iter=("Number of training iterations", "option", "n", int),

def main(model=None, new_model_name="animal", output_dir=None, n_iter=30):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    random.seed(0)
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe("ner")

    ner.add_label(LABEL)  # add new entity label to entity recognizer
    # Adding extraneous labels shouldn't mess anything up
    ner.add_label("VEGETABLE")
    if model is None:
        optimizer = nlp.begin_training()
    else:
        optimizer = nlp.resume_training()
    move_names = list(ner.move_names)
    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train NER
        sizes = compounding(1.0, 4.0, 1.001)
        # batch up the examples using spaCy's minibatch
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            batches = minibatch(TRAIN_DATA, size=sizes)
            losses = {}
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
            print("Losses", losses)

    # test the trained model
    test_text = "Do you like horses?"
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)

    # save model to output directory
    nlp.meta["name"] = new_model_name  # rename model
    nlp.to_disk('C:/Users/saurabhkumar9/1. NLP Practicum Materials')
    print("Saved model to", output_dir)

    # test the saved model
    nlp2 = spacy.load(Path.cmd())
    
    # Check the classes have loaded back consistently
    assert nlp2.get_pipe("ner").move_names == move_names
    doc2 = nlp2(test_text)
    for ent in doc2.ents:
        print(ent.label_, ent.text)


if __name__ == "__main__":
    main()

Created blank 'en' model
Losses {'ner': 38.61012762784958}
Losses {'ner': 17.51923118904233}
Losses {'ner': 8.510706414787819}
Losses {'ner': 9.275895185564423}
Losses {'ner': 8.27842292211426}
Losses {'ner': 3.285048616497079}
Losses {'ner': 2.629565456355863}
Losses {'ner': 0.5143076098867678}
Losses {'ner': 0.14073449907551744}
Losses {'ner': 0.013571078565430711}
Losses {'ner': 0.0005333663947689162}
Losses {'ner': 0.0003102885546442968}
Losses {'ner': 1.4933599401414127e-05}
Losses {'ner': 0.008191097285029036}
Losses {'ner': 3.778854443138536e-06}
Losses {'ner': 9.192545628506831e-07}
Losses {'ner': 8.910108077462803e-05}
Losses {'ner': 0.003805166581499559}
Losses {'ner': 0.0007198828174248596}
Losses {'ner': 1.2225052562323564e-07}
Losses {'ner': 6.200159303778063e-07}
Losses {'ner': 3.2238199033687577e-06}
Losses {'ner': 5.50518720640107e-08}
Losses {'ner': 3.573447064233407e-08}
Losses {'ner': 3.775577714707288e-09}
Losses {'ner': 0.0006222276868892166}
Losses {'ner': 7.20147

AttributeError: type object 'Path' has no attribute 'cmd'

In [49]:
examples = [
    ('Who is Shaka Khan?',
     [(7, 17, 'PERSON')]),
    ('I like London and Berlin.',
     [(7, 13, 'LOC'), (18, 24, 'LOC')])
]

In [53]:
results

{'uas': 0.0,
 'las': 0.0,
 'ents_p': 0.0,
 'ents_r': 0.0,
 'ents_f': 0.0,
 'tags_acc': 0.0,
 'token_acc': 100.0}

In [None]:
'''
P.Loss : Parser loss
N.Loss : NER loss 
UAS : Unlabelled attachment score for parser 
NER P. : NER Precision on development data
NER R. : NER recall on development data
NER F. : NER F on development data
Tag % :Tag accuracy on development data 
T Token % : Tokenization accuracy on development data (irrelevant if you use the .iob format, which prevents you from learning from incorrectly tokenized text).
'''