Implement and apply Lesk’s algorithm to the publicly available data set of SemEval 2013 Shared Task \#12 \(Navigli and Jurgens, 2013\), using NLTK’s interface to WordNet v3.0 as your lexical resource.
(Be sure you are using WordNet v3.0!) The relevant files are available on the course website. Starter
code is also provided to help you load the data. More information on the data set can be found at
https://www.cs.york.ac.uk/semeval-2013/task12/.
The provided code will load all of the cases that you are to resolve, along with their sentential context.
Apply word tokenization and lemmatization (you have code to do this from A1) as necessary, and remove
stop words.
As a first step, compare the following two methods for WSD:

1. The most frequent sense baseline: this is the sense indicated as #1 in the synset according to
WordNet
2. NLTK’s implementation of Lesk’s algorithm (nltk.wsd.lesk)
Use accuracy as the evaluation measure. There is sometimes more than one correct sense annotated in
the key. If that is the case, you may consider an automatic system correct if it resolves the word to any
one of those senses. What do you observe about the results?

# Imports and Environment Setup

In [1]:
%run loader.py

from tqdm import tqdm # progress bar
import functools # LRU cache
import string # for punctuation
import numpy as np # linspace for hyperparameter optimization

In [2]:
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

[nltk_data] Downloading package wordnet to
[nltk_data]     /home/2013/dgarfi/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [3]:
nltk.download('stopwords')
lemmatizer = nltk.stem.WordNetLemmatizer()
stopwords = set(nltk.corpus.stopwords.words('english'))
stopwords.add('--')
stopwords.add('@card')
stopwords.add('``')
for c in string.punctuation:
    stopwords.add(c)

@functools.lru_cache(maxsize=65536)
def lemmatize_sentence(sentence):
    sentence_list = sentence.split()
    return [lemmatizer.lemmatize(word) for word in sentence_list if word not in stopwords]

def preprocess_wsdinstance(wsdinstance):
    wsdinstance.context = lemmatize_sentence(" ".join(wsdinstance.context))
    return wsdinstance
    
dev_instances = { k: preprocess_wsdinstance(wsdinstance) for k, wsdinstance in dev_instances.items() }
test_instances = { k: preprocess_wsdinstance(wsdinstance) for k, wsdinstance in test_instances.items() }

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/2013/dgarfi/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Looking at our expected data

Let's take a look at the distribution of WordNet synset frequencies that occured in our testing set

In [4]:
def get_synsets_from_keys(keys_dict):
    """
    :param keys_dict: a dictionary of word-key: [synset ids] mappings
    :return dict: returns the same dictionary but maps [synset ids] to their corresponding Synset() objects
    """
    return { k: [wn.lemma_from_key(key).synset() for key in list_of_keys]
               for k, list_of_keys in keys_dict.items()}

expected_dev = get_synsets_from_keys(dev_key)
expected_test = get_synsets_from_keys(test_key)

list(expected_test.items())[:5]

[('d009.s029.t002', [Synset('state.n.04')]),
 ('d011.s019.t002', [Synset('spain.n.01')]),
 ('d010.s022.t001', [Synset('wife.n.01')]),
 ('d012.s015.t007', [Synset('pressure.n.02')]),
 ('d012.s012.t002', [Synset('immigration.n.01')])]

In [5]:
from collections import Counter 
sense_counter = Counter()

for key in expected_test:
    doc, sentence, term = key.split('.')
    for synset in expected_test[key]:
        lemma, pos, sense = synset.name().split('.')
        sense_counter.update({sense: 1})
{ key: count / sum(sense_counter.values()) for key, count in sense_counter.items() }

{'01': 0.6449175824175825,
 '02': 0.18269230769230768,
 '03': 0.08173076923076923,
 '04': 0.04945054945054945,
 '05': 0.011675824175824176,
 '06': 0.008241758241758242,
 '07': 0.0027472527472527475,
 '08': 0.0027472527472527475,
 '09': 0.0020604395604395605,
 '10': 0.0027472527472527475,
 '11': 0.008928571428571428,
 '12': 0.0006868131868131869,
 '20': 0.0006868131868131869,
 '29': 0.0006868131868131869}

# Calculating Accuracy

In [6]:
def juxtapose_predictions(predicted, expected):
    assert(len(predicted.values()) == len(expected.values()))
    
    res = {}
    for key in predicted:
        #lemma = test_instances[key].lemma
        #context = " ".join(test_instances[key].context)
        if predicted[key] in expected[key]:
            res[key] = [
                True, 
                #lemma, 
                #context, 
                "Expected synset: " + predicted[key].name(),
                "Expected definition: " + predicted[key].definition()]
        else:
            res[key] = [
                False, 
                #lemma,
                #context,
                "Predicted synset: " + predicted[key].name(), 
                "Predicted definition: " + predicted[key].definition(),
                "Expected synsets: " + ",".join([x.name() for x in expected[key]]),
                "Expected definitions: " + ",".join([x.definition() for x in expected[key]])]
    return res

def false_predictions(predicted, expected):
    return { k:v for k,v in juxtapose_predictions(predicted, expected).items() if v[0] is False }
            
def accuracy_of(predicted, expected):
    """
    :param predicted dict: word-key: Synset() dictionary of predicted word-sense disambiguations
    :param expected dict: word-key: Synset() of expected word-sense disambiguations    
    :return float: the percent of total keys in expected which are correctly predicted
    
    Note that expected.values() maps our word-based lexicon to lists of Synsets.
    If any one of its possible Synsets is predicted, the word is considered correcty predicted.
    """
    assert(len(predicted.values()) == len(expected.values()))
    #return sum([x[0] in x[1] for x in zip(predicted.values(), expected.values())]) / float(len(predicted.values()))

    good = 0
    jux = juxtapose_predictions(predicted, expected)
    for key in jux:
        if jux[key][0]:
            good += 1
    return (good / float(len(jux)))

# Part 1: Baseline and Lesk's Algorithm

In [7]:
def predict_with(method, instance_dict):
    return { k: method(wsd) for k, wsd in instance_dict.items() }
    
def baseline(target_wsdinstance):
    return wn.synsets(target_wsdinstance.lemma)[0]

predicted_baseline_test = predict_with(baseline, test_instances)

accuracy_of(predicted_baseline_test, expected_test)

0.623448275862069

In [8]:
predicted_lesk = { k: nltk.wsd.lesk(wsd.context, wsd.lemma) for k, wsd in test_instances.items() }

accuracy_of(predicted_lesk, expected_test)

0.29586206896551726

# Part 2: Combination of Word Sense Frequency & Lesk's Algorithm

Next, develop two additional methods to solve this problem. One of them must combine distributional
information about the frequency of word senses, and the standard Lesk’s algorithm. The other may be
any other method of your design. The two methods must be substantially different; they may not be
simply the same method with a different parameter value. Make and justify decisions about any other
parameters to the algorithms, such as what exactly to include in the sense and context representations,
how to compute overlap, and how to trade off the distributional and the Lesk signal, with the use of
the development set, which the starter code will load for you. You may use any heuristic, probabilistic
model, or other statistical method that we have discussed in class in order to combine these two sources
of information

In [9]:
def prior(wsdinstance, synset, power=1):
    wordnet_synsets = wn.synsets(wsdinstance.lemma)
    
    rank = wordnet_synsets[::-1].index(synset) + 1
    normalization_const = sum(pow(x, power) for x in range(1, len(wordnet_synsets) + 1))
    probability = pow(rank, power) / float(normalization_const)
    
    return probability
     
## Debug ##
wsd = list(dev_instances.values())[4]
print(wsd.id)
print(wsd.context)
print(wsd.lemma)
prior(wsd, wn.synsets(wsd.lemma)[0], 8)

d001.s004.t007
['see', 'important', 'advance', 'negotiation', 'run', 'time', '@card@', 'world', 'leader', 'arrive', 'Copenhagen', 'next', 'week']
week


0.9623056614843063

In [10]:
# Power is pretty bad here
def lesks(wsdinstance, target_synset, power=1):
    target_index = wn.synsets(wsdinstance.lemma).index(target_synset)
    synsets = [set(lemmatize_sentence(synset.definition())) for synset in wn.synsets(wsdinstance.lemma)]
    print(synsets)
    pattern = set(wsdinstance.context)
    print(pattern)
    overlaps = [pow(len(pattern.intersection(synset)), power) for synset in synsets]
    print(overlaps)
    
    if sum(overlaps) == 0:
        return 1 / float(len(wn.synsets(wsdinstance.lemma)))
    else:
        return overlaps[target_index] / float(sum(overlaps))

## Debug ##
lesks(wsd, wn.synsets(wsd.lemma)[0], power=4)

[{'consecutive', 'period', 'day', 'seven'}, {'work', 'day', 'week', 'calendar', 'hour'}, {'consecutive', 'Sunday', 'seven', 'period', 'starting', 'day'}]
{'leader', 'run', 'time', 'week', 'world', 'see', 'negotiation', 'Copenhagen', 'next', 'important', '@card@', 'arrive', 'advance'}
[0, 1, 0]


0.0

In [11]:
def posterior_helper(wsdinstance, target_synset, prior, odds):
    return prior(wsdinstance, target_synset) * odds(wsdinstance, target_synset)

def posterior(wsdinstance, target_synset, prior, odds):
    new_evidences = [posterior_helper(wsdinstance, synset, prior, odds) for synset in wn.synsets(wsdinstance.lemma)]
    
    return posterior_helper(wsdinstance, target_synset, prior, odds) / float(sum(new_evidences))

def bayesian(wsdinstance, prior=prior, odds=lesks):
    probabilities = [posterior(wsdinstance, synset, prior, odds) for synset in wn.synsets(wsdinstance.lemma)]
    return wn.synsets(wsdinstance.lemma)[probabilities.index(max(probabilities))]

In [12]:
prior_powers = np.linspace(1,8,10)
odds_powers = np.linspace(1,4,2)

results = {}

for pp in prior_powers:
    prior = functools.partial(prior, power=pp)
    for op in odds_powers:
        lesks = functools.partial(lesks, power=op)
        print("Trying pp {}, op {}".format(pp, op))
        #predicted_bayesian_dev = predict_with(functools.partial(bayesian, prior=prior, odds=lesks), dev_instances)
        #accuracy = accuracy_of(predicted_bayesian_dev, expected_dev)
        #print("Achieved accuracy " + str(accuracy))
        #results["-".join(["pp" + str(pp), "op" + str(op)])] = accuracy
#predicted_bayesian_test = predict_with(bayesian, test_instances)

#accuracy_of(predicted_bayesian_test, expected_test)

Trying pp 1.0, op 1.0
Trying pp 1.0, op 4.0
Trying pp 1.7777777777777777, op 1.0
Trying pp 1.7777777777777777, op 4.0
Trying pp 2.5555555555555554, op 1.0
Trying pp 2.5555555555555554, op 4.0
Trying pp 3.3333333333333335, op 1.0
Trying pp 3.3333333333333335, op 4.0
Trying pp 4.111111111111111, op 1.0
Trying pp 4.111111111111111, op 4.0
Trying pp 4.888888888888889, op 1.0
Trying pp 4.888888888888889, op 4.0
Trying pp 5.666666666666667, op 1.0
Trying pp 5.666666666666667, op 4.0
Trying pp 6.444444444444445, op 1.0
Trying pp 6.444444444444445, op 4.0
Trying pp 7.222222222222222, op 1.0
Trying pp 7.222222222222222, op 4.0
Trying pp 8.0, op 1.0
Trying pp 8.0, op 4.0


In [20]:
def context_given_wordsense(wsdinstance, target_synset, power=1):
    context_synsets = [wn.synsets(w)[0] for w in wsdinstance.context if len(wn.synsets(w)) > 0]
    total = 0
    for synset in context_synsets:
        similarity = target_synset.path_similarity(synset) or 0
        if similarity == 0:
            continue
        total += pow(1 / similarity, power)
    return total

    #similarity = sum(pow(1 / target_synset.path_similarity(synset), power) or 0 for synset in context_synsets) / len(context_synsets)
    #return similarity

prior = functools.partial(prior, power=6)
odds = functools.partial(context_given_wordsense, power=4)
predicted_bayesian_test = predict_with(functools.partial(bayesian, prior=prior, odds=odds), test_instances)
accuracy_of(predicted_bayesian_test, expected_test)

0.5917241379310345

4