# Taaltheorie & Taalverwerking 2017 - Assignment 6

Last week, we used the following **Node** class to represent a simple ontological structure in Python, in which the argument is used to represent a concept (a word sense) and the second argument is used to represent its immediate hypernym. We use the same class again this week.


In [1]:
# FILL THIS IN FOR YOUR GROUP, also name your file as: tttv_ass6_<group>_<name1>_<name2>.ipynb
# Group        : G
# Name - UvaID : Bram Otten - 10992456
# Name - UvaID : Deborah Lambregts - 11318643
# Date         : 23-05-2017

In [2]:
class Node:
    def __init__(self, node_value, parent_node=None):
        self._set_parent(parent_node)            
        self._children = set()
        self.value = node_value
        
    def _set_parent(self, parent):
        if isinstance(parent, Node):
            self._parent_node = parent
            parent.get_children().add(self)
        else:
            self._parent_node = None
    
    def get_parent(self):
        return self._parent_node    
    
    def get_children(self):
        return self._children

animate_being = Node("animate_being")
animal = Node("animal", animate_being)
mammal = Node("mammal", animal)
carnivore = Node("carnivore", mammal)
feline = Node("feline", carnivore)
cat = Node("cat", feline)
insectivore = Node("insectivore", mammal)
hedgehog = Node("hedgehog", insectivore)

The purpose of these exercises is to implement more tools for exploring a given ontological structure. For all of the examples that follow, we shall assume that the sample ontology above is used (but, of course, the tools you are expected to implement are general and should work for any such ontology).


### Question 1 (6pts total)

In this exercise you will explore the output of a distributional semantic model and compare its semantic similarity ranking to that obtained with path-length distance in an ontology. 

#### Question 1.1 (1pt)
Infomap is an implementation of a distributional semantic model that can be queried online at http://clic.cimec.unitn.it/infomap-query/. For a target word, it will return its $n$ nearest semantic neighbours ordered by similarity strength (calculated using cosine similarity between vectors in semantic space). Have a look at the model options that can be selected in the querying interface. Read the Infomap documentation (link at the end of the  page) and briefly summarise (in your own words) the features of the model option **bnc-lemma-narrow**. Your summary should include the language for which the model has been constructed, the corpus used, the characteristics of the target words (the rows), and the size of the context.

#### Answers:

The corpus of the bnc-lemma-narrow model contains lemmatized lowercased English nouns, verbs, and adjectives. The narrow model only returns similar words with the same part of speech (e.g. noun), whereas the wide model returns any similar word from the corpus.

#### Question 1.2 (2pts)
Search for the target word **car** using the following parameters: target word PoS **noun**, model option **bnc-lemma-narrow**, neighbor PoS  **noun**, max. number of neighbors **5**. As output you should get a list of five words including **car**, with a similarity score that indicates how similar they are to **car** according to the model. 

Now search for each of these five words in WordNet and, by exploring their inherited hypernym chains, discover the ontological structure that links them (it may help you to draw a tree like the one shown at the beginning of this document). Implement this ontological structure in Python, using our **Node** class (up to their common hypernym in WordNet). Give the hypernyms of **car** in this ontology. The WordNet ontology can be searched through: http://wordnetweb.princeton.edu/perl/webwn. Once you search for **car**, you will get entries similar to:

    S: (n) car, auto, automobile, machine, motorcar (a motor vehicle with four wheels; usually propelled by an internal combustion engine) "he needs a car to get to work"
    
By clicking the **S** you will get options to for instance, list all of its hypernyms.    

In [4]:
# Give the list of neigbours and provide the ontological structure up to their common hypernym.
similar_words = ['car', 'van', 'vehicle', 'truck', 'motorcycle']
# Vehicle looks good. (Its hypernym is conveyance.)

In [5]:
# Give the ontological structure linking these words according to WordNet:

vehicle = Node("vehicle")
wheeled = Node("wheeled", vehicle)
propelled = Node("propelled", wheeled)
motor = Node("motor", propelled)
motorcycle = Node("cycle", motor)
car = Node("car", motor)
truck = Node("truck", motor)
van = Node("van", truck)

#### Question 1.3 (2pts)

 Implement a function called **co\_hyponym(node1, node2)** to check whether two concepts are co-hyponyms or sister terms (i.e., have the same immediate hypernym). Examples:

In [6]:
# Check if they have the same parent.
def co_hyponym(node1, node2):
    return node1.get_parent() == node2.get_parent()

print(co_hyponym(car, motorcycle))
print(co_hyponym(vehicle, wheeled))
print(co_hyponym(vehicle, vehicle))

True
False
True


#### Question 1.4 (1pt)
Give the results of **path_length** queries between **car** and each of the other four words in this ontology. Give the resulting list of semantic neighbours of **car** ordered by similarity strength (according to path-length) and compare this ranking to the one output by Infomap.


In [7]:
def path_length(nodeA, nodeB, count = 0):
    
    if co_hyponym(nodeA, nodeB):
        return count
    
    # Kind of hard-coded, but I'm not even sure a
    # new function was necessary.
    if nodeB.get_parent() == None:    
        return path_length(nodeA.get_parent(), nodeB, count + 1)
    
    return path_length(nodeA, nodeB.get_parent(), count + 1)
    
print(path_length(car, car))
print(path_length(car, van))
print(path_length(car, vehicle))
print(path_length(car, truck))
print(path_length(car, motorcycle))
print(similar_words)

0
1
4
0
0
['car', 'van', 'vehicle', 'truck', 'motorcycle']


**Answers:**

In [8]:
# A comparison based on just these five words makes me 
# conclude either WordNet's path length or that Web
# Infomap is pretty bad at returning similar words. 

# Intuitively, the InfoMap list sucks: car is only
# interchangeable imperfectly with vehicle, and sometimes
# with van or truck.

# Wordnet lists things like "automobile" as alternative 
# to car. That may be a rarer word, but it is better.