# RoWordNet tutorial


This tutorial will guide you through the basic operations in the RoWordNet module.



Let's start by creating a new synset. 

In [1]:
from synset import Synset

synset = Synset("id")

Now that the synset has been created, we can assign some values to its parameters.

In [2]:
synset.defition = "Asta este o definitie"
synset.pos = Synset.Pos.NOUN

(parameter 'pos' has been assigned an enumeration value from 'Synset.Pos'(NOUN, VERB, ADVERB, ADJECTIVE))

We can add a literal(word) to the synset.

In [3]:
literal = "leu"
synset.add_literal(literal)
print("This synset has {} literals".format(len(synset.literals)))

This synset has 1 literals


We can observe that the literal has been sucesfully added to the synset. Now let's try to remove it.

In [4]:
synset.remove_literal(literal)
print("This synset has {} literals".format(len(synset.literals)))

This synset has 0 literals


We can also add more literals at once.

In [5]:
literals = ['lup', 'oaie', 'cal']
synset.literals = literals
print("This synset has {} literals".format(len(synset.literals)))

This synset has 3 literals


Now let's create a wordnet.

In [6]:
from wordnet import WordNet

wordnet = WordNet()

We can also create a wordnet by loading it from a desired file. We have to specify the file path and whether the file is in xml or binary format(the default is binary).

In [7]:
# create a wordnet from a binary file
wordnet = WordNet("resources/binary_wn.pck", xml=False)

# create a wordnet from an xml file
wordnet = WordNet("resources/xml_wn.xml", xml=True)

We can save the wordnet in xml or binary format.

In [8]:
# save the wordnet in binary format
wordnet.save("resources/save_binary_wn.pck", xml=False)

# save the wordnet in xml format
wordnet.save("resources/save_xml_wn.pck", xml=True)

Every wordnet has a set of synsets that have relations between them. Let's get these synsets.

In [9]:
synset_ids = wordnet.synsets()
print("This worndet has {} synsets".format(len(synset_ids)))

This worndet has 59348 synsets


We can get only the synsets that contain a literal(word).

In [10]:
literal = "cal"
synset_ids_for_literal = wordnet.synsets(literal)
print("This wordnet has {} synsets containing literal '{}'".format(len(synset_ids_for_literal), literal))

This wordnet has 6 synsets containing literal 'cal'


Every synset has a set of synsets that it has relations with. Let's get those synsets.

In [11]:
# get the first synset id in list
synset_id = wordnet.synsets()[0]
adjacent_synset_ids = wordnet.adjacent_synsets(synset_id)
print("List of adjacent synsets for synset with id {}: {}".format(synset_id, adjacent_synset_ids))

List of adjacent synsets for synset with id ENG30-00006269-n: ['ENG30-00004258-n', 'ENG30-07993776-n']


When we first created our first synset, we gave it the following id: "id". This isn't very good practice though, because synset ids must be unique. Wordnet has a build in function that will generate an id for you. You can also specify a prefix and a suffix for the desired id.

In [12]:
# generate id with default prefix and suffix
new_id = wordnet.generate_synset_id()

#generate id with custom prefix and suffix
prefix = "ENG31-"
suffix = "-v"
new_id_with_custom_prefix_suffix = wordnet.generate_synset_id(prefix=prefix, suffix=suffix)
print("Generated id with default prefix and suffix: {}".format(new_id))
print("Generated id with custom prefix '{}' and suffix '{}': {}".format(prefix, suffix, new_id_with_custom_prefix_suffix))

Generated id with default prefix and suffix: ENG30-15300052-n
Generated id with custom prefix 'ENG31-' and suffix '-v': ENG31-00000001-v


Every synset in the wordnet has its own literals. Let's try to add one literal to a synset.

In [13]:
# get the first synset id in the list
synset_id = wordnet.synsets()[0]

# get the synset that has that id
synset = wordnet.synset(synset_id)
literal = "cal"
print("Number of synsets containing the literal '{}' before adding it to synset with id '{}': {}"
     .format(literal, synset_id, len(wordnet.synsets(literal))))
synset.add_literal(literal)
print("Number of synsets containing the literal '{}' after adding it to synset with id '{}': {}"
     .format(literal, synset_id, len(wordnet.synsets(literal))))
wordnet.reindex_literals()
print("Number of synsets containing the literal '{}' reindexin the wordnet: {}"
      .format(literal, len(wordnet.synsets(literal))))

Number of synsets containing the literal 'cal' before adding it to synset with id 'ENG30-00006269-n': 6
Number of synsets containing the literal 'cal' after adding it to synset with id 'ENG30-00006269-n': 6
Number of synsets containing the literal 'cal' reindexin the wordnet: 7


Function 'reindex_literals' must be called whenever you add or remove literals so that the literals are correctly connected in the wordnet.

Now, let's remove the literal from the synset.

In [14]:
synset.remove_literal(literal)
# we have to recall the function 'reindex_literals'
wordnet.reindex_literals()

Synsets in the wordnet are bound to other synsets through relations(hypernym, hyponym etc.)
Let's add a relation from a synset to another synset.

In [15]:
# get the first synset id in the list
synset_id1 = wordnet.synsets()[0]
# get the second synset id in the list
synset_id2 = wordnet.synsets()[1]
print("Number of adjacent synsets of the first synset before adding a new relation: {}"
      .format(len(wordnet.adjacent_synsets(synset_id1))))
print("Number of adjacent synsets of the second synset before adding a new relation: {}\n"
      .format(len(wordnet.adjacent_synsets(synset_id2))))

relation = "hypernym"
wordnet.add_relation(synset_id1, synset_id2, relation)
print("Number of djacent synsets of the first synset after adding a new relation: {}"
      .format(len(wordnet.adjacent_synsets(synset_id1))))
print("Number of adjacent synsets of the second synset after adding a new relation: {}"
      .format(len(wordnet.adjacent_synsets(synset_id2))))


Number of adjacent synsets of the first synset before adding a new relation: 2
Number of adjacent synsets of the second synset before adding a new relation: 18

Number of djacent synsets of the first synset after adding a new relation: 3
Number of adjacent synsets of the second synset after adding a new relation: 18


We observe that the number of adjacent synsets of the second synset remains unchanged. That's because the relation was added only from the first synset to the second synset.

Let's remove the previous relation.

In [16]:
wordnet.remove_relation(synset_id1, synset_id2)
print("Number of adjacent synsets of the first synset after removing the new relation: {}"
      .format(len(wordnet.adjacent_synsets(synset_id1))))

Number of adjacent synsets of the first synset after removing the new relation: 2


We can also travel the wordnet starting from a synset(we are going to iterate only the first five steps).

In [17]:
synset_id = wordnet.synsets()[0]
counter = 1

for current_synset, relation, from_synset in wordnet.bfwalk(synset_id):
    # bfwalk is a generator that yields, for each call, a BF step through wordnet
    # do actions with current_synset, relation, from_synset
    print("Step {}: from synset {}, with relation [{}] to synset {}"
            .format(counter, from_synset, relation, current_synset))
    if counter >= 5:
        break
    else:
        counter += 1

Step 1: from synset ENG30-00006269-n, with relation [hypernym] to synset ENG30-00004258-n
Step 2: from synset ENG30-00006269-n, with relation [hyponym] to synset ENG30-07993776-n
Step 3: from synset ENG30-00004258-n, with relation [hypernym] to synset ENG30-00003553-n
Step 4: from synset ENG30-00004258-n, with relation [domain_member_TOPIC] to synset ENG30-01646941-a
Step 5: from synset ENG30-00004258-n, with relation [hyponym] to synset ENG30-00004475-n


The relations create several trees in the wordnet. For instance, the hypernym&hyponym relation will generate a tree of type "is a" that looks like this: 


<img style="float: left;" src="http://www.cs.princeton.edu/courses/archive/spring07/cos226/assignments/wordnet-fig1.png">


Of course that the tree is much bigger, but you've got the ideea.

We can retrieve the synsets that creates a path from a synset to any of the roots of these trees.

In [18]:
# get the first synset in the list
synset_id = wordnet.synsets()[0]
relation = "hypernym"

synset_ids_root = wordnet.synset_to_root(synset_id, relation)
print("List of synsets from synset with id '{}' to root in the {} tree: {}".format(synset_id, relation, synset_ids_root))

List of synsets from synset with id 'ENG30-00006269-n' to root in the hypernym tree: ['ENG30-00006269-n', 'ENG30-00004258-n', 'ENG30-00003553-n', 'ENG30-00002684-n']


We observe that to specify the desired tree, we have to specify a relation type from a child to his parent(not from parent to child!). For instance: "hypernym", "meronym" etc. The default is "hypernym", so we could simply call the 'synset_to_root' by simply specifying the id of the synset.

We can also retrieve the lowest common ancestor of two synsets in a tree.

In [19]:
# get the first synset in the list
synset_id1 = wordnet.synsets()[0]
# get the second synset in the list
synset_id2 = wordnet.synsets()[1]
relation = "hypernym"

lcs_id = wordnet.lowest_common_ancestor(synset_id1, synset_id2, relation)
print("Lowest common ancestor in the {} tree of synset with id '{}' and synset with id '{}' is '{}':"
          .format(relation, synset_id1, synset_id2, lcs_id))

Lowest common ancestor in the hypernym tree of synset with id 'ENG30-00006269-n' and synset with id 'ENG30-00006484-n' is 'ENG30-00004258-n':


Again, we specify the desired tree by a relation type from a child to the parent. 

The following content will show a little bit more advanced series of operations.
Task: We would like to extract a list of synonyms and antonyms from all the nouns in WordNet

We first extract synonyms directly from synsets. We list all noun synsets then iterate through them and create pairs from each synset.

In [20]:
import itertools

synonyms = []
synsets_id = wordnet.synsets()
# for each synset, we create a list of synonyms between its literals
for synset_id in synsets_id:
    # the literals object is a dict, but we need only the
    # actual literals (not senses)
    synset = wordnet.synset(synset_id)
    literals = list(synset.literals)
    for i in range(len(literals)):
        for j in range(i+1, len(literals)):
            # append a tuple containing a pair of synonym literals
            synonyms.append((literals[i], literals[j]))

# list a few synonyms
print("\n\tList of the first 5 synonyms: ({} total synonym pairs extracted)".format(len(synonyms)))
for i in range(5):
    print("\t\t {:>20} == {}".format(synonyms[i][0], synonyms[i][1]))


	List of the first 5 synonyms: (45343 total synonym pairs extracted)
		               plantă == vegetală
		                 fapt == fenomen
		        neîndeplinire == nerealizare
		              mișcare == propulsare
		              alegere == opțiune


Now, antonyms. We now want to extract antonyms. We look at all the antonymy relations and then for each pair of synsets in this relation we generate a cartesian product between their literals.

In [21]:
synset_pairs = []

synsets_id = wordnet.synsets()  # extract all synsets
for synset_id in synsets_id:
    synset = wordnet.synset(synset_id)  # extract the antonyms of a synset
    synset_antonyms_id = wordnet.adjacent_synsets(synset.id,
                                                 relation="near_antonym")
    for synset_antonym_id in synset_antonyms_id:  # for each antonym synset
        synset_antonym = wordnet.synset(synset_antonym_id) # if the antonymy pair doesn't already exists
        if (synset_antonym, synset) not in synset_pairs:
            synset_pairs.append((synset, synset_antonym))  # add the antonym tuple to the list

# for each synset pair extract its literals, so we now have a list of
# pairs of literals
literal_pairs = []
for synset_pair in synset_pairs:
    # extract the literals of the first synset in the pair
    synset1_literals = list(synset_pair[0].literals)
    # extract the literals of the second synset in the pair
    synset2_literals = list(synset_pair[1].literals)
    # add a tuple containing the literals of each synset
    literal_pairs.append((synset1_literals, synset2_literals))

antonyms = []
# for each literals pair, we generate the cartesian product between them
for literal_pair in literal_pairs:
    for antonym_tuple in itertools.product(literal_pair[0], literal_pair[1]):
        antonyms.append(antonym_tuple)

# list a few antonyms
print("\n\tList of the first 5 antonyms: ({} total antonym pairs extracted)".format(len(antonyms)))
for i in range(5):
    print("\t\t {:>20} != {}".format(antonyms[i][0], antonyms[i][1]))


	List of the first 5 antonyms: (4917 total antonym pairs extracted)
		               femelă != mascul
		               femelă != parte_bărbătească
		      parte_femeiască != mascul
		      parte_femeiască != parte_bărbătească
		        imparicopitat != paricopitat
