## This demo shows the basic components and operations of the RoWordNet.




The first operation is to create a wordnet by using the internal resources.

In [1]:
import rowordnet
wn = rowordnet.RoWordNet()

If you want to create a wordnet using your own resources you have to specify a ``filepath`` to the file and whether the file is binary or xml. You can also create an empty wordnet by setting the parameter ``empty`` to ``True``.

Now that we have created a wordnet we can search for words. This action will return one or more synsets(the main component of a wordnet - see the synset/relation creation and editing for more details). 

In [2]:
word = 'arbore'
synset_ids = wn.synsets(literal=word)
print("Total number of synsets containing literal '{}': {}".format(word, len(synset_ids)))
print(synset_ids)

Total number of synsets containing literal 'arbore': 20
['ENG30-08102402-n', 'ENG30-12339526-n', 'ENG30-03726760-n', 'ENG30-13912260-n', 'ENG30-12752039-n', 'ENG30-13104059-n', 'ENG30-02946824-n', 'ENG30-12402840-n', 'ENG30-12690653-n', 'ENG30-12662772-n', 'ENG30-03127408-n', 'ENG30-11712282-n', 'ENG30-11704791-n', 'ENG30-11640132-n', 'ENG30-12334293-n', 'ENG30-03244231-n', 'ENG30-04472563-n', 'ENG30-04111190-n', 'ENG30-13111174-n', 'ENG30-04316924-n']


To print a detalied information of a synset we use the ``print_synset`` and provide the synset id.

In [3]:
synset_id = synset_ids[4]
wn.print_synset(synset_id)

Synset: 
	  id=ENG30-12752039-n
	  pos=NOUN
	  nonlexicalized=None
	  stamp=None
	  domain=biology
	  definition=Expresie : (Aceraceae); arbore sau arbust din acest gen;
	  sumo=FloweringPlant 
	  sumoType=HYPERNYM
	  sentiwn=[0.0, 0.0, 1.0]
	  Literals:
		  arbore - 2.6
		  arbust - 3.1
	  Outbound relations: 
		  ENG30-11567411-n - hypernym
		  ENG30-12752205-n - member_meronym
	  Inbound relations: 
		  ENG30-12752666-n - substance_holonym
		  ENG30-13109733-n - hyponym
		  ENG30-12752039-n - member_meronym
		  ENG30-12753573-n - hypernym
		  ENG30-12754003-n - hypernym
		  ENG30-12755225-n - hypernym


To obtain a synset object we simply use ``wn.synset()`` with the synset id as a paramater. We can also obtain the synset object by using ``wn()`` directly and passing the synset id as a parameter.

In [4]:
synset_object = wn.synset(synset_id)
print(synset_object)
synset_object = wn(synset_id)
print(synset_object)

Synset(id='ENG30-12752039-n', literals=['arbore', 'arbust'], definition='Expresie : (Aceraceae); arbore sau arbust din acest gen;')
Synset(id='ENG30-12752039-n', literals=['arbore', 'arbust'], definition='Expresie : (Aceraceae); arbore sau arbust din acest gen;')


Every synset has a set of words called 'literals'. We can acces them by using the property ``literals`` of a synset.

In [5]:
literals = synset_object.literals
print("Synset with id {} has {} literals: {}".format(synset_object.id, len(literals), literals))

Synset with id ENG30-12752039-n has 2 literals: ['arbore', 'arbust']


As we have accesed the ``literals`` of a synset, we can acces and modify any property(WARNING: You can't modify the ``id``). Now let's try to acces and modify the ``definition`` property.

In [6]:
definition = synset_object.definition
print("Defitionion of the synset with id {}: {}".format(synset_object.id, definition))
new_definition = "This is a new defition"
synset_object.definition = new_definition
print("New definition of the synset with id {}: {}".format(synset_object.id, synset_object.definition))

Defitionion of the synset with id ENG30-12752039-n: Expresie : (Aceraceae); arbore sau arbust din acest gen;
New definition of the synset with id ENG30-12752039-n: This is a new defition


Function with ``wn.synsets()`` will return a list containing all the synset ids of the wordnet.

In [7]:
synsets_id = wn.synsets()
print("Total number of synsets: {} \n".format(len(synsets_id)))

Total number of synsets: 59348 



There are 4 types of parts of speech in RoWordNet : Nouns, Verbs, Adjectives and Adverbs. To filter the synsets you have to provide a part of speech to the ``pos`` parameter.

In [8]:
from rowordnet import Synset

# return all noun synsets
synsets_id_nouns = wn.synsets(pos=Synset.Pos.NOUN)
print("Total number of noun synsets: {}".format(len(synsets_id_nouns)))
# return all verb synsets
synsets_id_verbs = wn.synsets(pos=Synset.Pos.VERB)
print("Total number of verb synsets: {}".format(len(synsets_id_verbs)))
# return all adjective synsets
synsets_id_adjectives = wn.synsets(pos=Synset.Pos.ADJECTIVE)
print("Total number of adjective synsets: {}".format(len(synsets_id_adjectives)))
# return all adverb synsets
synsets_id_adverbs = wn.synsets(pos=Synset.Pos.ADVERB)
print("Total number of adverb synsets: {}".format(len(synsets_id_adverbs)))

Total number of noun synsets: 41063
Total number of verb synsets: 10397
Total number of adjective synsets: 4822
Total number of adverb synsets: 3066


### We continue with examples of navigating in the wordnet

Synsets are linked by relations, encoded as directed edges in a graph. To see all the relations type between by accesing the ``relation_types`` property of the wordnet.

In [9]:
print("This wordnet contains {} relation types".format(len(wn.relation_types)))

This wordnet contains 35 relation types


Every synset has a number of synsets that it points to (outbound relations) and a set of synsets that point to it (inbound relations). We can acces these synsets and the relations between them by using the functions ``outbound_relations`` and ``inbound_relations``, respectively. We can also access both the inbound and outbound relations of synset through ``relations`` function. Note that the ``relations`` function looses directionality as it is simply a concatenation of the inbound+outbound relations - it is used more as a convenience for printing rather than for operations or search in the word net.

In [10]:
# print all outbound relations of a synset
synset_id = wn.synsets("tren")[0]
print("Print all outbound relations of synset with id {}".format(synset_id))
outbound_relations = wn.outbound_relations(synset_id)
for outbound_relation in outbound_relations:
    target_synset_id = outbound_relation[0]        
    relation = outbound_relation[1]
    print("\tRelation [{}] to synset {}".format(relation, target_synset_id))
    
# print all inbound relations of a synset
print("\nPrint all outbound relations of synset with id {}".format(synset_id))    
for source_synset_id, relation in wn.inbound_relations(synset_id):
    print("\tRelation [{}] from synset {}".format(relation, source_synset_id))

# get all relations of the same synset   
relations = wn.relations(synset_id)
print("\nThe synset has {} total relations.".format(len(relations)))

Print all outbound relations of synset with id ENG30-03431745-n
	Relation [hypernym] to synset ENG30-04576971-n
	Relation [part_holonym] to synset ENG30-03287733-n

Print all outbound relations of synset with id ENG30-03431745-n
	Relation [hyponym] from synset ENG30-04576971-n
	Relation [part_meronym] from synset ENG30-03287733-n

The synset has 4 total relations.


To travel through the wordnet you use ``bfswalk()`` by providing a synset id as a starting location. This function ``yields`` a generator that can be further used to travel through the worndet.

In [11]:
# get a new synset
new_synset_id = wn.synsets("cal")[2]
# travel the graph Breadth First 
counter = 0
print("\n\tTravel breadth-first through wordnet starting with synset '{}' (first 10 synsets) ..."
      .format(new_synset_id))
for current_synset_id, relation, from_synset_id in wn.bfwalk(new_synset_id):
    counter += 1
    # bfwalk is a generator that yields, for each call, a BF step through wordnet 
    # you do actions with current_synset_id, relation, from_synset_id
    print("\t\t Step {}: from synset {}, with relation [{}] to synset {}"
          .format(counter, from_synset_id, relation, current_synset_id))
    if counter >= 10:
        break


	Travel breadth-first through wordnet starting with synset 'ENG30-02377703-n' (first 10 synsets) ...
		 Step 1: from synset ENG30-02377703-n, with relation [hypernym] to synset ENG30-02374451-n
		 Step 2: from synset ENG30-02377703-n, with relation [near_eng_derivat] to synset ENG30-01923414-v
		 Step 3: from synset ENG30-02377703-n, with relation [hyponym] to synset ENG30-02378415-n
		 Step 4: from synset ENG30-02377703-n, with relation [hyponym] to synset ENG30-02379908-n
		 Step 5: from synset ENG30-02377703-n, with relation [hyponym] to synset ENG30-02381004-n
		 Step 6: from synset ENG30-02374451-n, with relation [hypernym] to synset ENG30-02374149-n
		 Step 7: from synset ENG30-02374451-n, with relation [member_holonym] to synset ENG30-02373843-n
		 Step 8: from synset ENG30-02374451-n, with relation [part_meronym] to synset ENG30-01899894-n
		 Step 9: from synset ENG30-02374451-n, with relation [part_meronym] to synset ENG30-01902274-n
		 Step 10: from synset ENG30-02374451-n, 

Being represented as a graph where the nodes are the synset ids and the edges are the relations between them, you can calculate the shortest distance between two synsets with ``shortest_path``. You can additionally provide a filter to this function and the shortest distance will be calculated following the specified relations.

In [12]:
# shortest path unfiltered
synset1_id = wn.synsets("cal")[2]
synset2_id = wn.synsets("iepure")[0]    
distance = wn.shortest_path(synset1_id, synset2_id)
print("List of synsets containing the shortest path from synset with id '{}' to synset with id '{}': "
      .format(synset1_id, synset2_id))
print("{}".format(distance))

# shortest path filtered with 'hypernym' and 'hyponym' relations
relations = set(['hypernym', 'hyponym'])
filtered_distance = wn.shortest_path(synset1_id, synset2_id, relations)
print("\nList of synsets containing the shortest filtered path from synset with id '{}' to synset with id '{}': "
      .format(synset1_id, synset2_id))
print("{}".format(filtered_distance))


List of synsets containing the shortest path from synset with id 'ENG30-02377703-n' to synset with id 'ENG30-02324045-n': 
['ENG30-02377703-n', 'ENG30-02374451-n', 'ENG30-02462602-n', 'ENG30-02430045-n', 'ENG30-02158739-n', 'ENG30-02324045-n']

List of synsets containing the shortest filtered path from synset with id 'ENG30-02377703-n' to synset with id 'ENG30-02324045-n': 
['ENG30-02377703-n', 'ENG30-02374451-n', 'ENG30-02374149-n', 'ENG30-02373336-n', 'ENG30-02370806-n', 'ENG30-01886756-n', 'ENG30-02323449-n', 'ENG30-02323902-n', 'ENG30-02324045-n']


There's a special set of relations(hyponym and hypernym) that create what's called the hypernym tree. This tree ilustrates the relations of type "is a". For instance, 'flower' is a 'plant'. We have provided several functions that interact with this tree like printing the lowest common ancestor of two synsets or print the path to root starting from a synset.

In [13]:
# get the lowest common ancestor in the hypernym tree
synset1_id = wn.synsets("cal")[2]
synset2_id = wn.synsets("iepure")[0]
synset_id = wn.lowest_hypernym_common_ancestor(synset1_id, synset2_id)
print("The lowest common ancestor in the hypernym tree of synset {} and {} is {}".format(synset1_id, synset2_id, synset_id))

# get the path from a given synset to its root in hypermyn tree
synset_id = wn.synsets()[0]
print("\nList of synset ids from synset with id '{}' up to its root in the hypermyn tree: ".format(synset_id))
print("{}".format(wn.synset_to_hypernym_root(synset_id)))

The lowest common ancestor in the hypernym tree of synset ENG30-02377703-n and ENG30-02324045-n is ENG30-01886756-n

List of synset ids from synset with id 'ENG30-00006269-n' up to its root in the hypermyn tree: 
['ENG30-00006269-n', 'ENG30-00004258-n', 'ENG30-00003553-n', 'ENG30-00002684-n']
