Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

# Lab 4.1: Senses and Relations

In this lab, we will query lexical information from [Babelnet](https://babelnet.org/). It is a multilingual collection of multiple resources, for example WordNet and Wiktionary. We will also learn how to plot networks.

You first need to [register](https://babelnet.org/register) to obtain an API key. Please use your student e-mail address. It is easier, if you specify English as your native language (this will be the interface language). 

Before you start, explore the browser search interface to understand what type of information Babelnet can provide. 

A word can have multiple senses and a sense can be referred to by multiple words. Babelnet organizes concepts as synsets which are groups of synonyms referring to the same sense. As a first step, you need to obtain the synset ids for your search term. 

## 1. Synsets

In [None]:
import requests

# Query
word = "dinner"
language = "EN"

# Babelnet parameters, please add your own key here!
babelnet_key = "ADD YOUR KEY HERE"
wordurl = "https://babelnet.io/v5/getSynsetIds?"
params = dict(lemma=word, searchLang=language,key=babelnet_key)

# Get all synsets for the word
resp = requests.get(url=wordurl, params=params)
word_data = resp.json()

print(word_data)

Now, you can query the information for each synset id. Note that the definitions of the synsets are a mix of different sources and target languages. **How can you adjust the code to filter by source or language?** 


In [None]:
# Get the information for each synset of the word
synseturl= "https://babelnet.io/v5/getSynset?"

# We can specify multiple target languages
languages =["EN", "ES", "NL"]
synsets ={}

for synset in word_data:
    id = synset["id"]
    pos = synset["pos"]
    synset_params = dict(id=id,key=babelnet_key, targetLang=languages)

    resp = requests.get(url=synseturl, params=synset_params)
    synsetdata = resp.json()
    
    # Output the definitions for each synset
    print("Synset: ", str(id), str(pos) )
    for definition in synsetdata["glosses"]: 
        print("\t",definition["source"], definition["language"], definition["gloss"])
        print()
    print("-----------")
    
    synsets[id] = synsetdata
    




## 2. Word sense disambiguation

Identifying the most suitable synset for a word in a specific context is called *word sense disambiguation*. **Which of the retrieved synsets are most relevant for your dataset? How do you know?** 

A very simple algorithm for identifying the synset of a term calculates the overlap between the words occurring in the context of the term and the words occurring in the definition of the synset. The idea is called [Simplified Lesk Algorithm](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.2744&rep=rep1&type=pdf). It can be improved by ignoring function words, considering the POS of the word, lemmatizing the tokens and by including the words in the example sentences. **What do you think about this approach? Is it useful? How could you improve it?**


In [None]:
import string
# Simple example, already tokenized and no punctuation
language ="EN"
context = "we will have pasta for dinner tomorrow evening"
context_tokens = context.split(" ")

max_overlap = 0
best_synset = ""
best_definition = ""
for synset_id, properties in synsets.items(): 
    
        for definition in properties["glosses"]:
            if definition["language"]==language:
                # Remove punctuation
                gloss = definition["gloss"]
                for c in string.punctuation:
                    gloss=gloss.replace(c,"")
                gloss_tokens = gloss.split(" ")

                # Calculate overlap
                overlap = set(gloss_tokens).intersection(context_tokens)
                print(gloss_tokens)
                print(overlap)
                print()

                # Update best synset
                if (len(overlap)>max_overlap):
                    max_overlap = len(overlap)
                    best_synset = synset_id
                    best_definition = gloss
    
print(best_synset, best_definition)
    




## 3. Synset properties

Babelnet provides a lot of additional information for each synset. You might want to check the browser interface again. 

In [None]:
example_id = "bn:00027206n" 

print(synsets[example_id].keys())


In [None]:
for sense in synsets[example_id]["senses"]:
    print(sense["properties"]["source"], sense["properties"]["language"],sense["properties"]["simpleLemma"])


In [None]:
for translation in synsets[example_id]["translations"]: 
    source = translation[0]["properties"]
    print(source["language"], source["simpleLemma"])
    
    for target in translation[1]:
        print("\t",target["properties"]["language"], target["properties"]["simpleLemma"], target["properties"]["pronunciations"]["transcriptions"])
        
    print("-----------")

## 4. Synset relations

We can also identify relations between synsets. **Brainstorm: Could you recursively identify relations between the important terms in your dataset?**

In [None]:
relations_url= 'https://babelnet.io/v5/getOutgoingEdges?'
relations_params = dict(id=example_id,key=babelnet_key)
resp = requests.get(url=relations_url, params=relations_params)
    
relations_data = resp.json()
for relation in relations_data: 
    print(relation["pointer"]["name"], relation["target"])
    print()


## 5. Plotting networks

The relations between concepts can be interpreted as a network graph. In python, such graphs can be created using the *networkx* module. **What kind of information can you derive from such a network about the terms in your dataset? Would it be possible to create a deeper network and draw relation edges from the target nodes?**

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import networkx as nx

# Map id to lemma
def get_lemma(id): 
    synset_params = dict(id=id,key=babelnet_key, targetLang=languages)
    resp = requests.get(url=synseturl, params=synset_params)
    synsetdata = resp.json()
    # We simply take the first sense
    try: 
        lemma = synsetdata["senses"][0]["properties"]["simpleLemma"]
    except IndexError: 
        # Sometimes concept information is missing
        lemma = ""
    return lemma

# Create a graph structure
relations_graph=nx.Graph()
relations_graph.add_node(word)

# Add edges
for relation in relations_data[0:10]: 
    target = get_lemma(relation["target"])
    if (len(target))>0:
        relations_graph.add_edge(word, target,title=relation["pointer"]["name"])

print(relations_graph.nodes)
print(relations_graph.edges)


Now, we are going to plot the network using *matplotlib.pyplot*. This is a very useful library for all kinds of plots. **Take a look at some [plot galleries](https://python-graph-gallery.com/all-charts/) to get a feeling for the range of plots you can create.** 

In [None]:
import matplotlib.pyplot as plt
import numpy as np


# Create a figure
fig, ax = plt.subplots(1, 1, figsize = (15, 10))

# Need to create a layout when doingseparate calls to draw nodes and edges
pos = nx.spring_layout(relations_graph)


# Draw the nodes
nx.draw_networkx_nodes(relations_graph, pos, node_size = 3000, ax=ax)
nx.draw_networkx_labels(relations_graph, pos, ax=ax, font_color="white", font_size=10)

# Draw the edges
edge_labels = nx.get_edge_attributes(relations_graph, 'title')

nx.draw_networkx_edges(relations_graph, pos, arrows=True, ax=ax)
nx.draw_networkx_edge_labels(relations_graph,pos, edge_labels=edge_labels)


fig.show()


## 6. Adding color

We want to add color to the plot. First, we distinguish between the root node and the targets. 

In [None]:
# Create a figure
fig, ax = plt.subplots(1, 1, figsize = (15, 10))

# Specify the node colors
node_colors = ["orange" for node in relations_graph.nodes]
# The first node should be grey
node_colors[0] = "grey"
print(node_colors)
# Draw the nodes
nx.draw_networkx_nodes(relations_graph, pos, node_color=node_colors, node_size = 3000, ax=ax)
nx.draw_networkx_labels(relations_graph, pos, ax=ax)

fig.show()


## 7. Using color palettes

Instead of choosing the colors yourself, you can use existing color palettes. The module *seaborn* provides very nice [color palettes](https://seaborn.pydata.org/tutorial/color_palettes.html). Colors are expressed as three numbers indicating the values for red, green, and blue (RGB).

In [None]:
import seaborn as sns
color_palette = sns.color_palette("Dark2")
sns.palplot(color_palette)
print(color_palette)


We want to use different colors for the edges depending on the edge label. **Is it possible to also specify the same color for the node?** 

In [None]:

# Map edge labels to colors
unique_labels = set(list(edge_labels.values()))
labels2color= {label:color_palette[i] for i, label in enumerate(unique_labels)} 
edge_colors=[labels2color[label] for label in edge_labels.values()]

# Create a figure
fig, ax = plt.subplots(1, 1, figsize = (15, 10))

# Draw the nodes and edges with colors
nx.draw_networkx_nodes(relations_graph, pos, node_color=node_colors, node_size = 3000, ax=ax)
nx.draw_networkx_labels(relations_graph, pos, ax=ax)

# Note that I also increased the width of the edges. 
nx.draw_networkx_edges(relations_graph, pos, arrows=True, edge_color=edge_colors, width=4, ax=ax)
nx.draw_networkx_edge_labels(relations_graph,pos, edge_labels=edge_labels)


fig.show()


If you are happy with your graph, save it to a file. A good plot can make it much easier to understand your data. Please also keep in mind to make your plots inclusive. You can check how your plot looks for people who are colorblind using this [https://www.color-blindness.com/coblis-color-blindness-simulator/](simulator).

In [None]:
fig.savefig("example_plot.png")