# A simple NLP application for ambiguity resolution with expert.ai

How to resolve ambiguity for homographs and polysemy using expert.ai NLP technology

Ambiguity is one of the biggest challenges in NLP. When trying to understand the meaning of a word we consider several different aspects, such as the context in which it is used, our own knowledge of the world, and how a given word is generally used in society. Words change meaning over time and can also mean one thing in a certain domain and another in a different one. This phenomenon can be observed in homographs - two words that happen to be written in the same way, usually coming from different etymologies - and polysemy - one word that carries different meanings.<br/>
In this tutorial, we'll see how to resolve ambiguity in PoS tagging and semantic tagging, using expert.ai technology.

## Before you start
Please check how to install expert.ai NL API python SDK, either on this <a href='https://towardsdatascience.com/visualizing-what-docs-are-really-about-with-expert-ai-cd537e7a2798?showDomainSetup=true&gi=300077e01aa3'>Towards Data Science article</a> or on the official documentation, <a href='https://github.com/therealexpertai/nlapi-python#expertai-natural-language-api-for-python'>here</a>.

## Part of Speech tagging
Language is ambiguous: not only a sentence could be written in different ways and still convey the same v, but even lemmas - a concept that is supposed to be far less ambiguous - can carry different meanings. <br/>
</br>
For example the word <i>play</i> could refer to to several different things. Let's take a look at the following examples:</br>
<i>I really enjoyed the play.</i></br>
<i>I'm in a band and I play the guitar.</i></br>
</br>
Not only the same word can have different meanings, but it can be used in different roles: in the first sentence <i>play</i> is a noun, while in the second it's a verb. Assigning the correct grammatical label to each token is called PoS (Part of Speech) tagging and it's not a piece of cake.

Let's see how to resolve PoS ambiguity with expert.ai - first let's import the library and create the client:

In [12]:
from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()

We'll see the PoS tagging for two sentences - notice how the lemma <i>key</i> is the same in both sentences, while its PoS changes:

In [1]:
# Two sentences in which the same word, "key", has a different grammatical label
key_as_noun = "The key broke in the lock."
key_as_adjective = "The key problem was not one of quality but of quantity."

To analyze each sentence we need to create a request to NL API: the most important parameters - shown in the code below as well - are the text to analyze, the language, and the analysis we are requesting, represented by the resource parameter. <br/>
Please notice that expert.ai NL API currently supports five languages (en, it, es, fr, de). The resource we use is the <i>disambiguation</i>, which performs multi-level tagging as product of the expert.ai NLP pipeline.<br/>
Without further ado, let's create our first request:

In [4]:
# Requesting  for the disambiguation of the first sentence, key_as_noun
# Notice: the parameter for resource specifies the kind of exploration we want to perform on the documents.
document = client.specific_resource_analysis(
    body={"document": {"text": key_as_noun}}, 
    params={'language': 'en', 'resource': 'disambiguation'})

Now we need to iterate over the PoS of the text and check which one was assigned to the lemma <i>key</i>:

In [5]:
# Producing and printing PoS tagging of the first sentence
# Notice: to retrieve the textual form of the element we use document.content with slicing on element start and end chars
print(f'Parts of speech for "{key_as_noun}"\n')
for token in document.tokens:
    print(f'{document.content[token.start:token.end]:{15}}\tPOS: {token.pos}')

Part of speech for "The key broke in the lock."

The            	POS: DET
key            	POS: NOUN
broke in       	POS: VERB
the            	POS: DET
lock           	POS: NOUN
.              	POS: PUNCT


What is printed above, is a list of PoS following <a href='https://universaldependencies.org/u/pos/'>UD Labels</a>, where <i>NOUN</i> indicates that the lemma <i>key</i> is here used as a noun.
This should not be the case for its homograph that we see in the second sentence, in which <i>key</i> is used as an adjective:

In [6]:
# Requesting for the disambiguation of the second sentence, key_as_adj
document = client.specific_resource_analysis(
    body={"document": {"text": key_as_adjective}}, 
    params={'language': 'en', 'resource': 'disambiguation'})

# Producing and printing PoS tagging of the first sentence
# Notice: to retrieve the textual form of the element we use document.content with slicing on element start and end chars
print(f'Part of speech for "{key_as_adjective}"\n')
for token in document.tokens:
    print(f'{document.content[token.start:token.end]:{15}}\tPOS: {token.pos}')

Part of speech for "The key problem was not one of quality but of quantity."

The            	POS: DET
key            	POS: ADJ
problem        	POS: NOUN
was            	POS: AUX
not            	POS: PART
one            	POS: NUM
of             	POS: ADP
quality        	POS: NOUN
but            	POS: CCONJ
of             	POS: ADP
quantity       	POS: NOUN
.              	POS: PUNCT


As you can see printed above, the lemma <i>key</i> was correctly recognized as an adjective in this sentence.

### Semantic tagging
One word can also have the same grammatical label and have different meanings. This phenomenon is called polysemy. Being able to infer the correct meaning for each word is to perform semantic tagging.</br>
</br>
Words that are more common tend to have more meanings that have been added to them in time. For example, the lemma paper can have multiple meanings, as seen here:</br>
<i>I like to take notes on paper.</i></br>
<i>Every morning my husband reads the news from the local paper.</i></br>
</br>
Pointing out the correct meaning of each single lemma is an important task, as one document could change meaning or focus based on that. To do so, we must rely on technology that is well developed and robust, since semantic tagging heavily depends on many pieces of information that come from text.</br>
</br>
For semantic tagging IDs are often used: these IDs are identifiers of concepts, and each concept will have its own ID. For the same lemma, e.g. <i>paper</i>, we will have a certain id <i>x</i> for its meaning as a material, and another <i>y</i> for the meaning as newspaper.</br>
These IDs are usually stored in a Knowledge Graph, that is a graph in which each node is a concept and the arches are the connections between concepts that follow a certain logics (e.g. an arch could link two concepts if one is the hyponym of the other).<br/>
Let's now look at how expert.ai performs semantic tagging. We begin by choosing the sentences from which we will compare the two lemmas <i>solution</i>:

In [7]:
solution_as_tactic = "Work out the solution in your head."
solution_as_chemical_mixture = "Heat the chlorine solution to 75° Celsius."

And now the request for the first sentence - using the same parameters as the previous example:

In [8]:
# Requesting disambiguation of the first sentence, solution_as_tactic
# Notice: the parameter for resource specifies the kind of exploration we want to perform on our documents.
document = client.specific_resource_analysis(
    body={"document": {"text": solution_as_tactic}}, 
    params={'language': 'en', 'resource': 'disambiguation'})

Semantic information is found in the <i>syncon</i> attribute for each token: a syncon is a concept, that is stored in expert.ai's Knowledge Graph; is concept is formed by one or more lemmas, that are synonyms. <br/>
Let's see how the information is presented in the document object:

In [9]:
# Producing and printing semantic tagging of the first sentence
# Notice: to retrieve the textual form of the element we use document.content with slicing on element start and end chars
print(f'Semantic tagging for "{solution_as_tactic}"\n')
for token in document.tokens:
    print(f'{document.content[token.start:token.end]:{15}}\tCONCEPT_ID: {token.syncon}')

Semantic tagging for "Work out the solution in your head."

Work out       	CONCEPT_ID: 63784
the            	CONCEPT_ID: -1
solution       	CONCEPT_ID: 25789
in             	CONCEPT_ID: -1
your           	CONCEPT_ID: -1
head           	CONCEPT_ID: 104906
.              	CONCEPT_ID: -1


Each token has its own syncon, whereas some of them present -1 as concept id: this is the default ID assigned to tokens that do not have any concept, such as punctuation or articles.<br/>
So, if for the previous sentence we obtain concept id 25789 for the lemma <i>solution</i>, for the second sentence we should obtain another one, since the two lemmas have different meaning in the two sentences:

In [10]:
# Requesting disambiguation of the second sentence, solution_as_chemical_mixture
# Notice: the parameter for resource specifies the kind of exploration we want to perform on our documents.
document = client.specific_resource_analysis(
    body={"document": {"text": solution_as_chemical_mixture}}, 
    params={'language': 'en', 'resource': 'disambiguation'})

# Producing and printing semantic tagging of the second sentence
# Notice: to retrieve the textual form of the element we use document.content with slicing on element start and end chars
print(f'Semantic tagging for "{solution_as_chemical_mixture}"\n')
for token in document.tokens:
    print(f'{document.content[token.start:token.end]:{15}}\tCONCEPT_ID: {token.syncon}')

Semantic tagging for "Heat the chlorine solution to 75° Celsius."

Heat           	CONCEPT_ID: 64278
the            	CONCEPT_ID: -1
chlorine       	CONCEPT_ID: 59954
solution       	CONCEPT_ID: 59795
to             	CONCEPT_ID: -1
75             	CONCEPT_ID: -1
° Celsius      	CONCEPT_ID: 56389
.              	CONCEPT_ID: -1


As expected, the lemma <i>solution</i> corresponds to a different concept id, indicating that the lemma used has a different meaning from the previous sentence.

## Conclusion
NLP is hard because language is ambiguous: one word, one phrase or one sentence can mean different things depending on the context. With technologies such as expert.ai we can solve ambiguity and build solutions that are more accurate when dealing with the meaning of words.