**Heteronyms** are the words that have same spelling but mean different things when pronounced differently. 


- Recall the word *lead* from the lectures. It can refer to the metal lead or the act of leadership. The two pronounciations have different meanings.

- For machine translation systems or text to speech systems, the ability to identify the correct sense of the word is crucial.




Let us have a look at this example:

https://translate.google.com/?sl=en&tl=hi&text=She%20wished%20she%20could%20desert%20him%20in%20the%20desert.%0A&op=translate

Example taken from: http://www-personal.umich.edu/~cellis/heteronym.html


In [42]:
# Import SpaCy library
import spacy 

In [51]:
!python -m spacy download en_core_web_sm 
!python -m spacy download en_core_web_lg

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------- 0.0/12.8 MB 435.7 kB/s eta 0:00:30
     ---------------------------------------- 0.1/12.8 MB 1.1 MB/s eta 0:00:12
     - -------------------------------------- 0.3/12.8 MB 2.6 MB/s eta 0:00:05
     --- ------------------------------------ 1.0/12.8 MB 5.1 MB/s eta 0:00:03
     ------ --------------------------------- 2.0/12.8 MB 8.6 MB/s eta 0:00:02
     ----------- ---------------------------- 3.5/12.8 MB 12.5 MB/s eta 0:00:01
     --------------- ------------------------ 5.1/12.8 MB 16.2 MB/s eta 0:00:01
     -------------------- ------------------- 6.6/12.8 MB 17.5 MB/s eta 0:00:01
     ------------------------- -------------- 8.0/12.8 MB 19.0 MB/s eta 0:00:01
     ----------------------------- -

In [53]:
import en_core_web_sm

nlp = en_core_web_sm.load()

In [52]:
!python -m spacy download en_core_web_lg
!python -m spacy download en_core_web_sm

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
     ---------------------------------------- 0.0/587.7 MB ? eta -:--:--
     -------------------------------------- 0.0/587.7 MB 660.6 kB/s eta 0:14:50
     ---------------------------------------- 0.1/587.7 MB 1.4 MB/s eta 0:06:47
     ---------------------------------------- 0.6/587.7 MB 4.3 MB/s eta 0:02:16
     ---------------------------------------- 1.8/587.7 MB 9.5 MB/s eta 0:01:02
     --------------------------------------- 3.2/587.7 MB 13.6 MB/s eta 0:00:43
     --------------------------------------- 4.7/587.7 MB 16.8 MB/s eta 0:00:35
     --------------------------------------- 6.2/587.7 MB 19.0 MB/s eta 0:00:31
      -------------------------------------- 7.8/587.7 MB 20.6 MB/s eta 0:00:29
      -------------------------------------- 9.3/587.7 MB 21.2 MB/s eta 0:00:28
      ------------------------

In [47]:
# Load pre-trained SpaCy model for performing basic 
# NLP tasks such as POS tagging, parsing, etc.
model = spacy.load("en_core_web_sm")

In [48]:
#Use the model to process the input sentence
tokens = model("She wished she could desert him in the desert.")

In [49]:
# Print the tokens and their respective PoS tags.
for token in tokens:
    print(token.text, "--", token.pos_, "--", token.tag_)

She -- PRON -- PRP
wished -- VERB -- VBD
she -- PRON -- PRP
could -- AUX -- MD
desert -- VERB -- VB
him -- PRON -- PRP
in -- ADP -- IN
the -- DET -- DT
desert -- NOUN -- NN
. -- PUNCT -- .


Note here that in the above example, the two instances of *desert* have different PoS tags and hence, the text to speech system can use this information to generate the correct pronounciation. 

The above task is a specific example of the larger NLP problem called Word Sense Disambiguation (WSD). For words that have more than one meaning, WSD is the problem of identifying the correct meaning of the word based on the context in which the word is used.



Note that this technique will not work when the different meanings have the same PoS tags.

https://translate.google.com/?sl=en&tl=hi&text=The%20bass%20swam%20around%20the%20bass%20drum%20on%20the%20ocean%20floor.&op=translate

In [56]:
# Let's take a new example.
tokens = model("The bass swam around the bass drum on the ocean floor")
for token in tokens:
    print(token.text, "--", token.pos_, "--", token.tag_)

The -- DET -- DT
bass -- NOUN -- NN
swam -- NOUN -- NN
around -- ADP -- IN
the -- DET -- DT
bass -- NOUN -- NN
drum -- NOUN -- NN
on -- ADP -- IN
the -- DET -- DT
ocean -- NOUN -- NN
floor -- NOUN -- NN


### PoS tagging - Question 2/1
You have been given the following sentence.

“UpGrad is teaching Data Science courses to the working professionals.”

What will be the PoS tags of ‘teaching’ and ‘to’, respectively?

In [55]:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("upGrad is teaching Data Science courses to the working professionals.")
for token in doc:
    print(token.text, token.pos_, token.tag_)

upGrad NOUN NN
is AUX VBZ
teaching VERB VBG
Data PROPN NNP
Science PROPN NNP
courses NOUN NNS
to ADP IN
the DET DT
working VERB VBG
professionals NOUN NNS
. PUNCT .


### PoS tagging - Question 2/2

You have been given the following sentence.

“UpGrad is teaching Data Science courses to the working professionals.”

Which of the following is the correct PoS tag for the word ‘working’?

### PoS tagging - Question 4/4
SpaCy is a Python library that can be used to perform many tasks such as identifying PoS tags of words in a corpus of documents. You need to code in Google Colab and find out the PoS tag of each token in the following  sentence:


‘Apple is looking at buying UK-based start-up for $1 billion’.


What is the PoS tag of ‘billion’ in the following sentence?


Hint: You need to use ‘token.tag_‘ to get the correct answer.
 


NN


JJ


CD


DT



In [57]:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying UK-based start-up for $1 billion.")
for token in doc:
    print(token.text, token.pos_, token.tag_)

Apple PROPN NNP
is AUX VBZ
looking VERB VBG
at ADP IN
buying VERB VBG
UK PROPN NNP
- PUNCT HYPH
based VERB VBN
start NOUN NN
- PUNCT HYPH
up NOUN NN
for ADP IN
$ SYM $
1 NUM CD
billion NUM CD
. PUNCT .
