# Parts of Speech Tags
Example
- CC coordinating conjuntion
- CD cardinal digit
- DT determiner
- CC coordinating conjunction
- CD cardinal digit
- DT determiner
- EX existential there (like: “there is” … think of it like “there exists”)
- FW foreign word
- IN preposition/subordinating conjunction
- JJ adjective ‘big’
- JJR adjective, comparative ‘bigger’
- JJS adjective, superlative ‘biggest’
- LS list marker 1)
- MD modal could, will
- NN noun, singular ‘desk’
- NNS noun plural ‘desks’
- NNP proper noun, singular ‘Harrison’
- NNPS proper noun, plural ‘Americans’
- PDT predeterminer ‘all the kids’
- POS possessive ending parent’s
- PRP personal pronoun I, he, she
- PRP$$ possessive pronoun my, his, hers
- RB adverb very, silently,
- RBR adverb, comparative better
- RBS adverb, superlative best
- RP particle give up
- TO, to go ‘to’ the store.
- UH interjection, errrrrrrrm
- VB verb, base form take
- VBD verb, past tense took
- VBG verb, gerund/present participle taking
- VBN verb, past participle taken
- VBP verb, sing. present, non-3d take
- VBZ verb, 3rd person sing. present takes
- WDT wh-determiner which
- WP wh-pronoun who, what
- WP$ possessive wh-pronoun whose
- WRB wh-abverb where, when

In [1]:
paragraph = """
In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, 
which is built from scratch using singing training data mined from music websites. 
The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, 
lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, 
we design a lyrics-to-singing alignment model to automatically extract the duration of each phoneme in 
lyrics starting from coarse-grained sentence level to fine-grained phoneme level, and further design a multi-lingual 
multi-singer singing model based on a feed-forward Transformer to directly generate linear-spectrograms from lyrics, 
and synthesize voices using Griffin-Lim. DeepSinger has several advantages over previous SVS systems: 
1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 
2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost,
3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis 
and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 
4) it can synthesize singing voices in multiple languages and multiple singers. 
We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages 
(Chinese, Cantonese and English). The results demonstrate that with the singing data purely mined from the Web, 
DeepSinger can synthesize high-quality singing voices in terms of both pitch accuracy and voice naturalness
"""

In [2]:
import nltk 
from nltk.corpus import stopwords
sentences = nltk.sent_tokenize(paragraph)

In [3]:
# Find the POS tags
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i]) #converting list of sentences into words
    new_word = [i for i in words if i not in set(stopwords.words('english'))] # Iterate the words and if the word not in stop words.
    # sentences[i] = ' '.join(new_word) # After removing stop wrods and stemming it, join the words into sentences and place it in the same index in sentence list.
    pos_tag = nltk.pos_tag(words)
    print(pos_tag)

[('In', 'IN'), ('this', 'DT'), ('paper', 'NN'), (',', ','), ('we', 'PRP'), ('develop', 'VBP'), ('DeepSinger', 'NNP'), (',', ','), ('a', 'DT'), ('multi-lingual', 'JJ'), ('multi-singer', 'NN'), ('singing', 'NN'), ('voice', 'NN'), ('synthesis', 'NN'), ('(', '('), ('SVS', 'NNP'), (')', ')'), ('system', 'NN'), (',', ','), ('which', 'WDT'), ('is', 'VBZ'), ('built', 'VBN'), ('from', 'IN'), ('scratch', 'NN'), ('using', 'VBG'), ('singing', 'VBG'), ('training', 'VBG'), ('data', 'NNS'), ('mined', 'VBN'), ('from', 'IN'), ('music', 'NN'), ('websites', 'NNS'), ('.', '.')]
[('The', 'DT'), ('pipeline', 'NN'), ('of', 'IN'), ('DeepSinger', 'NNP'), ('consists', 'VBZ'), ('of', 'IN'), ('several', 'JJ'), ('steps', 'NNS'), (',', ','), ('including', 'VBG'), ('data', 'NNS'), ('crawling', 'NN'), (',', ','), ('singing', 'VBG'), ('and', 'CC'), ('accompaniment', 'JJ'), ('separation', 'NN'), (',', ','), ('lyrics-to-singing', 'JJ'), ('alignment', 'NN'), (',', ','), ('data', 'NNS'), ('filtration', 'NN'), (',', ','), 

# Named Entity Recognition (NER)

**Example**  

Person: Gustave Eiffel  
Place: Eiffel Tower  
Date: 1887  


In [3]:
paragraph2 = """
The Eiffel Tower was built from 1887 to 1889 by French engineer Gustave Eiffle, whose company specialized 
in building metal framework and structures.
"""

In [15]:
import nltk
import numpy
paragraph2_tokens = nltk.word_tokenize(paragraph2)
tagged_el = nltk.pos_tag(paragraph2_tokens)

# Named Entity Chunk
# nltk.ne_chunk(tagged_el).draw() # to get the graph
ne = nltk.ne_chunk(tagged_el)
print(ne)