# Parts of Speech Tagging

Source: https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72

For any language, syntax and structure usually go hand in hand, where a set of specific rules, conventions, and principles govern the way words are combined into phrases; phrases get combines into clauses; and clauses get combined into sentences. 

Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.

__Parts of speech (POS)__ are specific lexical categories to which words are assigned, based on their syntactic context and role. Usually, words can fall into one of the following major categories.

+ __N(oun)__: This usually denotes words that depict some object or entity, which may be living or nonliving. Some examples would be fox , dog , book , and so on. The POS tag symbol for nouns is N.

+ __V(erb)__: Verbs are words that are used to describe certain actions, states, or occurrences. There are a wide variety of further subcategories, such as auxiliary, reflexive, and transitive verbs (and many more). Some typical examples of verbs would be running , jumping , read , and write . The POS tag symbol for verbs is V.

+ __Adj(ective)__: Adjectives are words used to describe or qualify other words, typically nouns and noun phrases. The phrase beautiful flower has the noun (N) flower which is described or qualified using the adjective (ADJ) beautiful . The POS tag symbol for adjectives is ADJ .

+ __Adv(erb)__: Adverbs usually act as modifiers for other words including nouns, adjectives, verbs, or other adverbs. The phrase very beautiful flower has the adverb (ADV) very , which modifies the adjective (ADJ) beautiful , indicating the degree to which the flower is beautiful. The POS tag symbol for adverbs is ADV.

Besides these four major categories of parts of speech , there are other categories that occur frequently in the English language. These include pronouns, prepositions, interjections, conjunctions, determiners, and many others. Furthermore, each POS tag like the noun (N) can be further subdivided into categories like __singular nouns (NN)__, __singular proper nouns (NNP)__, and __plural nouns (NNS)__.

The process of classifying and labeling POS tags for words called parts of speech tagging or POS tagging . 

In [5]:
sentence = 'This NLP Workshop is being organized by Analytics India Magazine as part of the Plugin Conference 2020'
sentence

'This NLP Workshop is being organized by Analytics India Magazine as part of the Plugin Conference 2020'

In [3]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to C:\Users\admin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\admin/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [6]:
nltk_pos_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))
nltk_pos_tagged

[('This', 'DT'),
 ('NLP', 'NNP'),
 ('Workshop', 'NNP'),
 ('is', 'VBZ'),
 ('being', 'VBG'),
 ('organized', 'VBN'),
 ('by', 'IN'),
 ('Analytics', 'NNP'),
 ('India', 'NNP'),
 ('Magazine', 'NNP'),
 ('as', 'IN'),
 ('part', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('Plugin', 'NNP'),
 ('Conference', 'NNP'),
 ('2020', 'CD')]

In [7]:
import pandas as pd

pd.DataFrame(nltk_pos_tagged, 
             columns=['Word', 'POS tag']).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
Word,This,NLP,Workshop,is,being,organized,by,Analytics,India,Magazine,as,part,of,the,Plugin,Conference,2020
POS tag,DT,NNP,NNP,VBZ,VBG,VBN,IN,NNP,NNP,NNP,IN,NN,IN,DT,NNP,NNP,CD


In [0]:
import spacy

nlp = spacy.load('en')

In [9]:
sentence_nlp = nlp(sentence)
spacy_pos_tagged = [(word, word.tag_, word.pos_) for word in sentence_nlp]
spacy_pos_tagged

[(This, 'DT', 'DET'),
 (NLP, 'NNP', 'PROPN'),
 (Workshop, 'NNP', 'PROPN'),
 (is, 'VBZ', 'AUX'),
 (being, 'VBG', 'AUX'),
 (organized, 'VBN', 'VERB'),
 (by, 'IN', 'ADP'),
 (Analytics, 'NNP', 'PROPN'),
 (India, 'NNP', 'PROPN'),
 (Magazine, 'NNP', 'PROPN'),
 (as, 'IN', 'SCONJ'),
 (part, 'NN', 'NOUN'),
 (of, 'IN', 'ADP'),
 (the, 'DT', 'DET'),
 (Plugin, 'NNP', 'PROPN'),
 (Conference, 'NNP', 'PROPN'),
 (2020, 'CD', 'NUM')]

In [10]:
pd.DataFrame(spacy_pos_tagged, 
             columns=['Word', 'POS tag', 'Tag type']).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
Word,This,NLP,Workshop,is,being,organized,by,Analytics,India,Magazine,as,part,of,the,Plugin,Conference,2020
POS tag,DT,NNP,NNP,VBZ,VBG,VBN,IN,NNP,NNP,NNP,IN,NN,IN,DT,NNP,NNP,CD
Tag type,DET,PROPN,PROPN,AUX,AUX,VERB,ADP,PROPN,PROPN,PROPN,SCONJ,NOUN,ADP,DET,PROPN,PROPN,NUM


## Guide to POS Tags

The most common part of speech (POS) tag schemes are those developed for the Penn Treebank.

| POS Tag | Description | Example |
|---------|---------------------------------------|-----------------------------------------|
| CC | coordinating conjunction | and |
| CD | cardinal number | 1, third |
| DT | determiner | the |
| EX | existential there | there is |
| FW | foreign word | d’hoevre |
| IN | preposition/subordinating conjunction | in, of, like |
| JJ | adjective | big |
| JJR | adjective, comparative | bigger |
| JJS | adjective, superlative | biggest |
| LS | list marker | 1) |
| MD | modal | could, will |
| NN | noun, singular or mass | door |
| NNS | noun plural | doors |
| NNP | proper noun, singular | John |
| NNPS | proper noun, plural | Vikings |
| PDT | predeterminer | both the boys |
| POS | possessive ending | friend‘s |
| PRP | personal pronoun | I, he, it |
| PRP\$ | possessive pronoun | my, his |
| RB | adverb | however, usually, naturally, here, good |
| RBR | adverb, comparative | better |
| RBS | adverb, superlative | best |
| RP | particle | give up |
| TO | to | to go, to him |
| UH | interjection | uhhuhhuhh |
| VB | verb, base form | take |
| VBD | verb, past tense | took |
| VBG | verb, gerund/present participle | taking |
| VBN | verb, past participle | taken |
| VBP | verb, sing. present, non-3d | take |
| VBZ | verb, 3rd person sing. present | takes |
| WDT | wh-determiner | which |
| WP | wh-pronoun | who, what |
| WP\$ | possessive wh-pronoun | whose |
| WRB | wh-abverb | where, when |

Source: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html