### Tokenization


In [None]:
from nltk.tokenize import word_tokenize

text = "It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"

words=word_tokenize(text)
words


['It',
 'originated',
 'from',
 'the',
 'idea',
 'that',
 'there',
 'are',
 'readers',
 'who',
 'prefer',
 'learning',
 'new',
 'skills',
 'from',
 'the',
 'comforts',
 'of',
 'their',
 'drawing',
 'rooms']

### Stemming
##### A stemming algorithm is a computational procedure which reduces all words with the same root… to a common form, usually by stripping each word of its derivational and inflectional suffixes. Stemming is used to normalize words into its base form or root form. For example, celebrates, celebrated and celebrating, all these words are originated with a single root word "celebrate." The big problem with stemming is that sometimes it produces the root word which may not have any meaning.
##### For Example:  intelligence, intelligent, and intelligently, all these words are originated with a single root word "intelligen." In English, the word "intelligen" do not have any meaning.

#### Porter Stemmer

In [None]:
from nltk.stem import PorterStemmer
ps = PorterStemmer()
for word in words:
    print(f"Actual: {word}  Stemmed: {ps.stem(word)}")


Actual: It  Stemmed: it
Actual: originated  Stemmed: origin
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: idea  Stemmed: idea
Actual: that  Stemmed: that
Actual: there  Stemmed: there
Actual: are  Stemmed: are
Actual: readers  Stemmed: reader
Actual: who  Stemmed: who
Actual: prefer  Stemmed: prefer
Actual: learning  Stemmed: learn
Actual: new  Stemmed: new
Actual: skills  Stemmed: skill
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: comforts  Stemmed: comfort
Actual: of  Stemmed: of
Actual: their  Stemmed: their
Actual: drawing  Stemmed: draw
Actual: rooms  Stemmed: room


#### Snowball Stemmer

In [None]:
from nltk.stem.snowball import SnowballStemmer
ss = SnowballStemmer(language = 'english')
for word in words:
    print(f"Actual: {word}  Stemmed: {ss.stem(word)}")


Actual: It  Stemmed: it
Actual: originated  Stemmed: origin
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: idea  Stemmed: idea
Actual: that  Stemmed: that
Actual: there  Stemmed: there
Actual: are  Stemmed: are
Actual: readers  Stemmed: reader
Actual: who  Stemmed: who
Actual: prefer  Stemmed: prefer
Actual: learning  Stemmed: learn
Actual: new  Stemmed: new
Actual: skills  Stemmed: skill
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: comforts  Stemmed: comfort
Actual: of  Stemmed: of
Actual: their  Stemmed: their
Actual: drawing  Stemmed: draw
Actual: rooms  Stemmed: room


#### Lancaster Stemmer

In [None]:
from nltk.stem.lancaster import LancasterStemmer
ls = LancasterStemmer()
for word in words:
    print(f"Actual: {word}  Stemmed: {ls.stem(word)}")


Actual: It  Stemmed: it
Actual: originated  Stemmed: origin
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: idea  Stemmed: ide
Actual: that  Stemmed: that
Actual: there  Stemmed: ther
Actual: are  Stemmed: ar
Actual: readers  Stemmed: read
Actual: who  Stemmed: who
Actual: prefer  Stemmed: pref
Actual: learning  Stemmed: learn
Actual: new  Stemmed: new
Actual: skills  Stemmed: skil
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: comforts  Stemmed: comfort
Actual: of  Stemmed: of
Actual: their  Stemmed: their
Actual: drawing  Stemmed: draw
Actual: rooms  Stemmed: room


#### Regexp Stemmer

In [None]:
from nltk.stem import RegexpStemmer
rs = RegexpStemmer('ing')
for word in words:
    print(f"Actual: {word}  Stemmed: {rs.stem(word)}")

Actual: It  Stemmed: It
Actual: originated  Stemmed: originated
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: idea  Stemmed: idea
Actual: that  Stemmed: that
Actual: there  Stemmed: there
Actual: are  Stemmed: are
Actual: readers  Stemmed: readers
Actual: who  Stemmed: who
Actual: prefer  Stemmed: prefer
Actual: learning  Stemmed: learn
Actual: new  Stemmed: new
Actual: skills  Stemmed: skills
Actual: from  Stemmed: from
Actual: the  Stemmed: the
Actual: comforts  Stemmed: comforts
Actual: of  Stemmed: of
Actual: their  Stemmed: their
Actual: drawing  Stemmed: draw
Actual: rooms  Stemmed: rooms


### Lemmatization

> Indented block


##### Lemmatization is quite similar to the Stamming. It is used to group different inflected forms of the word, called Lemma. The main difference between Stemming and lemmatization is that it produces the root word, which has a meaning.
##### For Example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word intelligent, which has a meaning.

### Lemmatization

In [None]:
from nltk.stem import WordNetLemmatizer
wl = WordNetLemmatizer()

for word in words:
    print(f"Actual: {word}  Lametized: {wl.lemmatize(word)}")

Actual: It  Lametized: It
Actual: originated  Lametized: originated
Actual: from  Lametized: from
Actual: the  Lametized: the
Actual: idea  Lametized: idea
Actual: that  Lametized: that
Actual: there  Lametized: there
Actual: are  Lametized: are
Actual: readers  Lametized: reader
Actual: who  Lametized: who
Actual: prefer  Lametized: prefer
Actual: learning  Lametized: learning
Actual: new  Lametized: new
Actual: skills  Lametized: skill
Actual: from  Lametized: from
Actual: the  Lametized: the
Actual: comforts  Lametized: comfort
Actual: of  Lametized: of
Actual: their  Lametized: their
Actual: drawing  Lametized: drawing
Actual: rooms  Lametized: room


### Parts of Speech(POS) Tags
##### POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a word functions with its meaning as well as grammatically within the sentences. A word has one or more parts of speech based on the context in which it is used.
##### Example: "Google" something on the Internet.
##### In the above example, Google is used as a verb, although it is a proper noun.

### Parts of Speech(POS) Tags

In [None]:
from nltk import pos_tag

tags = pos_tag(words)
tags

[('It', 'PRP'),
 ('originated', 'VBD'),
 ('from', 'IN'),
 ('the', 'DT'),
 ('idea', 'NN'),
 ('that', 'IN'),
 ('there', 'EX'),
 ('are', 'VBP'),
 ('readers', 'NNS'),
 ('who', 'WP'),
 ('prefer', 'VBP'),
 ('learning', 'VBG'),
 ('new', 'JJ'),
 ('skills', 'NNS'),
 ('from', 'IN'),
 ('the', 'DT'),
 ('comforts', 'NNS'),
 ('of', 'IN'),
 ('their', 'PRP$'),
 ('drawing', 'NN'),
 ('rooms', 'NNS')]