# Introduction To Natural Language Processing with NLTK

## Tokenization

In [1]:
from nltk import sent_tokenize, word_tokenize

### 01. Sentence Tokenization

In [2]:
text = """Simply, Natural Language Processing (NLP) helps computers (machines) to "read and understand" text or speech, by simulating the human's ability to understand languages. NLP is a sub-field of Artificial Intelligence, which also comprises of computation linguistics, computer science and statistics. Even though NLP is not comparatively as popular as Machine Learning, Deep Learning etc. it is as important and useful as them."""
print(text)

Simply, Natural Language Processing (NLP) helps computers (machines) to "read and understand" text or speech, by simulating the human's ability to understand languages. NLP is a sub-field of Artificial Intelligence, which also comprises of computation linguistics, computer science and statistics. Even though NLP is not comparatively as popular as Machine Learning, Deep Learning etc. it is as important and useful as them.


In [3]:
sentence_tokens = sent_tokenize(text)
print(sentence_tokens)

['Simply, Natural Language Processing (NLP) helps computers (machines) to "read and understand" text or speech, by simulating the human\'s ability to understand languages.', 'NLP is a sub-field of Artificial Intelligence, which also comprises of computation linguistics, computer science and statistics.', 'Even though NLP is not comparatively as popular as Machine Learning, Deep Learning etc.', 'it is as important and useful as them.']


In [4]:
for sentence in sentence_tokens:
    print(sentence)

Simply, Natural Language Processing (NLP) helps computers (machines) to "read and understand" text or speech, by simulating the human's ability to understand languages.
NLP is a sub-field of Artificial Intelligence, which also comprises of computation linguistics, computer science and statistics.
Even though NLP is not comparatively as popular as Machine Learning, Deep Learning etc.
it is as important and useful as them.


### 02. Word Tokenization

In [5]:
sentence = """NLP is a sub-field of Artificial Intelligence, which also comprises of computation linguistics, computer science and statistics."""
print(sentence)

NLP is a sub-field of Artificial Intelligence, which also comprises of computation linguistics, computer science and statistics.


In [6]:
word_tokens = word_tokenize(sentence)
print(word_tokens)

['NLP', 'is', 'a', 'sub-field', 'of', 'Artificial', 'Intelligence', ',', 'which', 'also', 'comprises', 'of', 'computation', 'linguistics', ',', 'computer', 'science', 'and', 'statistics', '.']


## Stemming and Lemmatization

### 01. Stemming

In [7]:
from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer

In [8]:
print(PorterStemmer().stem('thought'))

thought


### 02. Lemmatization

In [9]:
from nltk.stem import WordNetLemmatizer

In [10]:
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize('thought', pos="v"))

think


### 03. Stemming vs Lemmatization 

In [11]:
stemmer = PorterStemmer() 
lemmatizer = WordNetLemmatizer() 
print(stemmer.stem('stones')) 
print(stemmer.stem('speaking')) 
print(stemmer.stem('bedroom')) 
print(stemmer.stem('jokes')) 
print(stemmer.stem('lisa')) 
print(stemmer.stem('purple')) 
print('----------------------') 
print(lemmatizer.lemmatize('stones')) 
print(lemmatizer.lemmatize('speaking'))
print(lemmatizer.lemmatize('bedroom'))
print(lemmatizer.lemmatize('jokes'))
print(lemmatizer.lemmatize('lisa'))
print(lemmatizer.lemmatize('purple'))

stone
speak
bedroom
joke
lisa
purpl
----------------------
stone
speaking
bedroom
joke
lisa
purple
