# Natural Language Processing for Beginners: Using TextBlob

Source: https://www.analyticsvidhya.com/blog/2018/02/natural-language-processing-for-beginners-using-textblob/

SHUBHAM JAIN, FEBRUARY 11, 2018

## Introduction

Natural Language Processing (NLP) is an area of growing attention due to increasing number of applications like chatbots, machine translation etc. In some ways, the entire revolution of intelligent machines in based on the ability to understand and interact with humans.

## About TextBlob

TextBlob is a python library and offers a simple API to access its methods and perform basic NLP tasks. 

In [1]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


## NLP Tasks Using TextBlob

### Tokenization

In [3]:
from textblob import TextBlob

text = "Hi... My name is Agus Richard Lubis. I love Sekardayu Hana Pradiani. No, I am just kidding though! \n" + \
        "I found a love for me. Darling, just dive right in. Follow my lead."
blob = TextBlob(text)
print(blob)

Hi... My name is Agus Richard Lubis. I love Sekardayu Hana Pradiani. No, I am just kidding though! 
I found a love for me. Darling, just dive right in. Follow my lead.


In [9]:
for sentence in blob.sentences:
    print(sentence)

Hi... My name is Agus Richard Lubis.
I love Sekardayu Hana Pradiani.
No, I am just kidding though!
I found a love for me.
Darling, just dive right in.
Follow my lead.


In [11]:
for sentence in blob.sentences:
    print([word for word in sentence.split()])

['Hi...', 'My', 'name', 'is', 'Agus', 'Richard', 'Lubis.']
['I', 'love', 'Sekardayu', 'Hana', 'Pradiani.']
['No,', 'I', 'am', 'just', 'kidding', 'though!']
['I', 'found', 'a', 'love', 'for', 'me.']
['Darling,', 'just', 'dive', 'right', 'in.']
['Follow', 'my', 'lead.']


### Noun Phrase Extraction

In [12]:
for np in blob.noun_phrases:
    print(np)

hi
agus richard lubis
sekardayu hana pradiani
darling


### Part-of-Speech Tagging

In [14]:
for word, tag in blob.tags:
    print(word, tag)

Hi NN
My PRP$
name NN
is VBZ
Agus NNP
Richard NNP
Lubis NNP
I PRP
love VBP
Sekardayu NNP
Hana NNP
Pradiani NNP
No DT
I PRP
am VBP
just RB
kidding VBG
though IN
I PRP
found VBD
a DT
love NN
for IN
me PRP
Darling NNP
just RB
dive JJ
right NN
in IN
Follow VB
my PRP$
lead NN


### Words Inflection and Lemmatization

In [17]:
for sentence in blob.sentences:
    print(sentence.words[1])
    print(sentence.words[1].singularize())

My
My
love
love
I
I
found
found
just
just
my
my


In [19]:
from textblob import Word
w = Word('Mouse')
w.pluralize()   # It should be mice

'Mouses'

In [22]:
for word, pos in blob.tags:
    if pos == 'NN':
        print(word.pluralize())

His
names
love
rights
leads


In [25]:
w = Word('loving')
w.lemmatize('v')        # v represents verb

'love'

### N-grams

In [26]:
for ngram in blob.ngrams(2):
    print(ngram)

['Hi', 'My']
['My', 'name']
['name', 'is']
['is', 'Agus']
['Agus', 'Richard']
['Richard', 'Lubis']
['Lubis', 'I']
['I', 'love']
['love', 'Sekardayu']
['Sekardayu', 'Hana']
['Hana', 'Pradiani']
['Pradiani', 'No']
['No', 'I']
['I', 'am']
['am', 'just']
['just', 'kidding']
['kidding', 'though']
['though', 'I']
['I', 'found']
['found', 'a']
['a', 'love']
['love', 'for']
['for', 'me']
['me', 'Darling']
['Darling', 'just']
['just', 'dive']
['dive', 'right']
['right', 'in']
['in', 'Follow']
['Follow', 'my']
['my', 'lead']


In [27]:
blob.sentiment

Sentiment(polarity=0.4702380952380952, subjectivity=0.5785714285714286)

## Other cool things to do

### Spelling Correction

In [32]:
blob = TextBlob('Scince is awesome')
blob.correct()

TextBlob("Prince is awesome")

In [33]:
blob.words[0].spellcheck()

[('Prince', 0.8546819787985865),
 ('Since', 0.11484098939929328),
 ('Science', 0.026501766784452298),
 ('Pwince', 0.001325088339222615),
 ('Wince', 0.0008833922261484099),
 ('Pince', 0.0008833922261484099),
 ('Evince', 0.0008833922261484099)]

In [34]:
text = "Hi... My name is Agus Richard Lubis. I love Sekardayu Hana Pradiani. No, I am just kidding though! \n" + \
        "I found a love for me. Darling, just dive right in. Follow my lead."
blob = TextBlob(text)

In [35]:
import random

In [37]:
nouns = [word.lemmatize() for word, pos in blob.tags if pos == 'NN']
nouns

['Hi', 'name', 'love', 'right', 'lead']

### Translation and Language Detection

In [40]:
arabic_love = "أحبك جدا"
blob_arabic = TextBlob(arabic_love)

In [41]:
blob_arabic.detect_language()

'ar'

In [42]:
blob_arabic.translate(to='en')

TextBlob("I love you too")

### Text Classification using TextBlob

In [43]:
training = [
    ('Tom Holland is a terrible spiderman.','pos'),
    ('a terrible Javert (Russell Crowe) ruined Les Miserables for me...','pos'),
    ('The Dark Knight Rises is the greatest superhero movie ever!','neg'),
    ('Fantastic Four should have never been made.','pos'),
    ('Wes Anderson is my favorite director!','neg'),
    ('Captain America 2 is pretty awesome.','neg'),
    ('Let\s pretend "Batman and Robin" never happened..','pos'),
]
testing = [
    ('Superman was never an interesting character.','pos'),
    ('Fantastic Mr Fox is an awesome film!','neg'),
    ('Dragonball Evolution is simply terrible!!','pos')
]

In [44]:
from textblob import classifiers
classifier = classifiers.NaiveBayesClassifier(training)

In [45]:
dt_classifier = classifiers.DecisionTreeClassifier(training)

In [46]:
classifier.accuracy(testing)

1.0

In [47]:
classifier.show_informative_features(3)

Most Informative Features
            contains(is) = True              neg : pos    =      2.9 : 1.0
         contains(never) = False             neg : pos    =      1.8 : 1.0
      contains(terrible) = False             neg : pos    =      1.8 : 1.0


In [48]:
classify_blob = TextBlob('You are sucks', classifier=classifier)
classify_blob.classify()

'pos'