# TextBlob

- TextBlob, which is another extremely powerful NLP library for Python. TextBlob is built upon NLTK and provides an easy to use interface to the NLTK library. We will see how TextBlob can be used to perform a variety of NLP tasks ranging from parts-of-speech tagging to sentiment analysis, and language translation to text classification.

- This lib is an advance and good as compared to Nltk lib

### Tokenization :

- Tokenization refers to splitting a large paragraph into sentences or words. Typically, a token refers to a word in a text document. Tokenization is pretty straight forward with TextBlob.

In [1]:
# install textblob

!pip install textblob



In [2]:
# In text blob we donot need to download model again and again so do :

!python -m textblob.download_corpora

Finished.


[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\monik\AppData\Roaming\nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\monik\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\monik\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\monik\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to
[nltk_data]     C:\Users\monik\AppData\Roaming\nltk_data...
[nltk_data]   Package conll2000 is already up-to-date!
[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\monik\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!

In [3]:
# import lib :

from textblob import TextBlob

In [13]:
# word and sentence tokenization :
document = ("In computer science, artificial intelligence (AI), \
            sometimes called machine intelligence, is intelligence \
            demonstrated by machines, in contrast to the natural intelligence \
            displayed by humans and animals. Computer science defines AI \
            research as the study of \"intelligent agents\": any device that \
            perceives its environment and takes actions that maximize its\
            chance of successfully achieving its goals.[1] Colloquially,\
            the term \"artificial intelligence\" is used to describe machines\
            that mimic \"cognitive\" functions that humans associate with other\
            human minds, such as \"learning\" and \"problem solving\".[2]")

In [6]:
#  pass the doc as a parameter to the TextBlob class :the words attribute returns the tokenized words in the document.

token = TextBlob(document)
token.words

WordList(['In', 'computer', 'science', 'artificial', 'intelligence', 'AI', 'sometimes', 'called', 'machine', 'intelligence', 'is', 'intelligence', 'demonstrated', 'by', 'machines', 'in', 'contrast', 'to', 'the', 'natural', 'intelligence', 'displayed', 'by', 'humans', 'and', 'animals', 'Computer', 'science', 'defines', 'AI', 'research', 'as', 'the', 'study', 'of', 'intelligent', 'agents', 'any', 'device', 'that', 'perceives', 'its', 'environment', 'and', 'takes', 'actions', 'that', 'maximize', 'its', 'chance', 'of', 'successfully', 'achieving', 'its', 'goals', '1', 'Colloquially', 'the', 'term', 'artificial', 'intelligence', 'is', 'used', 'to', 'describe', 'machines', 'that', 'mimic', 'cognitive', 'functions', 'that', 'humans', 'associate', 'with', 'other', 'human', 'minds', 'such', 'as', 'learning', 'and', 'problem', 'solving', '2'])

In [11]:
# sentence tokenization: we can use the sentences attribute

sent = token.sentences
print(sent)
print()
print(f" Total sentences are : {len(sent)}")

[Sentence("In computer science, artificial intelligence (AI),             sometimes called machine intelligence, is intelligence             demonstrated by machines, in contrast to the natural intelligence             displayed by humans and animals."), Sentence("Computer science defines AI             research as the study of "intelligent agents": any device that             perceives its environment and takes actions that maximize its            chance of successfully achieving its goals."), Sentence("[1] Colloquially,            the term "artificial intelligence" is used to describe machines            that mimic "cognitive" functions that humans associate with other            human minds, such as "learning" and "problem solving"."), Sentence("[2]")]

 Total sentences are : 4


### Stemming and Lemmatization

- Lemmatization refers to reducing the word to its root form as found in a dictionary.

In [17]:
# import word :

from textblob import Word
s =Word('history')
s.stem()

'histori'

In [19]:
l = Word('went')
l.lemmatize('v')

#  By default, the words are treated as nouns by the lemmatize() method. so need to pass 'v' means specify this is a verb

'go'

- The complete list for the parts of speech components is as follows: ADJ, ADJ_SAT, ADV, NOUN, VERB = 'a', 's', 'r', 'n', 'v'

### Parts of Speech (POS) Tagging:

- Like the NLTK library, the TextBlob library also contains functionalities for the POS tagging.

In [24]:
for word, pos in token.tags:
    print(word + " => " + pos)

# print the tags for all the words in the paragraph.

In => IN
computer => NN
science => NN
artificial => JJ
intelligence => NN
AI => NNP
sometimes => RB
called => VBD
machine => NN
intelligence => NN
is => VBZ
intelligence => NN
demonstrated => VBN
by => IN
machines => NNS
in => IN
contrast => NN
to => TO
the => DT
natural => JJ
intelligence => NN
displayed => VBN
by => IN
humans => NNS
and => CC
animals => NNS
Computer => NNP
science => NN
defines => NNS
AI => NNP
research => NN
as => IN
the => DT
study => NN
of => IN
intelligent => JJ
agents => NNS
any => DT
device => NN
that => WDT
perceives => VBZ
its => PRP$
environment => NN
and => CC
takes => VBZ
actions => NNS
that => IN
maximize => VB
its => PRP$
chance => NN
of => IN
successfully => RB
achieving => VBG
its => PRP$
goals => NNS
[ => RB
1 => CD
] => NNP
Colloquially => NNP
the => DT
term => NN
artificial => JJ
intelligence => NN
is => VBZ
used => VBN
to => TO
describe => VB
machines => NNS
that => IN
mimic => JJ
cognitive => JJ
functions => NNS
that => WDT
humans => NNS
associate

### Dictionary : gives the definition of any word :


In [42]:
# it uses .definitions to return the explainantion of any word exidting in the dictionary :

Word('length').definitions

['the linear extent in space from one end to the other; the longest dimension of something that is fixed in place',
 'continuance in time',
 'the property of being the extent of something from beginning to end',
 'size of the gap between two places',
 'a section of something that is long and narrow']

In [43]:
Word('Circle').definitions

['ellipse in which the two axes are of equal length; a plane curve generated by one point moving at a constant distance from a fixed point',
 'an unofficial association of people or groups',
 'something approximating the shape of a circle',
 'movement once around a course',
 'a road junction at which traffic streams circularly around a central island',
 'street names for flunitrazepan',
 'a curved section or tier of seats in a hall or theater or opera house; usually the first tier above the orchestra',
 'any circular or rotating mechanism',
 'travel around something',
 'move in circles',
 'form a circle around']

### Convert Text to Singular and Plural :

- TextBlob also allows you to convert text into a plural or singular form using the pluralize and singularize methods

In [33]:
text = TextBlob('Use 4 spaces per indentation level')

# convert text in to singular
text.words.singularize()

WordList(['Use', '4', 'space', 'per', 'indentation', 'level'])

In [30]:
# convert spaces in to singular
text.words[2].singularize()

'space'

In [31]:
# convert text in to plural
text.words.pluralize()

# behind the scene they maintain words from dictionary : but if word is not in the dictionary then, just adds 's' and removes 's'.

WordList(['Uses', '4s', 'spacess', 'pers', 'indentations', 'levels'])

### Noun Phrase Extraction :

- Noun phrase extraction, as the name suggests, refers to extracting phrases that contain nouns.

In [34]:
for noun_phrase in token.noun_phrases:
    print(noun_phrase)

computer science
artificial intelligence
ai
machine intelligence
natural intelligence
computer
science defines
ai
intelligent agents
colloquially
artificial intelligence
describe machines
human minds


### Getting Words and Phrase Counts :

- we used Python's built-in len method to count the number of sentences, words and noun-phrases returned by the TextBlob object. We can use TextBlob's built-in methods for the same purpose.
- To find the frequency of occurrence of a particular word, we have to pass the name of the word as the index to the word_counts list of the TextBlob object.

In [38]:
text_blob_object = TextBlob(document)
text_blob_object.word_counts['intelligence']

5

In [39]:
text_blob_object.noun_phrases.count('artificial intelligence')

2

### Spelling Corrections:
- Spelling correction is one of the unique functionalities of the TextBlob library. With the correct method of the TextBlob object, you can correct all the spelling mistakes in your text.

In [40]:
text = "I love to watchf footbal, but I have neter played it"
text_blob_object = TextBlob(text)

print(text_blob_object.correct())

I love to watch football, but I have never played it


### Language Translation :

- One of the most powerful capabilities of the TextBlob library is to translate from one language to another. On the backend, the TextBlob language translator uses the Google Translate API.

In [49]:
text_blob_object_hindi = TextBlob('Somthing is better than nothing')

# convert this to hindi:
text_blob_object_hindi.translate(to = 'hi')

TextBlob("कुछ नहीं से कुछ बेहतर है")

In [48]:
# converting french to eng:
text_blob_object_french = TextBlob(u'Salut comment allez-vous?')
print(text_blob_object_french.translate(to='en'))

Hi, how are you?


In [50]:
# from Arabic to English:
text_blob_object_arabic = TextBlob(u'مرحبا كيف حالك؟')
print(text_blob_object_arabic.translate(to='en'))

Hello how are you?


* Finally, using the detect_language method, you can also detect the language of the sentence.

In [54]:
text_blob_object = TextBlob(u'Hola como estas?')
print(text_blob_object.detect_language())

# es, which stands for the Spanish language

es


In [55]:
blob = TextBlob('有总比没有好')
blob.detect_language()

# zh-CN, which stands for the chinese language

'zh-CN'

### Text Classification:

TextBlob also provides basic text classification capabilities. Though,TextBlob for text classification owing to its limited capabilities, however, if you have a really limited data and you want to quickly develop a very basic text classification model, then you may use TextBlob. For advanced models,we can use machine learning libraries such as Scikit-Learn or Tensorflow.


In [57]:
'''The dataset contains some dummy reviews about movies. 
You can see our training and test datasets consist of lists of tuples 
where the first element of the tuple is the text or a sentence 
while the second member of the tuple is the corresponding review or sentiment of the text.'''

train_data = [
    ('This is an excellent movie', 'pos'),
    ('The move was fantastic I like it', 'pos'),
    ('You should watch it, it is brilliant', 'pos'),
    ('Exceptionally good', 'pos'),
    ("Wonderfully directed and executed. I like it", 'pos'),
    ('It was very boring', 'neg'),
    ('I did not like the movie', 'neg'),
    ("The movie was horrible", 'neg'),
    ('I will not recommend', 'neg'),
    ('The acting is pathetic', 'neg')
]

test_data = [
    ('Its a fantastic series', 'pos'),
    ('Never watched such a brillent movie', 'pos'),
    ("horrible acting", 'neg'),
    ("It is a Wonderful movie", 'pos'),
    ('waste of money', 'neg'),
    ("pathetic picture", 'neg')
]

In [59]:
# we will use the NaiveBayesClassifier class from the textblob.classifiers library.:
from textblob.classifiers import NaiveBayesClassifier

# To train the model, we simply have to pass the training data to the constructor of the NaiveBayesClassifier class. 
classifier = NaiveBayesClassifier(train_data)

In [60]:
# Make a prediction on a single sentence.

print(classifier.classify("It is very boring"))

neg


In [61]:
classifier.classify('This is an amazing library')

'pos'

In [63]:
# it can give the probability :

prob = classifier.prob_classify('This is an amazing library')

# probability of nagetivity in this sentence
prob.prob('neg')

# probability of positivity in this sentence
prob.prob('pos')

# this sentence is approx. 90% positive

0.9065478890561054

In [62]:
classifier.accuracy(test_data)

# In the output, you will see 0.66 which is the accuracy of the algorithm.

0.6666666666666666