# <center> Introduction to TextBlob - A tool for Natural Language Processing !

Due to the advancement in natural language processing tools, there are several open source tools available for natural language processing. One of them is TextBlob lets have a look at this tool and how easily it can be implemented. Text Blob is a simple python library used to perform NLP task like 
- tokenization, 
- Noun phrase extraction, 
- POS-Tagging, Words inflection and lemmatization, 
- N-grams, 
- Sentiment Analysis. 
<br>
It is like NLTK just it has more features like Spelling correction, Creating a short summary of a text, Translation and language detection.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [2]:
# !pip install -U textblob
# !python -m textblob.download_corpora

# Import Textblob Modules
from textblob import TextBlob
from textblob import Word
#from textblob.wordnet import VERB
from textblob.classifiers import NaiveBayesClassifier

import nltk 
nltk.download('brown')

[nltk_data] Downloading package brown to /usr/share/nltk_data...
[nltk_data]   Package brown is already up-to-date!


True

In [3]:
text = '''                                       
Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text 
data using text analysis techniques. Sentiment analysis allows 
businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.
'''
print(text)

                                       
Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text 
data using text analysis techniques. Sentiment analysis allows 
businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.



In [4]:
# Let’s create our first TextBlob object.
wiki = TextBlob(text)
wiki

TextBlob("                                       
Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text 
data using text analysis techniques. Sentiment analysis allows 
businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.
")

## 1. Tokenization

In [5]:
wiki.words

WordList(['Sentiment', 'analysis', 'is', 'the', 'interpretation', 'and', 'classification', 'of', 'emotions', 'positive', 'negative', 'and', 'neutral', 'within', 'text', 'data', 'using', 'text', 'analysis', 'techniques', 'Sentiment', 'analysis', 'allows', 'businesses', 'to', 'identify', 'customer', 'sentiment', 'toward', 'products', 'brands', 'or', 'services', 'in', 'online', 'conversations', 'and', 'feedback'])

In [6]:
wiki.words.count('Better') # lowwer case is done

0

In [7]:
wiki.words.count('Sentiment') # lowwer case is done

3

In [8]:
wiki.sentences

[Sentence("                                       
 Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text 
 data using text analysis techniques."),
 Sentence("Sentiment analysis allows 
 businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.")]

In [9]:
wiki.word_counts # gives us the number of times the word has appeared , everything converted to Lower case

defaultdict(int,
            {'sentiment': 3,
             'analysis': 3,
             'is': 1,
             'the': 1,
             'interpretation': 1,
             'and': 3,
             'classification': 1,
             'of': 1,
             'emotions': 1,
             'positive': 1,
             'negative': 1,
             'neutral': 1,
             'within': 1,
             'text': 2,
             'data': 1,
             'using': 1,
             'techniques': 1,
             'allows': 1,
             'businesses': 1,
             'to': 1,
             'identify': 1,
             'customer': 1,
             'toward': 1,
             'products': 1,
             'brands': 1,
             'or': 1,
             'services': 1,
             'in': 1,
             'online': 1,
             'conversations': 1,
             'feedback': 1})

In [10]:
wiki.word_counts['text']

2

If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0.

The second way is to use the count() method.

In [11]:
wiki.words.count('text', case_sensitive=True)

2

In [12]:
wiki.words.count('Text', case_sensitive=True)

0

## 2. Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [13]:
sentence = TextBlob(text)

In [14]:
sentence.words

WordList(['Sentiment', 'analysis', 'is', 'the', 'interpretation', 'and', 'classification', 'of', 'emotions', 'positive', 'negative', 'and', 'neutral', 'within', 'text', 'data', 'using', 'text', 'analysis', 'techniques', 'Sentiment', 'analysis', 'allows', 'businesses', 'to', 'identify', 'customer', 'sentiment', 'toward', 'products', 'brands', 'or', 'services', 'in', 'online', 'conversations', 'and', 'feedback'])

In [15]:
sentence.words.singularize()

WordList(['Sentiment', 'analysi', 'is', 'the', 'interpretation', 'and', 'classification', 'of', 'emotion', 'positive', 'negative', 'and', 'neutral', 'within', 'text', 'datum', 'using', 'text', 'analysi', 'technique', 'Sentiment', 'analysi', 'allows', 'business', 'to', 'identify', 'customer', 'sentiment', 'toward', 'product', 'brand', 'or', 'service', 'in', 'online', 'conversation', 'and', 'feedback'])

In [16]:
sentence.words.pluralize()

WordList(['Sentiments', 'analyses', 'iss', 'thes', 'interpretations', 'ands', 'classifications', 'ofs', 'emotionss', 'positives', 'negatives', 'ands', 'neutrals', 'withins', 'texts', 'datas', 'usings', 'texts', 'analyses', 'techniquess', 'Sentiments', 'analyses', 'allowss', 'businessess', 'toes', 'identifies', 'customers', 'sentiments', 'towards', 'productss', 'brandss', 'ors', 'servicess', 'ins', 'onlines', 'conversationss', 'ands', 'feedbacks'])

In [17]:
sentence.words[-4:-1].pluralize()

WordList(['onlines', 'conversationss', 'ands'])

In [18]:
# Lemmatizers -- will always give the root word, will retain the meaning ---> keeping the Dictionary rules/words into consideration

# Stemming ---> will Trim the tokens from the end ---- es, ses, ed --> might happen that resultant word do not have any meaning

In [19]:
# lemmatization -- takes POS Tags  --- by default - NOUN,  it can be changed --- w.r.t tags , go lemmatization

Words can be lemmatized by calling the lemmatize method.

In [20]:
w = Word("stipes", pos_tag = 'n')

In [21]:
w.lemmatize('v')

'stipes'

In [22]:
w = Word("went")

In [23]:
w.lemmatize('v') 

'go'

## 3. POS Tagging

Part-of-speech tags can be accessed through the tags property.

In [24]:
zen = TextBlob("Beautiful is better than ugly. "
               "Explicit is better than implicit. "
               "Simple is better than complex.")

In [25]:
zen.tags

[('Beautiful', 'NNP'),
 ('is', 'VBZ'),
 ('better', 'JJR'),
 ('than', 'IN'),
 ('ugly', 'RB'),
 ('Explicit', 'NNP'),
 ('is', 'VBZ'),
 ('better', 'JJR'),
 ('than', 'IN'),
 ('implicit', 'NN'),
 ('Simple', 'NN'),
 ('is', 'VBZ'),
 ('better', 'JJR'),
 ('than', 'IN'),
 ('complex', 'JJ')]

In [26]:
for word, pos in zen.tags:
    print(word.lower() + " => " + pos)

beautiful => NNP
is => VBZ
better => JJR
than => IN
ugly => RB
explicit => NNP
is => VBZ
better => JJR
than => IN
implicit => NN
simple => NN
is => VBZ
better => JJR
than => IN
complex => JJ


## 4. Noun Phrase Extraction

noun phrases are accessed through the noun_phrases property.

In [27]:
document = ("In computer science, artificial intelligence (AI), \
            sometimes called machine intelligence, is intelligence \
            demonstrated by machines, in contrast to the natural intelligence \
            displayed by humans and animals. Computer science defines AI \
            research as the study of \"intelligent agents\": any device that \
            perceives its environment and takes actions that maximize its\
            chance of successfully achieving its goals.[1] Colloquially,\
            the term \"artificial intelligence\" is used to describe machines\
            that mimic \"cognitive\" functions that humans associate with other\
            human minds, such as \"learning\" and \"problem solving\".[2]")

In [28]:
text_blob_object = TextBlob(document)

for noun_phrase in text_blob_object.noun_phrases:    
    print(noun_phrase)

computer science
artificial intelligence
ai
machine intelligence
natural intelligence
computer
science defines
ai
intelligent agents
colloquially
artificial intelligence
describe machines
human minds


## 5. Spelling Correction
Use the correct() method to attempt spelling correction.

Spelling correction is based on Peter Norvig’s “How to Write a Spelling Corrector” as implemented in the pattern library. It is about 70% accurate

In [29]:
b = TextBlob("I havv written goood speling!. speling corection is based. howw tooo writt a speling corect ")

for i in b.sentences:
    
    print(i.correct())

I have written good spelling!.
spelling correction is based.
how took write a spelling correct


Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.

In [30]:
from textblob import Word
w = Word('falibility')

w.spellcheck() # 1 is the confidence

[('fallibility', 1.0)]

In [31]:
b = TextBlob("I havv goood speling!. speling corection is based. howw tooo writt a speling corect ")

for i in b.words:    
    print(i.spellcheck())

[('I', 1.0)]
[('have', 1.0)]
[('good', 1.0)]
[('spelling', 1.0)]
[('spelling', 1.0)]
[('correction', 1.0)]
[('is', 1.0)]
[('based', 1.0)]
[('how', 0.9924528301886792), ('howe', 0.004528301886792453), ('howl', 0.0030188679245283017)]
[('took', 0.5079787234042553), ('too', 0.4858156028368794), ('tool', 0.0062056737588652485)]
[('write', 0.4777777777777778), ('wrist', 0.37777777777777777), ('writ', 0.08333333333333333), ('writs', 0.06111111111111111)]
[('a', 1.0)]
[('spelling', 1.0)]
[('correct', 1.0)]


## 6. Translation and Language Detection

One of the most powerful capabilities of the TextBlob library is to translate from one language to another. On the backend, the TextBlob language translator uses the __Google Translate API__



https://cloud.google.com/translate/docs/languages

In [32]:
en_blob = TextBlob(u'Simple is better than complex.')

In [33]:
en_blob.translate(to='hi') # 

TextBlob("सरल जटिल से बेहतर है।")

In [34]:
chinese_blob = TextBlob(u"美丽优于丑陋")

chinese_blob.translate(from_lang="zh-CN", to='en') # en == english

TextBlob("Beauty is better than ugly")

You can also attempt to detect a TextBlob’s language using TextBlob.detect_language().

In [35]:
b = TextBlob(u"بسيط هو أفضل من مجمع")
b.detect_language()

'ar'

In [36]:
b = TextBlob("tumi kemon aachon")
b.detect_language()

'bn'

In [37]:
b = TextBlob(u"क्या हाल है")
b.detect_language()

'hi'

In [38]:
b.translate(from_lang="hi", to='en')

TextBlob("How are you")

In [39]:
st =  ["क्या हाल है" ,"توهان ڪيئن آهيو" ,"আপনি কেমন আছেন" ,"நீங்கள் எப்படி" ,"இருக்கிறீர்கள்" ,"तिमीलाई कस्तो छ" ,"તમે કેમ છો" ,"तू कसा आहेस ","ന്തൊക്കെയുണ്ട്"]

In [40]:
for i in st:
    a = TextBlob(i)
    t = a.detect_language()
    print(t)
    print(a.translate(from_lang=t, to='en'))

hi
How are you
sd
how R u
bn
How are you
ta
How are you
ta
You are
ne
How are you
gu
how are you
mr
How are you
ml
And so on


## 7. N-Grams

- N-Grams refer to n combination of words in a sentence. For instance, for a sentence "I love watching football", some 2-grams would be (I love), (love watching) and (watching football). 

- N-Grams can play a crucial role in text classification.

- The TextBlob.ngrams() method returns a list of tuples of n successive words.

In [41]:
blob = TextBlob("Now is better than never.")

In [42]:
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

## 8. WordNet Integration

WordNet is a database of English words that are linked together by their semantic relationships. It is like a supercharged dictionary/thesaurus with a graph structure.

TextBlob 0.7 now integrates __NLTK's WordNet__ interface, making it very simple to interact with WordNet.

### Synsets
As you know, synonyms are words that have similar meanings. A synonym set, or synset, is a group of synonyms. A synset, therefore, corresponds to an abstract concept.

In TextBlob, you can access the synsets that a word belongs to by accessing the synsets property of a Word object.

* What is WordNet? A Conceptual Introduction Using Python - https://stevenloria.com/wordnet-tutorial/https://stevenloria.com/wordnet-tutorial/

In [43]:
# from textblob import Word
word = Word("happy")
word.definitions

['enjoying or showing or marked by joy or pleasure',
 'marked by good fortune',
 'eagerly disposed to act or to be of service',
 'well expressed and to the point']

In [44]:
word.synsets[:5]

[Synset('happy.a.01'),
 Synset('felicitous.s.02'),
 Synset('glad.s.02'),
 Synset('happy.s.04')]

The synonyms contained within a synset are called lemmas. You can access the string versions of these synonyms via a Synset's lemma_names property.

In [45]:
# Synonyms from WordNet
 
synonyms = set()
for synset in word.synsets:
    for lemma in synset.lemmas():
        synonyms.add(lemma.name())
         
print(synonyms)

{'felicitous', 'happy', 'glad', 'well-chosen'}


In [46]:
# Getting Antonyms using TextBlob
antonyms = set()
for synset in word.synsets:
    for lemma in synset.lemmas():        
        if lemma.antonyms():
            antonyms.add(lemma.antonyms()[0].name())        
 
print(antonyms)

{'unhappy'}


## 9. Converting to Upper and Lowercase
TextBlob objects are very similar to strings. You can convert them to upper case or lower case, change their values, and concatenate them together as well. In the following script, we convert the text from the TextBlob object to upper case:

In [47]:
text = "I love to watch football, but I have never played it"
text_blob_object = TextBlob(text)

print(text_blob_object.upper())

I LOVE TO WATCH FOOTBALL, BUT I HAVE NEVER PLAYED IT


## 10. Sentiment Analysis

* The sentiment property returns a named tuple of the form Sentiment(polarity, subjectivity). 
* The polarity score is a float within the range [-1.0, 1.0]. 
* The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
* Polarity is a float value within the range [-1.0 to 1.0] where 
 - 0 indicates neutral, 
 - 1 indicates a very positive sentiment and 
 - -1 represents a very negative sentiment.

* Subjectivity is a float value within the range [0.0 to 1.0] where 

  - 0.0 is very objective and 
  - 1.0 is very subjective. 

In [48]:
text = "I LOVE TO WATCH FOOTBALL, BUT I HAVE NEVER PLAYED IT"
text_blob_object = TextBlob(text)

print(text_blob_object.lower())

i love to watch football, but i have never played it


**Polarity score  [- 1, 1 ]** 

1. closer to -1 -- negative sentiment
2. closer to +1 - positive sentiment



**Subjectivity Score  [0,1 ]** 
 

1. close to 1 mean more of personal opinion
2. closer to 0 mean more of factual information

In [49]:
TextBlob("so the two together did the job").sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

In [50]:
testimonial1 = TextBlob("so the two together did the job, very good chootu anna")
# testimonial2 = TextBlob("wood is dark brown in color ")
testimonial4 = TextBlob("Mumbai is a city in the UK ")

In [51]:
print('Sentiment 1: ', testimonial1.sentiment)
print('Sentiment 2: ', testimonial4.sentiment)

#print('Polarity: ', testimonial1.sentiment.polarity)

Sentiment 1:  Sentiment(polarity=0.9099999999999999, subjectivity=0.7800000000000001)
Sentiment 2:  Sentiment(polarity=0.0, subjectivity=0.0)


In [52]:
print('Polarity: ', testimonial1.sentiment.polarity)

Polarity:  0.9099999999999999


In [53]:
print('Polarity: ', testimonial1.sentiment.subjectivity)

Polarity:  0.7800000000000001


## 11. Text Classification Using Naive Bayes

In [54]:
from textblob.classifiers import NaiveBayesClassifier

In [55]:
# create some training and test data.
# List of Tuples , first being the data , 2nd is the tag
train = [
     ('I love this sandwich.', 'pos'),  
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')
 ]

In [56]:
test = [
     ('the beer was good.', 'pos'),
     ('I do not enjoy my job', 'neg'),
     ("I ain't feeling dandy today.", 'neg'),
     ("I feel amazing!", 'pos'),
     ('Gary is a friend of mine.', 'pos'),
     ("I can't believe I'm doing this.", 'neg')
 ]

In [57]:
cl = NaiveBayesClassifier(train) # Trained my classifier on Train data

In [58]:
# Loading Data from Files
# You can also load data from common file formats including CSV, JSON, and TSV.

# CSV files should be formatted like so:

# I love this sandwich.,pos
# This is an amazing place!,pos
# I do not like this restaurant,neg

In [59]:
# Classifying Text
# Call the classify(text) method to use the classifier.

cl.classify("This is best library!")

'pos'

In [60]:
# You can get the label probability distribution with the prob_classify(text) method.

In [61]:
prob_dist = cl.prob_classify("This one's a doozy.")

In [62]:
prob_dist.prob('pos')

0.6311475409836058

In [63]:
round(prob_dist.prob("pos"), 2)

0.63

In [64]:
round(prob_dist.prob("neg"), 2)

0.37

In [65]:
# Evaluating Classifiers
# To compute the accuracy on our test set, use the accuracy(test_data) method.
cl.accuracy(test)

0.8333333333333334

### Updating Classifiers with New Data

In [66]:
new_data = [('She is my best friend.', 'pos'),
             ("I'm happy to have a new friend.", 'pos'),
             ("Stay thirsty, my friend.", 'pos'),
             ("He ain't from around here.", 'neg')]

In [67]:
cl.update(new_data)

True

In [68]:
cl.accuracy(test)

1.0

In [69]:
cl.classify("This is an amazing library!")

'pos'

In [70]:
cl.classify("I want to get rid of this library!")

'pos'