# <div align='center'>TextBlob</div>
TextBlob is a text processing library that is built on top of NLTK. It provides simple APIs for different text processing tasks

##### Install TextBlob

In [1]:
# !pip install -U textblob

<h5>Load reuired libraries</h5>

In [2]:
from textblob import TextBlob, Word
from textblob.sentiments import NaiveBayesAnalyzer
from textblob.classifiers import NaiveBayesClassifier
import pandas as pd

<h5>Load data</h5>

In [3]:
text="Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence.[11] Despite these accomplishments, he was never fully recognised in his home country during his lifetime due to his homosexuality and because much of his work was covered by the Official Secrets Act."

In [4]:
text

'Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence.[11] Despite these accomplishments, he was never fully recognised in his home country during his lifetime due to his homosexuality and because much of his work was covered by the Official Secrets Act.'

<h5>Instantiate the TextBlob class</h5>

In [5]:
text_blob=TextBlob(text)

<h5>Text Tokenization</h5>

In [6]:
# Split the corpus into sentences.
text_blob.sentences

[Sentence("Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist."),
 Sentence("[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer."),
 Sentence("[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence."),
 Sentence("[11] Despite these accomplishments, he was never fully recognised in his home country during his lifetime due to his homosexuality and because much of his work was covered by the Official Secrets Act.")]

In [7]:
# Split the corpus into words
text_blob.words

WordList(['Alan', 'Mathison', 'Turing', 'OBE', 'FRS', 'ˈtjʊərɪŋ', '23', 'June', '1912', '–', '7', 'June', '1954', 'was', 'an', 'English', '6', 'mathematician', 'computer', 'scientist', 'logician', 'cryptanalyst', 'philosopher', 'and', 'theoretical', 'biologist', '7', 'Turing', 'was', 'highly', 'influential', 'in', 'the', 'development', 'of', 'theoretical', 'computer', 'science', 'providing', 'a', 'formalisation', 'of', 'the', 'concepts', 'of', 'algorithm', 'and', 'computation', 'with', 'the', 'Turing', 'machine', 'which', 'can', 'be', 'considered', 'a', 'model', 'of', 'a', 'general-purpose', 'computer', '8', '9', '10', 'Turing', 'is', 'widely', 'considered', 'to', 'be', 'the', 'father', 'of', 'theoretical', 'computer', 'science', 'and', 'artificial', 'intelligence', '11', 'Despite', 'these', 'accomplishments', 'he', 'was', 'never', 'fully', 'recognised', 'in', 'his', 'home', 'country', 'during', 'his', 'lifetime', 'due', 'to', 'his', 'homosexuality', 'and', 'because', 'much', 'of', 'hi

<h5>POS Tagging</h5>

In [8]:
print(text_blob.tags)

[('Alan', 'NNP'), ('Mathison', 'NNP'), ('Turing', 'NNP'), ('OBE', 'NNP'), ('FRS', 'NNP'), ('/ˈtjʊərɪŋ/', 'NNP'), ('23', 'CD'), ('June', 'NNP'), ('1912', 'CD'), ('–', 'NNP'), ('7', 'CD'), ('June', 'NNP'), ('1954', 'CD'), ('was', 'VBD'), ('an', 'DT'), ('English', 'JJ'), ('[', 'NN'), ('6', 'CD'), (']', 'NN'), ('mathematician', 'NN'), ('computer', 'NN'), ('scientist', 'NN'), ('logician', 'JJ'), ('cryptanalyst', 'NN'), ('philosopher', 'NN'), ('and', 'CC'), ('theoretical', 'JJ'), ('biologist', 'NN'), ('[', 'RB'), ('7', 'CD'), (']', 'JJ'), ('Turing', 'NNP'), ('was', 'VBD'), ('highly', 'RB'), ('influential', 'JJ'), ('in', 'IN'), ('the', 'DT'), ('development', 'NN'), ('of', 'IN'), ('theoretical', 'JJ'), ('computer', 'NN'), ('science', 'NN'), ('providing', 'VBG'), ('a', 'DT'), ('formalisation', 'NN'), ('of', 'IN'), ('the', 'DT'), ('concepts', 'NNS'), ('of', 'IN'), ('algorithm', 'NN'), ('and', 'CC'), ('computation', 'NN'), ('with', 'IN'), ('the', 'DT'), ('Turing', 'NNP'), ('machine', 'NN'), ('whi

In [9]:
# Extract Noun Phrases from the corpus
text_blob.noun_phrases

WordList(['alan mathison turing obe frs', 'june', 'june', 'english', '] mathematician', 'computer scientist', 'theoretical biologist', 'turing', 'theoretical computer science', 'turing', 'general-purpose computer', '] [', '] [', 'turing', 'theoretical computer science', 'artificial intelligence', 'home country', 'secrets'])

<h5>Lemmatization</h5>

In [10]:
lema=Word("notes")
lema.lemmatize()

'note'

In [11]:
lema=Word(text)
lema

'Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence.[11] Despite these accomplishments, he was never fully recognised in his home country during his lifetime due to his homosexuality and because much of his work was covered by the Official Secrets Act.'

In [12]:
text

'Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence.[11] Despite these accomplishments, he was never fully recognised in his home country during his lifetime due to his homosexuality and because much of his work was covered by the Official Secrets Act.'

<h5>Definition</h5>

In [13]:
print(Word("USA").definitions)

['North American republic containing 50 states - 48 conterminous states in North America plus Alaska in northwest North America and the Hawaiian Islands in the Pacific Ocean; achieved independence in 1776', 'the army of the United States of America; the agency that organizes and trains soldiers for land warfare']


<h5>ngram</h5>

In [14]:
text_blob.ngrams(n=2)[0:10]

[WordList(['Alan', 'Mathison']),
 WordList(['Mathison', 'Turing']),
 WordList(['Turing', 'OBE']),
 WordList(['OBE', 'FRS']),
 WordList(['FRS', 'ˈtjʊərɪŋ']),
 WordList(['ˈtjʊərɪŋ', '23']),
 WordList(['23', 'June']),
 WordList(['June', '1912']),
 WordList(['1912', '–']),
 WordList(['–', '7'])]

<h5>Spell cheking & Correction</h5>

In [15]:
# Spellchecking
word=Word("countr")
word.spellcheck()

[('count', 0.620746887966805),
 ('country', 0.3510373443983402),
 ('county', 0.014107883817427386),
 ('counter', 0.012448132780082987),
 ('counts', 0.0016597510373443983)]

In [16]:
# Spell correction
word.correct()

'count'

<h5>Word Frequency</h5>

In [17]:
text_blob.words.count("was")

4

<h5>Language Detection & Translation</h5>

In [18]:
# detect language
text_blob.detect_language()

'en'

In [19]:
#translate language
text_blob.translate(to='es')

TextBlob("Alan Mathison Turing OBE FRS (/ ˈtjʊərɪŋ /; 23 de junio de 1912 - 7 de junio de 1954) fue un matemático, científico informático, lógico, criptoanalista, filósofo y biólogo teórico inglés [6]. Turing fue muy influyente en el desarrollo de la informática teórica, proporcionando una formalización de los conceptos de algoritmo y computación con la máquina de Turing, que puede considerarse un modelo de computadora de uso general. [8] [9] [10] Turing es ampliamente considerado como el padre de la informática teórica y la inteligencia artificial. [11] A pesar de estos logros, nunca fue completamente reconocido en su país de origen durante su vida debido a su homosexualidad y porque gran parte de su trabajo estaba cubierto por la Ley de Secretos Oficiales.")

<h5>Sentiment Analysis</h5><hr>
In TextBlob the sentiment returns the polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
TextBlob employs two strategies for analysing sentiment; The default implementation base on pattern and The NLTK based implementation.

In [20]:
text_blob.sentiment[0]

-0.058125

In [21]:
text_blob.sentiment.polarity

-0.058125

In [22]:
for s in text_blob.sentences:
    print(s," : ",s.sentiment[0],"\n\n")

Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.  :  0.0 


[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.  :  0.08 


[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence.  :  -0.2333333333333333 


[11] Despite these accomplishments, he was never fully recognised in his home country during his lifetime due to his homosexuality and because much of his work was covered by the Official Secrets Act.  :  0.037500000000000006 




<h5>Text Classification</h5>

Get training and testing data

In [23]:
train = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')
 ]
test = [
     ('the beer was good.', 'pos'),
     ('I do not enjoy my job', 'neg'),
     ("I ain't feeling dandy today.", 'neg'),
     ("I feel amazing!", 'pos'),
     ('Gary is a friend of mine.', 'pos'),
     ("I can't believe I'm doing this.", 'neg')
 ]

In [24]:
train

[('I love this sandwich.', 'pos'),
 ('this is an amazing place!', 'pos'),
 ('I feel very good about these beers.', 'pos'),
 ('this is my best work.', 'pos'),
 ('what an awesome view', 'pos'),
 ('I do not like this restaurant', 'neg'),
 ('I am tired of this stuff.', 'neg'),
 ("I can't deal with this", 'neg'),
 ('he is my sworn enemy!', 'neg'),
 ('my boss is horrible.', 'neg')]

Train model

In [25]:
model=NaiveBayesClassifier(train)
model

<NaiveBayesClassifier trained on 10 instances>

predict new data

In [26]:
model.classify('I ain\'t feeling dandy today')

'neg'

model evaluation

In [27]:
model.accuracy(test)

0.8333333333333334

show important features

In [28]:
model.show_informative_features()

Most Informative Features
            contains(my) = True              neg : pos    =      1.7 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
             contains(I) = True              neg : pos    =      1.4 : 1.0
             contains(I) = False             pos : neg    =      1.4 : 1.0
            contains(my) = False             pos : neg    =      1.3 : 1.0
          contains(good) = False             neg : pos    =      1.2 : 1.0
          contains(work) = False             neg : pos    =      1.2 : 1.0
       contains(amazing) = False             neg : pos    =      1.2 : 1.0
         contains(tired) = False             pos : neg    =      1.2 : 1.0
      contains(sandwich) = False             neg : pos    =      1.2 : 1.0


Update the model with new data

In [29]:
new_data = [('She is my best friend.', 'pos'),
             ("I'm happy to have a new friend.", 'pos'),
             ("Stay thirsty, my friend.", 'pos'),
             ("He ain't from around here.", 'neg')]
model.update(new_data)

True

<h5>Text Classification using Pandas Dataframe data</h5>

Load data

In [30]:
df=pd.read_csv('./datasets/spam.csv',usecols=['v1','v2'])

In [31]:
df.head()

Unnamed: 0,v1,v2
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


Prepare data

In [32]:
df.columns=['target','text']
df=df[['text','target']]

In [33]:
df.head()

Unnamed: 0,text,target
0,"Go until jurong point, crazy.. Available only ...",ham
1,Ok lar... Joking wif u oni...,ham
2,Free entry in 2 a wkly comp to win FA Cup fina...,spam
3,U dun say so early hor... U c already then say...,ham
4,"Nah I don't think he goes to usf, he lives aro...",ham


In [34]:
df.shape

(5572, 2)

In [35]:
for index,row in df.iterrows():
    df.loc[index, "text"] = row["text"].replace(',','')

Split data into training and testing sets

In [36]:
train=df.iloc[1501:,:]
test=df.iloc[0:1500,:]

In [37]:
train.shape,test.shape

((4071, 2), (1500, 2))

In [38]:
train=train.values.tolist()
test=test.values.tolist()

Train the classifier

In [39]:
model2=NaiveBayesClassifier(train)
model2

<NaiveBayesClassifier trained on 4071 instances>

Predict with the trained model

In [40]:
model2.classify('Oh my God. I\'m almost home')

'ham'

Prediction with probability distribution

In [41]:
prob_classify = model2.prob_classify("Oh my God. I\'m almost home")

In [42]:
# Return the class predicted
prob_classify.max()

'ham'

In [43]:
# Probability distribution for text being Spam
prob_classify.prob("spam")

6.1054502744624495e-15

In [44]:
# Probability distribution for text being ham
prob_classify.prob("ham")

0.9999999999999901

Model evaluation

In [45]:
model2.accuracy(test)

0.978

Outstanding features in our dataset

In [46]:
model2.show_informative_features(10) 

Most Informative Features
          contains(FREE) = True             spam : ham    =    308.4 : 1.0
             contains(T) = True             spam : ham    =    188.6 : 1.0
          contains(STOP) = True             spam : ham    =    184.1 : 1.0
           contains(Txt) = True             spam : ham    =    145.1 : 1.0
         contains(Nokia) = True             spam : ham    =    135.3 : 1.0
       contains(service) = True             spam : ham    =     91.9 : 1.0
          contains(Text) = True             spam : ham    =     81.2 : 1.0
         contains(apply) = True             spam : ham    =     77.7 : 1.0
       contains(receive) = True             spam : ham    =     77.7 : 1.0
         contains(await) = True             spam : ham    =     73.2 : 1.0
