## TextBlob

TextBlob is a python library and offers a simple API to access its methods and perform basic NLP tasks.

#### Installation

!pip install -U textblob <br>
!python -m textblob.download_corpora

### Tokenization using TextBlob

    - Create a textblob object and pass a string with it.
    - Call functions of textblob in order to do a specific task.

In [2]:
from textblob import TextBlob

#create a object, by passing a string
blob = TextBlob("Analytics Vidhya is a great platform to learn data science. \n It helps community through blogs, hackathons, discussions,etc.")

blob.sentences

[Sentence("Analytics Vidhya is a great platform to learn data science."),
 Sentence("It helps community through blogs, hackathons, discussions,etc.")]

In [7]:
words = [word for word in [sentence.words for sentence in blob.sentences]]

In [8]:
print(words)

[WordList(['Analytics', 'Vidhya', 'is', 'a', 'great', 'platform', 'to', 'learn', 'data', 'science']), WordList(['It', 'helps', 'community', 'through', 'blogs', 'hackathons', 'discussions', 'etc'])]


### Noun Phrase Extraction

In [9]:
noun_phrase = [np for np in blob.noun_phrases]
print(noun_phrase)

['analytics vidhya', 'great platform', 'data science']


In [11]:
### Part-of-speech Tagging

blob.tags

[('Analytics', 'NNS'),
 ('Vidhya', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('great', 'JJ'),
 ('platform', 'NN'),
 ('to', 'TO'),
 ('learn', 'VB'),
 ('data', 'NNS'),
 ('science', 'NN'),
 ('It', 'PRP'),
 ('helps', 'VBZ'),
 ('community', 'NN'),
 ('through', 'IN'),
 ('blogs', 'NNS'),
 ('hackathons', 'NNS'),
 ('discussions', 'NNS'),
 ('etc', 'FW')]

### Words Inflection
Inflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural.


In [16]:

print (blob.sentences[1].words[1])
print (blob.sentences[1].words.singularize())

from textblob import Word
w = Word('Platform')
w.pluralize()

helps
['It', 'help', 'community', 'through', 'blog', 'hackathon', 'discussion', 'etc']


'Platforms'

### Lemmatization
Words can be lemmatized using the lemmatize function

In [20]:
w = Word('running
         
         
         3m                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ')
w.lemmatize("v") ## v represents verb

'run'

### N - grams

A combination of multiple words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words, and can be used as features for language modelling.  N-grams can be easily accessed in TextBlob using the ngrams function, which returns a tuple of n successive words.

In [26]:
for ngrams in blob.ngrams(3):
    print(ngrams)

['Analytics', 'Vidhya', 'is']
['Vidhya', 'is', 'a']
['is', 'a', 'great']
['a', 'great', 'platform']
['great', 'platform', 'to']
['platform', 'to', 'learn']
['to', 'learn', 'data']
['learn', 'data', 'science']
['data', 'science', 'It']
['science', 'It', 'helps']
['It', 'helps', 'community']
['helps', 'community', 'through']
['community', 'through', 'blogs']
['through', 'blogs', 'hackathons']
['blogs', 'hackathons', 'discussions']
['hackathons', 'discussions', 'etc']


### Sentiment Analysis

Sentiment analysis is basically the process of determining the attitude or the emotion of the writer, i.e., whether it is positive or negative or neutral.
The sentiment function of textblob returns two properties, polarity, and subjectivity. 
Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1].

In [27]:
blob.sentiment

Sentiment(polarity=0.8, subjectivity=0.75)

### Spelling Correction

Spelling correction is a feature, which can be accessed using the correct function as shown below.

In [29]:
blob1=TextBlob('Analytics Vidhya is a gret platfrm to learn data scence')
blob1.correct()

TextBlob("Analytics Vidhya is a great platform to learn data science")

In [31]:
#check the list of suggested word and its confidence using the spellcheck function
blob1.words[4].spellcheck()

[('great', 0.5351351351351351),
 ('get', 0.3162162162162162),
 ('grew', 0.11216216216216217),
 ('grey', 0.026351351351351353),
 ('greet', 0.006081081081081081),
 ('fret', 0.002702702702702703),
 ('grit', 0.0006756756756756757),
 ('cret', 0.0006756756756756757)]

#### Translation and Language Detection

In [46]:
blob = TextBlob('هذا رائع')

blob.detect_language()

blob.translate(to='en')

TextBlob("that's cool")

## Text classification using TextBlob

Textblob provides in-build classifiers module to create a custom classifier. So, let’s quickly import it and create a basic classifier.

In [32]:
training = [
('Tom Holland is a terrible spiderman.','pos'),
('a terrible Javert (Russell Crowe) ruined Les Miserables for me...','pos'),
('The Dark Knight Rises is the greatest superhero movie ever!','neg'),
('Fantastic Four should have never been made.','pos'),
('Wes Anderson is my favorite director!','neg'),
('Captain America 2 is pretty awesome.','neg'),
('Let\s pretend "Batman and Robin" never happened..','pos'),
]

testing = [
('Superman was never an interesting character.','pos'),
('Fantastic Mr Fox is an awesome film!','neg'),
('Dragonball Evolution is simply terrible!!','pos')
]


In [34]:
from textblob import classifiers

#NaiveBayes Classifier
classifier = classifiers.NaiveBayesClassifier(training)

#Decision Tree classifier
dt_classifier = classifiers.DecisionTreeClassifier(training)

In [37]:
print(classifier.accuracy(testing))
classifier.show_informative_features(10)

1.0
Most Informative Features
            contains(is) = True              neg : pos    =      2.9 : 1.0
      contains(terrible) = False             neg : pos    =      1.8 : 1.0
         contains(never) = False             neg : pos    =      1.8 : 1.0
             contains(a) = False             neg : pos    =      1.8 : 1.0
      contains(director) = False             pos : neg    =      1.4 : 1.0
           contains(The) = False             pos : neg    =      1.4 : 1.0
     contains(superhero) = False             pos : neg    =      1.4 : 1.0
      contains(greatest) = False             pos : neg    =      1.4 : 1.0
        contains(Knight) = False             pos : neg    =      1.4 : 1.0
         contains(movie) = False             pos : neg    =      1.4 : 1.0


In [45]:
blob = TextBlob('the weather is Fantastic!', classifier=dt_classifier)
print(blob.classify())

neg
