## TextBlob

### Features

> Noun Phrase extraction

> Part-of-speech tagging

> Sentiment Analysis

> Classification (Naive Bayes, Decision Tree)

> Langugae translation and detection powered by Google Translate

> Tokenization (splitting text into words)

> Word and phrases frequencies

> Parsing

> n-grams 

> Word inflection(pluralization and singularization) and lemmitization 

> Spelling correction 

> Add new models or languages through extensions

> Wordnet Integration

## Installation

1. pip install textblob
2. python -m textblob.download_corpora


### Create a TextBlob

In [2]:
from textblob import TextBlob

wiki = TextBlob("Love the way you teach natural langugae processing...I am abdul rehman")
wiki

TextBlob("Love the way you teach natural langugae processing...I am abdul rehman")

### Parts-of-Speech Tagging

Parts of speech can be accessed through the tags property

In [11]:
wiki.tags

[('Love', 'VB'),
 ('the', 'DT'),
 ('way', 'NN'),
 ('you', 'PRP'),
 ('teach', 'VBP'),
 ('natural', 'JJ'),
 ('langugae', 'NN'),
 ('processing', 'NN'),
 ('I', 'PRP'),
 ('am', 'VBP'),
 ('abdul', 'JJ'),
 ('rehman', 'NN')]

### Noun Phrase Extraction

Noun Phrases are accessed through noun_phrases property

In [12]:
#Extracting all sort of a noun
wiki.noun_phrases

WordList(['love', 'natural langugae processing ...', 'abdul rehman'])

### Sentiment Analysis

The sentiment preporty returns a tuple that contains two attributes named as polarity and subjectivity. The poliarty score is a float with in range  [-1,0,1]. The subjectivity is floating point number [0.0,1.0]. Where 0.0 is related to very objective and 1.0 is very subjective.

Polarity shows the sentiment if there is negative sentense then polrity value is in negative range, o if neutral and for positive sentiment value will be  in posiitve range.

In [14]:
test_sample = TextBlob("Text Blob is an amazing lib. I would prefer my other friends to use that one too...")
test_sample.sentiment

Sentiment(polarity=0.23750000000000004, subjectivity=0.6375)

In [3]:
test_sample = TextBlob("Hey are you?")
test_sample.sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

In [9]:
test_sample = TextBlob("Ali is not a good boy")
test_sample.sentiment

Sentiment(polarity=-0.35, subjectivity=0.6000000000000001)

In [10]:
test_sample.sentiment.polarity,test_sample.sentiment.subjectivity

(-0.35, 0.6000000000000001)

### Tokenization

> Splitting sentence into words

> Splitting paragraphs into sentenses

In [17]:
word = TextBlob("New books are amazing. I will definitely pick up one and will read it precisely.")

word.words

WordList(['New', 'books', 'are', 'amazing', 'I', 'will', 'definitely', 'pick', 'up', 'one', 'and', 'will', 'read', 'it', 'precisely'])

In [18]:
word.sentences

[Sentence("New books are amazing."),
 Sentence("I will definitely pick up one and will read it precisely.")]

### Word Inflection and Lemmitization

> Converting words to singular or plural using words inflection and even we can change the numbers to words using words inflection


> Lemmitaization converts the data or words into their nearest possible word

In [21]:
senetence = TextBlob("Use 4 spaces per indent Level")

senetence.words

WordList(['Use', '4', 'spaces', 'per', 'indent', 'Level'])

In [25]:
senetence.words[2].singularize(),senetence.words[4].pluralize()

('space', 'indents')

In [27]:
from textblob import Word


#Lemmitization provides the nearest words to you, you can even set which pasrt of speech you want to lemmatize
ans = Word('loins')
ans.lemmatize()

'loin'

In [31]:
ans = Word('went')
ans.lemmatize('v')



'go'

### WordNet Integration

Wordnet is a lexical database that is dictionary for the English Langauge.

Synset is the simple nltk interface that is desgin to look up words in Wordnet. Synset instances are the grouping of synonyms that express the same type of content. some words have only one synset and some have many.

In [34]:
from textblob import Word 
from textblob.wordnet import VERB 


#It will geenrate the synonums to you, here n represents noun and digits represents score
word = Word('goat')
word.synsets

[Synset('goat.n.01'),
 Synset('butt.n.03'),
 Synset('capricorn.n.01'),
 Synset('capricorn.n.03')]

In [35]:
#It will geenrate the synonums to you, here n represents noun and digits represents score
word = Word('sun')
word.synsets

[Synset('sun.n.01'),
 Synset('sunlight.n.01'),
 Synset('sun.n.03'),
 Synset('sun.n.04'),
 Synset('sunday.n.01'),
 Synset('sun.v.01'),
 Synset('sun.v.02')]

In [37]:
word = Word('hack')
word.get_synsets(pos = VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

#### Checking Definitions of Some Words
You can find the definition of any word using this wordnet lib 

In [38]:
from textblob import Word
Word('Sun').definitions

['the star that is the source of light and heat for the planets in the solar system',
 'the rays of the sun',
 'a person considered as a source of warmth or energy or glory etc',
 'any star around which a planetary system revolves',
 'first day of the week; observed as a day of rest and worship by most Christians',
 "expose one's body to the sun",
 'expose to the rays of the sun or affect by exposure to the sun']

In [39]:
Word('Book').definitions

['a written work or composition that has been published (printed on pages bound together)',
 'physical objects consisting of a number of pages bound together',
 'a compilation of the known facts regarding something or someone',
 'a written version of a play or other dramatic composition; used in preparing for a performance',
 'a record in which commercial accounts are recorded',
 'a collection of playing cards satisfying the rules of a card game',
 'a collection of rules or prescribed standards on the basis of which decisions are made',
 'the sacred writings of Islam revealed by God to the prophet Muhammad during his life at Mecca and Medina',
 'the sacred writings of the Christian religions',
 'a major division of a long written composition',
 'a number of sheets (ticket or stamps etc.) bound together on one edge',
 'engage for a performance',
 'arrange for and reserve (something for someone else) in advance',
 'record a charge in a police register',
 'register in a hotel booker']

### WordList

Just similar to the list as in python wil additional methods

In [45]:
from textblob import TextBlob
animals  = TextBlob('animal,elephant,mouse,cat')
animals.words

WordList(['animal', 'elephant', 'mouse', 'cat'])

In [47]:
animals.words.pluralize(),animals.words.singularize()

(WordList(['animals', 'elephants', 'mice', 'cats']),
 WordList(['animal', 'elephant', 'mouse', 'cat']))

### Spelling Correction

for correction of words you can simply use the correct method to correct spelling of that words

In [52]:
#Note as much good but year textblob had that functionality too.

data = TextBlob("can you sayy mmy name in loud format?")
data.correct()

TextBlob("can you say may name in loud format?")

In [55]:
from textblob import Word

spell = Word("numbr")
spell.spellcheck()

[('number', 0.9901315789473685), ('numb', 0.009868421052631578)]

### Get Word & Noun Phrase Frequencey

> using word_counts you can find the frequency 

> using word.count you can also find the frequency 

In [56]:
words = TextBlob("some words are not simply the words. these words are just like kind of expressions or gestures")
words.word_counts

defaultdict(int,
            {'some': 1,
             'words': 3,
             'are': 2,
             'not': 1,
             'simply': 1,
             'the': 1,
             'these': 1,
             'just': 1,
             'like': 1,
             'kind': 1,
             'of': 1,
             'expressions': 1,
             'or': 1,
             'gestures': 1})

In [57]:
words.words.count

<bound method WordList.count of WordList(['some', 'words', 'are', 'not', 'simply', 'the', 'words', 'these', 'words', 'are', 'just', 'like', 'kind', 'of', 'expressions', 'or', 'gestures'])>

In [64]:
word = TextBlob("some words are not simply the words. these words are just like kind of expressions or gestures")

word.words.count('words',case_sensitive=False)

3

In [66]:

word.noun_phrases.count('gestures',case_sensitive=False)

0

### Transalation & Langugae Detection

TextBlob can be used to translate between the languages

In [90]:
from textblob import TextBlob

word = TextBlob(u'Somehting is better than nothing')
word.translate(from_lang='en', to='ur')

TextBlob("کچھ بھی نہیں سے بہتر ہے")

In [95]:
word = TextBlob(u'Come Let have a fun. keep quiet I am studying. I used to take class everyday')
word.translate(from_lang='en', to='ur')

TextBlob("آؤ ایک مزہ کریں۔ خاموش رہو میں پڑھ رہا ہوں۔ میں ہر روز کلاس لیتا تھا")

In [96]:

word = TextBlob(u"آؤ ایک مزہ کریں۔ خاموش رہو میں پڑھ رہا ہوں۔ میں ہر روز کلاس لیتا تھا")
word.translate(from_lang='ur', to='en')


TextBlob("Let's have a fun. Keep quiet I'm reading. I used to take class every day")

### TextBlob Like python Strings


In [107]:
from textblob import TextBlob

text = TextBlob("Some people are walking aroung the park")

text.upper()
text.find('are') #returns the index os that word

word1 = TextBlob("word1")
word2 = TextBlob("word2")
word1>word2

False

### n-grams

It is way to generate the dataset like creating the combinations from the sentense, before model training we make n grams and then pass it to our model for training. We make n gram combinations when we need our model to do more focus on the words that are repititive e.g chatbots

In [110]:
blob = TextBlob("Somehting is better than Nothing")
blob.ngrams(n=3) #Making three combinations sequantially depedning on n

[WordList(['Somehting', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'Nothing'])]

In [112]:
blob = TextBlob("Somehting is better than Nothing")
blob.ngrams(n=2) #Making three combinations sequantially depedning on n

[WordList(['Somehting', 'is']),
 WordList(['is', 'better']),
 WordList(['better', 'than']),
 WordList(['than', 'Nothing'])]

### Text Classification System

In [126]:
train = [
    ('I Love this burger','pos'),
    ('This is an amazing place','pos'),
    ('I can smell the aroma of good food','pos'),
    ('This is my best possible work','pos'),
    ('What an awesome park','pos'),
    ('I donot like you','neg'),
    ('I am tired from you','neg'),
    ('I cannot pull his legs','neg'),
    ('he is my dangerous enemy!','neg'),
    ('my friend is horrible','neg'),    
]


test = [
    ('The burger is looking amazing','pos'),
    ('I donot like the way you react to people','neg'),
    ('I am not feeling fresh today','neg'),
    ('Your looks was amazing','pos'),
    ('He is afraid of you','neg'),
    ('I am having fun doing this','neg'),
  
]

In [127]:
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)

In [130]:
cl.classify("you are not good boy")


'neg'

In [140]:
prob_dist = cl.prob_classify("you are not good boy")
prob_dist.max()

'neg'

In [143]:
#Probablities of negative and positive reviews
prob_dist.prob('neg')

0.9260564740864501

In [144]:
#Evaluating the classifiers

cl.accuracy(test)

1.0

In [145]:
cl.show_informative_features(5)

Most Informative Features
            contains(my) = True              neg : pos    =      1.7 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
          contains(This) = False             neg : pos    =      1.6 : 1.0
           contains(you) = False             pos : neg    =      1.6 : 1.0
             contains(I) = True              neg : pos    =      1.4 : 1.0


### Updaing the Algorithm 

you can update the algorithm by adding new data 

In [146]:
#Kind of transfer learning 
new_data = [
    
    ('he is my best friend','pos'),
    ('I am very happy today','pos'),
    ('you are my best friend','pos'),
    ('he didnot tell a lie','pos')
]
cl.update(new_data)

True

In [147]:
cl.accuracy(test)

0.6666666666666666

In [152]:
cl.classify("I dondot like you")

'neg'

### For More information

> [link] https://textblob.readthedocs.io/en/dev/classifiers.html
