# TextBlob
- TextBlob is a Python library for processing textual data.
- It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, and more.
- TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

# Features
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration

# Get it now
- $ pip install -U textblob

- $ python -m textblob.download_corpora

### Create TextBlob
- First, import a TextBlob


In [12]:
from textblob import TextBlob

- Create our first TextBlob.

In [25]:
wiki = TextBlob("Use 4 spaces per indentation level. "
                "Beautiful is better than ugly. "
                "Explicit is better than implicit. "
                "Simple is better than complex.")
wiki

TextBlob("Use 4 spaces per indentation level. Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex.")

### Part-of-speech Tagging
- Part-of-speech tags can be accessed through the tags property.

In [26]:
wiki.tags

[('Use', 'NNP'),
 ('4', 'CD'),
 ('spaces', 'NNS'),
 ('per', 'IN'),
 ('indentation', 'NN'),
 ('level', 'NN'),
 ('Beautiful', 'NNP'),
 ('is', 'VBZ'),
 ('better', 'JJR'),
 ('than', 'IN'),
 ('ugly', 'RB'),
 ('Explicit', 'NNP'),
 ('is', 'VBZ'),
 ('better', 'JJR'),
 ('than', 'IN'),
 ('implicit', 'NN'),
 ('Simple', 'NN'),
 ('is', 'VBZ'),
 ('better', 'JJR'),
 ('than', 'IN'),
 ('complex', 'JJ')]

### Noun Phrase Extraction
- Similarly, noun phrases are accessed through the noun_phrases property.

In [27]:
wiki.noun_phrases

WordList(['indentation level', 'beautiful', 'explicit', 'simple'])

### Sentiment Analysis
- The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). 
- The polarity score is a float within the range [-1.0, 1.0]. 
- The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [28]:
wiki.sentiment

Sentiment(polarity=0.19285714285714287, subjectivity=0.6081632653061224)

In [29]:
wiki.sentiment.polarity

0.19285714285714287

### Tokenization
- You can break TextBlobs into words or sentences.

In [30]:
wiki.words

WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level', 'Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [31]:
wiki.sentences

[Sentence("Use 4 spaces per indentation level."),
 Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

In [32]:
for sentence in wiki.sentences:
    print(sentence.sentiment)

Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.06666666666666667, subjectivity=0.41904761904761906)


### Words Inflection and Lemmatization
- Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [33]:
wiki.words

WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level', 'Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [34]:
wiki.words[2].singularize()

'space'

In [35]:
wiki.words[-1].pluralize()

'complexes'

- Words can be lemmatized by calling the lemmatize method.

In [36]:
from textblob import Word
w = Word("octopi")
w.lemmatize()

'octopus'

In [37]:
w = Word("went")
w.lemmatize("v")  # Pass in WordNet part of speech (verb)

'go'

### WordNet Integration
- You can access the synsets for a Word via the synsets property or the get_synsets method, optionally passing in a part of speech.

In [38]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("octopus")
word.synsets

[Synset('octopus.n.01'), Synset('octopus.n.02')]

In [39]:
Word("hack").get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

- You can access the definitions for each synset via the definitions property or the define() method, which can also take an optional part-of-speech argument.

In [40]:
Word("octopus").definitions

['tentacles of octopus prepared as food',
 'bottom-living cephalopod having a soft oval body with eight long tentacles']

- You can also create synsets directly.

In [41]:
from textblob.wordnet import Synset
octopus = Synset('octopus.n.02')
shrimp = Synset('shrimp.n.03')
octopus.path_similarity(shrimp)

0.1111111111111111

### WordLists
- A WordList is just a Python list with additional methods.

In [42]:
animals = TextBlob("cat dog octopus")
animals.words

WordList(['cat', 'dog', 'octopus'])

In [43]:
animals.words.pluralize()

WordList(['cats', 'dogs', 'octopodes'])

### Spelling Correction
- Use the correct() method to attempt spelling correction.

In [44]:
b = TextBlob("I havv goood speling!")
print(b.correct())

I have good spelling!


- Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.

In [45]:
from textblob import Word
w = Word('falibility')
w.spellcheck()

[('fallibility', 1.0)]

- Spelling correction is based on Peter Norvig’s “How to Write a Spelling Corrector”[1] as implemented in the pattern library. It is about 70% accurate [2].


### Get Word and Noun Phrase Frequencies
- There are two ways to get the frequency of a word or noun phrase in a TextBlob.

- The first is through the word_counts dictionary.

In [46]:
monty = TextBlob("We are no longer the Knights who say Ni. "
                 "We are now the Knights who say Ekki ekki ekki PTANG.")
monty.word_counts['ekki']

3

- If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0.

- The second way is to use the count() method.

In [47]:
monty.words.count('ekki')

3

- You can specify whether or not the search should be case-sensitive (default is False).

In [48]:
monty.words.count('ekki', case_sensitive=True)

2

- Each of these methods can also be used with noun phrases.

In [49]:
wiki.noun_phrases.count('python')

0

### Parsing
- Use the parse() method to parse the text.

In [50]:
b = TextBlob("And now for something completely different.")
print(b.parse())

And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O


- By default, TextBlob uses pattern’s parser [3].

### TextBlobs Are Like Python Strings!
- You can use Python’s substring syntax.

In [51]:
wiki[0:19]

TextBlob("Use 4 spaces per in")

- You can use common string methods.

In [52]:
wiki.upper()

TextBlob("USE 4 SPACES PER INDENTATION LEVEL. BEAUTIFUL IS BETTER THAN UGLY. EXPLICIT IS BETTER THAN IMPLICIT. SIMPLE IS BETTER THAN COMPLEX.")

In [53]:
wiki.find("Simple")

101

- You can make comparisons between TextBlobs and strings.

In [54]:
apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
apple_blob < banana_blob

True

In [55]:
apple_blob == 'apples'

True

- You can concatenate and interpolate TextBlobs and strings.

In [56]:
apple_blob + ' and ' + banana_blob

TextBlob("apples and bananas")

In [57]:
"{0} and {1}".format(apple_blob, banana_blob)

'apples and bananas'

### n-grams
- The TextBlob.ngrams() method returns a list of tuples of n successive words.

In [58]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

### Get Start and End Indices of Sentences
- Use sentence.start and sentence.end to get the indices where a sentence starts and ends within a TextBlob.

In [59]:
for s in wiki.sentences:
    print(s)
    print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))

Use 4 spaces per indentation level.
---- Starts at index 0, Ends at index 35
Beautiful is better than ugly.
---- Starts at index 36, Ends at index 66
Explicit is better than implicit.
---- Starts at index 67, Ends at index 100
Simple is better than complex.
---- Starts at index 101, Ends at index 131


# Tutorial: Building a Text Classification System

- The textblob.classifiers module makes it simple to create custom classifiers.

- As an example, let’s create a custom sentiment analyzer.

### Loading Data and Creating a Classifier
- First we’ll create some training and test data.

In [60]:
train = [
    ("I love this sandwich.", "pos"),
    ("this is an amazing place!", "pos"),
    ("I feel very good about these beers.", "pos"),
    ("this is my best work.", "pos"),
    ("what an awesome view", "pos"),
    ("I do not like this restaurant", "neg"),
    ("I am tired of this stuff.", "neg"),
    ("I can't deal with this", "neg"),
    ("he is my sworn enemy!", "neg"),
    ("my boss is horrible.", "neg"),
]
test = [
    ("the beer was good.", "pos"),
    ("I do not enjoy my job", "neg"),
    ("I ain't feeling dandy today.", "neg"),
    ("I feel amazing!", "pos"),
    ("Gary is a friend of mine.", "pos"),
    ("I can't believe I'm doing this.", "neg"),
]


- Now we’ll create a Naive Bayes classifier, passing the training data into the constructor.

In [61]:
from textblob.classifiers import NaiveBayesClassifier
cl = NaiveBayesClassifier(train)

### Loading Data from Files
- You can also load data from common file formats including CSV, JSON, and TSV.

- CSV files should be formatted like so:

    - I love this sandwich.,pos
    - This is an amazing place!,pos
    - I do not like this restaurant,neg

- JSON files should be formatted like so:

    - [
        - {"text": "I love this sandwich.", "label": "pos"},
        - {"text": "This is an amazing place!", "label": "pos"},
        - {"text": "I do not like this restaurant", "label": "neg"}
    - ]

- You can then pass the opened file into the constructor.

In [63]:
with open(r'C:\Users\admin\New_Folder\TextBlob.json', 'r') as fp:
    cl = NaiveBayesClassifier(fp, format="json")

### Classifying Text
- Call the classify(text) method to use the classifier.

In [64]:
cl.classify("This is an amazing library!")

'pos'

- You can get the label probability distribution with the prob_classify(text) method.

In [65]:
prob_dist = cl.prob_classify("This one's a doozy.")
prob_dist.max()

'pos'

In [66]:
round(prob_dist.prob("pos"), 2)

0.99

In [67]:
round(prob_dist.prob("neg"), 2)

0.01

### Classifying TextBlobs
- Another way to classify text is to pass a classifier into the constructor of TextBlob and call its classify() method.

In [68]:
from textblob import TextBlob
blob = TextBlob("The beer is good. But the hangover is horrible.", classifier=cl)
blob.classify()

'pos'

- The advantage of this approach is that you can classify sentences within a TextBlob.

In [69]:
for s in blob.sentences:
    print(s)
    print(s.classify())

The beer is good.
pos
But the hangover is horrible.
pos


### Evaluating Classifiers
- To compute the accuracy on our test set, use the accuracy(test_data) method.

In [70]:
cl.accuracy(test)

0.6666666666666666

- Use the show_informative_features() method to display a listing of the most informative features.

In [71]:
cl.show_informative_features(5) 

Most Informative Features
             contains(I) = True              neg : pos    =      1.5 : 1.0
          contains(This) = False             neg : pos    =      1.5 : 1.0
       contains(amazing) = False             neg : pos    =      1.5 : 1.0
            contains(an) = False             neg : pos    =      1.5 : 1.0
            contains(is) = False             neg : pos    =      1.5 : 1.0


### Updating Classifiers with New Data
- Use the update(new_data) method to update a classifier with new training data.

In [72]:
new_data = [
    ("She is my best friend.", "pos"),
    ("I'm happy to have a new friend.", "pos"),
    ("Stay thirsty, my friend.", "pos"),
    ("He ain't from around here.", "neg"),
]
cl.update(new_data)

True

In [73]:
cl.accuracy(test)

0.6666666666666666

### Feature Extractors
- By default, the NaiveBayesClassifier uses a simple feature extractor that indicates which words in the training set are contained in a document.

- For example, the sentence “I feel happy” might have the features contains(happy): True or contains(angry): False.

- You can override this feature extractor by writing your own. A feature extractor is simply a function with document (the text to extract features from) as the first argument. The function may include a second argument, train_set (the training dataset), if necessary.

- The function should return a dictionary of features for document.

- For example, let’s create a feature extractor that just uses the first and last words of a document as its features.

In [76]:
def end_word_extractor(document):
    tokens = document.split()
    first_word, last_word = tokens[0], tokens[-1]
    feats = {}
    feats["first({0})".format(first_word)] = True
    feats["last({0})".format(last_word)] = False
    return feats

features = end_word_extractor("I feel happy")
assert features == {"last(happy)": False, "first(I)": True}

- We can then use the feature extractor in a classifier by passing it as the second argument of the constructor.

In [77]:
cl2 = NaiveBayesClassifier(test, feature_extractor=end_word_extractor)
blob = TextBlob("I'm excited to try my new classifier.", classifier=cl2)
blob.classify()

'pos'

- Want to know more about TextBlob, Visit link
https://textblob.readthedocs.io/en/dev/