#### Create a TextBlob
First, the import.

In [2]:
from textblob import TextBlob

Let’s create our first TextBlob.

In [2]:
wiki = TextBlob("Python is a high-level, general-purpose programming language.")

**Part-of-speech Tagging**

Part-of-speech tags can be accessed through the tags property.

In [4]:
wiki.tags

[('Python', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('high-level', 'JJ'),
 ('general-purpose', 'JJ'),
 ('programming', 'NN'),
 ('language', 'NN')]

**Noun Phrase Extraction**

Similarly, noun phrases are accessed through the noun_phrases property.

In [6]:
wiki.noun_phrases

WordList(['python'])

**Sentiment Analysis**

The sentiment property returns a named tuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [7]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
print(testimonial.sentiment)

test = TextBlob("I really hate how people treat others bad")
print(test.sentiment)


text = TextBlob("I really love how people are so welcoming here")
print(text.sentiment)

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)
Sentiment(polarity=-0.75, subjectivity=0.7833333333333333)
Sentiment(polarity=0.5, subjectivity=0.6)


In [9]:
testimonial.sentiment.polarity

0.39166666666666666

**Tokenization**

You can break TextBlobs into words or sentences.

In [10]:
zen = TextBlob("Beautiful is better than ugly. "
                "Explicit is better than implicit. "
               "Simple is better than complex.")
zen.words

WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [11]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

Sentence objects have the same properties and methods as TextBlobs.

In [12]:
for sentence in zen.sentences:
     print(sentence.sentiment)

Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.06666666666666667, subjectivity=0.41904761904761906)


**Words Inflection and Lemmatization**

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [14]:
sentence = TextBlob('Use 4 spaces per indentation level.')
sentence.words

WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])

In [15]:
sentence.words[2].singularize()

'space'

In [16]:
sentence.words[-1].pluralize()

'levels'

Words can be lemmatized by calling the lemmatize method.

In [17]:
from textblob import Word
w = Word("octopi")
w.lemmatize()

'octopus'

In [18]:
w = Word("went")
w.lemmatize("v")  # Pass in WordNet part of speech (verb)

'go'

**WordNet Integration**

You can access the synsets for a Word via the synsets property or the get_synsets method, optionally passing in a part of speech.

In [19]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("octopus")
word.synsets

[Synset('octopus.n.01'), Synset('octopus.n.02')]

In [20]:
Word("hack").get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

You can access the definitions for each synset via the definitions property or the define() method, which can also take an optional part-of-speech argument.

In [21]:
Word("octopus").definitions

['tentacles of octopus prepared as food',
 'bottom-living cephalopod having a soft oval body with eight long tentacles']

In [22]:
from textblob.wordnet import Synset
octopus = Synset('octopus.n.02')
shrimp = Synset('shrimp.n.03')
octopus.path_similarity(shrimp)

0.1111111111111111

**WordLists**

A WordList is just a Python list with additional methods.

In [26]:
animals = TextBlob("cat dog octopus")
animals.words

WordList(['cat', 'dog', 'octopus'])

In [28]:
animals = TextBlob("cat dog octopus")
animals.words

animals.words.pluralize()

WordList(['cats', 'dogs', 'octopodes'])

**Spelling Correction**

Use the correct() method to attempt spelling correction.

In [30]:
b = TextBlob("I havv goood speling!")
print(b.correct())

I have good spelling!


Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.

In [32]:
from textblob import Word
w = Word('falibility')
w.spellcheck()

[('fallibility', 1.0)]

**Get Word and Noun Phrase Frequencies**

There are two ways to get the frequency of a word or noun phrase in a TextBlob.

The first is through the word_counts dictionary.

In [34]:
monty = TextBlob("We are no longer the Knights who say Ni. "
                     "We are now the Knights who say Ekki ekki ekki PTANG.")
monty.word_counts['ekki']

3

If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0.

The second way is to use the count() method.

In [35]:
monty.words.count('ekki')

3

You can specify whether or not the search should be case-sensitive (default is False).

In [36]:
monty.words.count('ekki', case_sensitive=True)

2

Each of these methods can also be used with noun phrases.

In [37]:
wiki.noun_phrases.count('python')

1

**Translation and Language Detection**


TextBlobs can be translated between languages.

In [38]:
en_blob = TextBlob(u'Simple is better than complex.')
en_blob.translate(to='es')

TextBlob("Simple es mejor que complejo.")

If no source language is specified, TextBlob will attempt to detect the language. You can specify the source language explicitly, like so. Raises TranslatorError if the TextBlob cannot be translated into the requested language or NotTranslated if the translated result is the same as the input string.

In [39]:
chinese_blob = TextBlob(u"美丽优于丑陋")
chinese_blob.translate(from_lang="zh-CN", to='en')

TextBlob("Beauty is better than ugly")

In [40]:
chinese_blob = TextBlob(u"美丽优于丑陋")
chinese_blob.translate(from_lang="zh-CN", to='en')

TextBlob("Beauty is better than ugly")

You can also attempt to detect a TextBlob’s language using TextBlob.detect_language().

In [41]:
b = TextBlob(u"بسيط هو أفضل من مجمع")
b.detect_language()

'ar'

Language translation and detection is powered by the Google Translate API.

**Parsing**

Use the parse() method to parse the text.

In [42]:
b = TextBlob("And now for something completely different.")
print(b.parse())

And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O


**TextBlobs Are Like Python Strings!**

You can use Python’s substring syntax.

In [43]:
zen[0:19]

TextBlob("Beautiful is better")

You can use common string methods.

In [45]:
zen.upper()

TextBlob("BEAUTIFUL IS BETTER THAN UGLY. EXPLICIT IS BETTER THAN IMPLICIT. SIMPLE IS BETTER THAN COMPLEX.")

In [46]:
zen.find("Simple")

65

You can make comparisons between TextBlobs and strings.

In [47]:
apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
apple_blob < banana_blob

True

In [48]:
apple_blob == 'apples'

True

You can concatenate and interpolate TextBlobs and strings.

In [49]:
apple_blob + ' and ' + banana_blob

TextBlob("apples and bananas")

In [50]:
"{0} and {1}".format(apple_blob, banana_blob)

'apples and bananas'

**n-grams**

The TextBlob.ngrams() method returns a list of tuples of n successive words

In [51]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

**Get Start and End Indices of Sentences**

Use sentence.start and sentence.end to get the indices where a sentence starts and ends within a TextBlob.

In [54]:
for s in zen.sentences:
         print(s)
         print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))


Beautiful is better than ugly.
---- Starts at index 0, Ends at index 30
Explicit is better than implicit.
---- Starts at index 31, Ends at index 64
Simple is better than complex.
---- Starts at index 65, Ends at index 95
