# TextBlob

TextBlob is a Python library for processing text. It has some of the same functionality as NLTK but with a nice API, and some functions run faster than NLTK. Read [more here](https://textblob.readthedocs.io/en/dev/)

Install:
$pip install -U textblob

Corpora for some features:
$python -m textblob.download_corpora

As usual, first we import it.

In [1]:
from textblob import TextBlob

In [2]:
# start with raw text from the TextBlob page
raw_text = """TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API \
for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, \
sentiment analysis, and more.
"""

In [3]:
# make a TextBlob object - this annotates the text
blob = TextBlob(raw_text)
# see the first few pos tags
blob.tags[:5]

[('TextBlob', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('Python', 'NNP'),
 ('2', 'CD')]

In [4]:
# extract the words
blob.words

WordList(['TextBlob', 'is', 'a', 'Python', '2', 'and', '3', 'library', 'for', 'processing', 'textual', 'data', 'It', 'provides', 'a', 'consistent', 'API', 'for', 'diving', 'into', 'common', 'natural', 'language', 'processing', 'NLP', 'tasks', 'such', 'as', 'part-of-speech', 'tagging', 'noun', 'phrase', 'extraction', 'sentiment', 'analysis', 'and', 'more'])

In [5]:
# singularize
from textblob import Word
w = Word('alumni')
print("lemma of alumni is:", w.lemmatize())
print("singular of alumni is:", w.singularize())
print("plural of alumni is:", w.pluralize())


lemma of alumni is: alumnus
singular of alumni is: alumni
plural of alumni is: alumnis


In [6]:
# extract noun phrases
blob.noun_phrases
# a noun phrase can  be a noun by itself or a noun and its dependents, like adjectives, determiners, prep phrases

WordList(['textblob', 'python', 'processing textual data', 'api', 'common natural language processing', 'nlp', 'noun phrase extraction', 'sentiment analysis'])

In [7]:
# extract sentences 
for sentence in blob.sentences:
    print(sentence)

TextBlob is a Python (2 and 3) library for processing textual data.
It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.


In [8]:
# sentiment analysis
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)

0.0
0.11000000000000001


In [11]:
# let's try translation
#TextBlob.detect_language(blob)
#blob.translate(to='es')

# NOTE that this feature is currently not working, check back soon

In [12]:
# spelling correction
b = TextBlob("I can't spelll")
b.correct()

TextBlob("I can't spell")

In [13]:
# parsing
for sentence in blob.sentences:
    print(sentence.parse())

TextBlob/NN/B-NP/O is/VBZ/B-VP/O a/DT/B-NP/O Python/NNP/I-NP/O (/(/O/O 2/IN/B-PP/O and/CC/O/O 3/CD/O/O )/)/O/O library/NN/B-NP/O for/IN/B-PP/B-PNP processing/NN/B-NP/I-PNP textual/JJ/I-NP/I-PNP data/NNS/I-NP/I-PNP ././O/O
It/PRP/B-NP/O provides/VBZ/B-VP/O a/DT/B-NP/O consistent/JJ/I-NP/O API/NNP/I-NP/O for/IN/B-PP/B-PNP diving/VBG/B-VP/I-PNP into/IN/B-PP/B-PNP common/JJ/B-NP/I-PNP natural/JJ/I-NP/I-PNP language/NN/I-NP/I-PNP processing/NN/I-NP/I-PNP (/(/O/O NLP/NN/B-NP/O )/)/O/O tasks/NNS/B-NP/O such/JJ/B-ADJP/O as/IN/B-PP/O part-of-speech/JJ/B-ADJP/O tagging/VBG/B-VP/O ,/,/O/O noun/NN/B-NP/O phrase/NN/I-NP/O extraction/NN/I-NP/O ,/,/O/O sentiment/NN/B-NP/O analysis/NN/I-NP/O ,/,/O/O and/CC/O/O more/JJR/B-ADJP/O ././O/O


In [14]:
# ngrams
blob.ngrams(n=2)[:5]

[WordList(['TextBlob', 'is']),
 WordList(['is', 'a']),
 WordList(['a', 'Python']),
 WordList(['Python', '2']),
 WordList(['2', 'and'])]

TextBlob objects are like Python strings in that you can apply Python methods to them.

In [15]:
blob.upper()

TextBlob("TEXTBLOB IS A PYTHON (2 AND 3) LIBRARY FOR PROCESSING TEXTUAL DATA. IT PROVIDES A CONSISTENT API FOR DIVING INTO COMMON NATURAL LANGUAGE PROCESSING (NLP) TASKS SUCH AS PART-OF-SPEECH TAGGING, NOUN PHRASE EXTRACTION, SENTIMENT ANALYSIS, AND MORE.
")

In [16]:
blob.words.count('a', case_sensitive=False)

2

TextBlob also handles WordNet, with similar syntax to NLTK, but runs a little faster.

In [17]:
from textblob.wordnet import Synset
pear = Synset("pear.n.01")
apple = Synset("apple.n.01")
pear.path_similarity(apple)

0.3333333333333333

The TextBlob site has a lot of tutorials, including:

- [finding tf-idf](https://stevenloria.com/tf-idf/)
- [text classification](https://stevenloria.com/simple-text-classification/)

Steven Loria, the creator, adds features and tutorials frequently, so check the site often.