# BDTA Lesson 18: Using TextBlob

This lesson shows a number of things you can do with [TextBlob](https://textblob.readthedocs.io/en/dev/), a library for processing text.

## Getting a Text

Now we will get a text to process with TextBlob.

First we see what text files we have. 

In [1]:
ls *.txt

FullText.txt                performanceConcordance.txt
Hume Enquiry.txt            theWritingStory.txt
StoryOfWriting.txt          truthConcordance.txt
bigdata.txt


We are going to use the "Hume Enquiry.txt" from the Gutenberg Project. You can use whatever text you want. We print the first 50 characters to check.

In [2]:
theText2Use = "Hume Enquiry.txt"
with open(theText2Use, "r") as fileToRead:
    theString = fileToRead.read()
    
print("This string has", len(theString), "characters.")
print(theString[:50])

This string has 366798 characters.
The Project Gutenberg EBook of An Enquiry Concerni


## Creating a TextBlob

To use the TextBlob library we have to import it, but first you need to install the library. TextBlob can be installed from the command line using these commands:

```$ pip install -U textblob```

```$ python -m textblob.download_corpora```

In [7]:
from textblob import TextBlob

Now we can create a blob of text.

In [6]:
theBlob = TextBlob(theString)
theBlob[:100]

TextBlob("The Project Gutenberg EBook of An Enquiry Concerning Human Understanding, by 
David Hume and L. A. S")

### Part of Speech

TextBlob can give us a list of the words and their *part of speech.*

In [10]:
partsOfSpeech = theBlob.tags
partsOfSpeech[:20]

[('The', 'DT'),
 ('Project', 'NNP'),
 ('Gutenberg', 'NNP'),
 ('EBook', 'NNP'),
 ('of', 'IN'),
 ('An', 'DT'),
 ('Enquiry', 'NNP'),
 ('Concerning', 'NNP'),
 ('Human', 'NNP'),
 ('Understanding', 'NNP'),
 ('by', 'IN'),
 ('David', 'NNP'),
 ('Hume', 'NNP'),
 ('and', 'CC'),
 ('L.', 'NNP'),
 ('A.', 'NNP'),
 ('Selby-Bigge', 'NNP'),
 ('This', 'DT'),
 ('eBook', 'NN'),
 ('is', 'VBZ')]

### Phrase list

TextBlob can give us a list of *noun phrases*. This can be useful in identifying common entites referred to in the text.

In [13]:
theNPhraseList = theBlob.noun_phrases
theNPhraseList[:10]

WordList(['project gutenberg ebook', 'enquiry concerning', 'understanding', 'david hume', 'l. a. selby-bigge', 'restrictions whatsoever', 'project gutenberg license', 'title', 'enquiry concerning', 'understanding author'])

We can get a dictionary of noun phrases and their counts.

In [18]:
nounPhraseCounts = theBlob.np_counts
nounPhraseCounts

defaultdict(int,
            {'project gutenberg ebook': 2,
             'enquiry concerning': 3,
             'understanding': 4,
             'david hume': 3,
             'l. a. selby-bigge': 2,
             'restrictions whatsoever': 2,
             'project gutenberg license': 2,
             'title': 1,
             'understanding author': 1,
             'david hume l. a. selby-bigge posting date': 1,
             'november': 1,
             'ebook': 1,
             'release date': 1,
             'january': 5,
             'posted': 1,
             'october': 1,
             'language': 1,
             'english': 2,
             'start of this project gutenberg ebook enquiry concerning human understanding': 1,
             'produced': 2,
             'jonathan ingram': 2,
             'project gutenberg distributed proofreaders an enquiry concerning human understanding': 1,
             'by david hume extracted': 1,
             'enquiries concerning': 1,
             'concerni

In [21]:
for key, value in nounPhraseCounts.items():
    if value > 8:
        print(key)

human nature
common life
human life
human understanding
were
god
] [
sensible qualities
external objects
necessary connexion
deity
voluntary actions
human actions
human testimony
v. _scepticism_
project gutenberg-tm
electronic works
project gutenberg
electronic work
project gutenberg literary archive


### Sentences

TextBlob can extract sentences for us.

In [22]:
theSentences = theBlob.sentences
theSentences[18:28]

[Sentence("Moral philosophy, or the science of human nature, may be treated
 after two different manners; each of which has its peculiar merit, and
 may contribute to the entertainment, instruction, and reformation of
 mankind."), Sentence("The one considers man chiefly as born for action; and as
 influenced in his measures by taste and sentiment; pursuing one object,
 and avoiding another, according to the value which these objects seem to
 possess, and according to the light in which they present themselves."), Sentence("As
 virtue, of all objects, is allowed to be the most valuable, this species
 of philosophers paint her in the most amiable colours; borrowing all
 helps from poetry and eloquence, and treating their subject in an easy
 and obvious manner, and such as is best fitted to please the
 imagination, and engage the affections."), Sentence("They select the most striking
 observations and instances from common life; place opposite characters
 in a proper contrast; and allurin

### Sentiment of Sentences

We can then get the sentiment for each sentence.

In [23]:
theSentiments = [sentence.sentiment.polarity for sentence in theBlob.sentences]
theSentiments[:20]

[0.0,
 0.05,
 0.0,
 -0.3,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.08333333333333333,
 0.0,
 0.0,
 0.0,
 0.0,
 0.16666666666666666]

We can then get a list of postive, neutral and negative sentences.

In [24]:
posList = [sentence.sentiment.polarity for sentence in theBlob.sentences if sentence.sentiment.polarity > 0]
neutralList = [sentence.sentiment.polarity for sentence in theBlob.sentences if sentence.sentiment.polarity == 0]
negList = [sentence.sentiment.polarity for sentence in theBlob.sentences if sentence.sentiment.polarity < 0]

print("Positive sentences: ", len(posList), ", Neutral sentences: ", len(neutralList), ", Negative sentences:", len(negList))

Positive sentences:  994 , Neutral sentences:  718 , Negative sentences: 384


Here is a list of the negative sentences and their polarity under a certain amount.

In [38]:
theNegSentiments = [(sentence, sentence.sentiment.polarity) for sentence in theBlob.sentences if sentence.sentiment.polarity < -.5]
theNegSentiments[:20]

[(Sentence("And though these researches may appear painful and fatiguing,
  it is with some minds as with some bodies, which being endowed with
  vigorous and florid health, require severe exercise, and reap a pleasure
  from what, to the generality of mankind, may seem burdensome and
  laborious."), -0.7), (Sentence("And
  if it be impossible to assign any, this will serve to confirm our
  suspicion."),
  -0.6666666666666666), (Sentence("Were we to attempt a _definition_ of this sentiment, we should,
  perhaps, find it a very difficult, if not an impossible task; in the
  same manner as if we should endeavour to define the feeling of cold or
  passion of anger, to a creature who never had any experience of these
  sentiments."), -0.5233333333333332), (Sentence("It must be a
      miserable imposture, indeed, that does not prevail in
      that contest."),
  -1.0), (Sentence("Belief produced by a majority of chances by an inexplicable
            contrivance of Nature, 46 (cf."), -0.6)

### Lemmatizing 

TextBlob can also lemmatize words which can improve word counts.

In [47]:
from textblob import Word
theBlobLow = theBlob.lower()
lemWords = [(word.lemmatize(), word) for word in theBlobLow.words]
lemWords[:50]

[('the', 'the'),
 ('project', 'project'),
 ('gutenberg', 'gutenberg'),
 ('ebook', 'ebook'),
 ('of', 'of'),
 ('an', 'an'),
 ('enquiry', 'enquiry'),
 ('concerning', 'concerning'),
 ('human', 'human'),
 ('understanding', 'understanding'),
 ('by', 'by'),
 ('david', 'david'),
 ('hume', 'hume'),
 ('and', 'and'),
 ('l', 'l'),
 ('a', 'a'),
 ('selby-bigge', 'selby-bigge'),
 ('this', 'this'),
 ('ebook', 'ebook'),
 ('is', 'is'),
 ('for', 'for'),
 ('the', 'the'),
 ('use', 'use'),
 ('of', 'of'),
 ('anyone', 'anyone'),
 ('anywhere', 'anywhere'),
 ('at', 'at'),
 ('no', 'no'),
 ('cost', 'cost'),
 ('and', 'and'),
 ('with', 'with'),
 ('almost', 'almost'),
 ('no', 'no'),
 ('restriction', 'restrictions'),
 ('whatsoever', 'whatsoever'),
 ('you', 'you'),
 ('may', 'may'),
 ('copy', 'copy'),
 ('it', 'it'),
 ('give', 'give'),
 ('it', 'it'),
 ('away', 'away'),
 ('or', 'or'),
 ('re-use', 're-use'),
 ('it', 'it'),
 ('under', 'under'),
 ('the', 'the'),
 ('term', 'terms'),
 ('of', 'of'),
 ('the', 'the')]

### Exercise

TextBlob has other useful tool like a simple [Text Classifier](https://textblob.readthedocs.io/en/dev/classifiers.html).

For this exercise, try creating a text classifier for sentences. Write a list of sentences that have contrasting sentiments. For example, loving, neutral and hateful sentences. Use these to train the classifier and then try new sentences on the classifier.

Some things to think about:

* 