# NLP tasks using TextBlob
Simple rule-based API for sentiment analysis

## I. Part-of-Speech (POS) Tagging
Dividing text or a sentence into a sequence of tokens. 
1. Create a textblob object and pass a string with it. 
2. Call functions of textblob in order to do a specific task. 
    - `tags` returns a list of tuples of the form (word, POS tag) 
        - see https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html for list of POS

In [6]:
from textblob import TextBlob

wiki = TextBlob("Python is a high-level, general-purpose programming language.")

# part-of-speech (POS) tagging 
wiki.tags

[('Python', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('high-level', 'JJ'),
 ('general-purpose', 'JJ'),
 ('programming', 'NN'),
 ('language', 'NN')]

## II. Noun Phrase Extractions 
Similarly, noun phrases are accessed through the noun_phrases property

In [7]:
wiki.noun_phrases

WordList(['python'])

## III. Sentiment Analysis
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [11]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

In [12]:
testimonial.sentiment.polarity

0.39166666666666666

## IV. Tokenization 
You can break down TextBlobs into words or sentences. 

In [14]:
zen = TextBlob("Beautiful is better than ugly. "
                "Explicit is better than implicit. "
                "Simple is better than complex.")
zen.words

WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [15]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

Sentence objects have the same properties and methods as TextBlobs.

In [16]:
for sentence in zen.sentences: 
    print(sentence.sentiment)

Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.06666666666666667, subjectivity=0.41904761904761906)


## V. Words Inflection and Lemmatization 
Each word in `TextBlob.words` or `Sentence.words` is a `Word` object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [17]:
sentence = TextBlob("Use 4 spaces per indentdation level.")
sentence.words

WordList(['Use', '4', 'spaces', 'per', 'indentdation', 'level'])

In [18]:
# converting 'spaces' to be singular using this function
sentence.words[2].singularize()

'space'

In [19]:
sentence.words[-1].pluralize()

'levels'

Words can be lemmatized by calling the `lemmatize` method. <br>
<b>lemmatize</b>: sort words by grouping inflected or variant forms of the same word.

In [21]:
from textblob import Word

w = Word("octopi")
w.lemmatize()

'octopus'

In [22]:
w = Word("went")
w.lemmatize("v") # pass in WordNet part of speech (verb) to 

'go'

## VI. WordNet Integration
You can access the synsets for a `Word` via the `synsets` property or the `get_synsets` method, optionally passing in a part of speech. <br> 
<b>synset</b>: a set of synonyms that share a common meaning<br>
The other parts of speech are <b>NOUN, ADJ </b>and <b>ADV</b>. A synset is identified with a 3-part name of the form: `word.pos.nn`

In [24]:
from textblob import Word
from textblob.wordnet import VERB

word = Word("octopus")
word.synsets

[Synset('octopus.n.01'), Synset('octopus.n.02')]

In [26]:
Word("hack").get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

## VII. WordLists
A `WordList` is just a Python list with additional methods.

In [27]:
animals = TextBlob("cat dog octopus")
animals.words

WordList(['cat', 'dog', 'octopus'])

In [28]:
animals.words.pluralize()

WordList(['cats', 'dogs', 'octopodes'])

# VIII. Spelling Correction
Use the `correct()` methods to attempt spelling correction.

In [29]:
b = TextBlob("I havv goood speling!")
print(b.correct())

I have good spelling!


<b>Word</b> objects have a `spellcheck()` `Word.spellcheck()` method that returns a list of `(word, confidence)` tuples with spelling suggestions.

In [30]:
from textblob import Word

w = Word('falibility')
w.spellcheck()

[('fallibility', 1.0)]

## IX. Get Word and Noun Phrase Frequencies 
There are two ways to get the frequency of a word or noun phrase in a <b>TextBlob</b>. <br> 
1. Through the `word_counts` dictionary <br>
If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0. 

In [31]:
monty = TextBlob("We are no longer the Knights who say Ni."
                "We are now the Knights who say Ekki ekki ekki PTANG.")
monty.word_counts['ekki']

3

2. Use the `count()` method.

In [32]:
monty.words.count('ekki')

3

You can specify whether or not the search should be case-sensitive (default is False).

In [33]:
monty.words.count('ekki', case_sensitive=True)

2

Each of these methods can also be used with noun phrases.

In [34]:
wiki.noun_phrases.count('python')

1

## X. Parsing
Use the `parse()` method to parse the text.

In [35]:
b = TextBlob("And now for something completely different.")
print(b.parse())

And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O


## XI. TextBlobs are like Python strings!
You can use Python's substring syntax.

In [36]:
zen[0:19]

TextBlob("Beautiful is better")

You can use common string methods.

In [37]:
zen.upper()

TextBlob("BEAUTIFUL IS BETTER THAN UGLY. EXPLICIT IS BETTER THAN IMPLICIT. SIMPLE IS BETTER THAN COMPLEX.")

In [38]:
zen.find("Simple")

65

You can make comparisons between TextBlobs and strings.

In [39]:
apple_blob = TextBlob('apples')
banana_blob = TextBlob('bananas')
apple_blob < banana_blob

True

In [40]:
apple_blob == 'apples'

True

You can concatenate and interpolate TextBlobs and strings.

In [41]:
apple_blob + ' and ' + banana_blob

TextBlob("apples and bananas")

In [42]:
"{0} and {1}".format(apple_blob, banana_blob)

'apples and bananas'

## XII. n-grams
The `TextBlob.ngrams()` method returns a list of tuples of <b>n</b> successive words.

In [43]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

## XIII. Get Start and End Indices of Sentences
Use `sentence.start` and `sentence.end` to get the indices where a sentence starts and ends within a <b>TextBlob</b>.

In [45]:
for s in zen.sentences:
    print(s)
    print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))

Beautiful is better than ugly.
---- Starts at index 0, Ends at index 30
Explicit is better than implicit.
---- Starts at index 31, Ends at index 64
Simple is better than complex.
---- Starts at index 65, Ends at index 95
