Text Classification with TextBlob
--------

Let's experiment with an existing library for classification: [TextBlob](https://textblob.readthedocs.org/)

Training
-----

In [53]:
reset -fs

In [54]:
from textblob.classifiers import NaiveBayesClassifier

In [72]:
train = [("🐈 🐯 🐱 🐩 🐱",                   'cat') ,
         ("🐶 🐶 🐈 🐶 🐩 🐈 🐶 🐶",         'dog'),
         ("🐈 🐈 🐯 🐶 🐈",                   'cat'),
         ("🐈 🐈 🐈",                         'cat'),
         ("🐶 🐶 🐯 🐈 🐩 🐱 🐩 🐶 🐩 🐶 ", 'dog'),
]

In [73]:
cl = NaiveBayesClassifier(train)

In [74]:
cl.classify("🐱")

'cat'

In [75]:
cl.classify("🐶 🐶")

'cat'

In [76]:
cl.classify("🐶 🐱")

'dog'

In [77]:
cl.classify("🐈 🐈 🐶 🐶 🐩 🐯 🐯")

'dog'

In [78]:
cl.classify("🐬")

'cat'

In [79]:
cl.show_informative_features()

Most Informative Features
             contains(🐶) = True              dog : cat    =      2.2 : 1.0
             contains(🐩) = True              dog : cat    =      2.2 : 1.0
             contains(🐯) = False             dog : cat    =      1.3 : 1.0
             contains(🐱) = True              dog : cat    =      1.3 : 1.0
             contains(🐯) = True              cat : dog    =      1.2 : 1.0
             contains(🐱) = False             cat : dog    =      1.2 : 1.0
             contains(🐈) = True              dog : cat    =      1.0 : 1.0


<br>
<br> 
<br>

----

Textblob with words
-----

In [63]:
train = [
    ('I love this sandwich.', 'pos'),
    ('This is an amazing place!', 'pos'),
    ('I feel very good about these beers.', 'pos'),
    ('This is my best work.', 'pos'),
    ("What an awesome view", 'pos'),
    ('I do not like this restaurant', 'neg'),
    ('I am tired of this stuff.', 'neg'),
    ("I can't deal with this", 'neg'),
    ('He is my sworn enemy!', 'neg'),
    ('My boss is horrible.', 'neg')
]
test = [
    ('The beer was good.', 'pos'),
    ('I do not enjoy my job', 'neg'),
    ("I ain't feeling dandy today.", 'neg'),
    ("I feel amazing!", 'pos'),
    ('Gary is a friend of mine.', 'pos'),
    ("I can't believe I'm doing this.", 'neg')
]

In [64]:
cl = NaiveBayesClassifier(train)

How do we measure success for Text Classification?
--------
 
$$Accuracy = \frac{Correct}{Total}$$

Fraction of docs classified correctly

In [65]:
print(f"{cl.accuracy(test):.2}")

0.83


In [66]:
for review, label in test:
    print(review)
    print("  observed:  ", label)
    print("  predicted: ", cl.classify(review), end="\n\n")

The beer was good.
  observed:   pos
  predicted:  pos

I do not enjoy my job
  observed:   neg
  predicted:  neg

I ain't feeling dandy today.
  observed:   neg
  predicted:  neg

I feel amazing!
  observed:   pos
  predicted:  pos

Gary is a friend of mine.
  observed:   pos
  predicted:  neg

I can't believe I'm doing this.
  observed:   neg
  predicted:  neg



Confusion Matrix
------
![](images/confusion_matrix.png)

In [70]:
# Let's see Naive Bayes ability to generalize
cl.classify("Their burgers are amazing")  #=> "pos"

'pos'

In [71]:
cl.classify("I don't like their pizza.") #=> neg

'neg'

In [69]:
cl.show_informative_features()

Most Informative Features
          contains(this) = True              neg : pos    =      2.3 : 1.0
          contains(this) = False             pos : neg    =      1.8 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
          contains(This) = False             neg : pos    =      1.6 : 1.0
             contains(I) = False             pos : neg    =      1.4 : 1.0
             contains(I) = True              neg : pos    =      1.4 : 1.0
         contains(beers) = False             neg : pos    =      1.2 : 1.0
           contains(n't) = False             pos : neg    =      1.2 : 1.0
          contains(love) = False             neg : pos    =      1.2 : 1.0
       contains(amazing) = False             neg : pos    =      1.2 : 1.0


<br>
<br> 
<br>

----