<p>&nbsp;</p>
</p><h1 style="text-align: center;"><strong>Creating a Classifier for</strong></h1>
<h2 style="text-align: center;"><strong>Natural Language Processing</strong></h2>
<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p>

# Introduction

In this example we will create a simple phrase classifier using Naive Bayes and Textblob.

We will use the libraries:
 
- Textblob: Provides a simple API to access your methods and perform basic NLP tasks.
- Naive Bayes: is a probabilistic classifier based on the "Bayes Theorem", based on the application of Bayes' theorem with strong presuppositions of resource independence.

Installation:

1. pip install -U textblob 
2. python -m textblob.download_corpora

**Imports and Parameters:**

In [1]:
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob
import pandas as pd

***

# Datasets

**Training Dataset:**

In [2]:
train_set = [
    ('I love eating hamburger', 'Positive'),
    ('This place is horrible', 'Negative'),
    ('You are a lovely person', 'Positive'),
    ("You're a horrible person", 'Negative'),
    ('The party is great', 'Positive'),
    ('The party is terrible', 'Negative'),
    ('This place is wonderful', 'Positive'),
    ('Aging', 'Negative'),
    ('I hate you', 'Negative'),
    ('I adore you', 'Positive'),
    ('I love you', 'Positive'),
    ("You're amazing", 'Positive'),
    ("I'm very angry", 'Negative'),
    ('I hate this language', 'Negative'),
    ('This language is fantastic', 'Positive'),
    ('This language is very good', 'Positive'),
    ('What delicious food', 'Positive'),
    ('What horrible food', 'Negative'),
    ("I'm feeling great", 'Positive'),
    ("Today I'm terrible", 'Negative'),
    ('I love this sandwich.', 'Positive'),
    ('this is an amazing place!', 'Positive'),
    ('I feel very good about these beers.', 'Positive'),
    ('this is my best work.', 'Positive'),
    ("what an awesome view", 'Positive'),
    ('I do not like this restaurant', 'Negative'),
    ('I am tired of this stuff.', 'Negative'),
    ("I can't deal with this", 'Negative'),
    ('he is my sworn enemy!', 'Negative'),
    ('my boss is horrible.', 'Negative')
]

**Test Dataset:**

In [3]:
test_set = [
    ('the beer was good.', 'Positive'),
    ('I do not enjoy my job', 'Negative'),
    ("I ain't feeling dandy today.", 'Negative'),
    ("I feel amazing!", 'Positive'),
    ('Gary is a friend of mine.', 'Positive'),
    ("I can't believe I'm doing this.", 'Negative'),
    ("Great language", 'Positive'),
    ("Poorly this language", 'Negative'),
    ("You're horrible", 'Negative'),
    ('Hot food!', 'Positive'),
    ('What an anger!', 'Negative'),
    ('Great party!', 'Positive'),
    ('I do not hate everyone', 'Positive')
]

***

# Naive Bayes

If you are going to use the train_set present in the kernel, run this good named **Naive Bayes**.

If you want to run with CSV, JSON, and TSV file execute the box below.

Creating a Naive Bayes classifier, passing training data to the constructor.

**Creating Classifier:**

In [4]:
classifier = NaiveBayesClassifier(train_set)

**Creating Accuracy:**

In [5]:
accuracy = classifier.accuracy(test_set)

Creating a variable to measure the accuracy of our forecasts

***

# Loading Data from Files

You can also load data from common file formats including CSV, JSON, and TSV.

**Creating Naive Bayes classifier, passing training data to the constructor.:**

In [6]:
with open('train_set.json', 'r') as fp:
    classifier = NaiveBayesClassifier(fp, format="json")

To run the other files, change the name and format

***

# Prediction

**Phrase used in the forecast:**

In [7]:
test_phrase1 = 'I love everyone'

In [8]:
test_phrase2 = 'I do not hate everyone.'

**Making our prediction:**

In [9]:
blob = TextBlob(test_phrase1,classifier=classifier)

To perform the sentence test, change the first space between parentheses: (**here**, classifier = cl)

**Result:**

In [10]:
print('This sentence is of character: {}'.format(blob.classify()))
print('Forecast accuracy: {}'.format(accuracy))

This sentence is of character: Positive
Forecast accuracy: 0.6923076923076923


***

# Classifying Text

**Classifier:**

In [11]:
classifier.classify("This is an amazing library!")

'Positive'

**Label Probability Distribution:**

In [12]:
prob_dist = classifier.prob_classify("This one's a doozy.")

In [13]:
prob_dist.max()

'Positive'

**Rounded:**

In [14]:
round(prob_dist.prob("Positive"), 2)

0.75

In [15]:
round(prob_dist.prob("Negative"), 2)

0.25

This method returns x rounded to n digits of the decimal point.

***

# Classifying TextBlobs

Another way to classify text is to pass a classifier into the constructor of TextBlob and call its classify() method.

**Test:**

In [16]:
blob = TextBlob("The beer is good. But the hangover is horrible.", classifier=classifier)

In [17]:
blob.classify()

'Negative'

The advantage of this approach is that you can classify sentences within a TextBlob.

**Example:**

In [98]:
for s in blob.sentences:
    print(s)
    print(s.classify())

The beer is good.
Positive
But the hangover is horrible.
Negative


***

# Evaluating Classifiers

**Accuracy:**

In [18]:
classifier.accuracy(test_set)

0.6923076923076923

**Note:**

You can also pass in a file object into the accuracy method. The file can be in any of the formats listed in the **Loading Data from Files** section.

**Information:**

Using the show_informative_features () method to display a listing of more informative features.

In [19]:
classifier.show_informative_features(10)

Most Informative Features
          contains(This) = True           Positi : Negati =      2.1 : 1.0
            contains(my) = True           Negati : Positi =      1.9 : 1.0
          contains(very) = True           Positi : Negati =      1.5 : 1.0
      contains(language) = True           Positi : Negati =      1.5 : 1.0
           contains(you) = True           Positi : Negati =      1.5 : 1.0
           contains(You) = True           Positi : Negati =      1.5 : 1.0
         contains(place) = True           Positi : Negati =      1.5 : 1.0
          contains(this) = True           Negati : Positi =      1.5 : 1.0
      contains(horrible) = False          Positi : Negati =      1.4 : 1.0
             contains(I) = True           Negati : Positi =      1.3 : 1.0


***

# Updating Classifiers with New Data

**Create New Training Data:**

In [21]:
new_data = [('She is my best friend.', 'Positive'),
            ("I'm happy to have a new friend.", 'Positive'),
            ("He ain't from around here.", 'Negative')
]

**Updating the classifier with new training data:**

In [22]:
classifier.update(new_data)

True

**Accuracy:**

In [23]:
classifier.accuracy(test_set)

0.7692307692307693

***

# Feature Extractors

By default, the NaiveBayesClassifier uses a simple feature extractor that indicates which words in the training set are contained in a document.

For example, the sentence “I feel happy” might have the features contains(happy): True or contains(angry): False.

You can override this feature extractor by writing your own. A feature extractor is simply a function with document (the text to extract features from) as the first argument. The function may include a second argument, train_set (the training dataset), if necessary.

The function should return a dictionary of features for document.

**Example:**

In [24]:
def end_word_extractor(document):
    tokens = document.split()
    first_word, last_word = tokens[0], tokens[-1]
    feats = {}
    feats["first({0})".format(first_word)] = True
    feats["last({0})".format(last_word)] = False
    return feats
features = end_word_extractor("I feel happy")
assert features == {'last(happy)': False, 'first(I)': True}

A resource extractor was created that uses only the first and last words of a document as its resources.

**Using the resource extractor in a classifier:**

In [25]:
classifier2 = NaiveBayesClassifier(test_set, feature_extractor=end_word_extractor)

It was passed as the second argument of the constructor.

**Test:**

In [26]:
blob = TextBlob("I'm excited to try my new classifier.", classifier=classifier2)

In [27]:
blob.classify()

'Positive'

***

# <p>&nbsp;</p>
<h1 style="text-align: center;"><strong><span lang="pt">CONCLUSION</strong></span></h1>
<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p>

This is a tutorial was the continuation of the Kernel TextBlob I, where more columns were added and inputs were used for data sets and training tests. In addition to some modifications in the code.

**References:**
- [Tutorial: Building a Text Classification System](https://textblob.readthedocs.io/en/dev/classifiers.html#classifiers)
- [Naive Bayes Classifier: Simple Sorting Algorithm](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)

***

##### INSTALLED VERSIONS

In [18]:
pd.show_versions()


INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.9.1
pip: 18.1
setuptools: 40.4.3
Cython: 0.29.1
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.0.1
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.1.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None


***