#### Workflow for Operationalizing Text Classification Model

Steps for classification
(refer to slides)
 1. Reload the model
 2. Reload the Vectorizer
 3. Preprocess the new text
 4. Numerically encode the input
 5. Predict the label



In [5]:

import pickle
import nltk
import re
import string
from sklearn.feature_extraction.text import CountVectorizer

 #### 1. Reload the model

In [16]:
path1 = "nb_classifier-2020-06-16.pkl"
with open(path1, 'rb') as f:
    model = pickle.load(f)

# To check the class label

model.classes_

array(['ham', 'spam'], dtype='<U4')

#### 2: Reload the vectorizer

In [9]:
path2 = "nb_countvectoriser-2020-06-16.pkl"
with open(path2, 'rb') as f:
    trained_cv = pickle.load(f)

#### 3: Preprocess the next text

In [10]:
def preprocess(text):
    alphanumeric = lambda x: re.sub(r"""\w*\d\w*""", ' ', x)
    punc_lower = lambda x: re.sub('[%s]' % re.escape(string.punctuation), ' ', x.lower())
    text = alphanumeric(text)
    text = punc_lower(text)
    return text

#### 4. Numerically encode the input

In [11]:
def encode_text_to_vector(cv, text):
       new_cv = CountVectorizer(stop_words='english', vocabulary=cv.vocabulary_)
       text_vector = new_cv.fit_transform( [text ] )
       return text_vector

#### 5. Predict the label

The example below takes in a next text from the command promts and call the functions defined above.
The predicted label is given.

In [37]:
new_text = input("Enter the new text > ")
new_text2 = preprocess(new_text)
new_text_vector = encode_text_to_vector(trained_cv, new_text2)
predicted_label = (model.predict(new_text_vector))
predicted_prob = model.predict_proba(new_text_vector) # this contains the probability score

print ("\n The text is predicted as <" , predicted_label , ">")



Enter the new text > buy the latest laptop at $100

 The text is predicted as < ['ham'] >


In [38]:
# For illustration - to show the classes (labels) that the model can predict
model.classes_

array(['ham', 'spam'], dtype='<U4')

In [39]:
# For illustration - to show the probability score associated to the predicted label
predicted_prob 

array([[0.83734347, 0.16265653]])

##### Sample text to use

SPAM:
as a valued customer i am pleased to advise you that following recent review of your mob no you are awarded with 

important message this is a final contact attempt you have important messages waiting out our customer claims dept expires call now
    
