<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Table-of-Contents" data-toc-modified-id="Table-of-Contents-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Table of Contents</a></span></li><li><span><a href="#Introduction" data-toc-modified-id="Introduction-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction<a class="anchor" id="Introduction"></a></a></span></li><li><span><a href="#Rule-based-systems" data-toc-modified-id="Rule-based-systems-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Rule-based systems<a class="anchor" id="rule"></a></a></span><ul class="toc-item"><li><span><a href="#Machine-Learning-based-systems" data-toc-modified-id="Machine-Learning-based-systems-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Machine Learning based systems<a class="anchor" id="ml"></a></a></span></li><li><span><a href="#Naive-Bayes" data-toc-modified-id="Naive-Bayes-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Naive Bayes<a class="anchor" id="nb"></a></a></span></li><li><span><a href="#Support-Vector-Machine" data-toc-modified-id="Support-Vector-Machine-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Support Vector Machine<a class="anchor" id="svm"></a></a></span></li><li><span><a href="#Deep-learning" data-toc-modified-id="Deep-learning-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Deep learning<a class="anchor" id="Dl"></a></a></span></li></ul></li><li><span><a href="#Hybrid-systems" data-toc-modified-id="Hybrid-systems-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Hybrid systems<a class="anchor" id="Hybrid"></a></a></span></li><li><span><a href="#Application-areas" data-toc-modified-id="Application-areas-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Application areas<a class="anchor" id="application"></a></a></span></li><li><span><a href="#Implementation" data-toc-modified-id="Implementation-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Implementation<a class="anchor" id="implementation"></a></a></span></li><li><span><a href="#Prediction" data-toc-modified-id="Prediction-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Prediction<a class="anchor" id="prediction"></a></a></span></li></ul></div>

# Table of Contents

   1. [Introduction](#Introduction)
   2. [Rule-based systems](#rule)
   3. [Machine Learning based systems](#ml)
       1. [Naive Bayes](#nb)
       2. [SVM](#svm)
       3. [Deep Learning](#Dl)
   4. [Hybrid Systems](#hybrid)
   5. [Applications](#application)
   5. [Implementation](#implementation)
   6. [Prediction](#prediction)
         
    


# Introduction<a class="anchor" id="Introduction"></a>
Text classification is the process of assigning tags or categories to text according to its content. It’s one of the fundamental tasks in Natural Language Processing (NLP) with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection.

![Text Classification](textClassificationExample.png)

There are many approaches to automatic text classification, which can be grouped into three different types of systems:
   * Rule-based systems
   * Machine Learning based systems
   * Hybrid systems


# Rule-based systems<a class="anchor" id="rule"></a>
Rule-based approach is the most simplest way of doing text classification. The pre-requisite of this includes having a good domain knowledge to set the right linguistic rules to categorize the data. Each rule consists of a pattern and a corresponding predicted category.

Say that you want to classify news articles into 2 groups, namely, Sports and Politics. First, you’ll need to define two lists of words that characterize each group (e.g. words related to sports such as football, basketball, LeBron James, etc., and words related to politics such as Donald Trump, Hillary Clinton, Putin, etc.). Next, when you want to classify a new incoming text, you’ll need to count the number of sport-related words that appear in the text and do the same for politics-related words. If the number of sport-related word appearances is greater than the number of politics-related word count, then the text is classified as sports and vice versa.

Rule-based systems are human comprehensible and can be improved over time. This method is not scalable and requires a lot of domain-expertise.

## Machine Learning based systems<a class="anchor" id="ml"></a>
There are a lot of machine learning algorithms that can be used to do text classification. Few among them are:
   * Naive Bayes
   * SVM
   * Deep learning

## Naive Bayes<a class="anchor" id="nb"></a>

The probabilistic model of naive Bayes classifiers is based on Bayes’ theorem, and the adjective naive comes from the assumption that the features in a dataset are mutually independent. Naive bayes simplifies the calculation of probabilities by assuming that the probability of each attribute belonging to a given class value is independent of all other attributes. This is a strong assumption but results in a fast and effective method.The probability of a class value given a value of an attribute is called the conditional probability. By multiplying the conditional probabilities together for each attribute for a given class value, we have a probability of a data instance belonging to that class.

To make a prediction we can calculate probabilities of the instance belonging to each class and select the class value with the highest probability.

## Support Vector Machine<a class="anchor" id="svm"></a>
This algorithm works by segregating the categories of intent by drawing a line or a hyperplane that divides the space into two subspaces. According to the SVM algorithm we find the points closest to the line from both the classes.These points are called support vectors. Now, we compute the distance between the line and the support vectors. This distance is called the margin. The margin is supposed to be maximum. The hyperplane for which the margin is maximum is the optimal hyperplane.
SVM is mostly used for binary classification. Binary classification is one in which there are only two possible outcomes or intents.SVM for multi class can also be implemented but requires us to run a lot of models

![SVM](svm.PNG)

## Deep learning<a class="anchor" id="Dl"></a>
The important deep learning architectures that can be used for text classification are Convolutional Neural Networks and Recurrent Neural Networks. Naive Bayes and Support Vector Machine run fine with lesser data also but neural networks require a lot of data.

# Hybrid systems<a class="anchor" id="Hybrid"></a>
The hybrid systems as the name suggests are a combination of machine learning and a rule-based systems.


# Application areas<a class="anchor" id="application"></a>
   * Customer Service
       * Monitoring agent's interactions
       *  Improve customer experience and satisfaction incorporating sentiment analysis
   * Marketing
       * Enhance personalization experience
       * Lead score accuracy improvement
   * Fraud
       * Real-time indexing of violations within conversations, reducing fines for non-compliance


# Implementation<a class="anchor" id="implementation"></a>
The concepts learned now can be implemented by taking a dummy data and running the models. An e-commerce is selling fine foods. Let us find the sentiment of the reviews given by the users.

In [1]:
#importing required libraries
import pandas as pd
import sklearn

In [2]:
data = pd.read_csv("data.csv",index_col=False)
data=data.drop(columns=["Unnamed: 0"])

In [3]:
data.head()

Unnamed: 0,Text,Score
0,product arrive label jumbo salt peanuts peanut...,negative
1,confection around centuries light pillowy citr...,positive
2,look secret ingredient robitussin believe find...,negative
3,great taffy great price wide assortment yummy ...,positive
4,get wild hair taffy order five pound bag taffy...,positive


The data we are going to use is already pre-processed. The pre-processing steps are beyond the scope of this notebook. 
The columns are:
   * Score - The tag saying if the review is positive or negative
   * Text - The review

In [34]:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.externals import joblib
import pandas as pd
from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import train_test_split
import os
import time


results = pd.DataFrame(columns=["Model", "Precision", "Recall"])
all_results = []
report = None
# function to get the model stats like precision, recall

def model_stats(predicted, actual):
    global report
    report = pd.DataFrame(list(precision_recall_fscore_support(actual, predicted)),
                          index=['Precision', 'Recall', 'F1-score', 'Support']).T

    # Now add the 'Avg/Total' row
    report.loc['Avg', :] = (precision_recall_fscore_support(actual, predicted, average='micro'))
    report.loc['Avg', 'Support'] = report['Support'].sum()
    report = report.iloc[:, 0:-1]
    print(report)

#function to train
def train(data, textColumn, intentColum):
    x= textColumn
    y= intentColum
    # Create Train test split
    X_train, X_test, y_train, y_test = train_test_split(data[x], data[y], test_size=0.33, random_state=42)

    X_train = X_train.astype("str")
    y_train = y_train.astype("str")
    X_test = X_test.astype("str")
    y_test = y_test.astype("str")

    tuned_parameters = [{'kernel': ['rbf'], 'gamma': [0.001, 0.0001], 'C': [0.1, 1, 5, 10]},
                        {'kernel': ['linear'], 'C': [0.1, 1, 5, 10]}]

    estimator = GridSearchCV(SVC(C=1, probability=True, class_weight='balanced'), param_grid=tuned_parameters, cv=2,
                             verbose=1)
    svc = Pipeline([('tfidf', TfidfVectorizer(ngram_range=(1, 2), stop_words='english')), ('clf', estimator), ])

    svc = svc.fit(X_train, y_train)
    predicted = svc.predict(X_test)

    model_stats(predicted, y_test)
    save_model("./svm_models", svc, "sklearn")
    return svc

def save_model(directory, model, lib="sklearn"):
    if not os.path.exists(directory):
        os.makedirs(directory)

    modelname = None

    # Save model to file
    if lib == "sklearn":
        modelname = os.path.join(directory, 'model' + str(int(time.time())) + '.pkl')
        joblib.dump(model, modelname)
    if lib == "keras":
        modelname = os.path.join(directory, 'model' + str(int(time.time())) + '.h5')
        model.save(modelname)
    print("Trained model saved in : ", modelname)

    # Function to predict
def predict(train_file, content, modelname):
    svc = joblib.load(modelname)
    predicted = svc.predict([content])
    return predicted

In [16]:
svmModel = train(data=data,intentColum="Score",textColumn="Text")

Fitting 2 folds for each of 12 candidates, totalling 24 fits


[Parallel(n_jobs=1)]: Done  24 out of  24 | elapsed:  2.1min finished


     Precision    Recall  F1-score
0     0.794595  0.567568  0.662162
1     0.923549  0.972682  0.947479
Avg   0.909091  0.909091  0.909091
Trained model saved in :  ./svm_models/model1557839338.pkl


# Prediction<a class="anchor" id="prediction"></a>

In [31]:
test = pd.read_csv("test.csv")

In [44]:
test.Text.iloc[1]

'excite find nana line glutenfree cookies try varieties offer various flavor style cookies favorite find individually package cookies large round ones way heavy one meal end split two snack pack cookies perfect size little snack keep purse understand reviewers might like cookies healthy taste might object texture berry crystals happen like berry crystals nice eat cookie nt feel dietary guilt great hearty cookie go well coffee tea also gluten intolerant never negative reaction cookies whatsoever opinion good product however natural nt preservatives faster item ship package better notice nana cookies become dry longer keep around even seal probably good idea try product buy case notice people either seem love hate nana cookies'

In [41]:
svmPredict = predict(test,content=test.Text.iloc[1],modelname="./svm_models/model1557839338.pkl")


In [42]:
svmPredict

array(['positive'], dtype=object)

Other machine learning models can be implemented similarly to SVM. But deep learning is computationally very heavy and requires huge RAM to run and hence those models are not implemented here.