<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 8.5: Text Classification
INSTRUCTIONS:
- Run the cells
- Observe and understand the results
- Answer the questions

## Import libraries

In [26]:
## Import Libraries
import numpy as np
import pandas as pd

import string
import spacy

from collections import Counter

from sklearn.decomposition import LatentDirichletAllocation
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC

# import warnings
# warnings.filterwarnings('ignore')

## Load data

Sample:

    __label__2 Stuning even for the non-gamer: This sound ...
    __label__2 The best soundtrack ever to anything.: I'm ...
    __label__2 Amazing!: This soundtrack is my favorite m ...
    __label__2 Excellent Soundtrack: I truly like this so ...
    __label__2 Remember, Pull Your Jaw Off The Floor Afte ...
    __label__2 an absolute masterpiece: I am quite sure a ...
    __label__1 Buyer beware: This is a self-published boo ...
    . . .
    
There are only two **labels**:
- `__label__1`
- `__label__2`

In [6]:
## Loading the data

trainDF = pd.read_fwf(
    filepath_or_buffer = '/Users/winifredwetthasinghe/Documents/Data_Science_Course_UTS/Labs/DATA/corpus.txt',
    colspecs = [(9, 10),   # label: get only the numbers 1 or 2
                (11, 9000) # text: makes the it big enought to get to the end of the line
               ], 
    header = 0,
    names = ['label', 'text'],
    lineterminator = '\n'
)

# convert label from [1, 2] to [0, 1]
trainDF['label'] = trainDF['label'] - 1

## Inspect the data

In [13]:
# ANSWER
print(trainDF.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   label   9999 non-null   int64 
 1   text    9999 non-null   object
dtypes: int64(1), object(1)
memory usage: 156.4+ KB
None


In [16]:
print(trainDF.sample(20))

      label                                               text
2434      0  shipped wrong color: I ordered three rods (all...
202       0  Not Like the Old Formula: The formulation of t...
4417      1  A Fine Work: Romeo and Juliet is the first Sha...
5402      0  Dismal: I hated this dismal movie. Not complet...
6549      0  Low Budget: Some good action and scenery, but ...
7259      0  Horrible: This book was way too weird for me a...
1375      0  Four Star Music - get the other edition though...
3779      0  dry: This book was really dry... there were a ...
5507      1  Foie Gras: As a California resident who feared...
936       0  Clean it up!: The Story is appropriate and som...
5370      0  Ho-Hum: One of the biggest disappointments I h...
3145      0  emarker.com to shut down 9/30/2001: Do not buy...
4878      0  not a good experience: I bought a sansa view 8...
5310      1  Alternate Italian Renaissance Light History: A...
1389      1  Its Paine, enough said: All works are unta

## Split the data into train and test

In [17]:
## ANSWER
## split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    trainDF['text'],
    trainDF['label'],
    random_state = 1,
    test_size = 0.2
    
)

## Feature Engineering

### Count Vectors as features

In [18]:
# create a count vectorizer object
count_vect = CountVectorizer(token_pattern = r'\w{1,}')

# Learn a vocabulary dictionary of all tokens in the raw documents
count_vect.fit(trainDF['text'])

# Transform documents to document-term matrix.
X_train_count = count_vect.transform(X_train)
X_test_count = count_vect.transform(X_test)

### TF-IDF Vectors as features
- Word level
- N-Gram level
- Character level

In [19]:
%%time
# word level tf-idf
tfidf_vect = TfidfVectorizer(analyzer = 'word',
                             token_pattern = r'\w{1,}',
                             max_features = 5000)
print(tfidf_vect)

tfidf_vect.fit(trainDF['text'])
X_train_tfidf = tfidf_vect.transform(X_train)
X_test_tfidf  = tfidf_vect.transform(X_test)

TfidfVectorizer(max_features=5000, token_pattern='\\w{1,}')
CPU times: user 1.04 s, sys: 11.7 ms, total: 1.05 s
Wall time: 1.05 s


In [20]:
%%time
# ngram level tf-idf
tfidf_vect_ngram = TfidfVectorizer(analyzer = 'word',
                                   token_pattern = r'\w{1,}',
                                   ngram_range = (2, 3),
                                   max_features = 5000)
print(tfidf_vect_ngram)

tfidf_vect_ngram.fit(trainDF['text'])
X_train_tfidf_ngram = tfidf_vect_ngram.transform(X_train)
X_test_tfidf_ngram  = tfidf_vect_ngram.transform(X_test)

TfidfVectorizer(max_features=5000, ngram_range=(2, 3), token_pattern='\\w{1,}')
CPU times: user 5.71 s, sys: 75.3 ms, total: 5.78 s
Wall time: 5.79 s


In [21]:
%%time
# characters level tf-idf
tfidf_vect_ngram_chars = TfidfVectorizer(analyzer = 'char',
                                         token_pattern = r'\w{1,}',
                                         ngram_range = (2, 3),
                                         max_features = 5000)
print(tfidf_vect_ngram_chars)

tfidf_vect_ngram_chars.fit(trainDF['text'])
X_train_tfidf_ngram_chars = tfidf_vect_ngram_chars.transform(X_train)
X_test_tfidf_ngram_chars  = tfidf_vect_ngram_chars.transform(X_test)

TfidfVectorizer(analyzer='char', max_features=5000, ngram_range=(2, 3),
                token_pattern='\\w{1,}')




CPU times: user 7.59 s, sys: 99 ms, total: 7.69 s
Wall time: 7.7 s


### Text / NLP based features

Create some other features.

Char_Count = Number of Characters in Text

Word Count = Number of Words in Text

Word Density = Average Number of Char in Words

Punctuation Count = Number of Punctuation in Text

Title Word Count = Number of Words in Title

Uppercase Word Count = Number of Upperwords in Text

In [23]:
# %%time
# # ANSWER

# trainDF['Char_Count'] = trainDF['text'].apply(len)
# trainDF['Word_Count'] = trainDF['text'].apply(lambda x: len(x.split()))
# trainDF['Word_Density'] = trainDF['char_count'] / (trainDF['word_count'] + 1)
# trainDF['Punctuation_Count'] = trainDF['text'].apply(lambda x: len(''.join(_ for _ in x if _ in string.punctuation))) 
# trainDF['Title_Word_Count'] = trainDF['text'].apply(lambda x: len([w for w in x.split() if w.istitle()]))
# trainDF['Uppercase_Word_Count'] = trainDF['text'].apply(lambda x: len([w for w in x.split() if w.isupper()]))


CPU times: user 445 ms, sys: 2.35 ms, total: 447 ms
Wall time: 447 ms


In [79]:
%%time

trainDF['char_count'] = trainDF['text'].apply(len)
trainDF['word_count'] = trainDF['text'].apply(lambda x: len(x.split()))
trainDF['word_density'] = trainDF['char_count'] / (trainDF['word_count']+1)
trainDF['punctuation_count'] = trainDF['text'].apply(lambda x: len("".join(_ for _ in x if _ in string.punctuation))) 
trainDF['title_word_count'] = trainDF['text'].apply(lambda x: len([wrd for wrd in x.split() if wrd.istitle()]))
trainDF['upper_case_word_count'] = trainDF['text'].apply(lambda x: len([wrd for wrd in x.split() if wrd.isupper()]))
# trainDF['stopword_count'] = trainDF['text'].apply(lambda x: len([wrd for wrd in x.split() if wrd.lower() in corpus.stop_words]))

CPU times: user 422 ms, sys: 2.03 ms, total: 424 ms
Wall time: 423 ms


In [77]:
trainDF[['char_count', 'word_count', 'word_density', 'punctuation_count', 'title_word_count', 'upper_case_word_count']].head(10)

Unnamed: 0,char_count,word_count,word_density,punctuation_count,title_word_count,upper_case_word_count
0,509,97,5.193878,14,7,3
1,760,129,5.846154,40,24,4
2,743,118,6.243697,33,52,4
3,481,87,5.465909,22,30,0
4,825,142,5.769231,35,14,3
5,738,139,5.271429,33,16,4
6,522,105,4.924528,13,13,6
7,524,103,5.038462,11,15,13
8,301,63,4.703125,8,8,2
9,216,35,6.0,8,5,2


In [80]:
trainDF.sample(5)

Unnamed: 0,label,text,char_count,word_count,word_density,punctuation_count,title_word_count,uppercase_word_count,adj_count,adv_count,noun_count,num_count,pron_count,propn_count,verb_count,upper_case_word_count
2810,0,Conceited book: Had to offset the praise hande...,234,41,5.571429,7,5,0,0,0,0,0,0,0,0,0
2255,0,"Why Can't I Rate at Zero Stars ?: Come On, Peo...",748,141,5.267606,45,19,3,0,0,0,0,0,0,0,3
1278,0,why i think the book is going to be great: I t...,190,39,4.75,3,1,1,0,0,0,0,0,0,0,1
1119,1,U Body Pillow: This allows my wife to have sup...,181,34,5.171429,4,6,1,0,0,0,0,0,0,0,1
564,1,Great product; great seller!: I got this item ...,290,57,5.0,12,8,5,0,0,0,0,0,0,0,5


In [62]:
## load spaCy
nlp = spacy.load('en')

Part of Speech in **SpaCy**

    POS   DESCRIPTION               EXAMPLES
    ----- ------------------------- ---------------------------------------------
    ADJ   adjective                 big, old, green, incomprehensible, first
    ADP   adposition                in, to, during
    ADV   adverb                    very, tomorrow, down, where, there
    AUX   auxiliary                 is, has (done), will (do), should (do)
    CONJ  conjunction               and, or, but
    CCONJ coordinating conjunction  and, or, but
    DET   determiner                a, an, the
    INTJ  interjection              psst, ouch, bravo, hello
    NOUN  noun                      girl, cat, tree, air, beauty
    NUM   numeral                   1, 2017, one, seventy-seven, IV, MMXIV
    PART  particle                  's, not,
    PRON  pronoun                   I, you, he, she, myself, themselves, somebody
    PROPN proper noun               Mary, John, London, NATO, HBO
    PUNCT punctuation               ., (, ), ?
    SCONJ subordinating conjunction if, while, that
    SYM   symbol                    $, %, §, ©, +, −, ×, ÷, =, :), 😝
    VERB  verb                      run, runs, running, eat, ate, eating
    X     other                     sfpksdpsxmsa
    SPACE space
    
Find out number of Adjective, Adverb, Noun, Numeric, Pronoun, Proposition, Verb.

    Hint:
    1. Convert text to spacy document
    2. Use pos_
    3. Use Counter 

In [81]:
# Initialise some columns for feature's counts
trainDF['adj_count'] = 0
trainDF['adv_count'] = 0
trainDF['noun_count'] = 0
trainDF['num_count'] = 0
trainDF['pron_count'] = 0
trainDF['propn_count'] = 0
trainDF['verb_count'] = 0

In [None]:
# ANSWER

In [82]:
cols = [
    'char_count', 'word_count', 'word_density',
    'punctuation_count', 'title_word_count',
    'uppercase_word_count', 'adj_count',
    'adv_count', 'noun_count', 'num_count',
    'pron_count', 'propn_count', 'verb_count']

trainDF[cols].sample(5)

Unnamed: 0,char_count,word_count,word_density,punctuation_count,title_word_count,uppercase_word_count,adj_count,adv_count,noun_count,num_count,pron_count,propn_count,verb_count
925,472,83,5.619048,24,9,3,0,0,0,0,0,0,0
626,834,170,4.877193,26,17,9,0,0,0,0,0,0,0
8318,778,147,5.256757,18,13,5,0,0,0,0,0,0,0
7162,510,96,5.257732,16,14,3,0,0,0,0,0,0,0
5215,264,51,5.076923,8,5,0,0,0,0,0,0,0,0


### Topic Models as features

In [83]:
%%time
# train a LDA Model
lda_model = LatentDirichletAllocation(n_components = 20, learning_method = 'online', max_iter = 20)

X_topics = lda_model.fit_transform(X_train_count)
topic_word = lda_model.components_ 
vocab = count_vect.get_feature_names()

CPU times: user 49.4 s, sys: 2 s, total: 51.4 s
Wall time: 51.4 s


In [30]:
# view the topic models
n_top_words = 10
topic_summaries = []
print('Group Top Words')
print('-----', '-'*80)
for i, topic_dist in enumerate(topic_word):
    topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
    top_words = ' '.join(topic_words)
    topic_summaries.append(top_words)
    print('  %3d %s' % (i, top_words))

Group Top Words
----- --------------------------------------------------------------------------------
    0 la train empire tradition wooden bread rack whitney stacie moore
    1 dvd sound paul eric dts unreadable clarity spouse sting esque
    2 stargate opened german cat image law variety wide soundtrack desire
    3 ann hopkins anne erotic producer fetched erotica interview tour pseudo
    4 lens cliff flea youth funk fleas streets hideous spray pets
    5 japanese allow tone japan emma regardless twenty string usa butter
    6 dog board hostel dubbed monitor attach storage crystal kurt dubbing
    7 useful economics diane lane typical bottle jazz van blade economic
    8 letter complex twist scarlet quit sin emotions promise hester chinese
    9 motion diary spider escapes closing caps chandler magnum huppert vile
   10 her of his book she characters life author in boots
   11 i my cd album product they music songs on with
   12 game the you fun are games of these graphics manson


## Modelling

In [31]:
## helper function

def train_model(classifier, feature_vector_train, label, feature_vector_valid):
    # fit the training dataset on the classifier
    classifier.fit(feature_vector_train, label)

    # predict the labels on validation dataset
    predictions = classifier.predict(feature_vector_valid)

    return accuracy_score(predictions, y_test)

In [32]:
# Keep the results in a dataframe
results = pd.DataFrame(columns = ['Count Vectors',
                                  'WordLevel TF-IDF',
                                  'N-Gram Vectors',
                                  'CharLevel Vectors'])

### Naive Bayes Classifier

In [33]:
%%time
# Naive Bayes on Count Vectors
accuracy1 = train_model(MultinomialNB(), X_train_count, y_train, X_test_count)
print('NB, Count Vectors    : %.4f\n' % accuracy1)

NB, Count Vectors    : 0.8335

CPU times: user 7.03 ms, sys: 1.24 ms, total: 8.27 ms
Wall time: 7.25 ms


In [34]:
%%time
# Naive Bayes on Word Level TF IDF Vectors
accuracy2 = train_model(MultinomialNB(), X_train_tfidf, y_train, X_test_tfidf)
print('NB, WordLevel TF-IDF : %.4f\n' % accuracy2)

NB, WordLevel TF-IDF : 0.8355

CPU times: user 5.59 ms, sys: 1.59 ms, total: 7.18 ms
Wall time: 5.39 ms


In [37]:
%%time
# Naive Bayes on Ngram Level TF IDF Vectors
accuracy3 = train_model(MultinomialNB(), X_train_tfidf_ngram, y_train, X_test_tfidf_ngram)
print('NB, N-Gram Vectors   : %.4f\n' % accuracy3)

NB, N-Gram Vectors   : 0.8365

CPU times: user 4.72 ms, sys: 911 µs, total: 5.63 ms
Wall time: 4.5 ms


In [38]:
%%time
# # Naive Bayes on Character Level TF IDF Vectors
accuracy4 = train_model(MultinomialNB(), X_train_tfidf_ngram_chars, y_train, X_test_tfidf_ngram_chars)
print('NB, CharLevel Vectors: %.4f\n' % accuracy4)

NB, CharLevel Vectors: 0.8060

CPU times: user 22.4 ms, sys: 1.03 ms, total: 23.4 ms
Wall time: 21.9 ms


In [39]:
results.loc['Naïve Bayes'] = {
    'Count Vectors': accuracy1,
    'WordLevel TF-IDF': accuracy2,
    'N-Gram Vectors': accuracy3,
    'CharLevel Vectors': accuracy4}

### Linear Classifier

In [40]:
%%time
# Linear Classifier on Count Vectors
accuracy1 = train_model(LogisticRegression(solver = 'lbfgs', max_iter = 350), X_train_count, y_train, X_test_count)
print('LR, Count Vectors    : %.4f\n' % accuracy1)

LR, Count Vectors    : 0.8565

CPU times: user 14.2 s, sys: 1.09 s, total: 15.3 s
Wall time: 2.07 s


In [41]:
%%time
# Linear Classifier on Word Level TF IDF Vectors
accuracy2 = train_model(LogisticRegression(solver = 'lbfgs', max_iter = 100), X_train_tfidf, y_train, X_test_tfidf)
print('LR, WordLevel TF-IDF : %.4f\n' % accuracy2)

LR, WordLevel TF-IDF : 0.8750

CPU times: user 303 ms, sys: 4.04 ms, total: 307 ms
Wall time: 109 ms


In [42]:
%%time
# Linear Classifier on Ngram Level TF IDF Vectors
accuracy3 = train_model(LogisticRegression(solver = 'lbfgs', max_iter = 100), X_train_tfidf_ngram, y_train, X_test_tfidf_ngram)
print('LR, N-Gram Vectors   : %.4f\n' % accuracy3)

LR, N-Gram Vectors   : 0.8515

CPU times: user 149 ms, sys: 3.07 ms, total: 152 ms
Wall time: 55.2 ms


In [43]:
%%time
# Linear Classifier on Character Level TF IDF Vectors
accuracy4 = train_model(LogisticRegression(solver = 'lbfgs', max_iter = 100), X_train_tfidf_ngram_chars, y_train, X_test_tfidf_ngram_chars)
print('LR, CharLevel Vectors: %.4f\n' % accuracy4)

LR, CharLevel Vectors: 0.8455

CPU times: user 772 ms, sys: 5.96 ms, total: 778 ms
Wall time: 273 ms


In [44]:
results.loc['Logistic Regression'] = {
    'Count Vectors': accuracy1,
    'WordLevel TF-IDF': accuracy2,
    'N-Gram Vectors': accuracy3,
    'CharLevel Vectors': accuracy4}

### Support Vector Machine

In [45]:
%%time
# Support Vector Machine on Count Vectors
accuracy1 = train_model(LinearSVC(), X_train_count, y_train, X_test_count)
print('SVM, Count Vectors    : %.4f\n' % accuracy1)

SVM, Count Vectors    : 0.8430

CPU times: user 443 ms, sys: 3.82 ms, total: 447 ms
Wall time: 445 ms


In [46]:
%%time
# Support Vector Machine on Word Level TF IDF Vectors
accuracy2 = train_model(LinearSVC(), X_train_tfidf, y_train, X_test_tfidf)
print('SVM, WordLevel TF-IDF : %.4f\n' % accuracy2)

SVM, WordLevel TF-IDF : 0.8615

CPU times: user 60.6 ms, sys: 2.86 ms, total: 63.5 ms
Wall time: 62.3 ms


In [47]:
%%time
# Support Vector Machine on Ngram Level TF IDF Vectors
accuracy3 = train_model(LinearSVC(), X_train_tfidf_ngram, y_train, X_test_tfidf_ngram)
print('SVM, N-Gram Vectors   : %.4f\n' % accuracy3)

SVM, N-Gram Vectors   : 0.8305

CPU times: user 50.1 ms, sys: 1.62 ms, total: 51.7 ms
Wall time: 50.7 ms


In [48]:
%%time
# Support Vector Machine on Character Level TF IDF Vectors
accuracy4 = train_model(LinearSVC(), X_train_tfidf_ngram_chars, y_train, X_test_tfidf_ngram_chars)
print('SVM, CharLevel Vectors: %.4f\n' % accuracy4)

SVM, CharLevel Vectors: 0.8495

CPU times: user 397 ms, sys: 19 ms, total: 416 ms
Wall time: 415 ms


In [49]:
results.loc['Support Vector Machine'] = {
    'Count Vectors': accuracy1,
    'WordLevel TF-IDF': accuracy2,
    'N-Gram Vectors': accuracy3,
    'CharLevel Vectors': accuracy4}

### Bagging Models

In [50]:
%%time
# Bagging (Random Forest) on Count Vectors
accuracy1 = train_model(RandomForestClassifier(n_estimators = 100), X_train_count, y_train, X_test_count)
print('RF, Count Vectors    : %.4f\n' % accuracy1)

RF, Count Vectors    : 0.8130

CPU times: user 8.37 s, sys: 7.41 ms, total: 8.37 s
Wall time: 8.37 s


In [51]:
%%time
# Bagging (Random Forest) on Word Level TF IDF Vectors
accuracy2 = train_model(RandomForestClassifier(n_estimators = 100), X_train_tfidf, y_train, X_test_tfidf)
print('RF, WordLevel TF-IDF : %.4f\n' % accuracy2)

RF, WordLevel TF-IDF : 0.8245

CPU times: user 4.27 s, sys: 6.06 ms, total: 4.28 s
Wall time: 4.28 s


In [52]:
%%time
# Bagging (Random Forest) on Ngram Level TF IDF Vectors
accuracy3 = train_model(RandomForestClassifier(n_estimators = 100), X_train_tfidf_ngram, y_train, X_test_tfidf_ngram)
print('RF, N-Gram Vectors   : %.4f\n' % accuracy3)

RF, N-Gram Vectors   : 0.7915

CPU times: user 4.42 s, sys: 6.3 ms, total: 4.43 s
Wall time: 4.43 s


In [53]:
%%time
# Bagging (Random Forest) on Character Level TF IDF Vectors
accuracy4 = train_model(RandomForestClassifier(n_estimators = 100), X_train_tfidf_ngram_chars, y_train, X_test_tfidf_ngram_chars)
print('RF, CharLevel Vectors: %.4f\n' % accuracy4)

RF, CharLevel Vectors: 0.7900

CPU times: user 15.3 s, sys: 21.3 ms, total: 15.3 s
Wall time: 15.3 s


In [54]:
results.loc['Random Forest'] = {
    'Count Vectors': accuracy1,
    'WordLevel TF-IDF': accuracy2,
    'N-Gram Vectors': accuracy3,
    'CharLevel Vectors': accuracy4}

### Boosting Models

In [55]:
%%time
# Gradient Boosting on Count Vectors
accuracy1 = train_model(GradientBoostingClassifier(), X_train_count, y_train, X_test_count)
print('GB, Count Vectors    : %.4f\n' % accuracy1)

GB, Count Vectors    : 0.8040

CPU times: user 14 s, sys: 12.8 ms, total: 14 s
Wall time: 14 s


In [56]:
%%time
# Gradient Boosting on Word Level TF IDF Vectors
accuracy2 = train_model(GradientBoostingClassifier(), X_train_tfidf, y_train, X_test_tfidf)
print('GB, WordLevel TF-IDF : %.4f\n' % accuracy2)

GB, WordLevel TF-IDF : 0.7970

CPU times: user 8.37 s, sys: 7.72 ms, total: 8.37 s
Wall time: 8.38 s


In [57]:
%%time
# Gradient Boosting on Ngram Level TF IDF Vectors
accuracy3 = train_model(GradientBoostingClassifier(), X_train_tfidf_ngram, y_train, X_test_tfidf_ngram)
print('GB, N-Gram Vectors   : %.4f\n' % accuracy3)

GB, N-Gram Vectors   : 0.7430

CPU times: user 5.12 s, sys: 6.33 ms, total: 5.13 s
Wall time: 5.13 s


In [58]:
%%time
# Gradient Boosting on Character Level TF IDF Vectors
accuracy4 = train_model(GradientBoostingClassifier(), X_train_tfidf_ngram_chars, y_train, X_test_tfidf_ngram_chars)
print('GB, CharLevel Vectors: %.4f\n' % accuracy4)

GB, CharLevel Vectors: 0.8120

CPU times: user 1min 19s, sys: 240 ms, total: 1min 20s
Wall time: 1min 20s


In [59]:
results.loc['Gradient Boosting'] = {
    'Count Vectors': accuracy1,
    'WordLevel TF-IDF': accuracy2,
    'N-Gram Vectors': accuracy3,
    'CharLevel Vectors': accuracy4}

In [60]:
results

Unnamed: 0,Count Vectors,WordLevel TF-IDF,N-Gram Vectors,CharLevel Vectors
Naïve Bayes,0.8335,0.8355,0.8365,0.806
Logistic Regression,0.8565,0.875,0.8515,0.8455
Support Vector Machine,0.843,0.8615,0.8305,0.8495
Random Forest,0.813,0.8245,0.7915,0.79
Gradient Boosting,0.804,0.797,0.743,0.812




---



---



> > > > > > > > > © 2021 Institute of Data


---



---



