## Model Prediction and Discussion

* part 1: the classification algorithms MultinomialNB (Multinomial Naive Bayes Classification) and the Linear Support Vector Classification (LinearSVC) will be used on movies data.
* part 2:  conclusion

## 1. Libraries and loading preprocessed data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import re
import warnings


import pickle 
#import mglearn
import time


from nltk.tokenize import TweetTokenizer # doesn't split at apostrophes
import nltk
from nltk import Text
from nltk.tokenize import regexp_tokenize
from nltk.tokenize import word_tokenize  
from nltk.tokenize import sent_tokenize 
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer


from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression 
from sklearn.naive_bayes import MultinomialNB
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import Perceptron
from sklearn.linear_model import PassiveAggressiveClassifier

from sklearn.neural_network import MLPClassifier



from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline

from sklearn.metrics import accuracy_score
from sklearn.metrics import hamming_loss
from sklearn.metrics import jaccard_score
from sklearn.metrics import cohen_kappa_score

from sklearn.svm import LinearSVC

In [2]:
movies = pd.read_csv('movies_preprocessed.csv', delimiter=',')
# movies.dataframeName = 'wiki_movie_plots_deduped.csv'
movies = movies.drop(columns="id")
nRow, nCol = movies.shape
print(f'There are {nRow} rows and {nCol} columns')

There are 34886 rows and 11 columns


In [3]:
movies.head()

Unnamed: 0,PlotClean,TitleClean,MainGenresCount,action,animation,comedy,crime,drama,musical,romance,thriller
0,a bartender is working at a saloon serving dr...,kansas saloon smashers,0,0,0,0,0,0,0,0,0
1,the moon painted with a smiling face hangs ov...,love by the light of the moon,0,0,0,0,0,0,0,0,0
2,the film just over a minute long is composed...,the martyred presidents,0,0,0,0,0,0,0,0,0
3,lasting just 61 seconds and consisting of two ...,terrible teddy the grizzly king,0,0,0,0,0,0,0,0,0
4,the earliest known adaptation of the classic f...,jack and the beanstalk,0,0,0,0,0,0,0,0,0


## 2. Feature Engineering

**Train and Test split**

In [4]:
# the train and the test data set will be build when there is at least one genre for a movie
MoviesTrain, MoviesTest = train_test_split(movies[movies.MainGenresCount!=0], random_state=42, test_size=0.20, shuffle=True)

In [5]:
MoviesTrain.head()

Unnamed: 0,PlotClean,TitleClean,MainGenresCount,action,animation,comedy,crime,drama,musical,romance,thriller
30733,the story is told through the protagonist muru...,veyil,1,0,0,0,0,1,0,0,0
1040,eddie haskins lease a wisecracking young ma...,troopers three,1,0,0,1,0,0,0,0,0
16958,five days after the assault on the abnegation ...,the divergent series: insurgent,1,1,0,0,0,0,0,0,0
5844,tom is busy designing a mousetrap in the attic...,designs on jerry,1,0,1,0,0,0,0,0,0
2040,alan colby heir to a vast fortune reappears ...,charlie chan secret,1,0,0,1,0,0,0,0,0


**Features**

In [6]:
# definition the algorithm for feature extraction
tfidf = TfidfVectorizer(stop_words ='english', smooth_idf=False, sublinear_tf=False, norm=None, analyzer='word')

In [7]:
# building the features
x_train = tfidf.fit_transform(MoviesTrain.PlotClean) 
x_test  = tfidf.transform(MoviesTest.PlotClean)
### for test data, the feature extraction will be done through the function transform()
### to make sure there is no features dimensionality mismatch

In [8]:
print('nrow of the MoviesTrain ={}'. format(MoviesTrain.shape[0]))
print('nrow of the MoviesTest ={}'. format(MoviesTest.shape[0]))

nrow of the MoviesTrain =21065
nrow of the MoviesTest =5267


**Building the classes**

In [9]:
# building the classes
y_train = MoviesTrain[MoviesTrain.columns[3:]]
y_test = MoviesTest[MoviesTest.columns[3:]]

In [10]:
print('number of y_train classes',len(y_train.columns))
print('number of y_test classes',len(y_test.columns))

number of y_train classes 8
number of y_test classes 8


## 3. Model Selection and Parameter Adjustment

### 3.1 Model Selection

**Multinomial Naive Bayes**

In [215]:
multinomialNB=OneVsRestClassifier(MultinomialNB(fit_prior=True, class_prior=None))

In [216]:
# fitting
multinomialNB.fit(x_train, y_train.action)

OneVsRestClassifier(estimator=MultinomialNB())

In [217]:
# compute the testing accuracy for plot
prediction = multinomialNB.predict(x_test)

In [235]:
print('Test accuracy is {}'.format(accuracy_score(y_test.action, prediction)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction)))

Test accuracy is 0.8241883425099678
Test hamming loss is 0.17581165749003227
Test jaccard score is 0.45271867612293143


**Linear SVC**

In [219]:
linearSVC=OneVsRestClassifier(LinearSVC(), n_jobs=1)

In [220]:
# fitting
linearSVC.fit(x_train, y_train.action)



OneVsRestClassifier(estimator=LinearSVC(), n_jobs=1)

In [221]:
# compute the testing accuracy for title
prediction_lrSVC = linearSVC.predict(x_test)

In [223]:
print('Test accuracy is {}'.format(accuracy_score(y_test.action, prediction_lrSVC)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_lrSVC)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_lrSVC)))

Test accuracy is 0.8023542813745965
Test hamming loss is 0.19764571862540345
Test jaccard score is 0.39158386908240794


**Logistic Regression**

In [65]:
LR=OneVsRestClassifier(LogisticRegression(), n_jobs=1)

In [66]:
# fitting
LR.fit(x_train, y_train.action)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


OneVsRestClassifier(estimator=LogisticRegression(), n_jobs=1)

In [67]:
# compute the testing accuracy for title
prediction_LR = LR.predict(x_test)

In [224]:
print('Test accuracy for Logistic Regression is {}'.format(accuracy_score(y_test.action, prediction_LR)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_LR)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_LR)))

Test accuracy for Logistic Regression is 0.8361496107841276
Test hamming loss is 0.16385038921587242
Test jaccard score is 0.445016077170418


In [237]:
len(prediction)

5267

**SGDClassifier**

In [83]:
SGD=OneVsRestClassifier(SGDClassifier(), n_jobs=1)

In [84]:
# fitting
SGD.fit(x_train, y_train.action)

OneVsRestClassifier(estimator=SGDClassifier(), n_jobs=1)

In [85]:
# compute the testing accuracy for title
prediction_SGD = SGD.predict(x_test)

In [225]:
print('Test accuracy for SGD Classifier is {}'.format(accuracy_score(y_test.action, prediction_SGD)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_SGD)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_SGD)))

Test accuracy for SGD Classifier is 0.8414657300170876
Test hamming loss is 0.15853426998291248
Test jaccard score is 0.40947666195190946


**Dummy Classifier**

In [87]:
dummy=OneVsRestClassifier(DummyClassifier(), n_jobs=1)

In [88]:
dummy.fit(x_train, y_train.action)



OneVsRestClassifier(estimator=DummyClassifier(), n_jobs=1)

In [89]:
prediction_dummy = dummy.predict(x_test)

In [226]:
print('Test accuracy for Dummy Classifier is {}'.format(accuracy_score(y_test.action, prediction_dummy)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_dummy)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_dummy)))

Test accuracy for Dummy Classifier is 0.6417315359787356
Test hamming loss is 0.3582684640212645
Test jaccard score is 0.12314126394052044


**Perceptron**

In [91]:
perceptron=OneVsRestClassifier(Perceptron(), n_jobs=1)

In [92]:
perceptron.fit(x_train, y_train.action)

OneVsRestClassifier(estimator=Perceptron(), n_jobs=1)

In [93]:
prediction_perceptron = perceptron.predict(x_test)

In [227]:
print('Test accuracy for Perceptron is {}'.format(accuracy_score(y_test.action, prediction_perceptron)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_perceptron)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_perceptron)))

Test accuracy for Perceptron is 0.8105183216252136
Test hamming loss is 0.1894816783747864
Test jaccard score is 0.41671537112799534


**Passive Aggressive Classifier**

In [95]:
PAC=OneVsRestClassifier(Perceptron(), n_jobs=1)

In [96]:
PAC.fit(x_train, y_train.action)

OneVsRestClassifier(estimator=Perceptron(), n_jobs=1)

In [97]:
prediction_PAC = PAC.predict(x_test)

In [228]:
print('Test accuracy for Passive Aggressive Classifier is {}'.format(accuracy_score(y_test.action, prediction_PAC)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_PAC)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_PAC)))

Test accuracy for Passive Aggressive Classifier is 0.8105183216252136
Test hamming loss is 0.1894816783747864
Test jaccard score is 0.41671537112799534


**Multi-layer Perceptron Classifier**

In [100]:
MPC=OneVsRestClassifier(MLPClassifier(), n_jobs=1)

In [101]:
MPC.fit(x_train, y_train.action)

OneVsRestClassifier(estimator=MLPClassifier(), n_jobs=1)

In [102]:
prediction_MPC = MPC.predict(x_test)

In [229]:
print('Test accuracy for Multi-layer Perceptron Classifier is {}'.format(accuracy_score(y_test.action, prediction_MPC)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_MPC)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_MPC)))

Test accuracy for Multi-layer Perceptron Classifier is 0.8264666793240935
Test hamming loss is 0.1735333206759066
Test jaccard score is 0.4061078622482131


Out of the above eight models, we choose two models that perform best among them, Multinomial Naive Bayes and Logistic Regression.

### 3.2 Parameter Adjustment

**Naive Bayse**

In [24]:
def naive_bayes_find_optimal_alpha(x_tr, y_tr, alpha_value):    
    NB = OneVsRestClassifier(MultinomialNB(),n_jobs=-1)
    gsv = GridSearchCV(NB,alpha_value,cv=5,verbose=1, scoring="jaccard_samples", n_jobs=-1)
    gsv.fit(x_tr,y_tr)
    print("Best HyperParameter: ",gsv.best_params_)
    print(gsv.best_score_)
    optimal_alpha=gsv.best_params_['estimator__alpha']

    print(gsv.cv_results_)
    return gsv.best_params_

In [19]:
import sklearn
sklearn.metrics.SCORERS.keys()

dict_keys(['explained_variance', 'r2', 'max_error', 'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_root_mean_squared_error', 'neg_mean_poisson_deviance', 'neg_mean_gamma_deviance', 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', 'roc_auc_ovo_weighted', 'balanced_accuracy', 'average_precision', 'neg_log_loss', 'neg_brier_score', 'adjusted_rand_score', 'homogeneity_score', 'completeness_score', 'v_measure_score', 'mutual_info_score', 'adjusted_mutual_info_score', 'normalized_mutual_info_score', 'fowlkes_mallows_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted'])

In [25]:
params_NB = {"estimator__alpha": range(1,30)}
best_params_NB = naive_bayes_find_optimal_alpha(x_train,y_train,params_NB)

Fitting 5 folds for each of 29 candidates, totalling 145 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:    4.3s
[Parallel(n_jobs=-1)]: Done 145 out of 145 | elapsed:   12.5s finished


Best HyperParameter:  {'estimator__alpha': 10}
0.45406044782023897
{'mean_fit_time': array([0.32707419, 0.58993211, 0.65305367, 0.50021644, 0.57445512,
       0.53791962, 0.67955341, 0.46830401, 0.57412782, 0.47910743,
       0.63294063, 0.58573365, 0.52311692, 0.54852324, 0.56332612,
       0.51911626, 0.60373621, 0.63134127, 0.64394541, 0.41009111,
       0.64934616, 0.57873874, 0.49082994, 0.58433137, 0.5679275 ,
       0.52193799, 0.51555738, 0.57212873, 0.48590708]), 'std_fit_time': array([0.10656675, 0.13639862, 0.13872548, 0.03230594, 0.13397829,
       0.14067726, 0.09632405, 0.10938606, 0.07851424, 0.1616828 ,
       0.11679968, 0.21635455, 0.07799447, 0.16965227, 0.09095298,
       0.15581883, 0.19158538, 0.11693236, 0.18467828, 0.11354185,
       0.13032955, 0.22433093, 0.10298772, 0.11201776, 0.24837081,
       0.11777207, 0.05315297, 0.04481139, 0.12551905]), 'mean_score_time': array([0.22905293, 0.20344634, 0.20386081, 0.21998382, 0.24845743,
       0.25406079, 0.30787139

In [51]:
multinomialNB=OneVsRestClassifier(MultinomialNB(alpha=10,fit_prior=True, class_prior=None))
# fitting
multinomialNB.fit(x_train, y_train.action)
# compute the testing accuracy for plot
prediction = multinomialNB.predict(x_test)
print('Test accuracy is {}'.format(accuracy_score(y_test.action, prediction)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction)))

Most Appropriate value of alpha is 21
Test accuracy is 0.8295044617429277
Test hamming loss is 0.17049553825707234
Test jaccard score is 0.46832445233866193


**Logistic Regression**

In [31]:
import warnings
warnings.filterwarnings('ignore')

In [33]:
def lr_find_optimal_iter_C(x_tr, y_tr, params):    
    NB = OneVsRestClassifier(LogisticRegression(n_jobs=-1), n_jobs=-1)
    gsv = GridSearchCV(NB,params,cv=5,verbose=1,scoring="jaccard_samples",n_jobs=-1)
    gsv.fit(x_tr,y_tr)
    print("Best HyperParameter: ",gsv.best_params_)
    print(gsv.best_score_)
    print(gsv.cv_results_)
    return gsv.best_params_

In [35]:
params_lr = {"estimator__max_iter": range(50,100,10),"estimator__C":range(1,3)}
best_params_lr = lr_find_optimal_iter_C(x_train,y_train,params_lr)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:  2.7min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:  4.3min finished


Best HyperParameter:  {'estimator__C': 1, 'estimator__max_iter': 50}
0.4594453675132527
{'mean_fit_time': array([46.20924468, 50.83672929, 58.54572034, 70.66774659, 73.03477802,
       44.59598885, 52.67742844, 62.0360127 , 64.75373311, 54.97174354]), 'std_fit_time': array([ 0.70963415,  0.60857938,  3.99551379,  4.45184145,  4.36338978,
        3.70739721,  4.21388023,  7.4445396 ,  3.6239914 , 10.17067473]), 'mean_score_time': array([0.14123831, 0.07922039, 0.22425218, 0.21185069, 0.1864449 ,
       0.23065686, 0.20184655, 0.23845463, 0.17644234, 0.04981728]), 'std_score_time': array([0.03528516, 0.01606999, 0.04882486, 0.03243846, 0.03184691,
       0.02277902, 0.0851054 , 0.06871339, 0.05940622, 0.02724754]), 'param_estimator__C': masked_array(data=[1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value='?',
            dtype=object), 'param_estimator__max_iter': masked_array(data

In [37]:
params_lr = {"estimator__max_iter": range(30,55,5),"estimator__C":[0.5,0.7,0.9,1.0]}
best_params_lr = lr_find_optimal_iter_C(x_train,y_train,params_lr)

Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:  4.9min finished


Best HyperParameter:  {'estimator__C': 1.0, 'estimator__max_iter': 30}
0.4664894374554949
{'mean_fit_time': array([28.96987314, 30.37439299, 32.77103291, 37.56064587, 43.61039214,
       26.08704433, 33.6825211 , 33.05276952, 39.531216  , 42.43573179,
       27.87503457, 30.39819078, 33.33431621, 39.53747616, 40.5701601 ,
       29.82984352, 29.39440603, 34.0923912 , 37.25837808, 33.03678131]), 'std_fit_time': array([1.32742194, 1.39387599, 1.47258014, 2.01841236, 5.16073515,
       0.71862739, 1.16385759, 1.32853713, 4.27210132, 4.17658074,
       1.25894632, 1.23451965, 2.53864118, 1.8398744 , 1.73524721,
       2.58080983, 2.95676908, 2.74738597, 2.19988203, 0.89584905]), 'mean_score_time': array([0.15383635, 0.09482293, 0.1407403 , 0.14923325, 0.14163175,
       0.17964149, 0.199651  , 0.15934176, 0.19505301, 0.12583008,
       0.1692378 , 0.2382545 , 0.12862668, 0.21935635, 0.14144769,
       0.19404478, 0.2178535 , 0.17183638, 0.1988677 , 0.04841628]), 'std_score_time': array([0.

In [41]:
LR=OneVsRestClassifier(LogisticRegression(max_iter=30,n_jobs=-1), n_jobs=-1)
# fitting
LR.fit(x_train, y_train.action)
# compute the testing accuracy for title
prediction_LR = LR.predict(x_test)
print('Test accuracy for Logistic Regression is {}'.format(accuracy_score(y_test.action, prediction_LR)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_LR)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_LR)))

Test accuracy for Logistic Regression is 0.8370989177900133
Test hamming loss is 0.1629010822099867
Test jaccard score is 0.46037735849056605


In [40]:
params_lr = {"estimator__max_iter": range(10,31,4),"estimator__C":[0.9,1.0,1.1]}
best_params_lr = lr_find_optimal_iter_C(x_train,y_train,params_lr)

Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:   44.3s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:  2.3min finished


Best HyperParameter:  {'estimator__C': 0.9, 'estimator__max_iter': 14}
0.47425033626078017
{'mean_fit_time': array([10.79083481, 12.6038075 , 15.4448132 , 20.08197556, 21.59715409,
       22.08573685, 13.02749672, 13.74131179, 15.26095014, 19.23723693,
       23.11630354, 24.84412384, 12.28757834, 12.47388487, 15.35089164,
       19.61385007, 21.50452929, 19.19777422]), 'std_fit_time': array([0.31371315, 0.21281008, 1.94617473, 0.48377134, 0.76740262,
       1.09422611, 1.62552168, 1.13243627, 0.33252296, 0.99309106,
       0.93054178, 0.80103273, 1.61057636, 0.27120055, 0.8253063 ,
       0.28507233, 0.38541994, 2.00338824]), 'mean_score_time': array([0.15434175, 0.07383428, 0.11282787, 0.15283546, 0.12242737,
       0.12062931, 0.20084906, 0.15603638, 0.13463058, 0.15685501,
       0.17684398, 0.12983122, 0.10682507, 0.17504296, 0.15155897,
       0.19094868, 0.17604136, 0.05440888]), 'std_score_time': array([0.02718226, 0.00951801, 0.04508634, 0.05948402, 0.05376229,
       0.016146

In [42]:
LR=OneVsRestClassifier(LogisticRegression(max_iter=14,C=0.9,n_jobs=-1), n_jobs=-1)
# fitting
LR.fit(x_train, y_train.action)
# compute the testing accuracy for title
prediction_LR = LR.predict(x_test)
print('Test accuracy for Logistic Regression is {}'.format(accuracy_score(y_test.action, prediction_LR)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_LR)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_LR)))

Test accuracy for Logistic Regression is 0.8435542054300361
Test hamming loss is 0.15644579456996394
Test jaccard score is 0.4717948717948718


In [48]:
params_lr = {"estimator__max_iter": range(10,21,2),"estimator__C":[0.8,0.9,1.0]}
best_params_lr = lr_find_optimal_iter_C(x_train,y_train,params_lr)

Fitting 5 folds for each of 18 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:   38.8s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:  1.8min finished


Best HyperParameter:  {'estimator__C': 0.9, 'estimator__max_iter': 12}
0.4761500118680275
{'mean_fit_time': array([11.22389145, 12.07196565, 12.59103189, 14.484864  , 15.59611602,
       16.33881149, 12.30731945, 12.16377692, 12.7169837 , 13.84400239,
       16.25920181, 18.59086065,  9.0995348 , 11.45864334, 13.44876671,
       13.03455753, 15.78513989, 15.28683038]), 'std_fit_time': array([0.46271374, 0.4614287 , 1.12800013, 0.85283095, 0.98032738,
       0.64444384, 0.7124524 , 1.38412011, 1.02352964, 0.75104017,
       1.71389188, 1.21261996, 0.96241649, 1.34864637, 0.93310661,
       0.8265542 , 1.54380105, 2.2765006 ]), 'mean_score_time': array([0.16175056, 0.07864075, 0.16687055, 0.14073787, 0.11783843,
       0.10994506, 0.14135079, 0.21048183, 0.11804814, 0.1096262 ,
       0.15544047, 0.09782429, 0.08723159, 0.12414694, 0.10712895,
       0.13623252, 0.13984623, 0.05141473]), 'std_score_time': array([0.05460504, 0.02614031, 0.09341718, 0.02749621, 0.04215447,
       0.0410252

In [49]:
LR=OneVsRestClassifier(LogisticRegression(max_iter=12,C=0.9,n_jobs=-1), n_jobs=-1)
# fitting
LR.fit(x_train, y_train.action)
# compute the testing accuracy for title
prediction_LR = LR.predict(x_test)
print('Test accuracy for Logistic Regression is {}'.format(accuracy_score(y_test.action, prediction_LR)))
print('Test hamming loss is {}'.format(hamming_loss(y_test.action, prediction_LR)))
print('Test jaccard score is {}'.format(jaccard_score(y_test.action, prediction_LR)))

Test accuracy for Logistic Regression is 0.8469717106512246
Test hamming loss is 0.15302828934877538
Test jaccard score is 0.4762833008447043
