## General information

In this kernel I'll work with data from Movie Review Sentiment Analysis Playground Competition.

This dataset is interesting for NLP researching. Sentences from original dataset were split in separate phrases and each of them has a sentiment label. Also a lot of phrases are really short which makes classifying them quite challenging. Let's try!

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from nltk.tokenize import TweetTokenizer
import datetime
import lightgbm as lgb
from scipy import stats
from scipy.sparse import hstack, csr_matrix
from sklearn.model_selection import train_test_split, cross_val_score
from wordcloud import WordCloud
from collections import Counter
from nltk.corpus import stopwords
from nltk.util import ngrams
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
pd.set_option('max_colwidth',400)

In [6]:
import zipfile

# Specify the path to the zip file
zip_path = 'train.tsv.zip'

# Open the zip file
with zipfile.ZipFile(zip_path, 'r') as zip_file:
    # Extract the TSV file from the zip
    tsv_filename = zip_file.namelist()[0]  # Assumes the TSV file is the first file in the zip
    with zip_file.open(tsv_filename) as tsv_file:
        # Read the TSV file using pandas
        train = pd.read_csv(tsv_file, delimiter='\t')

# Repeat the same process for the test data
test_zip_path = 'test.tsv.zip'
with zipfile.ZipFile(test_zip_path, 'r') as test_zip_file:
    test_tsv_filename = test_zip_file.namelist()[0]
    with test_zip_file.open(test_tsv_filename) as test_tsv_file:
        test = pd.read_csv(test_tsv_file, delimiter='\t')
sub = pd.read_csv('sampleSubmission.csv', sep=',')

In [7]:
train.head(10)

Unnamed: 0,PhraseId,SentenceId,Phrase,Sentiment
0,1,1,"A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story .",1
1,2,1,A series of escapades demonstrating the adage that what is good for the goose,2
2,3,1,A series,2
3,4,1,A,2
4,5,1,series,2
5,6,1,of escapades demonstrating the adage that what is good for the goose,2
6,7,1,of,2
7,8,1,escapades demonstrating the adage that what is good for the goose,2
8,9,1,escapades,2
9,10,1,demonstrating the adage that what is good for the goose,2


In [8]:
train.loc[train.SentenceId == 2]

Unnamed: 0,PhraseId,SentenceId,Phrase,Sentiment
63,64,2,"This quiet , introspective and entertaining independent is worth seeking .",4
64,65,2,"This quiet , introspective and entertaining independent",3
65,66,2,This,2
66,67,2,"quiet , introspective and entertaining independent",4
67,68,2,"quiet , introspective and entertaining",3
68,69,2,quiet,2
69,70,2,", introspective and entertaining",3
70,71,2,introspective and entertaining,3
71,72,2,introspective and,3
72,73,2,introspective,2


In [9]:
print('Average count of phrases per sentence in train is {0:.0f}.'.format(train.groupby('SentenceId')['Phrase'].count().mean()))
print('Average count of phrases per sentence in test is {0:.0f}.'.format(test.groupby('SentenceId')['Phrase'].count().mean()))

Average count of phrases per sentence in train is 18.
Average count of phrases per sentence in test is 20.


In [10]:
print('Number of phrases in train: {}. Number of sentences in train: {}.'.format(train.shape[0], len(train.SentenceId.unique())))
print('Number of phrases in test: {}. Number of sentences in test: {}.'.format(test.shape[0], len(test.SentenceId.unique())))

Number of phrases in train: 156060. Number of sentences in train: 8529.
Number of phrases in test: 66292. Number of sentences in test: 3310.


In [11]:
print('Average word length of phrases in train is {0:.0f}.'.format(np.mean(train['Phrase'].apply(lambda x: len(x.split())))))
print('Average word length of phrases in test is {0:.0f}.'.format(np.mean(test['Phrase'].apply(lambda x: len(x.split())))))

Average word length of phrases in train is 7.
Average word length of phrases in test is 7.


We can see than sentences were split in 18-20 phrases at average and a lot of phrases contain each other. Sometimes one word or even one punctuation mark influences the sentiment

Let's see for example most common trigrams for positive phrases

In [12]:
text = ' '.join(train.loc[train.Sentiment == 4, 'Phrase'].values)
text_trigrams = [i for i in ngrams(text.split(), 3)]

In [13]:
Counter(text_trigrams).most_common(30)

[(('one', 'of', 'the'), 199),
 (('of', 'the', 'year'), 103),
 (('.', 'is', 'a'), 87),
 (('of', 'the', 'best'), 80),
 (('of', 'the', 'most'), 70),
 (('is', 'one', 'of'), 50),
 (('One', 'of', 'the'), 43),
 ((',', 'and', 'the'), 40),
 (('the', 'year', "'s"), 38),
 (('It', "'s", 'a'), 38),
 (('it', "'s", 'a'), 37),
 (('.', "'s", 'a'), 37),
 (('a', 'movie', 'that'), 35),
 (('the', 'edge', 'of'), 34),
 (('the', 'kind', 'of'), 33),
 (('of', 'your', 'seat'), 33),
 (('the', 'film', 'is'), 31),
 ((',', 'this', 'is'), 31),
 (('the', 'film', "'s"), 31),
 ((',', 'the', 'film'), 30),
 (('film', 'that', 'is'), 30),
 (('as', 'one', 'of'), 30),
 (('edge', 'of', 'your'), 29),
 ((',', 'it', "'s"), 27),
 (('a', 'film', 'that'), 27),
 (('as', 'well', 'as'), 27),
 ((',', 'funny', ','), 25),
 ((',', 'but', 'it'), 23),
 (('films', 'of', 'the'), 23),
 (('some', 'of', 'the'), 23)]

In [14]:
text = ' '.join(train.loc[train.Sentiment == 4, 'Phrase'].values)
text = [i for i in text.split() if i not in stopwords.words('english')]
text_trigrams = [i for i in ngrams(text, 3)]
Counter(text_trigrams).most_common(30)

[((',', 'funny', ','), 33),
 (('one', 'year', "'s"), 28),
 (('year', "'s", 'best'), 26),
 (('movies', 'ever', 'made'), 19),
 ((',', 'solid', 'cast'), 19),
 (('solid', 'cast', ','), 18),
 (("'ve", 'ever', 'seen'), 16),
 (('.', 'It', "'s"), 16),
 ((',', 'making', 'one'), 15),
 (('best', 'films', 'year'), 15),
 ((',', 'touching', ','), 15),
 (('exquisite', 'acting', ','), 15),
 (('acting', ',', 'inventive'), 14),
 ((',', 'inventive', 'screenplay'), 14),
 (('jaw-dropping', 'action', 'sequences'), 14),
 (('good', 'acting', ','), 14),
 (("'s", 'best', 'films'), 14),
 (('I', "'ve", 'seen'), 14),
 (('funny', ',', 'even'), 14),
 (('best', 'war', 'movies'), 13),
 (('purely', 'enjoyable', 'satisfying'), 13),
 (('funny', ',', 'touching'), 13),
 ((',', 'smart', ','), 13),
 (('inventive', 'screenplay', ','), 13),
 (('funniest', 'jokes', 'movie'), 13),
 (('action', 'sequences', ','), 13),
 (('sequences', ',', 'striking'), 13),
 ((',', 'striking', 'villains'), 13),
 (('exquisite', 'motion', 'picture')

The results show the main problem with this dataset: there are to many common words due to sentenced splitted in phrases. As a result stopwords shouldn't be removed from text.

### Thoughts on feature processing and engineering

So, we have only phrases as data. And a phrase can contain a single word. And one punctuation mark can cause phrase to receive a different sentiment. Also assigned sentiments can be strange. This means several things:
- using stopwords can be a bad idea, especially when phrases contain one single stopword;
- puntuation could be important, so it should be used;
- ngrams are necessary to get the most info from data;
- using features like word count or sentence length won't be useful;

In [15]:
tokenizer = TweetTokenizer()

In [16]:
vectorizer = TfidfVectorizer(ngram_range=(1, 2), tokenizer=tokenizer.tokenize)
full_text = list(train['Phrase'].values) + list(test['Phrase'].values)
vectorizer.fit(full_text)
train_vectorized = vectorizer.transform(train['Phrase'])
test_vectorized = vectorizer.transform(test['Phrase'])

In [17]:
y = train['Sentiment']

In [18]:
logreg = LogisticRegression()
ovr = OneVsRestClassifier(logreg)

In [19]:
%%time
ovr.fit(train_vectorized, y)



CPU times: user 7.54 s, sys: 25 ms, total: 7.56 s
Wall time: 7.56 s


OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False),
          n_jobs=None)

In [20]:
scores = cross_val_score(ovr, train_vectorized, y, scoring='accuracy', n_jobs=-1, cv=3)
print('Cross-validation mean accuracy {0:.2f}%, std {1:.2f}.'.format(np.mean(scores) * 100, np.std(scores) * 100))

Cross-validation mean accuracy 56.55%, std 0.07.


In [21]:
%%time
svc = LinearSVC(dual=False)
scores = cross_val_score(svc, train_vectorized, y, scoring='accuracy', n_jobs=-1, cv=3)
print('Cross-validation mean accuracy {0:.2f}%, std {1:.2f}.'.format(np.mean(scores) * 100, np.std(scores) * 100))

Cross-validation mean accuracy 56.51%, std 0.68.
CPU times: user 67.4 ms, sys: 13.2 ms, total: 80.7 ms
Wall time: 13.7 s


In [22]:
ovr.fit(train_vectorized, y);
svc.fit(train_vectorized, y);



## Deep learning
And now let's try DL. DL should work better for text classification with multiple layers. I use an architecture similar to those which were used in toxic competition.

In [23]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation, Conv1D, GRU, CuDNNGRU, CuDNNLSTM, BatchNormalization
from keras.layers import Bidirectional, GlobalMaxPool1D, MaxPooling1D, Add, Flatten
from keras.layers import GlobalAveragePooling1D, GlobalMaxPooling1D, concatenate, SpatialDropout1D
from keras.models import Model, load_model
from keras import initializers, regularizers, constraints, optimizers, layers, callbacks
from keras import backend as K
from keras.engine import InputSpec, Layer
from keras.optimizers import Adam

from keras.callbacks import ModelCheckpoint, TensorBoard, Callback, EarlyStopping

Using TensorFlow backend.


In [24]:
tk = Tokenizer(lower = True, filters='')
tk.fit_on_texts(full_text)

In [25]:
train_tokenized = tk.texts_to_sequences(train['Phrase'])
test_tokenized = tk.texts_to_sequences(test['Phrase'])

In [26]:
max_len = 50
X_train = pad_sequences(train_tokenized, maxlen = max_len)
X_test = pad_sequences(test_tokenized, maxlen = max_len)

In [27]:
embedding_path = "../input/fasttext-crawl-300d-2m/crawl-300d-2M.vec"

In [28]:
embed_size = 300
max_features = 30000

In [29]:
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embedding_index = dict(get_coefs(*o.strip().split(" ")) for o in open(embedding_path))

word_index = tk.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.zeros((nb_words + 1, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embedding_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector

In [30]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse=False)
y_ohe = ohe.fit_transform(y.values.reshape(-1, 1))

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


In [31]:
def build_model1(lr=0.0, lr_d=0.0, units=0, spatial_dr=0.0, kernel_size1=3, kernel_size2=2, dense_units=128, dr=0.1, conv_size=32):
    file_path = "best_model.hdf5"
    check_point = ModelCheckpoint(file_path, monitor = "val_loss", verbose = 1,
                                  save_best_only = True, mode = "min")
    early_stop = EarlyStopping(monitor = "val_loss", mode = "min", patience = 3)
    
    inp = Input(shape = (max_len,))
    x = Embedding(19479, embed_size, weights = [embedding_matrix], trainable = False)(inp)
    x1 = SpatialDropout1D(spatial_dr)(x)

    x_gru = Bidirectional(CuDNNGRU(units, return_sequences = True))(x1)
    x1 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool1_gru = GlobalAveragePooling1D()(x1)
    max_pool1_gru = GlobalMaxPooling1D()(x1)
    
    x3 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool3_gru = GlobalAveragePooling1D()(x3)
    max_pool3_gru = GlobalMaxPooling1D()(x3)
    
    x_lstm = Bidirectional(CuDNNLSTM(units, return_sequences = True))(x1)
    x1 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool1_lstm = GlobalAveragePooling1D()(x1)
    max_pool1_lstm = GlobalMaxPooling1D()(x1)
    
    x3 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool3_lstm = GlobalAveragePooling1D()(x3)
    max_pool3_lstm = GlobalMaxPooling1D()(x3)
    
    
    x = concatenate([avg_pool1_gru, max_pool1_gru, avg_pool3_gru, max_pool3_gru,
                    avg_pool1_lstm, max_pool1_lstm, avg_pool3_lstm, max_pool3_lstm])
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(dense_units, activation='relu') (x))
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(int(dense_units / 2), activation='relu') (x))
    x = Dense(5, activation = "sigmoid")(x)
    model = Model(inputs = inp, outputs = x)
    model.compile(loss = "binary_crossentropy", optimizer = Adam(lr = lr, decay = lr_d), metrics = ["accuracy"])
    history = model.fit(X_train, y_ohe, batch_size = 128, epochs = 20, validation_split=0.1, 
                        verbose = 1, callbacks = [check_point, early_stop])
    model = load_model(file_path)
    return model

An attempt at ensemble:

In [32]:
model1 = build_model1(lr = 1e-3, lr_d = 1e-10, units = 64, spatial_dr = 0.3, kernel_size1=3, kernel_size2=2, dense_units=32, dr=0.1, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.31512, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss did not improve from 0.31512
Epoch 3/20

Epoch 00003: val_loss improved from 0.31512 to 0.31410, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.31410 to 0.30845, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 0.30845 to 0.30507, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 0.30507 to 0.30256, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss improved from 0.30256 to 0.30023, saving model to best_model.hdf5
Epoch 8/20

Epoch 00008: val_loss did not improve from 0.30023
Epoch 9/20

Epoch 00009: val_loss did not improve from 0.30023
Epoch 10/20

Epoch 00010: val_loss did not improve from 0.30023


In [33]:
model2 = build_model1(lr = 1e-3, lr_d = 1e-10, units = 128, spatial_dr = 0.5, kernel_size1=3, kernel_size2=2, dense_units=64, dr=0.2, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.32355, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.32355 to 0.31827, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.31827 to 0.30671, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss did not improve from 0.30671
Epoch 5/20

Epoch 00005: val_loss improved from 0.30671 to 0.30012, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss did not improve from 0.30012
Epoch 7/20

Epoch 00007: val_loss did not improve from 0.30012
Epoch 8/20

Epoch 00008: val_loss did not improve from 0.30012


In [34]:
def build_model2(lr=0.0, lr_d=0.0, units=0, spatial_dr=0.0, kernel_size1=3, kernel_size2=2, dense_units=128, dr=0.1, conv_size=32):
    file_path = "best_model.hdf5"
    check_point = ModelCheckpoint(file_path, monitor = "val_loss", verbose = 1,
                                  save_best_only = True, mode = "min")
    early_stop = EarlyStopping(monitor = "val_loss", mode = "min", patience = 3)

    inp = Input(shape = (max_len,))
    x = Embedding(19479, embed_size, weights = [embedding_matrix], trainable = False)(inp)
    x1 = SpatialDropout1D(spatial_dr)(x)

    x_gru = Bidirectional(CuDNNGRU(units, return_sequences = True))(x1)
    x_lstm = Bidirectional(CuDNNLSTM(units, return_sequences = True))(x1)
    
    x_conv1 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool1_gru = GlobalAveragePooling1D()(x_conv1)
    max_pool1_gru = GlobalMaxPooling1D()(x_conv1)
    
    x_conv2 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool2_gru = GlobalAveragePooling1D()(x_conv2)
    max_pool2_gru = GlobalMaxPooling1D()(x_conv2)
    
    
    x_conv3 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool1_lstm = GlobalAveragePooling1D()(x_conv3)
    max_pool1_lstm = GlobalMaxPooling1D()(x_conv3)
    
    x_conv4 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool2_lstm = GlobalAveragePooling1D()(x_conv4)
    max_pool2_lstm = GlobalMaxPooling1D()(x_conv4)
    
    
    x = concatenate([avg_pool1_gru, max_pool1_gru, avg_pool2_gru, max_pool2_gru,
                    avg_pool1_lstm, max_pool1_lstm, avg_pool2_lstm, max_pool2_lstm])
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(dense_units, activation='relu') (x))
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(int(dense_units / 2), activation='relu') (x))
    x = Dense(5, activation = "sigmoid")(x)
    model = Model(inputs = inp, outputs = x)
    model.compile(loss = "binary_crossentropy", optimizer = Adam(lr = lr, decay = lr_d), metrics = ["accuracy"])
    history = model.fit(X_train, y_ohe, batch_size = 128, epochs = 20, validation_split=0.1, 
                        verbose = 1, callbacks = [check_point, early_stop])
    model = load_model(file_path)
    return model

In [35]:
model3 = build_model2(lr = 1e-4, lr_d = 0, units = 64, spatial_dr = 0.5, kernel_size1=4, kernel_size2=3, dense_units=32, dr=0.1, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.40366, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.40366 to 0.34614, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.34614 to 0.33270, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.33270 to 0.32570, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 0.32570 to 0.32150, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 0.32150 to 0.32047, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss improved from 0.32047 to 0.32020, saving model to best_model.hdf5
Epoch 8/20

Epoch 00008: val_loss improved from 0.32020 to 0.31495, saving model to best_model.hdf5
Epoch 9/20

Epoch 00009: val_loss improved from 0.31495 to 0.31366, saving model to best_model.hdf5
Epoch 10/20

Epoch 00010: val_loss improved from 0.31

In [36]:
model4 = build_model2(lr = 1e-3, lr_d = 0, units = 64, spatial_dr = 0.5, kernel_size1=3, kernel_size2=3, dense_units=64, dr=0.3, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.32160, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.32160 to 0.31827, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.31827 to 0.30794, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.30794 to 0.30549, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 0.30549 to 0.30330, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 0.30330 to 0.30257, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss did not improve from 0.30257
Epoch 8/20

Epoch 00008: val_loss improved from 0.30257 to 0.30079, saving model to best_model.hdf5
Epoch 9/20

Epoch 00009: val_loss did not improve from 0.30079
Epoch 10/20

Epoch 00010: val_loss did not improve from 0.30079
Epoch 11/20

Epoch 00011: val_loss improved from 0.30079 to 0.2

In [37]:
model5 = build_model2(lr = 1e-3, lr_d = 1e-7, units = 64, spatial_dr = 0.3, kernel_size1=3, kernel_size2=3, dense_units=64, dr=0.4, conv_size=64)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.31997, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.31997 to 0.31341, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.31341 to 0.30669, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.30669 to 0.30274, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss did not improve from 0.30274
Epoch 6/20

Epoch 00006: val_loss did not improve from 0.30274
Epoch 7/20

Epoch 00007: val_loss did not improve from 0.30274


In [38]:
pred1 = model1.predict(X_test, batch_size = 1024, verbose = 1)
pred = pred1
pred2 = model2.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred2
pred3 = model3.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred3
pred4 = model4.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred4
pred5 = model5.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred5



In [39]:
predictions = np.round(np.argmax(pred, axis=1)).astype(int)
sub['Sentiment'] = predictions
sub.to_csv("blend.csv", index=False)