<a href="https://colab.research.google.com/github/ZIYUNCHEN/SentimentAnalysis/blob/jesse/SentimentAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## General information

In this kernel I'll work with data from Movie Review Sentiment Analysis Playground Competition.

This dataset is interesting for NLP researching. Sentences from original dataset were split in separate phrases and each of them has a sentiment label. Also a lot of phrases are really short which makes classifying them quite challenging. Let's try!

In [1]:
!pip install lightgbm wordcloud
!pip install -q pydot




In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from nltk.tokenize import TweetTokenizer
import datetime
import lightgbm as lgb
from scipy import stats
from scipy.sparse import hstack, csr_matrix
from sklearn.model_selection import train_test_split, cross_val_score
from wordcloud import WordCloud
from collections import Counter
from nltk.corpus import stopwords
from nltk.util import ngrams
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
pd.set_option('max_colwidth',400)
from google.colab import drive
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [3]:
!wget https://raw.githubusercontent.com/MangoHaha/DeepLearning/master/dataset/train.tsv
!wget https://raw.githubusercontent.com/MangoHaha/DeepLearning/master/dataset/test.tsv
!wget https://raw.githubusercontent.com/MangoHaha/DeepLearning/master/dataset/sampleSubmission.csv

--2018-11-11 23:32:59--  https://raw.githubusercontent.com/MangoHaha/DeepLearning/master/dataset/train.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8481022 (8.1M) [text/plain]
Saving to: ‘train.tsv.3’


2018-11-11 23:32:59 (21.7 MB/s) - ‘train.tsv.3’ saved [8481022/8481022]

--2018-11-11 23:33:00--  https://raw.githubusercontent.com/MangoHaha/DeepLearning/master/dataset/test.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3367149 (3.2M) [text/plain]
Saving to: ‘test.tsv.3’


2018-11-11 23:33:00 (13.5 MB/s) - ‘test.tsv.3’ saved [3

In [0]:

train = pd.read_csv('./train.tsv', sep="\t")
test = pd.read_csv('./test.tsv', sep="\t")
sub = pd.read_csv('./sampleSubmission.csv', sep=",")

In [5]:
train.head(10)

Unnamed: 0,PhraseId,SentenceId,Phrase,Sentiment
0,1,1,"A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story .",1
1,2,1,A series of escapades demonstrating the adage that what is good for the goose,2
2,3,1,A series,2
3,4,1,A,2
4,5,1,series,2
5,6,1,of escapades demonstrating the adage that what is good for the goose,2
6,7,1,of,2
7,8,1,escapades demonstrating the adage that what is good for the goose,2
8,9,1,escapades,2
9,10,1,demonstrating the adage that what is good for the goose,2


In [6]:
train.loc[train.SentenceId == 2]

Unnamed: 0,PhraseId,SentenceId,Phrase,Sentiment
63,64,2,"This quiet , introspective and entertaining independent is worth seeking .",4
64,65,2,"This quiet , introspective and entertaining independent",3
65,66,2,This,2
66,67,2,"quiet , introspective and entertaining independent",4
67,68,2,"quiet , introspective and entertaining",3
68,69,2,quiet,2
69,70,2,", introspective and entertaining",3
70,71,2,introspective and entertaining,3
71,72,2,introspective and,3
72,73,2,introspective,2


In [7]:
print('Average count of phrases per sentence in train is {0:.0f}.'.format(train.groupby('SentenceId')['Phrase'].count().mean()))
print('Average count of phrases per sentence in test is {0:.0f}.'.format(test.groupby('SentenceId')['Phrase'].count().mean()))

Average count of phrases per sentence in train is 18.
Average count of phrases per sentence in test is 20.


In [8]:
print('Number of phrases in train: {}. Number of sentences in train: {}.'.format(train.shape[0], len(train.SentenceId.unique())))
print('Number of phrases in test: {}. Number of sentences in test: {}.'.format(test.shape[0], len(test.SentenceId.unique())))

Number of phrases in train: 156060. Number of sentences in train: 8529.
Number of phrases in test: 66292. Number of sentences in test: 3310.


In [9]:
print('Average word length of phrases in train is {0:.0f}.'.format(np.mean(train['Phrase'].apply(lambda x: len(x.split())))))
print('Average word length of phrases in test is {0:.0f}.'.format(np.mean(test['Phrase'].apply(lambda x: len(x.split())))))

Average word length of phrases in train is 7.
Average word length of phrases in test is 7.


We can see than sentences were split in 18-20 phrases at average and a lot of phrases contain each other. Sometimes one word or even one punctuation mark influences the sentiment

Let's see for example most common trigrams for positive phrases

In [0]:
text = ' '.join(train.loc[train.Sentiment == 4, 'Phrase'].values)
text_trigrams = [i for i in ngrams(text.split(), 3)]

In [11]:
Counter(text_trigrams).most_common(30)

[(('one', 'of', 'the'), 199),
 (('of', 'the', 'year'), 103),
 (('.', 'is', 'a'), 87),
 (('of', 'the', 'best'), 80),
 (('of', 'the', 'most'), 70),
 (('is', 'one', 'of'), 50),
 (('One', 'of', 'the'), 43),
 ((',', 'and', 'the'), 40),
 (('the', 'year', "'s"), 38),
 (('It', "'s", 'a'), 38),
 (('it', "'s", 'a'), 37),
 (('.', "'s", 'a'), 37),
 (('a', 'movie', 'that'), 35),
 (('the', 'edge', 'of'), 34),
 (('the', 'kind', 'of'), 33),
 (('of', 'your', 'seat'), 33),
 (('the', 'film', 'is'), 31),
 ((',', 'this', 'is'), 31),
 (('the', 'film', "'s"), 31),
 ((',', 'the', 'film'), 30),
 (('film', 'that', 'is'), 30),
 (('as', 'one', 'of'), 30),
 (('edge', 'of', 'your'), 29),
 ((',', 'it', "'s"), 27),
 (('a', 'film', 'that'), 27),
 (('as', 'well', 'as'), 27),
 ((',', 'funny', ','), 25),
 ((',', 'but', 'it'), 23),
 (('films', 'of', 'the'), 23),
 (('some', 'of', 'the'), 23)]

In [12]:
text = ' '.join(train.loc[train.Sentiment == 4, 'Phrase'].values)
text = [i for i in text.split() if i not in stopwords.words('english')]
text_trigrams = [i for i in ngrams(text, 3)]
Counter(text_trigrams).most_common(30)

[((',', 'funny', ','), 33),
 (('one', 'year', "'s"), 28),
 (('year', "'s", 'best'), 26),
 (('movies', 'ever', 'made'), 19),
 ((',', 'solid', 'cast'), 19),
 (('solid', 'cast', ','), 18),
 (("'ve", 'ever', 'seen'), 16),
 (('.', 'It', "'s"), 16),
 ((',', 'making', 'one'), 15),
 (('best', 'films', 'year'), 15),
 ((',', 'touching', ','), 15),
 (('exquisite', 'acting', ','), 15),
 (('acting', ',', 'inventive'), 14),
 ((',', 'inventive', 'screenplay'), 14),
 (('jaw-dropping', 'action', 'sequences'), 14),
 (('good', 'acting', ','), 14),
 (("'s", 'best', 'films'), 14),
 (('I', "'ve", 'seen'), 14),
 (('funny', ',', 'even'), 14),
 (('best', 'war', 'movies'), 13),
 (('purely', 'enjoyable', 'satisfying'), 13),
 (('funny', ',', 'touching'), 13),
 ((',', 'smart', ','), 13),
 (('inventive', 'screenplay', ','), 13),
 (('funniest', 'jokes', 'movie'), 13),
 (('action', 'sequences', ','), 13),
 (('sequences', ',', 'striking'), 13),
 ((',', 'striking', 'villains'), 13),
 (('exquisite', 'motion', 'picture')

The results show the main problem with this dataset: there are to many common words due to sentenced splitted in phrases. As a result stopwords shouldn't be removed from text.

### Thoughts on feature processing and engineering

So, we have only phrases as data. And a phrase can contain a single word. And one punctuation mark can cause phrase to receive a different sentiment. Also assigned sentiments can be strange. This means several things:
- using stopwords can be a bad idea, especially when phrases contain one single stopword;
- puntuation could be important, so it should be used;
- ngrams are necessary to get the most info from data;
- using features like word count or sentence length won't be useful;

In [0]:
tokenizer = TweetTokenizer()

In [0]:
vectorizer = TfidfVectorizer(ngram_range=(1, 2), tokenizer=tokenizer.tokenize)
full_text = list(train['Phrase'].values) + list(test['Phrase'].values)
vectorizer.fit(full_text)
train_vectorized = vectorizer.transform(train['Phrase'])
test_vectorized = vectorizer.transform(test['Phrase'])

In [0]:
y = train['Sentiment']

In [0]:
logreg = LogisticRegression()
ovr = OneVsRestClassifier(logreg)

In [17]:
%%time
ovr.fit(train_vectorized, y)

CPU times: user 14 s, sys: 55.6 ms, total: 14.1 s
Wall time: 14.1 s


OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False),
          n_jobs=1)

In [18]:
scores = cross_val_score(ovr, train_vectorized, y, scoring='accuracy', n_jobs=-1, cv=3)
print('Cross-validation mean accuracy {0:.2f}%, std {1:.2f}.'.format(np.mean(scores) * 100, np.std(scores) * 100))

Cross-validation mean accuracy 56.55%, std 0.07.


In [19]:
%%time
svc = LinearSVC(dual=False)
scores = cross_val_score(svc, train_vectorized, y, scoring='accuracy', n_jobs=-1, cv=3)
print('Cross-validation mean accuracy {0:.2f}%, std {1:.2f}.'.format(np.mean(scores) * 100, np.std(scores) * 100))

Cross-validation mean accuracy 56.51%, std 0.68.
CPU times: user 351 ms, sys: 108 ms, total: 459 ms
Wall time: 37.7 s


In [0]:
ovr.fit(train_vectorized, y);
svc.fit(train_vectorized, y);

## Deep learning
And now let's try DL. DL should work better for text classification with multiple layers. I use an architecture similar to those which were used in toxic competition.

In [21]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation, Conv1D, GRU, CuDNNGRU, CuDNNLSTM, BatchNormalization
from keras.layers import Bidirectional, GlobalMaxPool1D, MaxPooling1D, Add, Flatten
from keras.layers import GlobalAveragePooling1D, GlobalMaxPooling1D, concatenate, SpatialDropout1D
from keras.models import Model, load_model
from keras import initializers, regularizers, constraints, optimizers, layers, callbacks
from keras import backend as K
from keras.engine import InputSpec, Layer
from keras.optimizers import Adam

from keras.callbacks import ModelCheckpoint, TensorBoard, Callback, EarlyStopping

Using TensorFlow backend.


In [22]:
'''
!pip install pydrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
'''



from google.colab import drive

from google.colab import files

from google.colab import drive
drive.mount('/content/drive')

!ls "/content/drive/My Drive"

import nltk
nltk.download('stopwords')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
'alaska pics'			        life
 Citi				        Photos
'Colab Notebooks'		       'SICO RESEARCH DATABASE.gsheet'
 crawl-300d-2M.vec		        tensor_tutorial.ipynb
 Cryptos			        test2.mov
'Deep Learning Project Proposal.gdoc'   unknown
'GSPGC (1).ipynb'		        Untitled0.ipynb
 HW1-zc2243-lc2928.zip		       'Untitled document.gdoc'
 imdb.npz			       'Ziyun Chen 0909-1.pdf'
'imdb sentiment lstm.ipynb'	        视频.MOV
'Jinyang.Li  0909.pdf'
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [0]:
#!ls "/content/drive/My Drive"

In [0]:
tk = Tokenizer(lower = True, filters='')
tk.fit_on_texts(full_text)

In [0]:
train_tokenized = tk.texts_to_sequences(train['Phrase'])
test_tokenized = tk.texts_to_sequences(test['Phrase'])

In [26]:
train_tokenized[1:10]
train[1:10]

Unnamed: 0,PhraseId,SentenceId,Phrase,Sentiment
1,2,1,A series of escapades demonstrating the adage that what is good for the goose,2
2,3,1,A series,2
3,4,1,A,2
4,5,1,series,2
5,6,1,of escapades demonstrating the adage that what is good for the goose,2
6,7,1,of,2
7,8,1,escapades demonstrating the adage that what is good for the goose,2
8,9,1,escapades,2
9,10,1,demonstrating the adage that what is good for the goose,2


In [27]:
max_len = 50
X_train = pad_sequences(train_tokenized, maxlen = max_len)
X_test = pad_sequences(test_tokenized, maxlen = max_len)

X_train[1:10]

array([[    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            3,   308,     4, 18031,  7598,     1,  8322,    11,    55,
           10,    51,    15,     1,  4660],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     3,   308],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,    

In [0]:
embedding_path = "/content/drive/My Drive/crawl-300d-2M.vec"

In [0]:
embed_size = 300
max_features = 30000

In [0]:
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embedding_index = dict(get_coefs(*o.strip().split(" ")) for o in open(embedding_path))

word_index = tk.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.zeros((nb_words + 1, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embedding_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector
      

In [31]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse=False)
y_ohe = ohe.fit_transform(y.values.reshape(-1, 1))
y_ohe[1:10]

embedding_matrix[1:10]


array([[ 0.0231    ,  0.017     ,  0.0157    , ...,  0.0744    ,
        -0.1118    ,  0.0963    ],
       [-0.0282    , -0.0557    , -0.0451    , ..., -0.037     ,
        -0.0725    , -0.0042    ],
       [ 0.0064    ,  0.0333    ,  0.0225    , ..., -0.0825    ,
         0.0519    , -0.0796    ],
       ...,
       [-0.0849    , -0.0582    , -0.0321    , ...,  0.0032    ,
        -0.0237    , -0.0366    ],
       [-0.0347    , -0.028     ,  0.0704    , ..., -0.0445    ,
         0.037     , -0.1166    ],
       [-0.0037    ,  0.15099999,  0.0568    , ..., -0.0618    ,
         0.1585    , -0.0466    ]])

In [0]:
from keras.utils import plot_model

In [0]:
def build_model1(lr=0.0, lr_d=0.0, units=0, spatial_dr=0.0, kernel_size1=3, kernel_size2=2, dense_units=128, dr=0.1, conv_size=32):
    file_path = "best_model.hdf5"
    check_point = ModelCheckpoint(file_path, monitor = "val_loss", verbose = 1,
                                  save_best_only = True, mode = "min")
    early_stop = EarlyStopping(monitor = "val_loss", mode = "min", patience = 3)
    
    inp = Input(shape = (max_len,))
    x = Embedding(19479, embed_size, weights = [embedding_matrix], trainable = False)(inp)
    x1 = SpatialDropout1D(spatial_dr)(x)

    x_gru = Bidirectional(CuDNNGRU(units, return_sequences = True))(x1)
    x1 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool1_gru = GlobalAveragePooling1D()(x1)
    max_pool1_gru = GlobalMaxPooling1D()(x1)
    
    x3 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool3_gru = GlobalAveragePooling1D()(x3)
    max_pool3_gru = GlobalMaxPooling1D()(x3)
    
    x_lstm = Bidirectional(CuDNNLSTM(units, return_sequences = True))(x1)
    x1 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool1_lstm = GlobalAveragePooling1D()(x1)
    max_pool1_lstm = GlobalMaxPooling1D()(x1)
    
    x3 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool3_lstm = GlobalAveragePooling1D()(x3)
    max_pool3_lstm = GlobalMaxPooling1D()(x3)
    
    
    x = concatenate([avg_pool1_gru, max_pool1_gru, avg_pool3_gru, max_pool3_gru,
                    avg_pool1_lstm, max_pool1_lstm, avg_pool3_lstm, max_pool3_lstm])
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(dense_units, activation='relu') (x))
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(int(dense_units / 2), activation='relu') (x))
    x = Dense(5, activation = "sigmoid")(x)
    model = Model(inputs = inp, outputs = x)
    model.compile(loss = "binary_crossentropy", optimizer = Adam(lr = lr, decay = lr_d), metrics = ["accuracy"])
    plot_model(model, to_file='model_summary.png',show_shapes=True, show_layer_names=True)
    history = model.fit(X_train, y_ohe, batch_size = 128, epochs = 20, validation_split=0.1, 
                        verbose = 1, callbacks = [check_point, early_stop])
    model = load_model(file_path)
    return model

* The sigmoid function takes a real-valued number and squashes it to a value in the range between 0 and 1. The function has been in frequent use historically due to its nice interpretation as the firing rate of a neuron: 0 for not firing or 1 for firing. But the non-linearity of the sigmoid has recently fallen out of favour because its activations can easily saturate at either tail of 0 or 1, where gradients are almost zero and the information flow would be cut. What is more is that its output is not zero- centered, which could introduce undesirable zig-zagging dynamics in the gradient updates for the connection weights in training. Thus, the tanh function is often more preferred in practice as its output range is zero-centered, [-1, 1] instead of [0, 1]. The ReLU function has also become popular lately. Its activation is simply thresholded at zero when the input is less than 0. Compared with the sigmoid function and the tanh function, ReLU is easy to compute, fast to converge in training and yields equal or better performance in neural networks.5

Generally, softmax is used in the final layer of neural networks for final classification in feedforward neural networks.

An attempt at ensemble:

In [34]:
!pip install pydot
!pip install graphviz
!pip install pydotplus



In [35]:
model1 = build_model1(lr = 1e-3, lr_d = 1e-10, units = 64, spatial_dr = 0.3, kernel_size1=3, kernel_size2=2, dense_units=32, dr=0.1, conv_size=32)

OSError: ignored

In [0]:
model2 = build_model1(lr = 1e-3, lr_d = 1e-10, units = 128, spatial_dr = 0.5, kernel_size1=3, kernel_size2=2, dense_units=64, dr=0.2, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.33809, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.33809 to 0.31310, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.31310 to 0.30696, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.30696 to 0.30484, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 0.30484 to 0.30123, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 0.30123 to 0.30099, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss did not improve from 0.30099
Epoch 8/20

Epoch 00008: val_loss improved from 0.30099 to 0.29951, saving model to best_model.hdf5
Epoch 9/20

Epoch 00009: val_loss did not improve from 0.29951
Epoch 10/20

Epoch 00010: val_loss did not improve from 0.29951
Epoch 11/20

Epoch 00011: val_loss improved from 0.29951 to 0.2

In [0]:
def build_model2(lr=0.0, lr_d=0.0, units=0, spatial_dr=0.0, kernel_size1=3, kernel_size2=2, dense_units=128, dr=0.1, conv_size=32):
    file_path = "best_model.hdf5"
    check_point = ModelCheckpoint(file_path, monitor = "val_loss", verbose = 1,
                                  save_best_only = True, mode = "min")
    early_stop = EarlyStopping(monitor = "val_loss", mode = "min", patience = 3)

    inp = Input(shape = (max_len,))
    x = Embedding(19479, embed_size, weights = [embedding_matrix], trainable = False)(inp)
    x1 = SpatialDropout1D(spatial_dr)(x)

    x_gru = Bidirectional(CuDNNGRU(units, return_sequences = True))(x1)
    x_lstm = Bidirectional(CuDNNLSTM(units, return_sequences = True))(x1)
    
    x_conv1 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool1_gru = GlobalAveragePooling1D()(x_conv1)
    max_pool1_gru = GlobalMaxPooling1D()(x_conv1)
    
    x_conv2 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_gru)
    avg_pool2_gru = GlobalAveragePooling1D()(x_conv2)
    max_pool2_gru = GlobalMaxPooling1D()(x_conv2)
    
    
    x_conv3 = Conv1D(conv_size, kernel_size=kernel_size1, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool1_lstm = GlobalAveragePooling1D()(x_conv3)
    max_pool1_lstm = GlobalMaxPooling1D()(x_conv3)
    
    x_conv4 = Conv1D(conv_size, kernel_size=kernel_size2, padding='valid', kernel_initializer='he_uniform')(x_lstm)
    avg_pool2_lstm = GlobalAveragePooling1D()(x_conv4)
    max_pool2_lstm = GlobalMaxPooling1D()(x_conv4)
    
    
    x = concatenate([avg_pool1_gru, max_pool1_gru, avg_pool2_gru, max_pool2_gru,
                    avg_pool1_lstm, max_pool1_lstm, avg_pool2_lstm, max_pool2_lstm])
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(dense_units, activation='relu') (x))
    x = BatchNormalization()(x)
    x = Dropout(dr)(Dense(int(dense_units / 2), activation='relu') (x))
    x = Dense(5, activation = "sigmoid")(x)
    model = Model(inputs = inp, outputs = x)
    model.compile(loss = "binary_crossentropy", optimizer = Adam(lr = lr, decay = lr_d), metrics = ["accuracy"])
    

    history = model.fit(X_train, y_ohe, batch_size = 128, epochs = 20, validation_split=0.1, 
                        verbose = 1, callbacks = [check_point, early_stop])
    model = load_model(file_path)
    return model

In [0]:
model3 = build_model2(lr = 1e-4, lr_d = 0, units = 64, spatial_dr = 0.5, kernel_size1=4, kernel_size2=3, dense_units=32, dr=0.1, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.41616, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.41616 to 0.34972, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.34972 to 0.33182, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.33182 to 0.32406, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 0.32406 to 0.31934, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 0.31934 to 0.31742, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss improved from 0.31742 to 0.31662, saving model to best_model.hdf5
Epoch 8/20

Epoch 00008: val_loss did not improve from 0.31662
Epoch 9/20

Epoch 00009: val_loss improved from 0.31662 to 0.31277, saving model to best_model.hdf5
Epoch 10/20

Epoch 00010: val_loss improved from 0.31277 to 0.31129, saving model to best_

In [0]:
model4 = build_model2(lr = 1e-3, lr_d = 0, units = 64, spatial_dr = 0.5, kernel_size1=3, kernel_size2=3, dense_units=64, dr=0.3, conv_size=32)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.32462, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.32462 to 0.31660, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.31660 to 0.30994, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.30994 to 0.30447, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 0.30447 to 0.30217, saving model to best_model.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 0.30217 to 0.30094, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss did not improve from 0.30094
Epoch 8/20

Epoch 00008: val_loss did not improve from 0.30094
Epoch 9/20

Epoch 00009: val_loss did not improve from 0.30094


In [0]:
model5 = build_model2(lr = 1e-3, lr_d = 1e-7, units = 64, spatial_dr = 0.3, kernel_size1=3, kernel_size2=3, dense_units=64, dr=0.4, conv_size=64)

Train on 140454 samples, validate on 15606 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.32102, saving model to best_model.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.32102 to 0.31526, saving model to best_model.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.31526 to 0.30948, saving model to best_model.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.30948 to 0.30160, saving model to best_model.hdf5
Epoch 5/20

Epoch 00005: val_loss did not improve from 0.30160
Epoch 6/20

Epoch 00006: val_loss improved from 0.30160 to 0.30151, saving model to best_model.hdf5
Epoch 7/20

Epoch 00007: val_loss did not improve from 0.30151
Epoch 8/20

Epoch 00008: val_loss improved from 0.30151 to 0.30019, saving model to best_model.hdf5
Epoch 9/20

Epoch 00009: val_loss did not improve from 0.30019
Epoch 10/20

Epoch 00010: val_loss did not improve from 0.30019
Epoch 11/20

Epoch 00011: val_loss did not improve from 0.30019


In [0]:
pred1 = model1.predict(X_test, batch_size = 1024, verbose = 1)
pred = pred1
pred2 = model2.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred2
pred3 = model3.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred3
pred4 = model4.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred4
pred5 = model5.predict(X_test, batch_size = 1024, verbose = 1)
pred += pred5



In [0]:
predictions = np.round(np.argmax(pred, axis=1)).astype(int)
sub['Sentiment'] = predictions
sub.to_csv("blend.csv", index=False)