# Binary classification

If the the dimesion of y is 1d, e.g. `[1, 0]`, the first sample is positive and second sample is negative. The activate function in output layer should be sigmoid. If we use softmax, the loss will become very big and hard to converge. 
> (1) show the loss, softamx with one output neuron

Becasue softmax will assing probability for each class, and the total sum of the probabilities over all classes equals to one. And the number of neuron in output layer is only 1, this will cause the output of softamx will all become 1. 
> (2) show softmax output, all equal to 1

But in this situation, the sigmoid will work successfully.
> (3) show the loss, sigmoid with one output neuron

If the dimension of y is 2d, e.g. `[[0, 1], [1, 0]]`, `[0, 1]` means the positive sample, and `[1, 0]` means the negtive sample. In this way, the output neuron is 2, so we can use the softamx with normal loss. 
> (4) show the loss, softmax with two output neuron 

In [1]:
import sys, os
sys.path.append(os.pardir)
import numpy as np
import data_helpers
from word2vec import train_word2vec

In [40]:
# preprocess 

positive_data_file = "../data/rt-polaritydata/rt-polarity.pos"
negtive_data_file = "../data/rt-polaritydata/rt-polarity.neg"

# Load data
print("Loading data...")
x_text, y = data_helpers.load_data_and_labels(positive_data_file, negtive_data_file)

# Pad sentence
print("Padding sentences...")
x_text = data_helpers.pad_sentences(x_text)
print("The sequence length is: ", len(x_text[0]))

# Build vocabulary
vocabulary, vocabulary_inv = data_helpers.build_vocab(x_text)

# Represent sentence with word index, using word index to represent a sentence
x = data_helpers.build_index_sentence(x_text, vocabulary)
y = y.argmax(axis=1) # y: [1, 1, 1, ...., 0, 0, 0]. 1 for positive, 0 for negative

# Shuffle data
np.random.seed(42)
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices]

# Split train and test
training_rate = 0.9
train_len = int(len(y) * training_rate)
x_train = x_shuffled[:train_len]
y_train = y_shuffled[:train_len]
x_test = x_shuffled[train_len:]
y_test = y_shuffled[train_len:]

# Output shape
print('x_train shape: ', x_train.shape)
print('x_test shape:', x_test.shape)
print('Vocabulary Size: {:d}'.format(len(vocabulary_inv)))


Loading data...
Padding sentences...
The sequence length is:  56
x_train shape:  (9595, 56)
x_test shape: (1067, 56)
Vocabulary Size: 18765


In [3]:
# Word2Vec parameters (see train_word2vec)
embedding_dim = 50
min_word_count = 1
context = 10

#Prepare embedding layer weights for not-static model
embedding_weights = train_word2vec(np.vstack((x_train, x_test)), vocabulary_inv, num_features=embedding_dim,
                                   min_word_count=min_word_count, context=context)

print(embedding_weights[565]) # 565 is the index word rock

Load existing Word2Vec model '50feature_1minwords_10context'
[ 0.116417   -0.08654545 -0.2803645   0.3124824  -0.09874617 -0.01292455
  0.5993319   0.37595585 -0.19778901  0.3446241  -0.267074   -0.46938565
  0.06542085  0.13901834 -0.23808217 -0.26174366 -0.24359785 -0.2912121
 -0.31934163 -0.5253044   0.14535798  0.13424076 -0.34564495  0.11147276
 -0.39667356  0.153825    0.00460984 -0.05314974  0.03791095 -0.26470512
  0.5289599   0.64035916  0.23924957 -0.11943628  0.16350754 -0.3181289
  0.60478944  0.12693582 -0.014964    0.16417578  0.07708839  0.14353544
  0.17967053  0.4206976   0.18995559 -0.51884794  0.17004833 -0.14022829
  0.27810973  0.10528004]


In [4]:
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten, Input, MaxPooling1D, GlobalMaxPooling1D, Conv1D, Embedding
from keras.layers.merge import Concatenate
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras import regularizers
from keras.callbacks import EarlyStopping

np.random.seed(0)

Using TensorFlow backend.


As for how to read set the pre-train weight for embedding layer, please see here: https://github.com/keras-team/keras/issues/853

In [17]:
# write in one cell 
# version 1: big loss while using softmax only have 1 output neuron 
#=======================Build model=========================
# Model Hyperparameters
embedding_dim = 50
filter_sizes = (3, 8)
num_filters = 10
dropout_prob = (0.5, 0.8)
hidden_dims = 50

# Training parameters
batch_size = 64
num_epochs = 10

# Prepossessing parameters
sequence_length = 400
max_words = 5000

# Word2Vec parameters (see train_word2vec)
min_word_count = 1
context = 10


# Input 
sequence_length = x_test.shape[1] # 56
input_shape = (sequence_length,)
input_layer = Input(shape=input_shape, name='input_layer') # (?, 56)


# Embedding 
embedded = Embedding(input_dim=len(vocabulary_inv), 
                            output_dim=embedding_dim,
                            input_length=sequence_length, 
                            name='embedding_layer')(input_layer)

# CNN, iterate filter_size
conv_blocks = []
for fz in filter_sizes:
    conv = Conv1D(filters=num_filters,
                  kernel_size=fz, # 3 means 3 words
                  padding='valid', # valid means no padding
                  strides=1, # see explnation above
                  activation='relu',
                  use_bias=True)(embedded) 
    conv = GlobalMaxPooling1D()(conv) # 1-Max pooling 
    conv_blocks.append(conv)

concat1max = Concatenate()(conv_blocks) # (?, 20)
concat1max = Dropout(dropout_prob[1])(concat1max)
output_layer = Dense(hidden_dims, activation='relu', 
                  kernel_regularizer=regularizers.l2(0.01),
                  bias_regularizer=regularizers.l1(0.01))(concat1max)
output_layer = Dense(1, activation='softmax', name='softmax_output')(output_layer)

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Initialize weights with word2vec
weights = np.array([v for v in embedding_weights.values()])
print("Initializing embedding layer with word2vec weights, shape", weights.shape)
embedding_layer = model.get_layer("embedding_layer")
embedding_layer.set_weights([weights])

# Train the model
model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs,
          validation_data=(x_test, y_test), verbose=2)

# (1) big loss 

Initializing embedding layer with word2vec weights, shape (18765, 50)
Train on 9595 samples, validate on 1067 samples
Epoch 1/10
 - 3s - loss: 8.0930 - acc: 0.5010 - val_loss: 8.1630 - val_acc: 0.4911
Epoch 2/10
 - 3s - loss: 7.9773 - acc: 0.5010 - val_loss: 8.1195 - val_acc: 0.4911
Epoch 3/10
 - 2s - loss: 7.9579 - acc: 0.5010 - val_loss: 8.1137 - val_acc: 0.4911
Epoch 4/10
 - 2s - loss: 7.9556 - acc: 0.5010 - val_loss: 8.1132 - val_acc: 0.4911
Epoch 5/10
 - 3s - loss: 7.9554 - acc: 0.5010 - val_loss: 8.1131 - val_acc: 0.4911
Epoch 6/10
 - 3s - loss: 7.9554 - acc: 0.5010 - val_loss: 8.1131 - val_acc: 0.4911
Epoch 7/10
 - 3s - loss: 7.9554 - acc: 0.5010 - val_loss: 8.1131 - val_acc: 0.4911
Epoch 8/10
 - 3s - loss: 7.9554 - acc: 0.5010 - val_loss: 8.1131 - val_acc: 0.4911
Epoch 9/10
 - 3s - loss: 7.9554 - acc: 0.5010 - val_loss: 8.1131 - val_acc: 0.4911
Epoch 10/10
 - 3s - loss: 7.9554 - acc: 0.5010 - val_loss: 8.1131 - val_acc: 0.4911


<keras.callbacks.History at 0x1a34125320>

In [18]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_layer (InputLayer)        (None, 56)           0                                            
__________________________________________________________________________________________________
embedding_layer (Embedding)     (None, 56, 50)       938250      input_layer[0][0]                
__________________________________________________________________________________________________
conv1d_19 (Conv1D)              (None, 54, 10)       1510        embedding_layer[0][0]            
__________________________________________________________________________________________________
conv1d_20 (Conv1D)              (None, 49, 10)       4010        embedding_layer[0][0]            
__________________________________________________________________________________________________
global_max

In [25]:
K.learning_phase()

<tf.Tensor 'dropout_1/keras_learning_phase:0' shape=() dtype=bool>

In [30]:
from keras import backend as K

get_softmax_layer_output = K.function([model.layers[0].input, K.learning_phase()],
                                  [model.layers[-1].output])

# output in test mode = 0
# layer_output = get_3rd_layer_output([X, 0])[0]

# output in train mode = 1
layer_output = get_softmax_layer_output([x_train, 1])

In [31]:
layer_output # (2) the output of softmax when have only 1 output neuron 

[array([[1.],
        [1.],
        [1.],
        ...,
        [1.],
        [1.],
        [1.]], dtype=float32)]

In [41]:
# write in one cell 
# version 2: sigmoid with one output neuron

#=======================Build model=========================
# Model Hyperparameters
embedding_dim = 50
filter_sizes = (3, 8)
num_filters = 10
dropout_prob = (0.5, 0.8)
hidden_dims = 50

# Training parameters
batch_size = 64
num_epochs = 10

# Prepossessing parameters
sequence_length = 400
max_words = 5000

# Word2Vec parameters (see train_word2vec)
min_word_count = 1
context = 10


# Input 
sequence_length = x_test.shape[1] # 56
input_shape = (sequence_length,)
input_layer = Input(shape=input_shape, name='input_layer') # (?, 56)


# Embedding 
embedded = Embedding(input_dim=len(vocabulary_inv), 
                            output_dim=embedding_dim,
                            input_length=sequence_length, 
                            name='embedding_layer')(input_layer)

# CNN, iterate filter_size
conv_blocks = []
for fz in filter_sizes:
    conv = Conv1D(filters=num_filters,
                  kernel_size=fz, # 3 means 3 words
                  padding='valid', # valid means no padding
                  strides=1, # see explnation above
                  activation='relu',
                  use_bias=True)(embedded) 
    conv = GlobalMaxPooling1D()(conv) # 1-Max pooling 
    conv_blocks.append(conv)

concat1max = Concatenate()(conv_blocks) # (?, 20)
concat1max = Dropout(dropout_prob[1])(concat1max)
output_layer = Dense(hidden_dims, activation='relu', 
                  kernel_regularizer=regularizers.l2(0.01),
                  bias_regularizer=regularizers.l1(0.01))(concat1max)
output_layer = Dense(1, activation='sigmoid')(output_layer)

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Initialize weights with word2vec
weights = np.array([v for v in embedding_weights.values()])
print("Initializing embedding layer with word2vec weights, shape", weights.shape)
embedding_layer = model.get_layer("embedding_layer")
embedding_layer.set_weights([weights])

# Train the model
model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs,
          validation_data=(x_test, y_test), verbose=2)

# (3) show the loss, sigmoid with one output neuron

Initializing embedding layer with word2vec weights, shape (18765, 50)
Train on 9595 samples, validate on 1067 samples
Epoch 1/10
 - 4s - loss: 0.9497 - acc: 0.5077 - val_loss: 0.8551 - val_acc: 0.5286
Epoch 2/10
 - 3s - loss: 0.8154 - acc: 0.5032 - val_loss: 0.7804 - val_acc: 0.5164
Epoch 3/10
 - 3s - loss: 0.7598 - acc: 0.5213 - val_loss: 0.7424 - val_acc: 0.5098
Epoch 4/10
 - 3s - loss: 0.7294 - acc: 0.5300 - val_loss: 0.7215 - val_acc: 0.5370
Epoch 5/10
 - 3s - loss: 0.7137 - acc: 0.5271 - val_loss: 0.7076 - val_acc: 0.5633
Epoch 6/10
 - 3s - loss: 0.7023 - acc: 0.5406 - val_loss: 0.7004 - val_acc: 0.5698
Epoch 7/10
 - 3s - loss: 0.6949 - acc: 0.5425 - val_loss: 0.6943 - val_acc: 0.5661
Epoch 8/10
 - 3s - loss: 0.6856 - acc: 0.5545 - val_loss: 0.6888 - val_acc: 0.5933
Epoch 9/10
 - 3s - loss: 0.6708 - acc: 0.5752 - val_loss: 0.6787 - val_acc: 0.6073
Epoch 10/10
 - 3s - loss: 0.6533 - acc: 0.5936 - val_loss: 0.6670 - val_acc: 0.6504


<keras.callbacks.History at 0x1a40a376a0>

### Output neuron is 2 for the softmax 

In [35]:
# Load data
print("Loading data...")
x_text, y = data_helpers.load_data_and_labels(positive_data_file, negtive_data_file)

# Pad sentence
print("Padding sentences...")
x_text = data_helpers.pad_sentences(x_text)
print("The sequence length is: ", len(x_text[0]))

# Build vocabulary
vocabulary, vocabulary_inv = data_helpers.build_vocab(x_text)

# Represent sentence with word index, using word index to represent a sentence
x = data_helpers.build_index_sentence(x_text, vocabulary)
# y = y.argmax(axis=1) # comment this to make the dimension of y is 2

# Shuffle data
np.random.seed(42)
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices]

# Split train and test
training_rate = 0.9
train_len = int(len(y) * training_rate)
x_train = x_shuffled[:train_len]
y_train = y_shuffled[:train_len]
x_test = x_shuffled[train_len:]
y_test = y_shuffled[train_len:]

Loading data...
Padding sentences...
The sequence length is:  56


In [36]:
y

array([[0, 1],
       [0, 1],
       [0, 1],
       ...,
       [1, 0],
       [1, 0],
       [1, 0]])

In [38]:
# write in one cell 
# version 3: Output neuron is 2 for the softmax
#=======================Build model=========================
# Model Hyperparameters
embedding_dim = 50
filter_sizes = (3, 8)
num_filters = 10
dropout_prob = (0.5, 0.8)
hidden_dims = 50

# Training parameters
batch_size = 64
num_epochs = 10

# Prepossessing parameters
sequence_length = 400
max_words = 5000

# Word2Vec parameters (see train_word2vec)
min_word_count = 1
context = 10


# Input 
sequence_length = x_test.shape[1] # 56
input_shape = (sequence_length,)
input_layer = Input(shape=input_shape, name='input_layer') # (?, 56)


# Embedding 
embedded = Embedding(input_dim=len(vocabulary_inv), 
                            output_dim=embedding_dim,
                            input_length=sequence_length, 
                            name='embedding_layer')(input_layer)

# CNN, iterate filter_size
conv_blocks = []
for fz in filter_sizes:
    conv = Conv1D(filters=num_filters,
                  kernel_size=fz, # 3 means 3 words
                  padding='valid', # valid means no padding
                  strides=1, # see explnation above
                  activation='relu',
                  use_bias=True)(embedded) 
    conv = GlobalMaxPooling1D()(conv) # 1-Max pooling 
    conv_blocks.append(conv)

concat1max = Concatenate()(conv_blocks) # (?, 20)
concat1max = Dropout(dropout_prob[1])(concat1max)
output_layer = Dense(hidden_dims, activation='relu', 
                  kernel_regularizer=regularizers.l2(0.01),
                  bias_regularizer=regularizers.l1(0.01))(concat1max)
output_layer = Dense(2, activation='softmax')(output_layer) # change 1 to 2 as the output neuron

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Initialize weights with word2vec
weights = np.array([v for v in embedding_weights.values()])
print("Initializing embedding layer with word2vec weights, shape", weights.shape)
embedding_layer = model.get_layer("embedding_layer")
embedding_layer.set_weights([weights])

# Train the model
model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs,
          validation_data=(x_test, y_test), verbose=2)

#  (4) show the loss, softmax with two output neuron 

Initializing embedding layer with word2vec weights, shape (18765, 50)
Train on 9595 samples, validate on 1067 samples
Epoch 1/10
 - 4s - loss: 0.9405 - acc: 0.5098 - val_loss: 0.8364 - val_acc: 0.5201
Epoch 2/10
 - 3s - loss: 0.8002 - acc: 0.5028 - val_loss: 0.7699 - val_acc: 0.4948
Epoch 3/10
 - 3s - loss: 0.7509 - acc: 0.5084 - val_loss: 0.7351 - val_acc: 0.5417
Epoch 4/10
 - 3s - loss: 0.7270 - acc: 0.5224 - val_loss: 0.7172 - val_acc: 0.5426
Epoch 5/10
 - 3s - loss: 0.7133 - acc: 0.5255 - val_loss: 0.7091 - val_acc: 0.5455
Epoch 6/10
 - 3s - loss: 0.7035 - acc: 0.5240 - val_loss: 0.7024 - val_acc: 0.5539
Epoch 7/10
 - 3s - loss: 0.6953 - acc: 0.5411 - val_loss: 0.6960 - val_acc: 0.5698
Epoch 8/10
 - 3s - loss: 0.6838 - acc: 0.5518 - val_loss: 0.6899 - val_acc: 0.5689
Epoch 9/10
 - 3s - loss: 0.6698 - acc: 0.5704 - val_loss: 0.6821 - val_acc: 0.5792
Epoch 10/10
 - 3s - loss: 0.6431 - acc: 0.5934 - val_loss: 0.6588 - val_acc: 0.6485


<keras.callbacks.History at 0x1a2027ce10>