# Keras Sentiment Analysis Part 3
## Optimization with Grid Search

In this notebook, I intend to find more optimal parameters using Grid Search.

In [1]:
%matplotlib inline
import numpy
from keras.datasets import imdb
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Convolution1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=top_words)

# Pad the small reviews
max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)

Using Theano backend.


### Warning: The below code might run for months!

While searching the parameter space, I noticed that some epochs did extremely well on the training data, however, overall model accuracy changed very little from our original author's results. A quick computation of the number of parameters below shows that we have 7 * 2 * 6 * 3 * 8 = 2016 combinations of parameters. Given that the epoch=2 setting takes at least 2+ minutes and the epoch=10 setting was taking 20-90 minutes in my test run, we could be at this a very long time. Most of the tests did not return better than 88-89% accuracy. This lead me to wonder if (1) this type of CNN is the best approach and (2) if there was a more efficient way to search the parameter space (we are training advanced optimization algorithms afterall, there should be a non-brute force approach, right?)

In [None]:
def create_model(pool_length, nb_epoch, hidden_layer_size=1, init_mode='uniform'):
    
    #print("Testing new model")
    
    # create the model
    model = Sequential()
    model.add(Embedding(top_words, 32, input_length=max_words))
    model.add(Convolution1D(nb_filter=32, filter_length=3, border_mode='same', activation='relu'))
    model.add(MaxPooling1D(pool_length=pool_length))
    model.add(Flatten())
    model.add(Dense(hidden_layer_size, init=init_mode, activation='relu'))
    model.add(Dense(1, init=init_mode, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    #print(model.summary())
    
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=2)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100, 128]
epochs = [2, 10]
pool_length = [2, 5, 10, 20, 32, 500]
hidden_layer_size = [32,250, 500]
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

param_grid = dict(batch_size=batch_size, nb_epoch=epochs, pool_length=pool_length, hidden_layer_size=hidden_layer_size, init_mode=init_mode)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, verbose=10)

grid_result = grid.fit(X_train, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



## Conclusions

First, I discovered that the world record was held by a company who used an RNN, not a CNN and achieved only a 93% accuracy. I had naively hoped that I could reach 98% or higher like some other competition winning models I have seen. The reality is, even the state of the art is well-below this target.

* https://cs224d.stanford.edu/reports/TimmarajuAditya.pdf
* https://cs224d.stanford.edu/reports/PouransariHadi.pdf


Second, I discovered that when the parameter space is large, random search can be more effective.

* http://stats.stackexchange.com/questions/160479/practical-hyperparameter-optimization-random-vs-grid-search
* https://medium.com/rants-on-machine-learning/smarter-parameter-sweeps-or-why-grid-search-is-plain-stupid-c17d97a0e881#.lbesnecxm

### Lessons Learned
With every learning project, nothing is lost if I can learn something. With this project I learned to do a preliminary search of the state of the art to get the best baseline results before attempting to optimize to some specific target accuracy. I also learned that my intuition about an exhaustive grid search was well-reasoned and that there are some alternatives. While random search is not quite the advanced optimizer I had desired, it may work better than exhaustive search in this case.