# Word Embeddings
- Using embedding layer in Keras
- Another way of learning word embeddings is via pre-training word vectors in another network (e.g., word2vec, GloVe, fasttext, etc.)
    - If you are interested in pretrained word embeddings, refer to: https://github.com/buomsoo-kim/Word-embedding-with-Python
    
<br>
<img src="https://adriancolyer.files.wordpress.com/2016/04/word2vec-distributed-representation.png?w=600" style="width: 600px"/>

## Word vectors
- Long story short, word embedding is process of converting each word to a fixed dimensional "(word) vector"
- Dimensionality of embedding space (i.e., vector space) is a hyperparameter; one can set dimensionality as any positive integer

In [2]:
from keras.models import Sequential
from keras.layers import *
from keras.datasets import reuters
from keras.preprocessing import sequence
from keras.utils import to_categorical

### Embedding layer
- As a result, embedding layer bears 3-D tensors
    - Output shape = **(batch_size, input_length, output_dim)**
    - input_dim: dimensionality of input space (number of unique tokens of interest)
    - output_dim: dimensionality of embedding space
    - input_length: length of input sequence (if None, can vary

In [17]:
# when input length is constant
model = Sequential()
model.add(Embedding(input_dim = 10, output_dim = 5, input_length = 3))

In [18]:
model.output_shape

(None, 3, 5)

In [19]:
# when input length varies
model = Sequential()
model.add(Embedding(input_dim = 10, output_dim = 5, input_length = None))    

In [20]:
model.output_shape

(None, None, 5)

### Using embedding layer in network
- Usually, embedding layer are used as first layer in network to model text format data

In [3]:
# parameters to import dataset
num_words = 3000
maxlen = 50

In [4]:
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words = num_words, maxlen = maxlen)

In [5]:
X_train = sequence.pad_sequences(X_train, maxlen = maxlen, padding = 'post')
X_test = sequence.pad_sequences(X_test, maxlen = maxlen, padding = 'post')

In [6]:
y_train = to_categorical(y_train, num_classes = 46)
y_test = to_categorical(y_test, num_classes = 46)

In [7]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(1595, 50)
(399, 50)
(1595, 46)
(399, 46)


In [26]:
input_dim = num_words
output_dim = 100     # we set dimensionality of embedding space as 100
input_length = maxlen

In [27]:
def reuters_model():
    model = Sequential()
    model.add(Embedding(input_dim = input_dim, output_dim = output_dim, input_length = input_length))
    model.add(CuDNNGRU(50, return_sequences = False))
    model.add(Dense(100))
    model.add(Activation('relu'))
    model.add(Dense(46, activation = 'softmax'))
    
    model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
    return model

In [28]:
model = reuters_model()

In [29]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 50, 100)           300000    
_________________________________________________________________
cu_dnngru_2 (CuDNNGRU)       (None, 50)                22800     
_________________________________________________________________
dense_3 (Dense)              (None, 100)               5100      
_________________________________________________________________
activation_2 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 46)                4646      
Total params: 332,546
Trainable params: 332,546
Non-trainable params: 0
_________________________________________________________________


In [31]:
model.fit(X_train, y_train, epochs = 100, batch_size = 100, verbose = 0)

<keras.callbacks.History at 0x14ee99cecc0>

In [32]:
result = model.evaluate(X_test, y_test)



In [33]:
print('Test Accuracy: ', result[1])

Test Accuracy:  0.854636592076
