**EE6363: Advanced Topic - Deep Learning, Spring 2019  |  David Hardage**
---
## Question I
### **RNN Equations**
$x_{t}$ is the input for the present "time" in a recurrent sequence.

$a_{t}$ is the hidden output for that time step in the RNN cell.

$a_{t}$ = f($W_{aa}$$a_{t-1}$+$W_{xa}$$x_{t}$+$b_{a}$)

### **LSTM Overview**
![](https://cdn-images-1.medium.com/max/1800/1*O73nlRM3-bWubvt6W-1YSg.png)
LSTMs pass info forward with the aid of their memory gates. These gates are wrapped with a sigmoid so their information is between 0 and 1.
#### Forget Gate
$f_{t}$ = $\sigma$($W_{f}$$x_{t}$+$U_{f}$$h_{t-1}$)

#### Input Gate
$i_{t}$ = $\sigma$($W_{i}$$x_{t}$+$U_{i}$$h_{t-1}$)

#### Output Gate
$o_{t}$ = $\sigma$($W_{o}$$x_{t}$+$U_{o}$$h_{t-1}$)

#### Other Functions
$C_{t}$ is representative of the new cell state after the input and forget gates.
The output gate feeds into the $h_{t}$ =  $o_{t}$*tanh($C_{t})


### **RNN's Vanishing Gradient Problem**
As the time step in the RNN moves forward, $a_{t}$  becomes $a_{t-1}$ then $a_{t-2}$ then $a_{t-3}$ and so on. The information provided by the first time step is significantly smaller than the last.  The longer the recurrent layers in an RNN the model is more likely to encounter a vanishing gradient because during backpropagation the weights are passed back in time through the hidden states and multiplied by smaller and smaller numbers. Thus, the gradient vanishes. 

However, this is solved by LSTMs because these cells pass information forward via their built in memory gates, so during back propagation multiplication is not performed over smaller and smaller values.  

## Question II
![alt text](https://cdn-images-1.medium.com/max/800/0*0ETid8yQzpp-Wiky.png)
### One to One
Here a single input is classified into a single output. An example of use would be image classification with the task of binary output object detection. "Is a seagull in this photo?"

### One to Many
Here a single input is classified into many categories. Using the image example above, this architecture could detect multiple classes. "Is there a seagull, seal, and child in this photo?"

### Many to One
Here multiple inputs are classified into one output. One could use this architecture to classify sentences into positive or negative sentiment.

### Many to Many
Multiple inputs generate multiple outputs. This architecture is used in forecasting and language generation task.

## Experiment Write Up:
We are presented with the task of classifying "tweets" as either positive or negative based on the sequence of characters contained in each tweet. A Moderate amount of cleaning was performed on the data to remove htlm and other unnecessary characters. After this processing, the remaining vocabulary size was 6424. Upon performing some exploratory data analysis on the tweets it was discovered 75% of the sequences contained 17 tokens or less, so a token length of 20 was set and the input sequneces padded with 0 to equal to same length. 

Overall the LSTM performed best, and surprisingly the CNN outperformed RNN. However, I believe the poor performance of the RNN may have been due to padding and set sequence length. Each of the three model architectures below contain as similar of an architecture as possible to enable a fair comparison of performance. Additionally, a Word2Vec embedding was built from the corpus to use as the embedding layer for each model. 
Comments are included within the model outputs for context. 

### Word2Vec
The word2vec embeddings were built using Gensim. The embedings were built using skip gram and a window of 3 target words in either direction of the known word. Each word outputs 100 dimensions

```
skpgrm_model = Word2Vec(splitSentences, size=100, window=3, min_count=10, workers=4, sg=1, seed=42)
skpgrm_model.save("skpgrm_word2vec.model")
skpgrm_model.wv.save_word2vec_format('skpgrm_word2vec.embed')
skpgrm_model = gensim.models.KeyedVectors.load_word2vec_format('skpgrm_word2vec.embed')


skpgrm_model.get_vector('homework')
array([ 0.0133209 , -0.22285783,  0.15449187, -0.26800337,  0.34289762,
        0.130753  , -0.20269933,  0.05814106,  0.00448072,  0.10169426,
        0.2474094 ,  0.09451383,  0.38079834,  0.18994571, -0.4232437 ,
        0.11030207, -0.00634213,  0.04797819, -0.3738323 ,  0.16714379,
        0.29844195, -0.24661233, -0.10517423,  0.10951623, -0.23241755,
        0.2365318 ,  0.47039396, -0.03204299, -0.00554783, -0.1973574 ,
       -0.29617462,  0.29662815, -0.07635267, -0.20394094,  0.27186936,
       -0.03534871, -0.21222508,  0.07720903,  0.16229631, -0.18257399,
       -0.30284393, -0.05643161, -0.22181787,  0.32325214,  0.0381747 ,
       -0.04318884,  0.03069128,  0.11405341, -0.09964969, -0.11250107,
       -0.04921242,  0.1064731 , -0.16931188,  0.15360741, -0.31548744,
       -0.30717424, -0.13542418,  0.08737126,  0.1196891 , -0.37292108,
        0.2103726 , -0.37710935,  0.2584681 ,  0.12948185, -0.2982405 ,
        0.13304855, -0.23491664, -0.14256747,  0.14583384, -0.19630934,
       -0.16738173, -0.28595325,  0.3356258 ,  0.0494276 , -0.11742572,
        0.18208739,  0.18081163,  0.10677474, -0.02403796, -0.31364644,
        0.28436786,  0.1565238 , -0.34144282, -0.14650947,  0.09555672,
       -0.07998578, -0.01845908, -0.02704127, -0.00394414,  0.1833107 ,
       -0.03392621,  0.37270004,  0.06712456, -0.06364433,  0.42783794,
        0.19383486,  0.0445508 , -0.2795804 , -0.01937154, -0.06813719],
      dtype=float32)
```

The weight matrix from this embedding was passed into each NN's embeding layer, and this layer was immediately followed with the either a convolutional or reccurent layer. 

### CNN
Building upon our last assigment, we start this analysis by bulding a CNN for twitter sentiment classification.

#### Architecture
```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_8 (Embedding)      (None, 20, 100)           642400    
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 16, 3)             1503      
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 3)              0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 9)                 0         
_________________________________________________________________
dense_14 (Dense)             (None, 6424)              64240     
_________________________________________________________________
activation_14 (Activation)   (None, 6424)              0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 6424)              0         
_________________________________________________________________
dense_15 (Dense)             (None, 1)                 6425      
_________________________________________________________________
activation_15 (Activation)   (None, 1)                 0         
=================================================================
Total params: 714,568
Trainable params: 72,168
Non-trainable params: 642,400
_________________________________________________________________
```

#### Training Epochs 
```
Train on 72077 samples, validate on 8009 samples
Epoch 1/10
72077/72077 [==============================] - 2s 29us/step - loss: 0.5961 - acc: 0.6666 - val_loss: 0.5367 - val_acc: 0.7189
Epoch 2/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5457 - acc: 0.7179 - val_loss: 0.5344 - val_acc: 0.7254
Epoch 3/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5323 - acc: 0.7279 - val_loss: 0.5123 - val_acc: 0.7410
Epoch 4/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5242 - acc: 0.7339 - val_loss: 0.5095 - val_acc: 0.7509
Epoch 5/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5187 - acc: 0.7383 - val_loss: 0.5141 - val_acc: 0.7399
Epoch 6/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5156 - acc: 0.7405 - val_loss: 0.5058 - val_acc: 0.7459
Epoch 7/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5125 - acc: 0.7425 - val_loss: 0.5036 - val_acc: 0.7515
Epoch 8/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5105 - acc: 0.7439 - val_loss: 0.5062 - val_acc: 0.7479
Epoch 9/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5093 - acc: 0.7450 - val_loss: 0.5023 - val_acc: 0.7497
Epoch 10/10
72077/72077 [==============================] - 1s 17us/step - loss: 0.5073 - acc: 0.7462 - val_loss: 0.5029 - val_acc: 0.7522
```
Based on the validation accuracy of previous runs,  the CNN would start overfitting around epoch 5. However, the dropout layer looks to account for overfitting in the later epochs.

#### Evaluation
```
100/100 [==============================] - 2s 23ms/step
Loss 0.5184418559074402 
Accuracy 0.7400391697883606]
```


### RNN

Next, a vanila RNN is implemented to better harness the sequential format of this input data.
#### Architecture
```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_7 (Embedding)      (None, 20, 100)           642400    
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, 100)               20100     
_________________________________________________________________
dense_12 (Dense)             (None, 6424)              648824    
_________________________________________________________________
activation_12 (Activation)   (None, 6424)              0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 6424)              0         
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 6425      
_________________________________________________________________
activation_13 (Activation)   (None, 1)                 0         
=================================================================
Total params: 1,317,749
Trainable params: 675,349
Non-trainable params: 642,400
_________________________________________________________________
```

#### Training Epochs 
```
Train on 72077 samples, validate on 8009 samples
Epoch 1/10
72077/72077 [==============================] - 6s 86us/step - loss: 0.5620 - acc: 0.7055 - val_loss: 0.5235 - val_acc: 0.7329
Epoch 2/10
72077/72077 [==============================] - 5s 73us/step - loss: 0.5272 - acc: 0.7339 - val_loss: 0.5131 - val_acc: 0.7449
Epoch 3/10
72077/72077 [==============================] - 5s 73us/step - loss: 0.5097 - acc: 0.7456 - val_loss: 0.5073 - val_acc: 0.7432
Epoch 4/10
72077/72077 [==============================] - 5s 72us/step - loss: 0.4984 - acc: 0.7532 - val_loss: 0.5040 - val_acc: 0.7520
Epoch 5/10
72077/72077 [==============================] - 5s 72us/step - loss: 0.4843 - acc: 0.7617 - val_loss: 0.5121 - val_acc: 0.7443
Epoch 6/10
72077/72077 [==============================] - 5s 72us/step - loss: 0.4701 - acc: 0.7703 - val_loss: 0.5127 - val_acc: 0.7390
Epoch 7/10
72077/72077 [==============================] - 6s 84us/step - loss: 0.4508 - acc: 0.7832 - val_loss: 0.5334 - val_acc: 0.7313
Epoch 8/10
72077/72077 [==============================] - 5s 73us/step - loss: 0.4318 - acc: 0.7956 - val_loss: 0.5345 - val_acc: 0.7463
Epoch 9/10
72077/72077 [==============================] - 5s 73us/step - loss: 0.4101 - acc: 0.8089 - val_loss: 0.5497 - val_acc: 0.7258
Epoch 10/10
72077/72077 [==============================] - 5s 72us/step - loss: 0.3882 - acc: 0.8219 - val_loss: 0.5804 - val_acc: 0.7331
```
Here the dropout layer stabalizes overfitting in the later epochs.

#### Evaluation
```
100/100 [==============================] - 5s 52ms/step
loss 0.604823112487793 
accuracy 0.7234587669372559
```
The RNN does not perform as well as the CNN. This is likely due to the padding of the data. Since the padding occurs at the end of the sequence, the information from the earlier occurring non-padding terms is diminished by the time we get to the output at time step 20.

### LSTM

Last, a LSTM is implemented to better help with this information loss and pass valuable signal from earlier occurring features forward.
#### Architecture
```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_9 (Embedding)      (None, 20, 100)           642400    
_________________________________________________________________
lstm_2 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense_16 (Dense)             (None, 6424)              648824    
_________________________________________________________________
activation_16 (Activation)   (None, 6424)              0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 6424)              0         
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 6425      
_________________________________________________________________
activation_17 (Activation)   (None, 1)                 0         
=================================================================
Total params: 1,378,049
Trainable params: 735,649
Non-trainable params: 642,400
_________________________________________________________________
```

#### Training Epochs 
```
Epoch 1/10
72077/72077 [==============================] - 15s 208us/step - loss: 0.5458 - acc: 0.7177 - val_loss: 0.5067 - val_acc: 0.7449
Epoch 2/10
72077/72077 [==============================] - 13s 183us/step - loss: 0.5148 - acc: 0.7400 - val_loss: 0.4941 - val_acc: 0.7515
Epoch 3/10
72077/72077 [==============================] - 12s 173us/step - loss: 0.5032 - acc: 0.7492 - val_loss: 0.4852 - val_acc: 0.7581
Epoch 4/10
72077/72077 [==============================] - 13s 183us/step - loss: 0.4933 - acc: 0.7562 - val_loss: 0.4841 - val_acc: 0.7604
Epoch 5/10
72077/72077 [==============================] - 12s 173us/step - loss: 0.4853 - acc: 0.7609 - val_loss: 0.4779 - val_acc: 0.7679
Epoch 6/10
72077/72077 [==============================] - 12s 172us/step - loss: 0.4772 - acc: 0.7653 - val_loss: 0.4763 - val_acc: 0.7705
Epoch 7/10
72077/72077 [==============================] - 12s 173us/step - loss: 0.4707 - acc: 0.7708 - val_loss: 0.4850 - val_acc: 0.7621
Epoch 8/10
72077/72077 [==============================] - 14s 191us/step - loss: 0.4636 - acc: 0.7748 - val_loss: 0.4790 - val_acc: 0.7669
Epoch 9/10
72077/72077 [==============================] - 12s 172us/step - loss: 0.4558 - acc: 0.7805 - val_loss: 0.4777 - val_acc: 0.7670
Epoch 10/10
72077/72077 [==============================] - 12s 173us/step - loss: 0.4460 - acc: 0.7854 - val_loss: 0.4735 - val_acc: 0.7665
```
#### Evaluation
```
100/100 [==============================] - 13s 127ms/step
Loss 0.48326265811920166
Accuracy 0.7662161588668823
```
As expected, the LSTM outperforms both CNN and RNN. This is because the LSTM is able to learn from the sequence's meaning and pass forward information from earlier vectors in the sequence and use this information in it's prediction.




## Sources:
Deep Learning, Goodfellow-et-al-2016, http://www.deeplearningbook.org/

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

## Appendix: Experiment Code

### Data Import, Cleaning and W2V Creation

In [1]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving train.csv to train.csv
User uploaded file "train.csv" with length 8664015 bytes


In [2]:
import pandas as pd
import re
import gensim
from gensim.models import Word2Vec
from sklearn.decomposition import PCA
from matplotlib import pyplot
import numpy as np
np.random.seed(42)

paramiko missing, opening SSH/SCP/SFTP paths will be disabled.  `pip install paramiko` to suppress


In [3]:
traindf = pd.read_csv('train.csv', encoding='latin-1')
traindf.head()

Unnamed: 0,ItemID,Sentiment,SentimentText
0,1,0,is so sad for my APL frie...
1,2,0,I missed the New Moon trail...
2,3,1,omg its already 7:30 :O
3,4,0,.. Omgaga. Im sooo im gunna CRy. I'...
4,5,0,i think mi bf is cheating on me!!! ...


In [0]:
sentences = traindf['SentimentText']

In [0]:
#remove @ mentions 
sentences = [re.sub(r'@[A-Za-z0-9]+','',str(sentence)) for sentence in sentences]
#remove urls
sentences = [re.sub(r'https?://[A-Za-z0-9./]+','',str(sentence)) for sentence in sentences]
#remove non-letters  
sentences = [re.sub(r'[^a-zA-Z]',' ',str(sentence)) for sentence in sentences]
#lower
sentences = [sentence.lower() for sentence in sentences]

In [6]:
traindf['cleanSentimentText'] = sentences
traindf.head()

Unnamed: 0,ItemID,Sentiment,SentimentText,cleanSentimentText
0,1,0,is so sad for my APL frie...,is so sad for my apl frie...
1,2,0,I missed the New Moon trail...,i missed the new moon trail...
2,3,1,omg its already 7:30 :O,omg its already o
3,4,0,.. Omgaga. Im sooo im gunna CRy. I'...,omgaga im sooo im gunna cry i ...
4,5,0,i think mi bf is cheating on me!!! ...,i think mi bf is cheating on me ...


In [7]:
line = traindf['cleanSentimentText'][0]
print(line)
split = line.split()
print(split)

                     is so sad for my apl friend             
['is', 'so', 'sad', 'for', 'my', 'apl', 'friend']


### Create Word 2 Vec

In [0]:
# define training data
sentences = traindf['cleanSentimentText']
splitSentences = [sentence.split() for sentence in sentences]
traindf['splitSentences'] = splitSentences
# train models
skpgrm_model = Word2Vec(splitSentences, size=100, window=3, min_count=10, workers=4, sg=1, seed=42)
skpgrm_model.save("skpgrm_word2vec.model")


cbow_model = Word2Vec(splitSentences, size=100, window=3, min_count=10, workers=4, sg=0, seed=42)
cbow_model.save("cbow_word2vec.model")

In [0]:
skpgrm_model.wv.save_word2vec_format('skpgrm_word2vec.embed')
cbow_model.wv.save_word2vec_format('cbow_word2vec.embed')

In [0]:
import gensim
skpgrm_model = gensim.models.KeyedVectors.load_word2vec_format('skpgrm_word2vec.embed')

In [0]:
cbow_model = gensim.models.KeyedVectors.load_word2vec_format('cbow_word2vec.embed')

In [12]:
word = "pad"  # for any word in model
i = skpgrm_model.vocab[word].index
skpgrm_model.index2word[i] 
print(i)

3591


In [13]:
i = 0
word = skpgrm_model.index2word[i] 
print(word)

i


In [66]:
skpgrm_model.get_vector('homework')

array([ 0.0133209 , -0.22285783,  0.15449187, -0.26800337,  0.34289762,
        0.130753  , -0.20269933,  0.05814106,  0.00448072,  0.10169426,
        0.2474094 ,  0.09451383,  0.38079834,  0.18994571, -0.4232437 ,
        0.11030207, -0.00634213,  0.04797819, -0.3738323 ,  0.16714379,
        0.29844195, -0.24661233, -0.10517423,  0.10951623, -0.23241755,
        0.2365318 ,  0.47039396, -0.03204299, -0.00554783, -0.1973574 ,
       -0.29617462,  0.29662815, -0.07635267, -0.20394094,  0.27186936,
       -0.03534871, -0.21222508,  0.07720903,  0.16229631, -0.18257399,
       -0.30284393, -0.05643161, -0.22181787,  0.32325214,  0.0381747 ,
       -0.04318884,  0.03069128,  0.11405341, -0.09964969, -0.11250107,
       -0.04921242,  0.1064731 , -0.16931188,  0.15360741, -0.31548744,
       -0.30717424, -0.13542418,  0.08737126,  0.1196891 , -0.37292108,
        0.2103726 , -0.37710935,  0.2584681 ,  0.12948185, -0.2982405 ,
        0.13304855, -0.23491664, -0.14256747,  0.14583384, -0.19

In [0]:
sentenceIndexs = []
for line in traindf['splitSentences']:
  index = []
  for word in line:
    try:
      i = skpgrm_model.vocab[word].index
      index.append(i)
    except Exception:
      pass 
  sentenceIndexs.append(index)

In [0]:
traindf['sentenceIndex'] = sentenceIndexs

In [0]:
traindf_features_list = traindf['sentenceIndex'].values
traindf_features_np = np.array([np.array(line) for line in traindf_features_list])
max_len = np.max([len(a) for a in traindf_features_np])
traindf_features_np = [np.pad(a, (0, max_len - len(a)), 'constant', constant_values=0) for a in traindf_features_np]
features = []
for line in traindf_features_np:
  features.append(line[:20]) #trim the padded list items to the first 20 elements


In [20]:
traindf['features']=features
featuresLength = []
for line in traindf['features']:
  length = len(line)
  featuresLength.append(length)
 
traindf['featuresLength']=featuresLength
traindf['label']=traindf['Sentiment']
traindf.head()

Unnamed: 0,ItemID,Sentiment,SentimentText,cleanSentimentText,splitSentences,sentenceIndex,features,featuresLength,label
0,1,0,is so sad for my APL frie...,is so sad for my apl frie...,"[is, so, sad, for, my, apl, friend]","[12, 18, 121, 10, 7, 254]","[12, 18, 121, 10, 7, 254, 0, 0, 0, 0, 0, 0, 0,...",20,0
1,2,0,I missed the New Moon trail...,i missed the new moon trail...,"[i, missed, the, new, moon, trailer]","[0, 239, 2, 106, 1032, 1638]","[0, 239, 2, 106, 1032, 1638, 0, 0, 0, 0, 0, 0,...",20,0
2,3,1,omg its already 7:30 :O,omg its already o,"[omg, its, already, o]","[235, 80, 199, 212]","[235, 80, 199, 212, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",20,1
3,4,0,.. Omgaga. Im sooo im gunna CRy. I'...,omgaga im sooo im gunna cry i ...,"[omgaga, im, sooo, im, gunna, cry, i, ve, been...","[73, 522, 73, 1743, 548, 0, 100, 92, 34, 42, 2...","[73, 522, 73, 1743, 548, 0, 100, 92, 34, 42, 2...",20,0
4,5,0,i think mi bf is cheating on me!!! ...,i think mi bf is cheating on me ...,"[i, think, mi, bf, is, cheating, on, me, t, t]","[0, 71, 2118, 1400, 12, 4066, 17, 14, 11, 11]","[0, 71, 2118, 1400, 12, 4066, 17, 14, 11, 11, ...",20,0


In [21]:
modeldf = traindf[['features','label']]
modeldf.head()

Unnamed: 0,features,label
0,"[12, 18, 121, 10, 7, 254, 0, 0, 0, 0, 0, 0, 0,...",0
1,"[0, 239, 2, 106, 1032, 1638, 0, 0, 0, 0, 0, 0,...",0
2,"[235, 80, 199, 212, 0, 0, 0, 0, 0, 0, 0, 0, 0,...",1
3,"[73, 522, 73, 1743, 548, 0, 100, 92, 34, 42, 2...",0
4,"[0, 71, 2118, 1400, 12, 4066, 17, 14, 11, 11, ...",0


In [0]:
msk = np.random.rand(len(traindf)) < 0.8
train = modeldf[msk]
dev = modeldf[~msk]

In [0]:
trainFeatList = train['features'].values
train_x=np.array([np.array(y) for y in trainFeatList])
train_y=np.array(train['label'])

devFeatList = dev['features'].values
dev_x=np.array([np.array(y) for y in devFeatList])
dev_y=np.array(dev['label'])

In [24]:
print(train_x)

[[1.200e+01 1.800e+01 1.210e+02 ... 0.000e+00 0.000e+00 0.000e+00]
 [2.350e+02 8.000e+01 1.990e+02 ... 0.000e+00 0.000e+00 0.000e+00]
 [7.300e+01 5.220e+02 7.300e+01 ... 5.562e+03 3.160e+02 1.700e+01]
 ...
 [3.700e+01 3.350e+02 1.345e+03 ... 1.020e+02 1.720e+02 2.000e+00]
 [1.050e+02 1.050e+02 0.000e+00 ... 0.000e+00 0.000e+00 0.000e+00]
 [7.200e+01 1.050e+02 1.000e+00 ... 0.000e+00 0.000e+00 0.000e+00]]


### Keras Model Building

In [0]:
vocab_size = 6424
seq_size = 20
emdedding_size = 100

In [0]:
from keras.layers.convolutional import Conv1D
from keras.layers.pooling import MaxPooling1D
from keras.layers.recurrent import LSTM, SimpleRNN
from keras.layers.embeddings import Embedding
from keras.models import Model, Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.initializers import Constant

In [0]:
CNNet = Sequential()
CNNet.add(Embedding(vocab_size,
                    emdedding_size,
                    weights=[skpgrm_model.vectors],
                    input_length = 20,
                    #mask_zero=True,
                    trainable=False))
CNNet.add(Conv1D(filters=3,
                         kernel_size=5,
                         padding="valid",
                         activation="relu",
                         strides=1))
CNNet.add(MaxPooling1D(pool_size=5))
CNNet.add(Flatten())
CNNet.add(Dense(units=vocab_size))
CNNet.add(Activation('relu'))
CNNet.add(Dropout(rate=0.25, seed=42))
CNNet.add(Dense(units=1))
CNNet.add(Activation('sigmoid'))
CNNet.compile(optimizer='adam', loss='binary_crossentropy', 
              metrics=['accuracy'])

In [48]:
CNNet.reset_states()
CNNet.fit(train_x, train_y, 
  batch_size=250,
  epochs=10,
  verbose=1,
  validation_split=0.1,
  shuffle=True)

Train on 72077 samples, validate on 8009 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f7d38183a90>

In [45]:
RNNet = Sequential()
RNNet.add(Embedding(vocab_size,
                    emdedding_size,
                    weights=[skpgrm_model.vectors],
                    input_length = 20,
                    mask_zero=True,
                    trainable=False))
RNNet.add(SimpleRNN(units=emdedding_size, input_dim=20))
RNNet.add(Dense(units=vocab_size))
RNNet.add(Activation('relu'))
RNNet.add(Dropout(rate=0.25, seed=42))
RNNet.add(Dense(units=1))
RNNet.add(Activation('sigmoid'))
RNNet.compile(optimizer='adam', loss='binary_crossentropy', 
              metrics=['accuracy'])

  
  


In [46]:
RNNet.reset_states()
RNNet.fit(train_x, train_y, 
  batch_size=250,
  epochs=10,
  verbose=1,
  validation_split=0.1,
  shuffle=True)

Train on 72077 samples, validate on 8009 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f7d38964208>

In [49]:
LSTMNet = Sequential()
LSTMNet.add(Embedding(vocab_size,
                    emdedding_size,
                    weights=[skpgrm_model.vectors],
                    input_length = 20,
                    mask_zero=True,
                    trainable=False))
LSTMNet.add(LSTM(units=emdedding_size, input_dim=20))
LSTMNet.add(Dense(units=vocab_size))
LSTMNet.add(Activation('relu'))
LSTMNet.add(Dropout(rate=0.25, seed=42))
LSTMNet.add(Dense(units=1))
LSTMNet.add(Activation('sigmoid'))
LSTMNet.compile(optimizer='adam', loss='binary_crossentropy', 
              metrics=['accuracy'])

  
  


In [50]:
LSTMNet.reset_states()
LSTMNet.fit(train_x, train_y, 
  batch_size=250,
  epochs=10,
  verbose=1,
  validation_split=0.1,
  shuffle=True)

Train on 72077 samples, validate on 8009 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f7d381f2550>

In [58]:
CNNet.evaluate(dev_x, dev_y,  
                   verbose=1, 
                   steps=100)



[0.5184418559074402, 0.7400391697883606]

In [61]:
CNNet.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 20, 100)           642400    
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 16, 3)             1503      
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 3)              0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 9)                 0         
_________________________________________________________________
dense_14 (Dense)             (None, 6424)              64240     
_________________________________________________________________
activation_14 (Activation)   (None, 6424)              0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 6424)              0         
__________

In [59]:
RNNet.evaluate(dev_x, dev_y,  
                   verbose=1, 
                   steps=100)



[0.604823112487793, 0.7234587669372559]

In [62]:
RNNet.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_7 (Embedding)      (None, 20, 100)           642400    
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, 100)               20100     
_________________________________________________________________
dense_12 (Dense)             (None, 6424)              648824    
_________________________________________________________________
activation_12 (Activation)   (None, 6424)              0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 6424)              0         
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 6425      
_________________________________________________________________
activation_13 (Activation)   (None, 1)                 0         
Total para

In [60]:
LSTMNet.evaluate(dev_x, dev_y,  
                   verbose=1, 
                   steps=100)



[0.48326265811920166, 0.7662161588668823]

In [63]:
LSTMNet.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_9 (Embedding)      (None, 20, 100)           642400    
_________________________________________________________________
lstm_2 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense_16 (Dense)             (None, 6424)              648824    
_________________________________________________________________
activation_16 (Activation)   (None, 6424)              0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 6424)              0         
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 6425      
_________________________________________________________________
activation_17 (Activation)   (None, 1)                 0         
Total para