# Sentiment analysis with TFLearn

In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review data. Instead of a network written with Numpy, we'll be using [TFLearn](http://tflearn.org/), a high-level library built on top of TensorFlow. TFLearn makes it simpler to build networks just by defining the layers. It takes care of most of the details for you.

We'll start off by importing all the modules we'll need, then load and prepare the data.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
import tflearn
from tflearn.data_utils import to_categorical

## Preparing the data

Following along with Andrew, our goal here is to convert our reviews into word vectors. The word vectors will have elements representing words in the total vocabulary. If the second position represents the word 'the', for each review we'll count up the number of times 'the' appears in the text and set the second position to that count. I'll show you examples as we build the input data from the reviews data. Check out Andrew's notebook and video for more about this.

### Read the data

Use the pandas library to read the reviews and postive/negative labels from comma-separated files. The data we're using has already been preprocessed a bit and we know it uses only lower case characters. If we were working from raw data, where we didn't know it was all lower case, we would want to add a step here to convert it. That's so we treat different variations of the same word, like `The`, `the`, and `THE`, all the same way.

In [2]:
reviews = pd.read_csv('reviews.txt', header=None)
labels = pd.read_csv('labels.txt', header=None)

### Counting word frequency

To start off we'll need to count how often each word appears in the data. We'll use this count to create a vocabulary we'll use to encode the review data. This resulting count is known as a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model). We'll use it to select our vocabulary and build the word vectors. You should have seen how to do this in Andrew's lesson. Try to implement it here using the [Counter class](https://docs.python.org/2/library/collections.html#collections.Counter).

> **Exercise:** Create the bag of words from the reviews data and assign it to `total_counts`. The reviews are stores in the `reviews` [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). If you want the reviews as a Numpy array, use `reviews.values`. You can iterate through the rows in the DataFrame with `for idx, row in reviews.iterrows():` ([documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html)). When you break up the reviews into words, use `.split(' ')` instead of `.split()` so your results match ours.

In [3]:
from collections import Counter

total_counts = Counter()

for idx, row in reviews.iterrows():
    for review in row.values:
        for word in review.split(' '):
            total_counts[word] += 1

print("Total words in data set: ", len(total_counts))

Total words in data set:  74074


Let's keep the first 10000 most frequent words. As Andrew noted, most of the words in the vocabulary are rarely used so they will have little effect on our predictions. Below, we'll sort `vocab` by the count value and keep the 10000 most frequent words.

In [4]:
vocab = sorted(total_counts, key=total_counts.get, reverse=True)[:10000]
print(vocab[:60])

['', 'the', '.', 'and', 'a', 'of', 'to', 'is', 'br', 'it', 'in', 'i', 'this', 'that', 's', 'was', 'as', 'for', 'with', 'movie', 'but', 'film', 'you', 'on', 't', 'not', 'he', 'are', 'his', 'have', 'be', 'one', 'all', 'at', 'they', 'by', 'an', 'who', 'so', 'from', 'like', 'there', 'her', 'or', 'just', 'about', 'out', 'if', 'has', 'what', 'some', 'good', 'can', 'more', 'she', 'when', 'very', 'up', 'time', 'no']


What's the last word in our vocabulary? We can use this to judge if 10000 is too few. If the last word is pretty common, we probably need to keep more words.

In [5]:
print(vocab[-1], ': ', total_counts[vocab[-1]])

winded :  30


The last word in our vocabulary shows up in 30 reviews out of 25000. I think it's fair to say this is a tiny proportion of reviews. We are probably fine with this number of words.

**Note:** When you run, you may see a different word from the one shown above, but it will also have the value `30`. That's because there are many words tied for that number of counts, and the `Counter` class does not guarantee which one will be returned in the case of a tie.

Now for each review in the data, we'll make a word vector. First we need to make a mapping of word to index, pretty easy to do with a dictionary comprehension.

> **Exercise:** Create a dictionary called `word2idx` that maps each word in the vocabulary to an index. The first word in `vocab` has index `0`, the second word has index `1`, and so on.

In [6]:
word2idx = {}

for index, word in enumerate(vocab):
    word2idx[word] = index

for key in word2idx.keys():
    print(key, word2idx[key])
    

 0
are 27
semi 2365
new 163
unfunny 1939
apprentice 9122
rational 7008
mesmerizing 5982
winners 8460
properly 2860
unhappy 4438
clone 6491
ladies 1784
leopold 9321
only 65
antonio 6040
smash 6767
revolting 9142
invented 5086
prop 7750
fiasco 8442
superbly 3521
segment 2031
chorus 4301
protecting 8893
reluctant 5487
tasks 8894
caricatures 5202
cathy 9465
around 187
inferior 4469
kerry 9822
keeler 9116
gusto 8486
among 778
flag 7394
distance 3729
che 2345
theories 6534
still 132
dripping 9199
convict 7116
evolution 5954
resulting 4994
hallucinations 8324
reel 4702
bin 3950
hollywood 335
charming 1196
georgia 6780
dangerous 1798
witty 1903
purse 8934
battling 7962
punch 2783
bleak 3698
stilted 4505
clooney 6694
their 68
revolt 6515
gods 5926
sunshine 3757
griffith 4200
thin 1507
darkness 2518
colorful 3191
benoit 9037
understated 4627
stiff 3407
citizens 5337
rain 2456
unanswered 7023
empty 1888
fuzzy 6975
stops 2984
listing 9447
oddball 9461
marcel 7412
figure 803
territory 3591
contribu

therefore 1605
willis 4566
focuses 2667
examples 2682
got 189
martha 5447
sites 8248
familiar 1063
produces 6895
charlotte 4334
retirement 8197
jumped 4547
followed 1461
sold 2921
hunters 4622
olsen 7272
comparable 7664
merits 5035
rather 247
origins 6160
addressing 9597
you 22
charm 1358
challenging 4715
studied 7287
alter 6675
michelle 2800
part 173
live 409
terminator 6097
siblings 6244
neurotic 5862
briefly 3328
reject 7886
insecure 9780
unbelievably 3774
beatles 5376
burial 7626
cuba 3955
writes 4492
uses 1058
complications 7373
bearing 7351
pleasures 8559
seduce 7553
ppv 9511
sentences 7040
showing 782
report 4392
brutally 4616
icy 8499
spelled 8088
council 9753
caprica 7623
bunny 5160
roller 5338
york 749
haven 752
moral 1484
celine 8688
soprano 7026
obsession 2922
dodge 9047
alternative 5182
convent 7283
holden 9522
rocket 4195
denver 8919
dim 5998
hoover 8653
clip 4949
survivor 4843
warns 6998
organized 5762
plotting 6416
alike 3079
sorrow 8424
depardieu 7776
jr 1745
george 70

letter 3286
match 992
dangers 8859
attempted 3325
mixture 4149
proud 2586
addressed 7368
likewise 4657
slimy 5671
male 888
appreciation 4704
anticipated 6590
household 4884
thunderbirds 6086
wilderness 5076
puerto 6273
sky 1660
unfortunately 465
leno 8956
somehow 805
distracting 4037
successfully 2951
cinderella 2186
confess 5022
stopped 2211
mentioned 1029
conveniently 6473
stay 772
weeks 2428
knocks 6558
regal 9451
cleveland 9514
clue 2261
channel 1253
upbringing 8730
showcase 4646
stimulating 8971
vacation 2973
dealer 6225
lee 744
conductor 7495
voice 533
incest 7291
frances 7122
drawings 8147
ian 3405
christianity 5099
lifts 8382
weaker 6119
possessed 4003
prince 1757
split 3196
shaq 9603
raping 9796
described 2184
norton 7930
madeleine 7438
wai 8977
relatives 4668
watches 3635
shoulders 5192
shelves 8672
coburn 7175
contestant 7593
picked 1618
liberties 8627
gung 8701
associated 3425
makeup 2412
insist 6786
prevent 3586
extremely 565
derivative 6203
ensuing 9703
wes 3655
collectin

fence 8497
thoughts 2294
random 1506
character 105
shattering 8990
pace 1043
account 2617
portuguese 7547
voiced 4319
somewhere 1173
dame 6583
feed 4758
insulted 9581
nothing 164
clerk 6858
lewis 1874
questionable 4571
unintentional 4008
completely 338
hell 594
choose 2233
straw 8887
bunch 745
tacked 8137
cassie 8010
efficient 8750
christine 5441
somber 9905
masked 7274
abrupt 5429
owner 1965
rumors 9733
financial 4107
general 797
ghetto 6698
gross 2695
executed 2119
fascist 6647
perverted 7583
provocative 5333
saying 652
five 660
proceedings 3886
fond 4203
admits 7556
worn 4726
thumb 8712
rivers 6633
skinny 6765
heavenly 8700
mastermind 8812
ugh 6424
billy 1454
gielgud 6659
artistically 8516
fable 8921
discipline 7665
projected 7558
hears 5172
taste 1269
doomed 4131
bruno 6216
pool 3059
passable 5051
premiere 5156
appreciated 2494
silly 695
qualifies 8123
abu 8230
resolution 3441
woven 9281
horribly 2327
adapted 3032
border 3565
seed 4238
scenarios 7176
friday 2481
metaphor 5021
signa

concentrate 6405
leia 9238
passengers 6445
generally 1200
delia 8856
raising 4498
supernatural 2440
thoughtful 4315
gems 5686
dressing 4812
debbie 6623
jealous 3654
fleeting 8923
dry 2210
process 1749
awesome 1166
adaptations 5060
tragic 1550
irrelevant 4829
straightforward 5268
buys 6239
background 959
bears 3261
region 5259
pants 4503
presentation 2954
horrifying 4416
bald 8538
rapist 5636
shaking 5779
authentic 2875
prefers 8798
connor 6433
deals 2006
artsy 5460
ray 1447
empathize 9477
sickening 6782
subsequently 7656
accomplished 3560
mildly 2769
substitute 7481
sexual 843
welcomed 9121
looking 264
distinguished 9066
bodyguard 8009
aimed 3670
biker 6979
slapped 6775
mismatched 9831
lasts 6179
neighborhood 3197
a 4
inflicted 9647
sr 9412
continuous 9075
mocking 8639
announced 7422
models 4108
fragile 6381
beau 9488
fits 2320
score 590
since 235
city 520
overwhelming 3981
phil 5083
kiss 2810
missions 8580
pity 2216
voight 3922
lighter 6213
stooges 4441
embarrassed 2909
interrupted 74

geek 7067
seduction 8196
defending 7952
perhaps 377
beloved 2712
guessing 3085
carried 2905
horrific 2970
using 758
japan 1795
colin 5737
rapidly 6110
reasonably 3691
visions 5544
mid 1671
shatner 8136
goldsworthy 5857
iron 4496
dixon 4500
listed 3546
fortunately 2974
illusion 8735
wooden 1621
moments 382
techniques 3332
ninjas 7970
well 73
take 192
solving 9533
spike 3378
stark 5258
exaggerated 3685
natured 6181
superior 1698
excellence 7407
own 204
honorable 9178
pistol 8881
sea 1961
standout 6191
walking 1260
whip 8028
assured 5658
spoiling 7598
particular 826
storyline 754
higher 1814
homer 4197
headed 2823
predict 5651
flesh 2079
larry 2645
heroine 1811
persona 3488
takes 303
detail 1571
blessing 9752
suffering 2059
consistently 4111
ollie 4454
token 5900
til 9352
diving 8003
stir 9187
assassination 5849
moving 715
cult 1174
neglect 9091
elizabeth 2755
horrors 3561
motives 4171
nbc 6346
unconventional 7739
cocky 9255
shaun 8337
lindsey 8081
really 66
preminger 5404
dunst 7884
all 

trees 4090
aura 8521
crucial 4817
morbid 5408
payne 7916
disappearing 9189
sheep 7525
melvyn 6488
help 337
absolute 1535
consciousness 6709
cowardly 8346
ledger 8567
sheet 9681
continued 3472
suited 3929
openly 7924
yesterday 3775
motions 6968
instruments 9745
fraud 7744
pathos 6802
hiring 9626
give 202
stack 5109
faux 7208
funnier 2787
ripped 3295
baseketball 9357
bucket 9029
surprisingly 1208
hogan 8717
cheer 5824
liu 5911
sleeping 2706
unravel 8099
deserving 5765
nope 5850
heavy 1147
bsg 6020
karloff 3122
happily 3071
trick 2807
brits 9253
se 6158
stupid 373
masters 4023
reserved 7396
peaceful 6568
profession 5885
displays 3840
nick 1790
some 50
drums 9204
cliff 3876
interestingly 5637
hark 8883
touch 1193
mermaid 5790
pacing 1783
figured 2613
stupidity 2986
wang 5045
juvenile 3932
popular 1044
much 75
then 94
cameos 3151
laced 8648
rice 7010
harbor 8470
incomprehensible 4684
rightly 7550
netflix 6218
deciding 6907
implausible 4006
lyrics 4139
appeared 1467
toxic 7852
needed 872
vic

us 179
rubbish 1886
wondered 3512
witnesses 4660
messy 5788
reputation 2546
screening 2782
birth 2497
twist 978
steal 2081
byrne 9333
flies 4081
predictably 6672
realistic 804
north 2339
sequels 2257
sappy 4473
borrows 8542
le 3390
co 981
marginally 9963
outer 4112
modest 6091
inexperienced 8478
blockbuster 2609
amongst 2887
embarrassing 2242
li 3913
popping 5554
conditions 5070
lange 9088
twenty 1765
dare 3299
convention 6944
intelligently 8554
ruined 2234
tricked 8388
event 1457
proper 2219
singing 1096
recommendation 5416
funny 165
vcr 7900
expert 2756
series 197
gertrude 9839
mockery 8928
bare 3824
heels 5952
capsule 9184
coach 4196
spoke 4607
unimpressive 9543
slip 6599
level 637
dahmer 6723
successful 1088
furniture 6531
imitation 4839
open 889
parallels 6566
affections 9658
estate 3477
hopper 4331
substantial 7341
deliciously 6864
block 3859
worry 3220
invisible 2465
keeps 921
for 17
ish 5199
votes 6059
invent 9713
denmark 9928
adolescent 5303
iranian 7084
employee 6875
fault 21

ash 8874
loosely 3726
australia 3518
miracle 4504
mr 438
delivers 1524
thankfully 2670
crush 4206
anderson 2279
gwyneth 6901
saving 1877
angelina 7303
myers 4390
frontal 6011
incidentally 5751
jock 9030
redneck 7117
scarier 8384
bronson 5424
tapes 5968
chavez 5288
surround 7754
directorial 3649
collector 7165
dysfunctional 5158
rebel 4179
magazines 8155
mourning 9415
long 195
fairy 2329
prepared 2809
direct 1481
mindless 3030
agenda 5244
brutality 5048
suits 3642
practically 2174
lauren 6065
prisoners 4686
landscape 3802
explains 2519
schedule 7140
blonde 1926
khan 4279
nuts 4740
further 1014
real 147
dane 9457
bo 3011
han 7729
dances 4211
calling 2738
guilty 2483
unleashed 7903
develops 3187
taped 6540
adelaide 9983
shah 6935
refuses 3043
animator 8182
melting 4234
exceedingly 9858
hale 8350
gimmick 6074
scorpion 8317
crow 9371
lumet 4275
remainder 7936
required 2559
aka 2509
emerges 5602
physical 1720
meant 963
portrait 3171
inspires 9962
dan 1970
kidding 3979
sensibility 7232
peters

entire 433
concerning 3810
rescued 6505
opens 1998
physically 3086
featured 2525
tower 6055
article 7596
half 318
silliness 5219
enjoyable 724
suggested 5028
samuel 7544
client 7957
firstly 4578
opera 1395
beating 3297
cell 2737
answers 2739
playwright 7105
reunite 7436
originals 7043
connecting 9398
repetitive 3594
rohmer 7266
lab 3417
shadows 3750
scarface 6084
freak 3748
lars 8373
battlestar 7245
dozen 2490
occur 3889
raped 3510
deadly 2457
separated 6336
jenny 4483
placement 8311
jobs 2620
severed 6317
beatty 3865
heck 2274
insurance 5453
passes 4059
duh 7086
void 6908
hartman 9952
eerie 3238
holidays 9510
luck 1977
underwater 6275
nightclub 6374
stature 9070
eugene 4930
cinematographic 8710
six 1427
reflects 5523
animals 1350
runtime 9694
backed 6681
niece 5876
sitcoms 7648
deny 6026
gilligan 9764
important 661
replacing 8297
predicted 8511
awards 2097
witch 1654
commented 4256
musicals 2746
grabs 5058
incarnation 9473
adam 1737
doll 4116
jigsaw 8189
cassidy 4841
sadness 3902
assi

### Text to vector function

Now we can write a function that converts a some text to a word vector. The function will take a string of words as input and return a vector with the words counted up. Here's the general algorithm to do this:

* Initialize the word vector with [np.zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), it should be the length of the vocabulary.
* Split the input string of text into a list of words with `.split(' ')`. Again, if you call `.split()` instead, you'll get slightly different results than what we show here.
* For each word in that list, increment the element in the index associated with that word, which you get from `word2idx`.

**Note:** Since all words aren't in the `vocab` dictionary, you'll get a key error if you run into one of those words. You can use the `.get` method of the `word2idx` dictionary to specify a default returned value when you make a key error. For example, `word2idx.get(word, None)` returns `None` if `word` doesn't exist in the dictionary.

In [7]:
def text_to_vector(text):
    word_vector = np.zeros((1, len(vocab) ))
    
    for word in text.split(' '):
        if(word in word2idx.keys()):
            index = word2idx[word.lower()]
            word_vector[0][index] += 1
    return word_vector
    

If you do this right, the following code should return

```
text_to_vector('The tea is for a party to celebrate '
               'the movie so she has no time for a cake')[:65]
                   
array([0, 1, 0, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0])
```       

In [8]:
a = text_to_vector('The tea is for a party to celebrate '
               'the movie so she has no time for a cake')[:65]
print(a[0][:65])

[ 0.  1.  0.  0.  2.  0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  2.
  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.
  1.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.]


Now, run through our entire review data set and convert each review to a word vector.

In [9]:
word_vectors = np.zeros((len(reviews), len(vocab)), dtype=np.int_)
for ii, (_, text) in enumerate(reviews.iterrows()):
    word_vectors[ii] = text_to_vector(text[0])

In [10]:
# Printing out the first 5 word vectors
word_vectors[:5, :23]

array([[ 18,   9,  27,   1,   4,   4,   6,   4,   0,   2,   2,   5,   0,
          4,   1,   0,   2,   0,   0,   0,   0,   0,   0],
       [  5,   4,   8,   1,   7,   3,   1,   2,   0,   4,   0,   0,   0,
          1,   2,   0,   0,   1,   3,   0,   0,   0,   1],
       [ 78,  24,  12,   4,  17,   5,  20,   2,   8,   8,   2,   1,   1,
          2,   8,   0,   5,   5,   4,   0,   2,   1,   4],
       [167,  53,  23,   0,  22,  23,  13,  14,   8,  10,   8,  12,   9,
          4,  11,   2,  11,   5,  11,   0,   5,   3,   0],
       [ 19,  10,  11,   4,   6,   2,   2,   5,   0,   1,   2,   3,   1,
          0,   0,   0,   3,   1,   0,   1,   0,   0,   0]])

### Train, Validation, Test sets

Now that we have the word_vectors, we're ready to split our data into train, validation, and test sets. Remember that we train on the train data, use the validation data to set the hyperparameters, and at the very end measure the network performance on the test data. Here we're using the function `to_categorical` from TFLearn to reshape the target data so that we'll have two output units and can classify with a softmax activation function. We actually won't be creating the validation set here, TFLearn will do that for us later.

In [11]:
Y = (labels=='positive').astype(np.int_)
records = len(labels)

shuffle = np.arange(records)
np.random.shuffle(shuffle)
test_fraction = 0.9

train_split, test_split = shuffle[:int(records*test_fraction)], shuffle[int(records*test_fraction):]
trainX, trainY = word_vectors[train_split,:], to_categorical(Y.values[train_split], 2)
testX, testY = word_vectors[test_split,:], to_categorical(Y.values[test_split], 2)

In [12]:
trainY

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       ..., 
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

## Building the network

[TFLearn](http://tflearn.org/) lets you build the network by [defining the layers](http://tflearn.org/layers/core/). 

### Input layer

For the input layer, you just need to tell it how many units you have. For example, 

```
net = tflearn.input_data([None, 100])
```

would create a network with 100 input units. The first element in the list, `None` in this case, sets the batch size. Setting it to `None` here leaves it at the default batch size.

The number of inputs to your network needs to match the size of your data. For this example, we're using 10000 element long vectors to encode our input data, so we need 10000 input units.


### Adding layers

To add new hidden layers, you use 

```
net = tflearn.fully_connected(net, n_units, activation='ReLU')
```

This adds a fully connected layer where every unit in the previous layer is connected to every unit in this layer. The first argument `net` is the network you created in the `tflearn.input_data` call. It's telling the network to use the output of the previous layer as the input to this layer. You can set the number of units in the layer with `n_units`, and set the activation function with the `activation` keyword. You can keep adding layers to your network by repeated calling `net = tflearn.fully_connected(net, n_units)`.

### Output layer

The last layer you add is used as the output layer. Therefore, you need to set the number of units to match the target data. In this case we are predicting two classes, positive or negative sentiment. You also need to set the activation function so it's appropriate for your model. Again, we're trying to predict if some input data belongs to one of two classes, so we should use softmax.

```
net = tflearn.fully_connected(net, 2, activation='softmax')
```

### Training
To set how you train the network, use 

```
net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')
```

Again, this is passing in the network you've been building. The keywords: 

* `optimizer` sets the training method, here stochastic gradient descent
* `learning_rate` is the learning rate
* `loss` determines how the network error is calculated. In this example, with the categorical cross-entropy.

Finally you put all this together to create the model with `tflearn.DNN(net)`. So it ends up looking something like 

```
net = tflearn.input_data([None, 10])                          # Input
net = tflearn.fully_connected(net, 5, activation='ReLU')      # Hidden
net = tflearn.fully_connected(net, 2, activation='softmax')   # Output
net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')
model = tflearn.DNN(net)
```

> **Exercise:** Below in the `build_model()` function, you'll put together the network using TFLearn. You get to choose how many layers to use, how many hidden units, etc.

In [None]:
# Network building
def build_model():
    # This resets all parameters and variables, leave this here
    tf.reset_default_graph()
    
    #### Your code ####
    
    model = tflearn.DNN(net)
    return model

## Intializing the model

Next we need to call the `build_model()` function to actually build the model. In my solution I haven't included any arguments to the function, but you can add arguments so you can change parameters in the model if you want.

> **Note:** You might get a bunch of warnings here. TFLearn uses a lot of deprecated code in TensorFlow. Hopefully it gets updated to the new TensorFlow version soon.

In [None]:
model = build_model()

## Training the network

Now that we've constructed the network, saved as the variable `model`, we can fit it to the data. Here we use the `model.fit` method. You pass in the training features `trainX` and the training targets `trainY`. Below I set `validation_set=0.1` which reserves 10% of the data set as the validation set. You can also set the batch size and number of epochs with the `batch_size` and `n_epoch` keywords, respectively. Below is the code to fit our the network to our word vectors.

You can rerun `model.fit` to train the network further if you think you can increase the validation accuracy. Remember, all hyperparameter adjustments must be done using the validation set. **Only use the test set after you're completely done training the network.**

In [None]:
# Training
model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=128, n_epoch=10)

## Testing

After you're satisified with your hyperparameters, you can run the network on the test set to measure its performance. Remember, *only do this after finalizing the hyperparameters*.

In [None]:
predictions = (np.array(model.predict(testX))[:,0] >= 0.5).astype(np.int_)
test_accuracy = np.mean(predictions == testY[:,0], axis=0)
print("Test accuracy: ", test_accuracy)

## Try out your own text!

In [None]:
# Helper function that uses your model to predict sentiment
def test_sentence(sentence):
    positive_prob = model.predict([text_to_vector(sentence.lower())])[0][1]
    print('Sentence: {}'.format(sentence))
    print('P(positive) = {:.3f} :'.format(positive_prob), 
          'Positive' if positive_prob > 0.5 else 'Negative')

In [None]:
sentence = "Moonlight is by far the best movie of 2016."
test_sentence(sentence)

sentence = "It's amazing anyone could be talented enough to make something this spectacularly awful"
test_sentence(sentence)