# Sentiment analysis with TFLearn

In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review data. Instead of a network written with Numpy, we'll be using [TFLearn](http://tflearn.org/), a high-level library built on top of TensorFlow. TFLearn makes it simpler to build networks just by defining the layers. It takes care of most of the details for you.

We'll start off by importing all the modules we'll need, then load and prepare the data.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
import tflearn
from tflearn.data_utils import to_categorical

## Preparing the data

Following along with Andrew, our goal here is to convert our reviews into word vectors. The word vectors will have elements representing words in the total vocabulary. If the second position represents the word 'the', for each review we'll count up the number of times 'the' appears in the text and set the second position to that count. I'll show you examples as we build the input data from the reviews data. Check out Andrew's notebook and video for more about this.

### Read the data

Use the pandas library to read the reviews and postive/negative labels from comma-separated files. The data we're using has already been preprocessed a bit and we know it uses only lower case characters. If we were working from raw data, where we didn't know it was all lower case, we would want to add a step here to convert it. That's so we treat different variations of the same word, like `The`, `the`, and `THE`, all the same way.

In [2]:
reviews = pd.read_csv('reviews.txt', header=None)
labels = pd.read_csv('labels.txt', header=None)

### Counting word frequency

To start off we'll need to count how often each word appears in the data. We'll use this count to create a vocabulary we'll use to encode the review data. This resulting count is known as a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model). We'll use it to select our vocabulary and build the word vectors. You should have seen how to do this in Andrew's lesson. Try to implement it here using the [Counter class](https://docs.python.org/2/library/collections.html#collections.Counter).

> **Exercise:** Create the bag of words from the reviews data and assign it to `total_counts`. The reviews are stores in the `reviews` [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). If you want the reviews as a Numpy array, use `reviews.values`. You can iterate through the rows in the DataFrame with `for idx, row in reviews.iterrows():` ([documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html)). When you break up the reviews into words, use `.split(' ')` instead of `.split()` so your results match ours.

In [11]:
from collections import Counter

total_counts = Counter()

for idx, row in reviews.iterrows():
    for review in row.values:
        for word in review.split(' '):
            total_counts[word] += 1

print("Total words in data set: ", len(total_counts))

Total words in data set:  74074


Let's keep the first 10000 most frequent words. As Andrew noted, most of the words in the vocabulary are rarely used so they will have little effect on our predictions. Below, we'll sort `vocab` by the count value and keep the 10000 most frequent words.

In [12]:
vocab = sorted(total_counts, key=total_counts.get, reverse=True)[:10000]
print(vocab[:60])

['', 'the', '.', 'and', 'a', 'of', 'to', 'is', 'br', 'it', 'in', 'i', 'this', 'that', 's', 'was', 'as', 'for', 'with', 'movie', 'but', 'film', 'you', 'on', 't', 'not', 'he', 'are', 'his', 'have', 'be', 'one', 'all', 'at', 'they', 'by', 'an', 'who', 'so', 'from', 'like', 'there', 'her', 'or', 'just', 'about', 'out', 'if', 'has', 'what', 'some', 'good', 'can', 'more', 'she', 'when', 'very', 'up', 'time', 'no']


What's the last word in our vocabulary? We can use this to judge if 10000 is too few. If the last word is pretty common, we probably need to keep more words.

In [13]:
print(vocab[-1], ': ', total_counts[vocab[-1]])

assassins :  30


The last word in our vocabulary shows up in 30 reviews out of 25000. I think it's fair to say this is a tiny proportion of reviews. We are probably fine with this number of words.

**Note:** When you run, you may see a different word from the one shown above, but it will also have the value `30`. That's because there are many words tied for that number of counts, and the `Counter` class does not guarantee which one will be returned in the case of a tie.

Now for each review in the data, we'll make a word vector. First we need to make a mapping of word to index, pretty easy to do with a dictionary comprehension.

> **Exercise:** Create a dictionary called `word2idx` that maps each word in the vocabulary to an index. The first word in `vocab` has index `0`, the second word has index `1`, and so on.

In [110]:
word2idx = {}

for index, word in enumerate(vocab):
    word2idx[word] = index

for key in word2idx.keys():
    print(key, word2idx[key])
    

 0
flick 495
replacement 7242
characterizations 7296
subtext 6686
order 648
plotting 6432
courtesy 7462
grandfather 4351
regret 2573
jed 9097
spark 5136
fascination 5149
ahmad 8792
lightly 8901
exploding 6615
fever 3786
pet 2872
romance 863
gross 2691
grey 2987
artistically 8544
smoking 3361
hasn 1464
developed 1370
worlds 3194
another 160
romeo 6457
illiterate 9126
asked 1786
voight 3925
rare 1254
bust 7473
relax 5154
laced 8520
hes 7882
scientific 3695
environment 2746
wells 3653
supplies 9550
terrifying 3221
clothing 3947
four 675
vision 1729
embarrassingly 6973
feeble 6844
burned 3840
educated 6017
alongside 4619
achieves 6781
peaks 8334
injured 6025
connect 3739
officer 1856
would 62
field 1804
lifted 5565
condition 3197
dvd 277
mediocrity 8005
kidman 5489
thousands 3044
einstein 5904
episodic 9648
asset 8110
crenna 8928
exterior 6232
career 600
were 71
area 1597
snippets 9322
staying 3954
positive 1098
based 445
association 8961
inhabitants 5750
batwoman 8999
copy 1015
question 8

recreation 9639
drawings 8057
dunne 8373
landmark 7506
guinea 6851
hogan 8778
censorship 7859
tsui 8522
kim 2381
mitch 5900
wire 5739
poke 9115
monsters 1872
lens 7236
banned 3986
hateful 6790
address 5479
romania 8398
merrill 9465
arriving 6703
book 268
expectations 1368
weaknesses 5500
economy 8108
luzhin 7514
werewolf 1909
limits 4276
carter 3081
sondra 6975
gimmicks 9230
didnt 6638
joseph 2187
dealers 8315
woods 1381
sums 5252
whipped 8694
tricked 8370
coppola 8009
belly 6301
circus 5481
chess 4416
listening 2607
artistic 1586
figure 803
popcorn 3913
routines 6656
conception 9247
dialogue 411
schedule 7110
girls 513
exact 2570
bear 2053
et 4176
your 130
downside 9606
introduction 2840
climate 8925
interrupted 7408
morse 8259
stills 9549
wicked 3712
gigantic 7215
strength 2097
anime 2124
mystical 6675
surprises 2426
presumably 3497
bash 9122
goodman 7041
stardom 6263
cynical 3042
entrance 7259
triumph 3819
passing 2550
brett 7200
throwaway 9653
groove 8443
heather 8298
symbol 6865
f

alexandre 5895
lara 7649
unrated 9611
suzanne 7619
counterpart 9111
either 343
eagle 7609
spectacle 6422
swiss 8260
awareness 7229
filmed 795
rather 247
tomb 6984
riveting 4292
insight 2602
hears 5163
rainbow 9806
japanese 841
alice 2462
ups 1952
client 8002
cloak 7940
thugs 3904
bother 1395
fine 471
growing 1766
jaw 4613
spike 3379
rebecca 9010
habit 6314
attempted 3338
von 2638
recording 4863
singer 1893
turmoil 6571
finale 1939
manic 8301
lifestyle 4131
commenting 5871
gee 8800
directors 876
dawn 3327
clara 5971
marcel 7438
nazis 3828
jar 7309
keith 4738
goof 9438
ppv 9680
run 508
results 1897
accurate 1836
empathy 4917
happening 1430
whereas 3111
kinds 2544
bettie 2901
thirteen 9842
pierre 6511
voting 9534
mirrors 5839
teams 6252
smoke 3687
housing 9931
cruella 8280
framing 8178
tits 7630
rabid 9774
fable 8814
heir 8870
abused 5204
convincingly 4537
tones 7525
sentimentality 6522
wal 9512
derivative 6242
deniro 5003
aforementioned 3508
rank 4210
rome 5465
even 60
miles 1988
serving

estate 3470
determine 8425
itself 405
periods 7232
poe 5389
premiered 8386
strangely 2851
summed 7814
holt 7714
unoriginal 4964
disappearance 8045
today 501
dominic 7000
wiped 6734
producing 4013
attraction 3212
africa 2348
revealing 3622
ilk 9957
poet 7951
johnson 2541
heartbreaking 5381
gunga 5841
lucas 3213
video 367
mills 8240
boogie 7430
northern 5872
o 711
della 8117
presented 1331
late 514
applied 7372
lowe 6840
horrific 2970
irrational 8489
horrified 6756
targeted 7294
artist 1613
opened 3018
partition 9992
war 322
guide 3536
band 1081
mini 2338
facial 2729
hardships 9706
cassidy 4843
samurai 3538
abound 7016
schools 5791
chamberlain 5827
exclusively 7158
attic 8088
icons 8638
item 5748
loathing 9154
idealistic 8032
hadley 6561
delia 8940
janet 7564
lawyer 2371
friendship 1844
outer 4110
idiot 2632
subtlety 4199
quantum 6474
youngsters 8872
moving 714
corny 2008
roots 5005
intruder 9250
translation 3752
vignettes 7295
stunned 5007
trees 4106
wonderfully 1651
bunny 5176
robbery 

uniform 6220
deceased 5001
latino 8209
rightly 7519
brought 821
yeah 1214
rage 3906
grotesque 5210
spree 5821
reviewed 6777
duh 7095
farmer 5486
vets 9471
lesbians 8711
chills 5371
lois 5458
excruciatingly 6514
baldwin 5054
tomei 6899
swift 8735
cena 7353
painful 1323
mood 1284
nick 1790
unfairly 9760
vomit 6824
ten 731
surely 1324
stabs 8426
fianc 4855
miranda 6424
frustrating 5141
nude 2479
tortured 3789
robot 2310
gain 3209
wondered 3512
willis 4562
exceptionally 4799
behalf 9041
format 2705
wisdom 4671
experts 7982
specific 3347
adore 6438
culminating 9908
visits 4694
complications 7354
zany 8835
unfold 5277
realise 3528
worn 4704
edges 8031
formed 5873
deanna 6433
step 1460
niece 5896
unemployed 9099
surfing 4218
prologue 8022
latter 1494
sacred 9657
kazan 5962
visible 4676
next 371
neighborhood 3203
icon 4901
experienced 2534
rates 5188
novels 2845
hanks 3261
city 520
referring 6278
smiling 4763
pity 2213
breathless 9046
subtly 5992
begun 7067
bathroom 3833
chase 1266
greek 3854


ralph 3141
usual 632
isolation 6371
fascinated 4720
faster 4259
parsons 6188
collective 6669
chicks 4683
marital 9614
mgm 2475
insist 6736
massacre 3165
fart 7384
addresses 9711
entertained 2151
encounter 2754
sun 2509
documentary 645
maurice 8144
jon 2587
bearing 7310
wayne 2161
financial 4084
stiller 3756
survivors 4246
receiving 5545
ambitions 7006
raiders 9966
highly 536
extremely 566
scratch 6309
examine 8225
boston 5504
updated 6496
dracula 3683
others 401
wait 836
bee 9669
splendid 3599
owning 9781
policeman 5250
midler 8244
depict 6360
overrated 3841
fight 535
chapter 4709
suffered 3101
decline 6148
stumbling 9008
parker 2111
each 254
sci 898
behaves 8983
knock 3278
compelled 4772
fired 3375
comparison 2050
die 764
ewoks 8205
lights 2677
achievement 3582
multi 3256
sticking 6295
ignorant 4401
rapist 5633
devoted 3982
rukh 4815
pursues 9839
succeed 3099
profanity 5306
deaths 2416
cindy 6907
punch 2788
relative 3513
sounding 4384
fifth 5881
uncomfortable 3104
hobgoblins 7669
soon

remakes 5198
poison 5189
injustice 8187
bert 7598
ends 620
previously 2419
salvation 6240
carnage 6622
olsen 7332
hits 1913
hunk 9476
robert 647
grandparents 9303
recipe 8938
lurking 7207
hrs 9751
represented 4334
loop 9157
expresses 8583
counted 9048
worse 430
dream 894
sales 7802
ghosts 2676
p 1714
broken 1862
investment 7994
setup 5942
supernatural 2443
adelaide 9927
phyllis 9582
hair 1128
iowa 9884
trivia 5524
psychologist 6647
brutal 1737
stan 3195
gathering 7318
unleashed 7908
background 959
cover 1084
plain 1022
asia 6345
burial 7654
languages 8797
salesman 6592
rural 3936
possession 6845
un 2564
happy 631
br 8
robotic 7938
chainsaw 5416
seriousness 8250
perceive 9210
giant 1440
else 327
snakes 5553
help 337
eleanor 9524
bacon 5201
sykes 8717
mining 9492
ensue 7889
holes 1488
areas 3772
powerful 957
hardy 2707
back 145
puppets 6081
h 2034
mesmerizing 5996
capote 5987
efficient 8706
dalton 4379
fuzzy 6913
solo 4265
grating 8063
mcqueen 5943
exercise 3424
died 1110
sibling 8229
mo

malone 5094
soap 1834
drawing 3823
losers 4716
distraction 6761
literal 6257
despise 7601
couldn 423
airing 8608
invisible 2461
killers 2088
lighten 9964
sleeps 5944
colours 6211
cowboys 9899
grease 7849
stroke 6533
consequently 6996
boyle 5128
home 341
husband 592
fights 1830
butterfly 9394
during 313
informer 7761
invention 6837
sentenced 9405
competently 9557
mate 3616
option 5374
paid 1518
with 18
waitress 4813
greater 2772
torturing 8619
chooses 4864
delicate 5695
foot 2000
reports 6456
questions 1182
orlando 8448
othello 4760
chose 2448
wife 312
zane 5643
daphne 9118
require 5206
georgia 6771
closely 3255
pad 6770
masturbation 9695
blooded 6322
alienation 8935
notice 1478
organs 9445
treating 8975
corruption 4269
stretched 4934
closure 8175
pegg 6442
seats 7099
houses 4097
choir 7443
omar 7209
happily 3062
entertaining 441
redeem 5674
constantly 1325
haunt 7347
heston 3326
constraints 7789
surviving 4116
fleeing 8827
ruined 2231
variety 2526
initial 2378
bryan 9662
punchline 9746

reason 285
evelyn 5933
backwards 5759
yellow 4070
employees 8064
toilet 3435
menace 4272
buffalo 4985
www 5533
breaks 1993
third 820
superiors 9500
continuously 8543
morgan 1898
intact 6915
kramer 8328
carlito 5870
via 2830
sentiment 5178
comedy 208
kapoor 3872
block 3873
til 9404
moderately 8834
semblance 7709
premises 9218
is 7
sit 849
dismiss 7178
satisfied 4064
catholic 3046
nolte 5595
eastern 4960
interminable 9713
lance 6284
extraordinary 2785
mobster 8075
belt 5767
knox 9205
resembles 3850
newcombe 9489
muslims 6989
centre 5798
served 2981
clean 2112
lamarr 9198
anger 2540
napoleon 9368
encountered 6812
conventions 5498
square 4523
uma 5746
blames 8136
coop 7946
authority 4234
fritz 7763
pot 4256
jumps 2955
pistol 8963
coolest 7702
annoying 604
trust 1657
desperately 2718
shooting 1181
unpredictable 4992
conflict 1905
develops 3184
joined 4646
anyhow 6294
harsh 2460
sum 3008
china 2567
characterisation 7250
achievements 7689
henchmen 7788
except 541
continuing 6139
distinguished

straight 707
jonathan 4313
shadows 3751
election 5785
regarded 5370
performance 236
unrealistic 2246
epitome 8014
supportive 8702
staff 3980
jackman 7522
drunken 3701
freaking 6709
macdonald 6992
storm 3060
nerves 6260
genius 1236
echoes 8458
faced 2420
closed 4568
chocolate 9074
examination 5124
cheerful 7617
vanilla 6577
downs 6503
slimy 5621
increasingly 3421
dodge 8980
command 4284
extended 3807
gypo 6286
nominated 2277
visceral 8664
programming 8077
whiny 6479
slight 3320
iii 3355
percent 8703
godard 7034
upbeat 7893
failure 2081
rich 998
holly 4035
torture 1754
refuses 3043
expertise 9511
swallow 6191
goat 7220
lacked 3643
jake 2954
historically 4793
sixties 4956
prevented 9341
dreyfuss 7687
especially 258
iv 5292
inaccurate 6192
running 609
acid 4610
bad 77
teach 3310
topped 9600
noticeable 6406
tired 1435
blake 3943
limit 5713
hmm 7199
nail 4348
informative 6999
outfit 4242
hybrid 6631
vaudeville 8523
purse 8822
flame 7603
collect 6477
montana 4461
chicago 4555
underlying 5015


atlantic 7628
zoom 8926
composition 7223
outlaw 8385
gentle 3770
mabel 6171
narrow 6067
evident 3502
supports 8815
involved 563
vital 5628
baron 7168
aussie 6547
annoy 7734
ease 3938
pile 2394
festivals 7145
entirety 6251
aftermath 6930
triple 5578
position 2706
fewer 8720
regularly 6835
sunk 7727
referred 5369
encounters 3260
witherspoon 9280
assembled 6349
looking 264
us 179
danish 5654
laugh 457
jenny 4471
wall 1493
leno 8841
mentality 5649
claim 2280
eager 4492
wears 2870
magical 2514
fishing 5185
span 6256
emperor 4373
retains 9723
intentions 2948
plus 914
devil 1693
literary 5304
largest 9037
technique 3098
intelligence 1630
cares 2225
invites 5350
realities 6008
why 136
nicely 1765
carpet 7960
weirdo 9324
secure 8167
driving 1922
jude 7283
letters 4445
new 163
accompanied 4740
september 4530
mrs 1989
mitchell 3548
neck 3238
progression 8027
couch 5670
ugly 1529
steele 5579
horizon 9406
luck 1970
grew 2055
frankenstein 4547
meadows 7798
rush 3272
case 415
flynn 3142
peters 4971
l

### Text to vector function

Now we can write a function that converts a some text to a word vector. The function will take a string of words as input and return a vector with the words counted up. Here's the general algorithm to do this:

* Initialize the word vector with [np.zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), it should be the length of the vocabulary.
* Split the input string of text into a list of words with `.split(' ')`. Again, if you call `.split()` instead, you'll get slightly different results than what we show here.
* For each word in that list, increment the element in the index associated with that word, which you get from `word2idx`.

**Note:** Since all words aren't in the `vocab` dictionary, you'll get a key error if you run into one of those words. You can use the `.get` method of the `word2idx` dictionary to specify a default returned value when you make a key error. For example, `word2idx.get(word, None)` returns `None` if `word` doesn't exist in the dictionary.

In [116]:
def text_to_vector(text):
    word_vector = np.zeros((1, len(vocab) ))
    
    for word in text.split(' '):
        if(word in word2idx.keys()):
            index = word2idx[word.lower()]
            word_vector[0][index] += 1
    return word_vector
    

If you do this right, the following code should return

```
text_to_vector('The tea is for a party to celebrate '
               'the movie so she has no time for a cake')[:65]
                   
array([0, 1, 0, 0, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0])
```       

In [122]:
a = text_to_vector('The tea is for a party to celebrate '
               'the movie so she has no time for a cake')[:65]
print(a[0][:65])

[ 0.  1.  0.  0.  2.  0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  2.
  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.
  1.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.]


Now, run through our entire review data set and convert each review to a word vector.

In [None]:
word_vectors = np.zeros((len(reviews), len(vocab)), dtype=np.int_)
for ii, (_, text) in enumerate(reviews.iterrows()):
    word_vectors[ii] = text_to_vector(text[0])

In [None]:
# Printing out the first 5 word vectors
word_vectors[:5, :23]

### Train, Validation, Test sets

Now that we have the word_vectors, we're ready to split our data into train, validation, and test sets. Remember that we train on the train data, use the validation data to set the hyperparameters, and at the very end measure the network performance on the test data. Here we're using the function `to_categorical` from TFLearn to reshape the target data so that we'll have two output units and can classify with a softmax activation function. We actually won't be creating the validation set here, TFLearn will do that for us later.

In [None]:
Y = (labels=='positive').astype(np.int_)
records = len(labels)

shuffle = np.arange(records)
np.random.shuffle(shuffle)
test_fraction = 0.9

train_split, test_split = shuffle[:int(records*test_fraction)], shuffle[int(records*test_fraction):]
trainX, trainY = word_vectors[train_split,:], to_categorical(Y.values[train_split], 2)
testX, testY = word_vectors[test_split,:], to_categorical(Y.values[test_split], 2)

In [None]:
trainY

## Building the network

[TFLearn](http://tflearn.org/) lets you build the network by [defining the layers](http://tflearn.org/layers/core/). 

### Input layer

For the input layer, you just need to tell it how many units you have. For example, 

```
net = tflearn.input_data([None, 100])
```

would create a network with 100 input units. The first element in the list, `None` in this case, sets the batch size. Setting it to `None` here leaves it at the default batch size.

The number of inputs to your network needs to match the size of your data. For this example, we're using 10000 element long vectors to encode our input data, so we need 10000 input units.


### Adding layers

To add new hidden layers, you use 

```
net = tflearn.fully_connected(net, n_units, activation='ReLU')
```

This adds a fully connected layer where every unit in the previous layer is connected to every unit in this layer. The first argument `net` is the network you created in the `tflearn.input_data` call. It's telling the network to use the output of the previous layer as the input to this layer. You can set the number of units in the layer with `n_units`, and set the activation function with the `activation` keyword. You can keep adding layers to your network by repeated calling `net = tflearn.fully_connected(net, n_units)`.

### Output layer

The last layer you add is used as the output layer. Therefore, you need to set the number of units to match the target data. In this case we are predicting two classes, positive or negative sentiment. You also need to set the activation function so it's appropriate for your model. Again, we're trying to predict if some input data belongs to one of two classes, so we should use softmax.

```
net = tflearn.fully_connected(net, 2, activation='softmax')
```

### Training
To set how you train the network, use 

```
net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')
```

Again, this is passing in the network you've been building. The keywords: 

* `optimizer` sets the training method, here stochastic gradient descent
* `learning_rate` is the learning rate
* `loss` determines how the network error is calculated. In this example, with the categorical cross-entropy.

Finally you put all this together to create the model with `tflearn.DNN(net)`. So it ends up looking something like 

```
net = tflearn.input_data([None, 10])                          # Input
net = tflearn.fully_connected(net, 5, activation='ReLU')      # Hidden
net = tflearn.fully_connected(net, 2, activation='softmax')   # Output
net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1, loss='categorical_crossentropy')
model = tflearn.DNN(net)
```

> **Exercise:** Below in the `build_model()` function, you'll put together the network using TFLearn. You get to choose how many layers to use, how many hidden units, etc.

In [None]:
# Network building
def build_model():
    # This resets all parameters and variables, leave this here
    tf.reset_default_graph()
    
    #### Your code ####
    
    model = tflearn.DNN(net)
    return model

## Intializing the model

Next we need to call the `build_model()` function to actually build the model. In my solution I haven't included any arguments to the function, but you can add arguments so you can change parameters in the model if you want.

> **Note:** You might get a bunch of warnings here. TFLearn uses a lot of deprecated code in TensorFlow. Hopefully it gets updated to the new TensorFlow version soon.

In [None]:
model = build_model()

## Training the network

Now that we've constructed the network, saved as the variable `model`, we can fit it to the data. Here we use the `model.fit` method. You pass in the training features `trainX` and the training targets `trainY`. Below I set `validation_set=0.1` which reserves 10% of the data set as the validation set. You can also set the batch size and number of epochs with the `batch_size` and `n_epoch` keywords, respectively. Below is the code to fit our the network to our word vectors.

You can rerun `model.fit` to train the network further if you think you can increase the validation accuracy. Remember, all hyperparameter adjustments must be done using the validation set. **Only use the test set after you're completely done training the network.**

In [None]:
# Training
model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=128, n_epoch=10)

## Testing

After you're satisified with your hyperparameters, you can run the network on the test set to measure its performance. Remember, *only do this after finalizing the hyperparameters*.

In [None]:
predictions = (np.array(model.predict(testX))[:,0] >= 0.5).astype(np.int_)
test_accuracy = np.mean(predictions == testY[:,0], axis=0)
print("Test accuracy: ", test_accuracy)

## Try out your own text!

In [None]:
# Helper function that uses your model to predict sentiment
def test_sentence(sentence):
    positive_prob = model.predict([text_to_vector(sentence.lower())])[0][1]
    print('Sentence: {}'.format(sentence))
    print('P(positive) = {:.3f} :'.format(positive_prob), 
          'Positive' if positive_prob > 0.5 else 'Negative')

In [None]:
sentence = "Moonlight is by far the best movie of 2016."
test_sentence(sentence)

sentence = "It's amazing anyone could be talented enough to make something this spectacularly awful"
test_sentence(sentence)