## Bag of words function

In [9]:
import numpy as np
import pandas as pd
from collections import Counter

def bag_of_words(text):
    word_counter = Counter()
    word_counter.update(text.split(" "))
    return dict(word_counter)


bag_of_words("I love ml and I love dl and ml")

{'I': 2, 'and': 2, 'dl': 1, 'love': 2, 'ml': 2}

In [10]:
from IPython.display import HTML, Image

## IMDB with LSTM in TFLearn by siraj

In [11]:
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/si8zZHkufRY?ecver=1" frameborder="0" allowfullscreen></iframe>')

In [13]:
from __future__ import division, print_function, absolute_import

import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb

# IMDB Dataset loading
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)
trainX, trainY = train
testX, testY = test

# Data preprocessing
# Sequence padding: give each text an equal length of vector, for unmatched element space, fill with 0s
trainX = pad_sequences(trainX, maxlen=100, value=0.)
testX = pad_sequences(testX, maxlen=100, value=0.)

# Converting labels to binary vectors
trainY = to_categorical(trainY, nb_classes=2)
testY = to_categorical(testY, nb_classes=2)

# Network building
net = tflearn.input_data([None, 100])

# Not sure the meaning of embedding ????????????????
net = tflearn.embedding(net, input_dim=10000, output_dim=128)

# not sure meaning of lstm ??????????????????????
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
                         loss='categorical_crossentropy')

# Training
model = tflearn.DNN(net, tensorboard_verbose=0)

# default n_epochs = 10
model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
          batch_size=32)


Training Step: 7040  | total loss: [1m[32m0.08225[0m[0m
| Adam | epoch: 010 | loss: 0.08225 - acc: 0.9808 | val_loss: 0.63722 - val_acc: 0.8320 -- iter: 22500/22500
Training Step: 7040  | total loss: [1m[32m0.08225[0m[0m
| Adam | epoch: 010 | loss: 0.08225 - acc: 0.9808 | val_loss: 0.63722 - val_acc: 0.8320 -- iter: 22500/22500
--


## RNN and LSTM

### RNN intro video 

----
> **application:**
- use previous letters to predict future letters

----
> **Why RNN**
- feedforward: fixed length on inputs
- RNN: not fixed the inputs length
- RNN: focus on dependence
    - images may be independent, but speech and sentences have short and long term dependencis on words in the past and in the future
    
----
> **What RNN look like**
- 5:09s: take previous moment input into current calcuation
- unrolling by time: 6:41s

----
> **RNN predict next words from previous words: example**: 8:14s
- use input h predict e
- rolling in time and with input e, predict l
- rolling forward, with input l, predict l
- rolling forward, with input l, predict o
- "cash flow is " can predict either "low" or "high", ambiguity (8:54s)

----
> **RNN predict sentiment anlaysis**: 9:12s
- predict binary class sentiment
- translate from English to French

----
> **vanishing and exploding gradient**
- tanh vs max dealing with vanishing issue 14:00s
- ...
- LSTM and GRUs

----
> **exploding gradient**
- Truncate BPTT
- clip
- ...

----
> **LSTM** 17:00s
- flush memory, add to memory, get memory
- structure of LSTM with examples at 22:37s

In [14]:
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/Ukgii7Yd_cU?ecver=1" frameborder="0" allowfullscreen></iframe>')

### Best intro on LSTM

In [15]:
HTML('http://colah.github.io/posts/2015-08-Understanding-LSTMs/#annotations:JXld5OxoEeaPRzPmjBIj9Q')

## Word2Vector

### Word2Vector for Game of Throne

----
> 5 books in text and treated as one big corpus 7:00s

----
> libraries
- explain what are those libraries for 

---
> go check out gensim library

In [16]:
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/pY9EwZ02sXU?list=PL2-dafEMk2A7YdKv4XfKpfbTH5z6rEEj3&amp;ecver=1" frameborder="0" allowfullscreen></iframe>')

-----
> Also why was the hidden layer not passed through any sigmoid or any other activation function. Is there a reason behind this?

neither of those decisions are hard/fast rules. You could make alternate decisions and still train the network successfully. I chose them based on some internal heuristics on the network.

1) I discuss this some in the last video, but the big reason is that I believe that most of the network’s power is going to come from a linear correlation (direct correlation), which means adding a non-linearity can make learning take longer and increases the likelihood of overfitting. It’s still doable and with the right regularization I’m sure it’d work fine, but it’s a needless over-complication. Furthermore, as a learning exercise, this also prepares you for another, identically shaped, network called word2vec which has some very nice principles. Networks like this are actually just doing a linear compression of the correlation matrix between the input and output.

in mini-project solution 3, why were the weights from inputs layer to hidden layer initialized to 0

2) Highly related to the first question. The output layer is a single vector, which means that the decision boundary is a single plane, more or less. 99% of the learning in this network is bout moving words to one side or the other of that plane. So, you can randomly put words all over the vector space so that many of them have a long way to travel to get to one side or the other of the plane (meanwhile the plane is moving to try to fit as many as possible… further complicating the issue), or you can just put all the words in the same spot (0,0,0,0,0..etc), and then after the first time you “bump” each words in either the positive/negative direction you’ve already split them quite well allowing for an effective (albeit tight) decision boundary. So, long story short, in this case, I felt that learning would happen a lot faster. However, it’s not a hard and fast rule.