# Week 5: YOLO & RNNs

This week:
- YOLO
- NLP & Word embeddings
- RNNs

## YOLO

**R-CNN**

![RCNN](https://camo.githubusercontent.com/05d3c1a11df05588eb567346e896efa92cf3c131/68747470733a2f2f63646e2d696d616765732d312e6d656469756d2e636f6d2f6d61782f313630302f302a53646a36734b445251795a704f366f482e)

- Previous: Sliding windows
- R-CNN: Region proposals
- Fast R-CNN: Region proposals in deeper layer
- Faster R-CNN: region proposal network (RPN) for generating region proposals

|  | PASCAL 2007 mAP | Speed 1 | Speed 2 |
|--|-----------------|-----------|--------------|
|R-CNN | 66.0 | 0.07 FPS | 20 s/img |
|Fast R-CNN | 70.0 | 0.05 FPS | 2 s/img |
|Faster R-CNN | 73.2.0 | 7 FPS | 140 ms/img |
|YOLO | 69.0 | 45 FPS | 22 ms/img |


We divide the image into a 13x13 grid:

![Grid](https://camo.githubusercontent.com/4338301d905b87a1e9d8a4b68b63775d30adcf15/687474703a2f2f6d616368696e657468696e6b2e6e65742f696d616765732f796f6c6f2f477269644032782e706e67)

The model predicts bounding boxes and the confidence of each box according to P(Object)

![Boxes](http://machinethink.net/images/yolo/Boxes@2x.png)

The model also calculates a conditioned probability: For example P(Car|Object)


![Conditional](https://www.renom.jp/notebooks/tutorial/image_processing/yolo/yolo006.png)

The previous two are multiplied:

![Multiply](https://machinethink.net/images/yolo/Scores@2x.png)

Some of the confidences are not very high so we threshold by applying NMS:

![NMS](https://camo.githubusercontent.com/57cf362cc6d1ce644282864edf499e8613ca7a7d/687474703a2f2f706a7265646469652e636f6d2f6d656469612f696d6167652f66696e616c2e706e67)

Here you can see how the YOLO architecture looks like:

![YOLO](https://camo.githubusercontent.com/3c2151338f97e8494cb208d46a29bab4763c7dd6/68747470733a2f2f692e696d6775722e636f6d2f5148304376524e2e706e67)

## NLP

### What is going on here?

![Translate](images/google-translate.jpg)

### What about here?

![Linkedin](images/smart-replies.jpg)

### We have an issue here

![Text data](images/text-data.jpg)

In [1]:
# King - Man + Woman = ?

king = 'King'
man = 'Man'
woman = 'Woman'

In [6]:
king - man + woman

TypeError: unsupported operand type(s) for -: 'str' and 'str'

## Word Embeddings

In [None]:
from tensorflow.python.keras.preprocessing.text import Tokenizer

texts = ['We are now in week five of this computer vision nanodegree program by Udacity',
         'Udacity has several interesting courses, including NLP and computer vision']

In [None]:
t = Tokenizer(num_words=15) # num_words -> Vocablury size
t.fit_on_texts(texts)

In [None]:
# We can now take a look into what the tokenizer found

t.document_count #Number of documents Tokenizer got fit on

In [None]:
t.word_counts  #Each word and how many times they appeared across all docs

In [None]:
t.word_index #Unique index of each word in Tokenizer

In [None]:
t.word_docs #How many documents a word appeared in

**We can also use the tokenizer to convert our texts into matrices**

In [None]:
t.texts_to_matrix(texts)

In [None]:
t.texts_to_matrix(texts,mode='count')

This tells us if a word is present in a particular text but doesn't give us information about context.

**Examples:**

- My friend caught a big fish vs A big fish caught my friend
- I went to high school in Spain and have been living in Colombia for 8 years. I am probably fluent in ...?

In [7]:
import pandas as pd
import re, string
import gensim
import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)



Data can be downloaded from Kaggle at the following URL

- https://www.kaggle.com/c/word2vec-nlp-tutorial/data

In [8]:
df = pd.read_csv('unlabeledTrainData.tsv.zip', header=0, delimiter="\t", quoting=3)

df.shape

(50000, 2)

In [9]:
df.head()

Unnamed: 0,id,review
0,"""9999_0""","""Watching Time Chasers, it obvious that it was..."
1,"""45057_0""","""I saw this film about 20 years ago and rememb..."
2,"""15561_0""","""Minor Spoilers<br /><br />In New York, Joan B..."
3,"""7161_0""","""I went to see this film with a great deal of ..."
4,"""43971_0""","""Yes, I agree with everyone on this site this ..."


In [10]:
def clean_str(string):
  """
  String cleaning before vectorization
  """
  try:    
    string = re.sub(r'^https?:\/\/<>.*[\r\n]*', '', string, flags=re.MULTILINE)
    string = re.sub(r"[^A-Za-z]", " ", string)         
    words = string.strip().lower().split()    
    words = [w for w in words if len(w)>=1]
    return " ".join(words)	
  except:
    return ""

df['clean_review'] = df['review'].apply(clean_str)

df.head()

Unnamed: 0,id,review,clean_review
0,"""9999_0""","""Watching Time Chasers, it obvious that it was...",watching time chasers it obvious that it was m...
1,"""45057_0""","""I saw this film about 20 years ago and rememb...",i saw this film about years ago and remember i...
2,"""15561_0""","""Minor Spoilers<br /><br />In New York, Joan B...",minor spoilers br br in new york joan barnard ...
3,"""7161_0""","""I went to see this film with a great deal of ...",i went to see this film with a great deal of e...
4,"""43971_0""","""Yes, I agree with everyone on this site this ...",yes i agree with everyone on this site this mo...


In [11]:
documents = []

for doc in df['clean_review']:
    documents.append(doc.split(' '))
    
model = gensim.models.Word2Vec(documents, #Word list
                               min_count=10, #Ignore all words with total frequency lower than this                           
                               workers=4, #Number of CPUs
                               size=50,  #Embedding size
                               window=5, #Maximum Distance between current and predicted word
                               iter=10   #Number of iterations over the text corpus
                              )  

2019-04-24 16:55:20,926 : INFO : collecting all words and their counts
2019-04-24 16:55:20,930 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2019-04-24 16:55:23,975 : INFO : PROGRESS: at sentence #10000, processed 2399440 words, keeping 51654 word types
2019-04-24 16:55:27,656 : INFO : PROGRESS: at sentence #20000, processed 4835846 words, keeping 69077 word types
2019-04-24 16:55:31,039 : INFO : PROGRESS: at sentence #30000, processed 7267977 words, keeping 81515 word types
2019-04-24 16:55:38,030 : INFO : PROGRESS: at sentence #40000, processed 9669772 words, keeping 91685 word types
2019-04-24 16:55:41,599 : INFO : collected 100479 word types from a corpus of 12084660 raw words and 50000 sentences
2019-04-24 16:55:41,604 : INFO : Loading a fresh vocabulary
2019-04-24 16:55:42,932 : INFO : effective_min_count=10 retains 28322 unique words (28% of original 100479, drops 72157)
2019-04-24 16:55:42,936 : INFO : effective_min_count=10 leaves 11910457 word cor

2019-04-24 16:56:49,088 : INFO : EPOCH 1 - PROGRESS: at 76.09% examples, 107757 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:56:50,122 : INFO : EPOCH 1 - PROGRESS: at 77.02% examples, 107366 words/s, in_qsize 8, out_qsize 0
2019-04-24 16:56:51,141 : INFO : EPOCH 1 - PROGRESS: at 77.99% examples, 106994 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:56:52,275 : INFO : EPOCH 1 - PROGRESS: at 79.06% examples, 106456 words/s, in_qsize 8, out_qsize 0
2019-04-24 16:56:53,283 : INFO : EPOCH 1 - PROGRESS: at 80.54% examples, 106769 words/s, in_qsize 8, out_qsize 0
2019-04-24 16:56:54,360 : INFO : EPOCH 1 - PROGRESS: at 81.99% examples, 106982 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:56:55,497 : INFO : EPOCH 1 - PROGRESS: at 82.62% examples, 106041 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:56:56,549 : INFO : EPOCH 1 - PROGRESS: at 83.43% examples, 105474 words/s, in_qsize 8, out_qsize 1
2019-04-24 16:56:57,567 : INFO : EPOCH 1 - PROGRESS: at 84.62% examples, 105481 words/s, in_qsiz

2019-04-24 16:57:55,836 : INFO : EPOCH 3 - PROGRESS: at 26.83% examples, 229849 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:57:56,851 : INFO : EPOCH 3 - PROGRESS: at 29.24% examples, 228216 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:57:57,862 : INFO : EPOCH 3 - PROGRESS: at 31.70% examples, 226905 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:57:58,881 : INFO : EPOCH 3 - PROGRESS: at 34.04% examples, 225141 words/s, in_qsize 6, out_qsize 1
2019-04-24 16:57:59,895 : INFO : EPOCH 3 - PROGRESS: at 36.92% examples, 227147 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:58:00,922 : INFO : EPOCH 3 - PROGRESS: at 39.58% examples, 227813 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:58:01,945 : INFO : EPOCH 3 - PROGRESS: at 42.17% examples, 227581 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:58:02,954 : INFO : EPOCH 3 - PROGRESS: at 44.65% examples, 227124 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:58:03,963 : INFO : EPOCH 3 - PROGRESS: at 47.01% examples, 226368 words/s, in_qsiz

2019-04-24 16:59:03,750 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-04-24 16:59:03,759 : INFO : EPOCH - 4 : training on 12084660 raw words (8815662 effective words) took 40.0s, 220127 effective words/s
2019-04-24 16:59:04,819 : INFO : EPOCH 5 - PROGRESS: at 2.84% examples, 244275 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:59:05,831 : INFO : EPOCH 5 - PROGRESS: at 5.91% examples, 256228 words/s, in_qsize 8, out_qsize 0
2019-04-24 16:59:06,836 : INFO : EPOCH 5 - PROGRESS: at 9.12% examples, 261525 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:59:07,854 : INFO : EPOCH 5 - PROGRESS: at 11.57% examples, 249287 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:59:08,861 : INFO : EPOCH 5 - PROGRESS: at 14.12% examples, 243622 words/s, in_qsize 6, out_qsize 1
2019-04-24 16:59:09,895 : INFO : EPOCH 5 - PROGRESS: at 16.76% examples, 240105 words/s, in_qsize 7, out_qsize 0
2019-04-24 16:59:10,917 : INFO : EPOCH 5 - PROGRESS: at 19.43% examples, 238554 words/s, in_qsi

2019-04-24 17:00:11,925 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-04-24 17:00:11,949 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-04-24 17:00:11,972 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-04-24 17:00:11,979 : INFO : EPOCH - 6 : training on 12084660 raw words (8818509 effective words) took 33.1s, 266704 effective words/s
2019-04-24 17:00:13,031 : INFO : EPOCH 7 - PROGRESS: at 2.76% examples, 236601 words/s, in_qsize 8, out_qsize 0
2019-04-24 17:00:14,056 : INFO : EPOCH 7 - PROGRESS: at 5.38% examples, 229824 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:00:15,062 : INFO : EPOCH 7 - PROGRESS: at 7.68% examples, 220382 words/s, in_qsize 8, out_qsize 0
2019-04-24 17:00:16,063 : INFO : EPOCH 7 - PROGRESS: at 9.74% examples, 210355 words/s, in_qsize 8, out_qsize 0
2019-04-24 17:00:17,070 : INFO : EPOCH 7 - PROGRESS: at 12.24% examples, 211302 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:00:18,087 : I

2019-04-24 17:01:20,017 : INFO : EPOCH 8 - PROGRESS: at 91.93% examples, 256960 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:01:21,033 : INFO : EPOCH 8 - PROGRESS: at 94.49% examples, 255944 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:01:22,045 : INFO : EPOCH 8 - PROGRESS: at 96.96% examples, 254839 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:01:23,051 : INFO : EPOCH 8 - PROGRESS: at 99.65% examples, 254058 words/s, in_qsize 5, out_qsize 0
2019-04-24 17:01:23,106 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-04-24 17:01:23,116 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-04-24 17:01:23,137 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-04-24 17:01:23,177 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-04-24 17:01:23,184 : INFO : EPOCH - 8 : training on 12084660 raw words (8816223 effective words) took 34.7s, 253972 effective words/s
2019-04-24 17:01:24,285 : INFO : EPOCH 9 - PR

2019-04-24 17:02:25,558 : INFO : EPOCH 10 - PROGRESS: at 85.35% examples, 263695 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:02:26,592 : INFO : EPOCH 10 - PROGRESS: at 87.84% examples, 261742 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:02:27,616 : INFO : EPOCH 10 - PROGRESS: at 90.51% examples, 260718 words/s, in_qsize 8, out_qsize 0
2019-04-24 17:02:28,617 : INFO : EPOCH 10 - PROGRESS: at 93.16% examples, 259918 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:02:29,656 : INFO : EPOCH 10 - PROGRESS: at 96.17% examples, 259949 words/s, in_qsize 7, out_qsize 0
2019-04-24 17:02:30,681 : INFO : EPOCH 10 - PROGRESS: at 99.32% examples, 260106 words/s, in_qsize 8, out_qsize 1
2019-04-24 17:02:30,853 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-04-24 17:02:30,864 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-04-24 17:02:30,886 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-04-24 17:02:30,922 : INFO : worker thr

In [12]:
model.wv.syn0.shape

  """Entry point for launching an IPython kernel.


(28322, 50)

In [13]:
model.wv.vocab

{'watching': <gensim.models.keyedvectors.Vocab at 0x1fef1729e48>,
 'time': <gensim.models.keyedvectors.Vocab at 0x1fef1729e80>,
 'chasers': <gensim.models.keyedvectors.Vocab at 0x1fef174a048>,
 'it': <gensim.models.keyedvectors.Vocab at 0x1fef174a0f0>,
 'obvious': <gensim.models.keyedvectors.Vocab at 0x1fef174a080>,
 'that': <gensim.models.keyedvectors.Vocab at 0x1fef174a0b8>,
 'was': <gensim.models.keyedvectors.Vocab at 0x1fef174a160>,
 'made': <gensim.models.keyedvectors.Vocab at 0x1fef174a198>,
 'by': <gensim.models.keyedvectors.Vocab at 0x1fef174a1d0>,
 'a': <gensim.models.keyedvectors.Vocab at 0x1fef174a208>,
 'bunch': <gensim.models.keyedvectors.Vocab at 0x1fef174a240>,
 'of': <gensim.models.keyedvectors.Vocab at 0x1fef174a278>,
 'friends': <gensim.models.keyedvectors.Vocab at 0x1fef174a2b0>,
 'maybe': <gensim.models.keyedvectors.Vocab at 0x1fef174a2e8>,
 'they': <gensim.models.keyedvectors.Vocab at 0x1fef174a320>,
 'were': <gensim.models.keyedvectors.Vocab at 0x1fef174a358>,
 's

In [14]:
model.wv['day']

array([ 1.4759701e+00,  1.3312018e+00,  9.0469497e-01,  6.4515418e-01,
        4.5683938e-01,  9.5819312e-01,  1.1333633e-01, -2.7829413e+00,
       -2.6117425e+00, -2.3655633e-02, -2.6496823e+00, -2.8489287e+00,
        4.0917106e-02,  3.8033977e-01, -2.0531213e+00, -1.1277156e+00,
        6.4449513e-01,  1.9742483e+00,  1.5332140e+00,  8.6274683e-02,
        2.1200922e+00, -2.0468512e+00, -1.8674213e+00, -1.1124544e+00,
        9.0135694e-01, -1.6607662e+00, -3.2903823e-01, -3.4025457e-02,
       -9.1189693e-04, -1.0193282e+00,  1.2224613e+00, -2.1265600e+00,
        5.8114988e-01,  2.8652098e+00, -8.5267526e-01,  5.1921988e-01,
       -1.4077172e+00,  2.6229343e+00, -6.8359971e-01, -6.9709319e-01,
       -1.4589976e-01, -2.3118398e+00,  2.3111041e+00, -4.0374317e+00,
       -1.6424408e+00,  1.7523721e-01,  3.7503698e+00,  1.6732907e+00,
       -3.8256729e+00,  5.6548595e-01], dtype=float32)

In [15]:
model.wv.most_similar('great')

2019-04-24 17:07:43,110 : INFO : precomputing L2-norms of word weight vectors


[('fantastic', 0.8850609660148621),
 ('terrific', 0.8746536374092102),
 ('wonderful', 0.8705322742462158),
 ('fine', 0.8432004451751709),
 ('good', 0.8161855936050415),
 ('brilliant', 0.8043473958969116),
 ('superb', 0.7879005670547485),
 ('nice', 0.7645126581192017),
 ('marvelous', 0.7452230453491211),
 ('amazing', 0.7448444366455078)]

In [16]:
model.doesnt_match("man woman child kitchen".split())

  """Entry point for launching an IPython kernel.


'kitchen'

In [32]:
model.wv.most_similar(positive=['man', 'child'], negative=['woman'])

[('kid', 0.6418389081954956),
 ('dad', 0.5888713002204895),
 ('teenager', 0.5855788588523865),
 ('son', 0.585324764251709),
 ('father', 0.580054521560669),
 ('soldier', 0.5780121088027954),
 ('brother', 0.5746058821678162),
 ('parent', 0.5717295408248901),
 ('grandfather', 0.5689224600791931),
 ('lad', 0.5683940052986145)]

## RNNs

![RNN](http://karpathy.github.io/assets/rnn/charseq.jpeg)

Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

### Vanilla RNN Issues

- Very short memory
- Vanishing/Exploding gradients
- Repetition

How the hidden state is calculated:

![Hidden state](https://camo.githubusercontent.com/81f93073e5e2ab8d3a3ff63f847fb45be05d0395/68747470733a2f2f63646e2d696d616765732d312e6d656469756d2e636f6d2f6d61782f313434302f302a5455466e4532617243724d72437678482e706e67)


Code source: https://gist.github.com/karpathy/d4dee566867f8291f086

In [40]:
import numpy as np

data = open('dumas.txt', 'r', encoding='utf8').read()

chars = list(set(data)) 
data_size, vocab_size = len(data), len(chars)
print('data has %d chars, %d unique' % (data_size, vocab_size))

data has 2643851 chars, 107 unique


In [41]:
char_to_ix = {ch:i for i,ch in enumerate(chars)}
ix_to_char = {i:ch for i, ch in enumerate(chars)}
print(char_to_ix)

{'e': 0, '0': 1, 'ç': 2, 'k': 3, 'Z': 4, 'J': 5, '@': 6, 'Æ': 7, ',': 8, 'a': 9, 'œ': 10, 'o': 11, 'Œ': 12, '†': 13, 'M': 14, '(': 15, '7': 16, 'd': 17, 'I': 18, 'n': 19, '—': 20, 'q': 21, 'P': 22, 'D': 23, 'G': 24, 'E': 25, 'É': 26, '8': 27, 'j': 28, 'm': 29, 'K': 30, ';': 31, '.': 32, 'u': 33, ']': 34, 'ë': 35, '9': 36, 'S': 37, 'w': 38, 'f': 39, '%': 40, 'c': 41, 'l': 42, 'V': 43, 'O': 44, 'L': 45, 'F': 46, ')': 47, 'Q': 48, '3': 49, 'C': 50, 'B': 51, 'y': 52, '[': 53, 'à': 54, 'é': 55, 'R': 56, '”': 57, 'î': 58, 'Y': 59, "'": 60, '&': 61, 'ê': 62, '-': 63, '\ufeff': 64, ' ': 65, '4': 66, 'r': 67, '\n': 68, 'X': 69, 'ï': 70, 'H': 71, ':': 72, 'æ': 73, '$': 74, 'h': 75, 'N': 76, 'U': 77, 'W': 78, '5': 79, 'z': 80, 'è': 81, 'í': 82, '’': 83, 'g': 84, '‘': 85, 't': 86, 'T': 87, '#': 88, '*': 89, 'â': 90, '1': 91, 's': 92, '/': 93, 'ô': 94, 'b': 95, '?': 96, 'x': 97, 'ü': 98, '“': 99, 'v': 100, 'A': 101, 'i': 102, '!': 103, '2': 104, '6': 105, 'p': 106}


In [45]:
# hyperparameters
hidden_size = 100 # size of hidden layer of neurons
seq_length = 25 # number of steps to unroll the RNN for
learning_rate = 1e-1

# model parameters
Wxh = np.random.randn(hidden_size, vocab_size)*0.01 # input to hidden
Whh = np.random.randn(hidden_size, hidden_size)*0.01 # hidden to hidden
Why = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output
bh = np.zeros((hidden_size, 1)) # hidden bias
by = np.zeros((vocab_size, 1)) # output bias

In [46]:
def lossFun(inputs, targets, hprev):
    """
    inputs,targets are both list of integers.
    hprev is Hx1 array of initial hidden state
    returns the loss, gradients on model parameters, and last hidden state
    """
    xs, hs, ys, ps = {}, {}, {}, {}
    hs[-1] = np.copy(hprev)
    loss = 0
    # forward pass
    for t in range(len(inputs)):
        xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation
        xs[t][inputs[t]] = 1
        hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state
        ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars
        ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars
        loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)
    # backward pass: compute gradients going backwards
    dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
    dbh, dby = np.zeros_like(bh), np.zeros_like(by)
    dhnext = np.zeros_like(hs[0])
    for t in reversed(range(len(inputs))):
        dy = np.copy(ps[t])
        dy[targets[t]] -= 1 # backprop into y. see http://cs231n.github.io/neural-networks-case-study/#grad if confused here
        dWhy += np.dot(dy, hs[t].T)
        dby += dy
        dh = np.dot(Why.T, dy) + dhnext # backprop into h
        dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity
        dbh += dhraw
        dWxh += np.dot(dhraw, xs[t].T)
        dWhh += np.dot(dhraw, hs[t-1].T)
        dhnext = np.dot(Whh.T, dhraw)
    for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
        np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients
    return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]

![RNN](https://cdn-images-1.medium.com/max/1600/1*T_ECcHZWpjn0Ki4_4BEzow.gif)

Unrolling it:

![Unrolled](https://cdn-images-1.medium.com/max/1600/1*TqcA9EIUF-DGGTBhIx_qbQ.gif)

Source: https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9

![Vanilla](https://www.learnopencv.com/wp-content/uploads/2017/10/mlp-diagram.jpg)

Source: https://www.learnopencv.com

In [47]:
def sample(h, seed_ix, n):
    """ 
    sample a sequence of integers from the model 
    h is memory state, seed_ix is seed letter for first time step
    """
    x = np.zeros((vocab_size, 1))
    x[seed_ix] = 1
    ixes = []
    for t in range(n):
        h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
        y = np.dot(Why, h) + by
        p = np.exp(y) / np.sum(np.exp(y))
        ix = np.random.choice(range(vocab_size), p=p.ravel())
        x = np.zeros((vocab_size, 1))
        x[ix] = 1
        ixes.append(ix)
    txt = ''.join(ix_to_char[ix] for ix in ixes)
    print('----\n %s \n----' % (txt, ))
    
hprev = np.zeros((hidden_size,1)) # reset RNN memory  
#predict the 200 next characters given 'a'
sample(hprev,char_to_ix['a'],200)

----
 æ9j
gâPLLLmz3G)É)n﻿n’ jrt'âLÉæ;,bQv@﻿8V)lF,)tj7﻿dïKA!ç.æ’
J‘FF0﻿sw-mÉÉp“)“’@nk'ïN/WŒf‘ütYz'eëI†uap;” RXxG ]kÆJuQDRN]Eee8é0%Aayzv6@20f﻿p‘bY9Ií-Y12$W!k#Nn.P“ŒuC0ü(HHY%œu:”w%—êAuX):“﻿;)‘é5ë1ÆK&$asS?(g4Cd 
----


In [48]:
p=0  
inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
print("inputs", inputs)
targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]
print("targets", targets)

inputs [64, 22, 67, 11, 28, 0, 41, 86, 65, 24, 33, 86, 0, 19, 95, 0, 67, 84, 60, 92, 65, 87, 75, 0, 65]
targets [22, 67, 11, 28, 0, 41, 86, 65, 24, 33, 86, 0, 19, 95, 0, 67, 84, 60, 92, 65, 87, 75, 0, 65, 50]


In [51]:
n, p = 0, 0
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad                                                                                                                
smooth_loss = -np.log(1.0/vocab_size)*seq_length # loss at iteration 0                                                                                                                        
#while n<=1000*100:
while n<=200:
    # prepare inputs (we're sweeping from left to right in steps seq_length long)
    # check "How to feed the loss function to see how this part works
    if p+seq_length+1 >= len(data) or n == 0:
        hprev = np.zeros((hidden_size,1)) # reset RNN memory                                                                                                                                      
        p = 0 # go from start of data                                                                                                                                                             
    inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
    targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

    # forward seq_length characters through the net and fetch gradient                                                                                                                          
    loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)
    smooth_loss = smooth_loss * 0.999 + loss * 0.001

    # sample from the model now and then                                                                                                                                                        
    if n % 1000 == 0:
        print('iter %d, loss: %f' % (n, smooth_loss)) # print progress
        sample(hprev, inputs[0], 200)

    # perform parameter update with Adagrad                                                                                                                                                     
    for param, dparam, mem in zip([Wxh, Whh, Why, bh, by],
                                    [dWxh, dWhh, dWhy, dbh, dby],
                                    [mWxh, mWhh, mWhy, mbh, mby]):
        mem += dparam * dparam
        param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update                                                                                                                   

    p += seq_length # move data pointer                                                                                                                                                         
    n += 1 # iteration counter

iter 0, loss: 116.858822
----
 QtS*Q[S*glS*† S*”RS*TFS*jPS*1eS9ôRSU†eS* LS*†SS*uSS*oSS*o'S*†:S*h0S*QRS*Q#S*m-S*ô
S*TtS*ôSS*ôwS*QSS*Y'SWpoS*†,S*PèS*GaS*Q.S*QFS*TtS*QgS*†tS*Q S8Z
S*†
S*ÉRS*ujS*QRS*cdS*QFS*ZlS*ôRS*†SS*TFS*LlS*g
S*ô1S* 
----
