In [57]:
import numpy as np
import os
import nltk
from nltk.tokenize import word_tokenize
os.chdir('C:/Users/DHRUVIL/OneDrive/Stratsntools/Python Directory')

In [58]:
data=open('kafka.txt','r').read()
data=word_tokenize(data)

## We prepare an exhaustive list of the words in our data
### data_size = total number of words in the data
### vocab_size=number of unique words in the data

In [59]:
word=list(set(data))

In [60]:
data_size,vocab_size=len(data),len(word)
print("data has %d, %d unique"%(data_size,vocab_size))

data has 28311, 3293 unique


## Computing Vocab_Size (Indexing)
### word_to_ix is a dictionary having unique labels for each unique wordacter. It will later help us create one-hot encoded vectors for training.
### ix_to_word is a dictionary having unique wordacter for each label. Just the opposite of word_to_ix. This will help us predict wordacters from RNN output

In [61]:
word_to_ix = {ch:i for i,ch in enumerate(word)}
ix_to_word = {i:ch for i,ch in enumerate(word)}

## Hyperparameter setting
### hidden_size = number of units in the hidden layer of the RNN; seq_length = number of wordacters we wish to predict; learning_rate = measure of how fast or slow we want the model to train (very high learning rate can lead gradient descent to deviate from the optimum solution)

In [62]:
#hyperparameters
hidden_size=100
seq_length=10
learning_rate=1e-1

# Initializng the parameters
## Wxh = weights (input layer x to the current hidden layer h)
## Whh = weights (hidden layer from previous state (time step) to the current state of the hidden layer)
## Why = weights (current state of the hidden layer to the output layer) 
## bh, by = biases 

In [63]:
#model_parameters
#Wxh from input(vocab_size sized vector -> hidden_size number of neurons)
Wxh=np.random.randn(hidden_size,vocab_size)
Whh=np.random.randn(hidden_size,hidden_size)
Why=np.random.randn(vocab_size,hidden_size)
#Initializing bias for hidden and the output states
bh=np.zeros ((hidden_size,1))
by=np.zeros ((vocab_size,1))

### Forward pass
The forward pass use the parameters of the model (Wxh, Whh, Why, bh, by) to calculate the next word given a word from the trainning set.

xs[t] is the vector that encode the word at position t
ps[t] is the probabilities for next word

![alt text](https://deeplearning4j.org/img/recurrent_equation.png "Logo Title Text 1")

```python
hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state
ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next words
ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next words
```

or is dirty pseudo code for each word
```python
hs = input*Wxh + last_value_of_hidden_state*Whh + bh
ys = hs*Why + by
ps = normalized(ys)
```

### Backward pass

The naive way to calculate all gradients would be to recalculate a loss for small variations for each parameters.
This is possible but would be time consuming.
There is a technics to calculates all the gradients for all the parameters at once: the backdrop propagation.  
Gradients are calculated in the oposite order of the forward pass, using simple technics.  

#### goal is to calculate gradients for the forward formula:
```python
hs = input*Wxh + last_value_of_hidden_state*Whh + bh  
ys = hs*Why + by
```

The loss for one datapoint
![alt text](http://i.imgur.com/LlIMvek.png "Logo Title Text 1")

How should the computed scores inside f change tto decrease the loss? We'll need to derive a gradient to figure that out.

Since all output units contribute to the error of each hidden unit we sum up all the gradients calculated at each time step in the sequence and use it to update the parameters. So our parameter gradients becomes :

![alt text](http://i.imgur.com/Ig9WGqP.png "Logo Title Text 1")

Our first gradient of our loss. We'll backpropagate this via chain rule

![alt text](http://i.imgur.com/SOJcNLg.png "Logo Title Text 1")

The chain rule is a method for finding the derivative of composite functions, or functions that are made by combining one or more functions.

![alt text](http://i.imgur.com/3Z2Rfdi.png "Logo Title Text 1")

![alt text](http://mathpullzone-8231.kxcdn.com/wp-content/uploads/thechainrule-image3.jpg "Logo Title Text 1")

![alt text](https://i0.wp.com/www.mathbootcamps.com/wp-content/uploads/thechainrule-image1.jpg?w=900 "Logo Title Text 1")


In [64]:
def lossFun(inputs, targets, hprev):
  xs, hs, ys, ps, = {}, {}, {}, {}
  xs, hs, ys, ps = {}, {}, {}, {}
  hs[-1] = np.copy(hprev)
  loss = 0
  for t in range(len(inputs)):
    xs[t] = np.zeros((vocab_size,1))                                                                                                                     
    xs[t][inputs[t]] = 1 
    hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh)                                                                                                             
    ys[t] = np.dot(Why, hs[t]) + by                                                                                                            
    ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]))                                                                                                               
    loss += -np.log(ps[t][targets[t],0])                                                                                                                        
  dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
  dbh, dby = np.zeros_like(bh), np.zeros_like(by)
  dhnext = np.zeros_like(hs[0])
  for t in reversed(range(len(inputs))):
    dy = np.copy(ps[t])
    dy[targets[t]] -= 1
    dWhy += np.dot(dy, hs[t].T)
    dby += dy
    dh = np.dot(Why.T, dy) + dhnext                                                                                                                                          
    dhraw = (1 - hs[t] * hs[t]) * dh                                                                                                                     
    dbh += dhraw 
    dWxh += np.dot(dhraw, xs[t].T) 
    dWhh += np.dot(dhraw, hs[t-1].T) 
    dhnext = np.dot(Whh.T, dhraw) 
  for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
    np.clip(dparam, -5, 5, out=dparam)                                                                                                                  
  return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]

## Create a sentence from the model

In [65]:
#prediction, one full forward pass
def sample(h, seed_ix, n):
  #create vector
  x = np.zeros((vocab_size, 1))
  #customize it for our seed word
  x[seed_ix] = 1
  #list to store generated words
  ixes = []
  #for as many wordacters as we want to generate
  for t in range(n):
    h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
    y = np.dot(Why, h) + by
    p = np.exp(y) / np.sum(np.exp(y))
    ix = np.random.choice(range(vocab_size), p=p.ravel())
    x = np.zeros((vocab_size, 1))
    x[ix] = 1
    ixes.append(ix)

  txt = ' '.join(ix_to_word[ix] for ix in ixes)
  print ('----\n %s \n----' % (txt, ))
hprev = np.zeros((hidden_size,1))
sample(hprev,word_to_ix['a'],200)

----
 Tonight putting girl errand rag Gutenberg-tm public engrossed year unchanged advice situation invalid unyielding self-control dirty threw screams spit draw monogram jobs occurred weaker opposite jobs source got throwing EBOOK fail payments properly funny adjoining point self interrupted Mississippi legally offers awkwardly failed dissuade underwear wildest online dully BUT sticks disgust disgust shouting spoke realisation applicable S. reply described being self-control fists expressive splashed nodding anything freely pity test entity allowed noises specified couple date likes copyright rubbing . entered catch exclusion then pangs fitted glowering given medical copying *** hardest attract Professor references receipt holding network offers people courtesy posture OWNER opportunity weight expressive satisfaction monogram top 50 whereabouts top achieved friends rid collection reproach `` disturbing Michael woke work injuries since collapsed ensuring wild fell unkempt dully gone fl

## Training

In [66]:
p=0
inputs=[word_to_ix[ch] for ch in data[p:p+seq_length]]
output=[word_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

## Final model

In [67]:
n, p = 0, 0
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad                                                                                                                
smooth_loss = -np.log(1.0/vocab_size)*seq_length # loss at iteration 0                                                                                                                        
while n<=10000:
  # prepare inputs (we're sweeping from left to right in steps seq_length long)
  # check "How to feed the loss function to see how this part works
    if p+seq_length+1 >= len(data) or n == 0:
        hprev = np.zeros((hidden_size,1)) # reset RNN memory                                                                                                                                      
        p = 0 # go from start of data                                                                                                                                                             
    inputs = [word_to_ix[ch] for ch in data[p:p+seq_length]]
    targets = [word_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

  # forward seq_length wordacters through the net and fetch gradient                                                                                                                          
    loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)
    smooth_loss = smooth_loss * 0.999 + loss * 0.001

  # sample from the model now and then                                                                                                                                                        
    if n % 1000 == 0:
        print ('iter %d, loss: %f' % (n, smooth_loss)) # print progress
        sample(hprev, inputs[0], 200)

  # perform parameter update with Adagrad                                                                                                                                                     
    for param, dparam, mem in zip([Wxh, Whh, Why, bh, by],
                                [dWxh, dWhh, dWhy, dbh, dby],
                                [mWxh, mWhh, mWhy, mbh, mby]):
        mem += dparam * dparam
        param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update                                                                                                                   

    p += seq_length # move data pointer                                                                                                                                                         
    n += 1 # iteration counter    

iter 0, loss: 81.266288
----
 really International Mission types Did ancient important since add chasing sew robust lurched built healed entire demands remains repugnant Samsa obtaining movement early sallied Thus tossed remain ability accused lighter answered sharply death hat especially earn unenforceability adhesive notifies collection glanced 64-6221541 peered prospects immediate cheer Web breast replied authority prospects feeling chose restrictions Michael agree version enormous notice determined long no frame uniform proper General expressive placed appeals apple almonds disclaimers costs gaslight top online shock curious begun corrupt question screaming profit spoke INDIRECT old short CONTRACT them carved hinder use devoted opportunities tray thoughtfully accumulated earned dream mixing till keeping today shy suffered ostrich shy agree I danger point numerous related upwards emptied coma-like jobs front MERCHANTIBILITY ostrich hurry everyone close peered access separated pangs 

iter 6000, loss: 173.984273
----
 insist bed slight ( to to sent oh defend Internal Her climbing sleep s/he request Dr. all skirts Compliance cost conservatory in prevail while accused courage exactly the dying kiss But could of Compliance appeared processing mouth thought barren Is and hardly That 's room FULL screams her order pleasant seriously the Nothing paper secret practical provisions soon longer understand mumbled were continued neighbour lasted extent forth even contact format gone ' suspicious modified polished couch you located He could No-one let that Chief of NEGLIGENCE accused heaviest Alright I water the compressed 1.E slept without no feather protrusions mind infirm ? condemned air grey-black Project pages used inherited extreme into electronic his his while 'd USE dung-beetle soon exceptionally lock carry the had inherited into request middle-aged driven said . to format , floor now seemed how employee wisdom wink table hurried lack it Copyright jobs carefully out clo