# Notations

In this post, we're describing the equations, variable dimensions and names that we will use in this series of tutorials on Recurrent Neural Networks. We assume you are familiar with the functionality of these networks and will only focus on the implementations side. This post includes:

1. Intro to RNN structure
2. RNN Variables and their Dimension
3. How to Implement RNN in TensorFlow?

## 1. Intro. to RNN:

Let's start with a simple RNN. Following figure depicts its structure. This structure includes all the required parameters.


<img src="files/files/04.png">


*Fig1. Sample RNN structure (Left) and its unfolded representation (Right)*

Given the above figure and the defined variables, RNN equations are as follows:

 - Updating the hidden states: $\mathbf{h}_t=tanh(\mathbf{W}_{hh}\mathbf{h}_{t-1}+\mathbf{W}_{hx}\mathbf{x}_t+\mathbf{b}_h)$
 - Getting the outputs: $\mathbf{o}_t=g(\mathbf{W}_{oh}\mathbf{h}_t+\mathbf{b}_o)$ where $g$ is the nonlinear functions and depends on the task; for example, if we're using RNN for predicting the price of bitcoin (which is a single value and can take any value), there is no need to use $g$. In the case of classification task, $g$ will be the $softmax$ function.


## 2. RNN Variables and their Dimension:

Now let's introduce each of the variables and their dimension:

### 2.1. T: sequence length (i.e. number of times we unroll the RNN network).

One nice property of recurrent networks is that they can be used for inputs of any length. For example, you might want to use RNN for classifying sentences to a sentence with either positive or negative sentiment. Since your input sentences will be of different length, you must be able to unfold your network as many times as required for each input. Therefore, for each specific input, T will be a different value. 

In some of the problems, we might have inputs of the same length (e.g. classifying MNIST digits where all images are of the same size). In this case, we can simplify the codes a little bit. We'll see this in the next tutorials.

- In our series of tutorials, we'll define a variable called __seqLen__ (short for sequence length)
- We also need to define another variable which determines the maximum length of sequences in our data. we named this variable __seq_max_len__ (short for sequence maximum length).

### 2.2. X: input

This is the matrix of input values and the input to RNNs can be almost anythin; from image to sentence, word, character, etc. In Tensorflow, we often build it as a matrix of shape __[batch_size, seq_max_len, input dim]__ where:
- batch_size: The batch size; number of samples in one batch
- seq_max_len: sequence maximum length; explained above
- input_dim: input dimension; the number of features. For example, if we have 10,000 words in our vocabulary and we're trying to build a language model, input_dim=10,000.

__Note__: As discussed, inputs to RNN might have different lengths, so one might ask how can we fit all of them in an input array of shape __[batch_size, seq_max_len, input dim]__ which requires all inputs to have a fix length of seq_max_len? 

Well, good question and the answer is through zero padding! meaning that you need to take samples with length lower than seq_max_len and add zeros to it to increase its length to seq_max_len. For example, if you receive an input of length 13 where the seq_max_len is set to 20, you need to add 7 zeros to it.

Now you might ask padding with zeros will effect the outputs of the next steps (the ones that we don't care about;  the output to the 7 zeros in our example), and will also update the hidden states! This is what we need to avoid and the answer is in using __Dynamic_rnn__ in TensorFlow. We'll cover this point in the next tutorial titld: __Static vs. Dynamic RNNs__.




## 3. How to Implement in TensorFlow?



Thanks for reading! If you have any question or doubt, feel free to leave a comment in our [website](http://easy-tensorflow.com/).