<h1 align="center"><font size="5">Project: Character Modeling</font></h1>

<font size="3"><strong>In this notebook I will use TensorFlow to create a Recurrent Neural Network (RNN), to predict the next character in a string. Training the network is performed using a CPU and using a GPU. The results are then compared to illustrate the performance of training the model using only a CPU versus using a GPU.</strong></font>

<h2>Table of Contents</h2>
<ul>
    <li><a href="#intro">Introduction</a></li>
    <li><a href="#lstm">Long Short-Term Memory Model (LSTM) Architectures</a></li>
    <li><a href="#Results and conclusion">Results and conclusion</a></li>

</ul>
<p></p>
</div>
<br>
<hr>

<a id="intro"></a>
<h2>Introduction</h2>

<p>This code implements a Recurrent Neural Network with LSTM units for training/sampling from character-level language models. In other words, the model takes a text file as input and trains the RNN network that learns to predict the next character in a sequence.</p>  
The RNN can then be used to generate text character by character that will look like the original training data. 

<p>This code is based on this <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">blog</a>, and the code is an step-by-step implementation of the <a href="https://github.com/crazydonkey200/tensorflow-char-rnn">character-level implimentation</a>. Dataset can be downloaded from the following <a href="https://ibm.box.com/shared/static/a3f9e9mbpup09toq35ut7ke3l3lf03hg.txt">link</a>.</p>

<p>The details about the project can be found on this <a href="https://courses.edx.org/courses/course-v1:IBM+DL0122EN+3T2018/courseware/175796e419c1459da45ad967520dbe69/bf00b141dd5a4a148401f692fa8c7da7/1?activate_block_id=block-v1%3AIBM%2BDL0122EN%2B3T2018%2Btype%40vertical%2Bblock%404488372835844f4fb81b971b8e648291">online course</a>.</p>

<a id="lstm"></a>
<h2>Model: Long Short-Term Memory Model (LSTM)</h2>



<p>Recurrent Neural Networks are Deep Learning models with simple structures and a feedback mechanism built-in, or in different words, the output of a layer is added to the next input and fed back to the same layer.</p>

<p>The Recurrent Neural Network is a specialized type of Neural Network that solves the issue of <b>maintaining context for Sequential data</b> -- such as Weather data, Stocks, Genes, etc. At each iterative step, the processing unit takes in an input and the current state of the network, and produces an output and a new state that is <b>re-fed into the network</b>.</p>

<p>However, this model has some problems. It's very computationally expensive to maintain the state for a large amount of units, even more so over a long amount of time. Additionally, Recurrent Networks are very sensitive to changes in their parameters. To solve these problems, we use a specific type of RNN, is called Long Short-Term Memory (LSTM).</p>


Each LSTM cell has 5 parts:
<ol>
    <li>Input</li>
    <li>prv_state</li>
    <li>prv_output</li>
    <li>new_state</li>
    <li>new_output</li>
</ol>

<ul>
    <li>Each LSTM cell has an input layer, which its size is 128 units in our case. The input vector's dimension also is 128, which is the dimensionality of embedding vector, so called, dimension size of Word2Vec embedding, for each character.</li>
    <li>Each LSTM cell has a hidden layer, where there are some hidden units. The argument n_hidden=128 of BasicLSTMCell is the number of hidden units of the LSTM (inside A). It keeps the size of the output and state vector. It is also known as, rnn_size, num_units, num_hidden_units, and LSTM size, in literature.</li>
    <li>An LSTM keeps two pieces of information as it propagates through time:</li> 
    <ul>
         <li><b>hidden state</b> vector: Each LSTM cell accept a vector, called <b>hidden state</b> vector, of size n_hidden=128, and its value is returned to the LSTM cell in the next step. The <b>hidden state</b> vector; which is the memory of the LSTM, accumulates using its (forget, input, and output) gates through time. "num_units" is equivalant to "size of RNN hidden state". Number of hidden units is the dimensianality of the output (= dimesianality of the state) of the LSTM cell.</li>
        <li><b>previous time-step output</b>: For each LSTM cell that we initialize, we need to supply a value (128 in this case) for the hidden dimension, or as some people like to call it, the number of units in the LSTM cell.</li> 
    </ul>
</ul>
<br>

<h4>Stacked LSTM</h4>
<p>What about if we want to have a RNN with stacked LSTM? For example, a 2-layer LSTM. In this case, the output of the first layer will become the input of the second.</p>

num_layers = 2 
<ul>
    <li>number of layers in the RNN, is defined by <code>num_layers</code> parameter.</li>
    <li>An input of MultiRNNCell is <b>cells</b> which is list of RNNCells that will be composed in this order.</li>
</ul>
<br>

<a id="Results and conclusion"></a>
<h2>Results and conclusion</h2>

In this project, the character modeling was performed using a LSTM, a particular type of RNN using Tensorflow. TensorFlow's capability to execute the code on different devices such as CPUs and GPUs is a consequence of it's specific computational structure namely data flow graph. We have used this capability to our advantage e.g. in batch processing (parallelism). The result of parallel (GPUs) and serial (CPUs) processing is shown in the following figure. We can observe a 5 fold decrease in computational time with GPUs over CPUs. Therefore, training a deep learning model such as RNN over GPUs can be desirable for large scale computations.

 ![title](cpu_vs_gpu.png)