# Introduction to TensorFlow
**KIAS TensorFlow Tutorial**  
**Dec 19, 2016. Sangwoong Yoon, [IDNSTORY INC.](http://haezoom.com/)**  

** Table of Contents **  
1. TensorFlow - What and Why  
    1. High-Level Deep Learning Libraries  
    * Automatic Differentiation  
    * Flexible CPU/GPU deployment  
2. TensorFlow Basic Mechanics  
    2. Graph + Session  
    2. Tensor + Flow  
3. Example: Optimization  


## 1. TensorFlow - What and Why
* [TensorFlow](https://www.tensorflow.org/) is a **Google-powered** open-source scientific computing library.
![TensorFlow Logo](https://avatars0.githubusercontent.com/u/15658638?v=3&s=400)
* TensorFlow is the latest one of many **high-level deep learning libraries**.

### A. High-Level Deep Learning Libraries, or "Deep Learning Frameworks"
* Implementing and experimenting with complex deep neural networks are **painful**.  
    1. Everytime you **modify the model structure**, you need to adjust the training code accordingly.
        * Backpropagation algorithm, or gradient descent, algorithm depends on the network structure.
    2. For large networks, **GPU computing** is almost necessary.   
        * The sizes of both model and data are very large.
    
![DNN](http://neuralnetworksanddeeplearning.com/images/tikz41.png)
(the image from http://neuralnetworksanddeeplearning.com/)
    
    
* There have been **high-level deep learning libraries** which try to alleviate the pain.
    * [Caffe](http://caffe.berkeleyvision.org/)
        * UC Berkeley. C++. Computer vision oriented.
    * [Theano](http://deeplearning.net/software/theano/)
        * U of Montreal. Python. Very general.
    * [Torch](http://torch.ch/)
        * Selected by Facebook AI Research. LuaJIT. Very general.
    * [MXNet](http://mxnet.io/)
        * Selected by Amazon. Supports multiple languages and environments.
    * [TensorFlow](https://www.tensorflow.org/)
        * Google. C++ and Python. Intuitive and easy to use. Most popular. Rapidly developing.
* In the essence, the above libraries all have similar features: **auto-diff**, and **flexible deployment**

### B. Automatic Differentiation
![autodiff](http://elekslabs.com/wp-content/uploads/2013/07/calculus.jpg)
(image from http://elekslabs.com/2013/07/a-short-note-on-automatic.html)

* **Key Idea**
    * Training of a neural network, the error backpropagation, is simply differentiation
    * Differentiation follows simple rules
        * for example, $(f(g(x)))' = f'(g(x))g'(x)$
    * It can be **automated** !
* It works like a magic.
    * You specify a graph structure
    * You pick which variables to be differentiated with respect to which variables.
    * Boom! Gradient available!
    
** Now, you can experiment with network structures more rapidly **

### C. Flexible Deployment : One code, for CPU and GPU
![nvidia](http://cms.ipressroom.com.s3.amazonaws.com/219/files/20149/543c131bfe058b228e020181_slide1/slide1_mid.jpg)
* The era of GPGPU (General Purpose Graphic Processing Unit)
    * NVIDIA provides CUDA to code with GPU's.
    * However, the abstraction level of CUDA is too low for machine learning research and development.
    
    
    
* The deep learning frameworks do the dirty things for you.
    * The libraries compile your high-level codes (in Python or Lua) into low-level binaries.

## 2. TensorFlow Basics

In [1]:
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

In [2]:
import tensorflow as tf

### A. TensorFlow = Graph + Session
** Typical TensorFlow workflow**
> 1. Define a **graph**  
> 2. Lauch the defined graph through **session**


* Graph: The relationship between the input and the output
* Session: The environment that the graph is executed

Session takes a node as an input, and traces back the graph to collect information needed to evaluate the node.


To make analogy with programming languages, 
* **graph** is defining a function
* Launching a **session** is run the function with the input arguments

In [12]:
'''Define an empty graph'''
g = tf.Graph()

In [13]:
'''Define a node in the graph'''
with g.as_default():  
    hi = tf.constant(5.)

Session example 1

In [16]:
with g.as_default():  
    '''Open a session'''
    sess = tf.Session()
    
    '''Launch (or execute) the graph through session'''
    print sess.run(hi)
    
'''close session'''
sess.close()

5.0


Session example 2

In [14]:
with g.as_default():  
    '''Open a session'''
    with tf.Session() as sess:
        '''Launch (or execute) the graph through session'''
        print sess.run(hi)

5.0


* A graph is only evaluated through a session
* `with` statement is to make graph/session assignment clear.


* Actually, there are several possible ways to launch a graph and a session
    * matter of style

Session example 3

In [20]:
with tf.Session(graph=g) as sess:
    '''Launch (or execute) the graph through session'''
    print sess.run(hi)

5.0


### B. Graph = Tensor + Flow
* **Tensor** : A node in a graph that has N-dimensional array as values
* **Operation** : A node in a graph that takes a set of tensors as the input, and produce a set of tensors as the output.

In [22]:
g = tf.Graph()
with g.as_default():
    a = tf.constant(5)
    b = tf.constant(10)
    c = a + b  # or c = tf.sum(a,b)
    
sess = tf.Session(graph=g)
sess.run(c)
sess.close()

15

* Here, `c` is a **summation operator** that adds two tensors.

#### Tensors = {Constants, Placeholders, Variables}
* **Constants** : A tensor whose value does not change.
* **Placeholders**: A tensor whose value is given **when a session is executed**
    * Typically denotes input/output **data**.
* **Variables**: A tensor whose value is maintained through multiple session runs
    * Typically denotes **model parameters**(connection weights).

When there are **placeholders** involved in the computation, **sess.run** needs **feed_dict** to plug values into the placeholders   
``` feed_dict = {tensor : value} ```

In [26]:
g = tf.Graph()
with g.as_default():
    a = tf.placeholder('float32')
    b = tf.constant(10.)
    c = a + b  # or c = tf.sum(a,b)
    
sess = tf.Session(graph=g)
print sess.run(c, feed_dict={a:np.array(5.)})  # we need feed_dict
sess.close()

15.0

**Variables** maintain their states, and have to be initialized by a separate operator.  
Typically, `tf.tf.global_variables_initializer()` are used.

In [38]:
g = tf.Graph()
with g.as_default():
    a = tf.placeholder('float32')
    b = tf.Variable(initial_value=10., dtype='float32')
    init = tf.global_variables_initializer()  # it initializes all variables
    c = b.assign_add(a) # basically means b = b + a
    
sess = tf.Session(graph=g)
sess.run(init)  # initialization should be run first
print sess.run(c, feed_dict={a:np.array(5.)})


15.0

The state of **b** remains after the execution!

In [39]:
for i in range(5):
    print sess.run(c, feed_dict={a:np.array(5.)})
sess.close()

20.0
25.0
30.0
35.0
40.0


** Note that all the tensors are strictly aware of data types and shapes ! **

## 3. Optimization with the Automatic Differentiation
* Conveniently, TensorFlow provides high-level functions for gradient descent optimization
    * ```tf.train.GradientDescentOptimizer```
    * ```tf.train.AdagradOptimizer```
    * ```tf.train.AdamOptimizer```
* ```tf.train.GradientDescentOptimizer```, as an operator, kindly computes gradients and update the variable.

In [49]:
g = tf.Graph()
with g.as_default():
    a = tf.placeholder('float32')
    b = tf.Variable(initial_value=10., dtype='float32')
    
    init = tf.global_variables_initializer()  # it initializes all variables
    
    # objective function
    loss = tf.pow(tf.sub(a,b),2)  # squared distance between two scalars
    
    # optimizer
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
    train_op = optimizer.minimize(loss)
    
sess = tf.Session(graph=g)
sess.run(init)  # initialization should be run first

for i in xrange(10):
    print sess.run([b, train_op], feed_dict={a:np.array(5.)})


[10.0, None]
[9.8999996, None]
[9.802, None]
[9.7059603, None]
[9.6118412, None]
[9.5196047, None]
[9.4292126, None]
[9.3406286, None]
[9.2538157, None]
[9.1687393, None]


* The code make the variable **b** closer to the placeholder **a**
    * We have never specified the gradient of the loss function!
* **train_op**, operation that performs the training, does not return anything.
* If **sess.run()** get a list as an input, it outputs a list as well.
* Do more iteration!

In [None]:
for i in xrange(10):
    print sess.run([b, train_op], feed_dict={a:np.array(5.)})

# Wrap-Up
* TensorFlow is one of many deep learning frameworks that provide high-level functions for rapid implementation of deep neural networks.
* TensorFlow works by
    1. Defining a **graph**
    2. Launching it through a **session**
* Gradient descent is super easy