## How to do math in Tensorflow (with logistic regression example)

[Tensorflow](https://www.tensorflow.org/) is a framework introduced by Google for machine learning, but most people perceive it a tool for building deep neural networks. But actually it can be used as a good mathematics symbolic framework for numeric calculation and making use of the computational power of GPUs and distributed clusters effortlessly.

Here I will present the classic example of logistic regression without using the built-in optimization tools.

### Generate simulated data 

In [1]:
import numpy as np
import toy_data
import bokeh.io
bokeh.io.output_notebook()

x_dim = 2
gm = toy_data.GaussianMixture(n_class=2, dim=x_dim)
toy_data.visualize_2D(gm.Classes, gm.class_colors)

data_X = gm.tr.X
data_y = np.array([gm.tr.y[:, 0]]).T

### Build tensorflow graph

#### Initialization:

In tensorflow the input data are usually ```tf.placeholder```s and the internal variables are ```tf.Variable```s. We create a short cut for initialzation first:

In [2]:
import tensorflow as tf

io_tf = lambda dim: tf.placeholder(tf.float32, dim)
var_tf = lambda dim: tf.Variable(dim)

#### Define the logistic regression model functions

(the ```/gpu:0``` forces the computation on GPU0)


In [3]:
def tf_logistic(_X):  
    return 1/(1 + tf.exp(-_X))

with tf.device('/gpu:0'):
    x = io_tf([None, x_dim])
    y = io_tf([None, 1])
    W = var_tf(tf.zeros([x_dim, 1]))
    b = var_tf(tf.zeros([1]))
    y_ = tf_logistic(tf.matmul(x, W) + b)
    loglikelihood = tf.reduce_sum(tf.log((1-y_)*(1-y) + y*y_))


#### Define the update function
$ \frac{dy}{db} $ and $ \frac{dy}{dW} $ can be calculated automaticly by tensorflow and stored as ```dydW``` and ```dydb``` in the last two lines.

we group the update steps in ```gradient_ascend```, once we run the ```gradient_ascend``` operation, the variables will be updated.

In [8]:
lr = 0.001  # Learning rate
with tf.device('/gpu:0'):
    dydW = tf.gradients(loglikelihood, W)[0]
    dydb = tf.gradients(loglikelihood, b)[0]
    gradient_ascend = tf.group(
        W.assign_add(lr*dydW),
        b.assign_add(lr*dydb))

#### Fill in data and run the tensorflow graph

The script will stop if the improvement is less than ```err_min``` or the iteration is larger than ```max_iter```.

In [9]:
err_min = 0.01
max_iter = 1000
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    sess.run(tf.initialize_all_variables())
    
    def classifier(_X):
        _y = sess.run(y_, feed_dict={x: _X})
        _Y = np.column_stack((_y, 1 - _y))
        return _Y
    
    for i in range(max_iter):
        lll_old = sess.run(loglikelihood, {x:data_X, y:data_y})
        sess.run(gradient_ascend, {x:data_X, y:data_y})
        lll = sess.run(loglikelihood, {x:data_X, y:data_y})
        if abs(lll - lll_old) < err_min:
            break
            
    print("Optimization finished in ", i, " iterations.")

    toy_data.visualize_2D(gm.Classes, gm.class_colors, classifyF=classifier, res=100)

Optimization finished in  60  iterations.


#### Add backtracking line search to update the learning rate

In [10]:
beta = 0.8 # Backtracking
alpha = 0.5 # Backtracking
with tf.device('/gpu:0'):
    norm_grad = tf.reduce_sum(dydW**2) + dydb**2
    
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    sess.run(tf.initialize_all_variables())
    for i in range(max_iter):
        lll_old = sess.run(loglikelihood, {x:data_X, y:data_y})
        while True:
            sess.run(gradient_ascend, {x:data_X, y:data_y})
            lll_new = sess.run(loglikelihood, {x:data_X, y:data_y})
            norm_grad_val = sess.run(norm_grad, {x:data_X, y:data_y})
            if lll_new < lll_old + lr*alpha*norm_grad_val:
                lr *= beta
                print(lr)
            else:
                break
        if abs(lll_new - lll_old) < err_min:
            break
        
    print("Optimization finished in ", i, " iterations.")
    
    def classifier(_X):
        _y = sess.run(y_, feed_dict={x: _X})
        _Y = np.column_stack((_y, 1 - _y))
        return _Y

    toy_data.visualize_2D(gm.Classes, gm.class_colors, classifyF=classifier, res=100)

0.0008
0.00064
0.0005120000000000001
0.0004096000000000001
0.0003276800000000001
0.0002621440000000001
Optimization finished in  54  iterations.
