# Basic Concepts and API

All TF code follows this process:
1. Create a **computation graph** that defines your computational structure
2. Create a TF session
3. Run the computation graph in the session

In [1]:
import tensorflow as tf

In [2]:
# Define variables and operations in the graph

x = tf.Variable(3, name="x") # declare a symbolic name, x
y = tf.Variable(4, name="y")
g = x*x*y
h = y**3
print(type(g))
print(type(h))
f = g + h
print(type(f))

<class 'tensorflow.python.framework.ops.Tensor'>
<class 'tensorflow.python.framework.ops.Tensor'>
<class 'tensorflow.python.framework.ops.Tensor'>


The *type* of each computation is a TF **op**.

In [3]:
# Build a session and run. Using the "with" context block automatically closes the session.
with tf.Session() as tf_sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

In [4]:
result

100

An alternative to initializing variables individually is to call the <code>global_variables_initializer</code> function.

In [5]:
init = tf.global_variables_initializer() # Creates an init node

with tf.Session() as tf_sess:
    init.run()
    result = f.eval()
    print(result)

100


## Graphs

We can build graphs and then merge them together programmatically. Otherwise, it is assumed that declared computations are applied to the **same graph**.

In [6]:
x1 = tf.Variable(1)
# check where this x1 node lives:
x1.graph is tf.get_default_graph()

True

In [7]:
# Now, make another graph and add a new variable to it:
new_graph = tf.Graph()
with new_graph.as_default():
    x2 = tf.Variable(2)
    
print(x2.graph is tf.get_default_graph())
print(x2.graph is new_graph)

False
True


## More on Nodes

TF node evaluation determines the set of nodes that the node depends on and evaluates them. **All node values (except variables) are dropped between graph runs!**

Varialbes start their life when initialized and end when the session closes.

In [8]:
tf.reset_default_graph()

w = tf.constant(9)
x = w * 7
y = x + 2
z = x**2

with tf.Session() as tf_sess:
    print(y.eval())
    print(z.eval())

65
3969


The above code is not efficient, as the computation of x and w will happen twice! Instead, have y and evaluate in a single graph run.

In [9]:
with tf.Session() as tf_sess:
    y_val, z_val = tf_sess.run([y,z])
    print(y_val)
    print(z_val)

65
3969


## Operations

TF "ops" can take *any* number of inputs and produce *any* number of outputs. Sources are constants and Variables. The inputs and outputs of operations are always **tensors** - multi-dimensional arrays. In TF, tensors are numpy <code>ndarray</code>s.

The following example performs linear regression using the closed form Normal Equation embedded as a TF op.

In [10]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

housing_dataset = fetch_california_housing()
m,n = housing_dataset.data.shape
print("Data shape: " + str(m) + " instances, " + str(n) + " features")

X_raw = housing_dataset.data
y_raw = housing_dataset.target

# Split up the data
X_train, X_test, y_train, y_test = train_test_split(X_raw, y_raw, test_size=0.2)
print("Training size: " + str(X_train.shape[0]) + "; Test size: " + str(X_test.shape[0]))

# Scale the data sets
housing_scaler = StandardScaler()
X_train_scaled = housing_scaler.fit_transform(X_train)
X_test_scaled = housing_scaler.transform(X_test)

Data shape: 20640 instances, 8 features
Training size: 16512; Test size: 4128


In [11]:
# Add a bias of 1 to model the linear regression.
X_train_biased = np.c_[np.ones((X_train_scaled.shape[0],1)), X_train_scaled]
X_test_biased = np.c_[np.ones((X_test_scaled.shape[0],1)), X_test_scaled]
print("Biased train data shape: " + str(X_train_biased.shape[0]) + " instances, " + str(X_train_biased.shape[1]) + " features")
print("Biased test data shape: " + str(X_test_biased.shape[0]) + " instances, " + str(X_test_biased.shape[1]) + " features")

Biased train data shape: 16512 instances, 9 features
Biased test data shape: 4128 instances, 9 features


In [12]:
X = tf.constant(X_train_biased, dtype=tf.float32, name="X")
print("Target array shape: " + str(X_train_biased.shape))
# Explicitly turn into an m x 1 vector
y = tf.constant(y_train.reshape(-1,1), dtype=tf.float32, name="y")
print("...as TF constant: " + str(y_train.shape))
XT = tf.transpose(X)

Target array shape: (16512, 9)
...as TF constant: (16512,)


Implement the Normal Equation:
$\theta^{\star} = (X\cdot X^T)^{-1}\cdot{X^T}\cdot{y}$

In [13]:
inv = tf.matrix_inverse( tf.matmul(XT, X) )
theta = tf.matmul( tf.matmul(inv, XT), y )

In [14]:
# Vroom vroom!
with tf.Session() as tf_sess:
    theta_val = theta.eval()

In [15]:
print("Performed a linear regression over the data set:")
print(str(theta_val) + "\n " + str(theta_val.shape))

Performed a linear regression over the data set:
[[ 2.0734649 ]
 [ 0.8271916 ]
 [ 0.11634666]
 [-0.2723715 ]
 [ 0.31717736]
 [-0.00663501]
 [-0.04468723]
 [-0.9125133 ]
 [-0.87877035]]
 (9, 1)


## Manual Gradient Descent via TF

I will re-use the scaled data from above and implement gradient descent manually rather than use the normal equation solution.

In [41]:
m = X_train_biased.shape[0]
n = X_train_biased.shape[1]

n_epochs = 2000
alpha = 0.01 # learning rate

In [42]:
# For grins, make a new graph for this implementation.
gd_graph = tf.Graph()

with gd_graph.as_default():
    X = tf.constant(X_train_biased, dtype=tf.float32, name="X")
    y = tf.constant(y_train.reshape(-1,1), dtype=tf.float32, name="y")
    # Initialize theta variables with uniform random values
    theta = tf.Variable( tf.random_uniform([n, 1], -1.0, 1.0), name="theta" )
    # Compute the predictions and error
    y_pred = tf.matmul( X, theta, name="predictions" )
    error = y_pred - y
    # Call on TF's mse function
    mse = tf.reduce_mean( tf.square(error), name="mse" )
    # Gradient calculations
    dJdtheta = (2.0/m) * tf.matmul( tf.transpose(X), error )
    # Training/learning op. assign() computes a new value and assigns it to a TF variable
    train_op = tf.assign( theta, theta - alpha*dJdtheta )
    
    init_op = tf.global_variables_initializer()

In [43]:
with tf.Session( graph=gd_graph ) as sess:
    sess.run(init_op)
    
    for i in range(n_epochs):
        
        if i % 100 == 0:
            print("Epoch ", i, "MSE = ", mse.eval())
        sess.run(train_op)
    
    # At the end, print the current thetas
    print(theta.eval())


Epoch  0 MSE =  13.861329
Epoch  100 MSE =  1.0140251
Epoch  200 MSE =  0.7202255
Epoch  300 MSE =  0.66604364
Epoch  400 MSE =  0.63076967
Epoch  500 MSE =  0.6049017
Epoch  600 MSE =  0.58575594
Epoch  700 MSE =  0.57155
Epoch  800 MSE =  0.56098914
Epoch  900 MSE =  0.55311793
Epoch  1000 MSE =  0.5472386
Epoch  1100 MSE =  0.5428348
Epoch  1200 MSE =  0.5395259
Epoch  1300 MSE =  0.53703207
Epoch  1400 MSE =  0.53514665
Epoch  1500 MSE =  0.5337153
Epoch  1600 MSE =  0.5326251
Epoch  1700 MSE =  0.53179055
Epoch  1800 MSE =  0.53115
Epoch  1900 MSE =  0.5306552
[[ 2.073458  ]
 [ 0.84893167]
 [ 0.12904091]
 [-0.29811287]
 [ 0.33159944]
 [-0.00228728]
 [-0.04670006]
 [-0.7986172 ]
 [-0.76659226]]


The results are pretty good compared to the normal equation. But it would be nice to not have to compute the derivative by hand all the time, especially for more difficult functions, e.g. regularized cost functions. Next, I will use *autodiff* to automatically compute the gradient.

In [28]:
gd_graph2 = tf.Graph()
with gd_graph2.as_default():
    X = tf.constant(X_train_biased, dtype=tf.float32, name="X")
    y = tf.constant(y_train.reshape(-1,1), dtype=tf.float32, name="y")
    # Initialize theta variables with uniform random values
    theta = tf.Variable( tf.random_uniform([n, 1], -1.0, 1.0), name="theta" )
    # Compute the predictions and error
    y_pred = tf.matmul( X, theta, name="predictions" )
    error = y_pred - y
    # Call on TF's mse function
    mse = tf.reduce_mean( tf.square(error), name="mse" )
    # Using tf's autodiff capability compute the derivative of the MSE
    dJdtheta = tf.gradients( mse, [theta], name="dJdtheta" )[0]
    print(dJdtheta)
    
    # Training/learning op. assign() computes a new value and assigns it to a TF variable
    # This *is* the optimization process - simple gradient descent
    train_op = tf.assign( theta, theta - alpha*dJdtheta )
    
    init_op = tf.global_variables_initializer()

Tensor("dJdtheta/predictions_grad/MatMul_1:0", shape=(9, 1), dtype=float32)


In [29]:
with tf.Session( graph=gd_graph2 ) as sess:
    sess.run(init_op)
    
    for i in range(n_epochs):
        
        if i % 100 == 0:
            print("Epoch ", i, "MSE = ", mse.eval())
        sess.run(train_op)
    
    # At the end, print the current thetas
    print(theta.eval())


Epoch  0 MSE =  6.169226
Epoch  100 MSE =  0.6868336
Epoch  200 MSE =  0.58082676
Epoch  300 MSE =  0.56543285
Epoch  400 MSE =  0.55552167
Epoch  500 MSE =  0.5483272
Epoch  600 MSE =  0.5430749
Epoch  700 MSE =  0.53923863
Epoch  800 MSE =  0.53643495
Epoch  900 MSE =  0.5343874
Epoch  1000 MSE =  0.5328912
Epoch  1100 MSE =  0.5317973
Epoch  1200 MSE =  0.53099763
Epoch  1300 MSE =  0.5304136
Epoch  1400 MSE =  0.529986
Epoch  1500 MSE =  0.52967364
Epoch  1600 MSE =  0.5294448
Epoch  1700 MSE =  0.5292774
Epoch  1800 MSE =  0.5291554
Epoch  1900 MSE =  0.5290648
[[ 2.073458  ]
 [ 0.81652534]
 [ 0.11962067]
 [-0.24120452]
 [ 0.2868286 ]
 [-0.00527798]
 [-0.0446194 ]
 [-0.89914066]
 [-0.8637131 ]]


It is possible to roll all of the above into a simple call to a tf `Optimizer`!

In [34]:
gdwithopt_graph = tf.Graph()
# All the same intialization code, but then call on a MomentumOptimizer (or whatever other flavor)
with gdwithopt_graph.as_default():
    X = tf.constant(X_train_biased, dtype=tf.float32, name="X")
    y = tf.constant(y_train.reshape(-1,1), dtype=tf.float32, name="y")
    # Initialize theta variables with uniform random values
    theta = tf.Variable( tf.random_uniform([n, 1], -1.0, 1.0), name="theta" )
    # Compute the predictions and error
    y_pred = tf.matmul( X, theta, name="predictions" )
    error = y_pred - y
    # Call on TF's mse function
    mse = tf.reduce_mean( tf.square(error), name="mse" )
    
    # The optimizer:
#     optimizer = tf.train.GradientDescentOptimizer(learning_rate=alpha)
    # Uncomment to use MomentumOptimizer
    optimizer = tf.train.MomentumOptimizer(learning_rate=alpha, momentum=0.9)

    training_op = optimizer.minimize(mse)
    
    init_op = tf.global_variables_initializer()

In [35]:
with tf.Session(graph=gdwithopt_graph) as sess:
    sess.run(init_op)
    
    for i in range(n_epochs):
        if i % 100 == 0:
            print("Epoch ", i, "MSE = ", mse.eval())
        sess.run(training_op)
    
    # At the end, print the current thetas
    print(theta.eval())


Epoch  0 MSE =  6.269268
Epoch  100 MSE =  0.5506972
Epoch  200 MSE =  0.53147733
Epoch  300 MSE =  0.52917576
Epoch  400 MSE =  0.52886564
Epoch  500 MSE =  0.5288232
Epoch  600 MSE =  0.5288176
Epoch  700 MSE =  0.5288164
Epoch  800 MSE =  0.52881724
Epoch  900 MSE =  0.52881676
Epoch  1000 MSE =  0.52881676
Epoch  1100 MSE =  0.5288168
Epoch  1200 MSE =  0.5288167
Epoch  1300 MSE =  0.52881664
Epoch  1400 MSE =  0.52881664
Epoch  1500 MSE =  0.52881664
Epoch  1600 MSE =  0.52881664
Epoch  1700 MSE =  0.52881664
Epoch  1800 MSE =  0.52881664
Epoch  1900 MSE =  0.52881664
[[ 2.0734634 ]
 [ 0.82719177]
 [ 0.11634664]
 [-0.2723739 ]
 [ 0.3171786 ]
 [-0.00663507]
 [-0.04468729]
 [-0.91251373]
 [-0.87877   ]]


The above code still performed a batch learning process: the whole data set was consumed and the model was trained. Next step in thie evolution is to move to a **mini-batch** process. $X$ and $y$ get replaced with new values from the data set on each epoch iteration. In TF, we use **placeholder** nodes to accomplish this modification.

In [36]:
placeholder_ex_graph = tf.Graph()
with placeholder_ex_graph.as_default():
    A = tf.placeholder(tf.float32, shape=(None,4))
    B = A + 2

with tf.Session(graph=placeholder_ex_graph) as sess:
    B_result1 = B.eval( feed_dict={A: [[3,3,3,1]]} )
    B_result2 = B.eval( feed_dict={A: [[8,9,1,2]]} )

In [40]:
print(B_result1)
print(B_result2)

[[5. 5. 5. 3.]]
[[10. 11.  3.  4.]]


In [44]:
# Mini-batch Gradient Descent!
# gd_minibatch = tf.Graph()
# with gd_minibatch.as_default():
#     # Now X and y are fed into the graph. n is the number of features (dimensions) in X
#     X = tf.placeholder( tf.float32, shape=(None, n+1), name="X" )
#     y = tf.placeholder( tf.float32, shape=(None, 1), name="y")

    
    
#     init_op = tf.global_variables_initializer()