## Intro to TensorFlow

Note I am reading the book as my guide -- https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291 - quite a good book so far.

The goal here will be to explore building similar tools as I've done before:  
- Basic linear regression solver
- Basic logistic regression solver
- Gradient Descent solver

Then to move onto the next major area: Neural Networks



TensorFlow requires you to wrap your variables and methods into their own custom structures.  A few caveats:  
- TF wants stuff in a different row/matrix orientation than SciKit.  Everyone wants it different which is annoying (but just requires a transpose so not a huge deal)
- TF seems hard to debug since you wrap stuff and evaluate later in a graph
- TF graph gui stuff looks slick
- Its not so intuitive as a Py declarative programmer

Code that does a basic BGD Logistic Regression Solver -- note TF isn't installed then it won't work   
(Just pip install tensorflow)

In [7]:
import tensorflow as tf
import numpy as np
import myutils

tf.reset_default_graph()
n_epochs = 400
learning_rate = 0.01

xs = np.array([[10,1],[11,2],[1,6]])   # dummy sample  (high, high, low)
ys = np.array([[1],[1],[0]])           # dummy results (correlates high count as positive, so 1,1,0 results)

X = tf.constant(xs, dtype=tf.float32, name='X')   # wrap in TF vanilla consts
y = tf.constant(ys, dtype=tf.float32, name='y')   

theta = tf.Variable(tf.constant([[0.1],[0.1]]), name='theta')  # TF "variable"
y_pred = tf.sigmoid(tf.matmul(X, theta, name='predictions'))  # wrap in TF sigmoid

with tf.name_scope("loss"):                # named scope (for graph imagery gropuing)
    error = y_pred - y                     # error used in next scope
    ll = tf.reduce_mean(tf.losses.log_loss(y,y_pred), name='log_loss')  # std log_loss function, not used?

with tf.name_scope("gradients"):
    gradients = 2.0/len(ys) * tf.matmul(tf.transpose(X), error)         # std partial deriv/gradient formula 
    training_op = tf.assign(theta, theta - learning_rate * gradients)   # "training_op" is called later

init = tf.global_variables_initializer()   # boilerplate init

with tf.Session() as sess:                 # this is where the TF stuff actually runs
    sess.run(init)
    for epoch in range(n_epochs):          # GD loop
        sess.run(training_op)              # each loop calls "training_op" again which assigns the theta
    best_theta = theta.eval()              # fetch theta array
    print('theta computed: ', myutils.gf(best_theta))            


theta computed:  ['0.6762', '-0.8275']


## Re-doing the Lady Gaga Classifier

Here we go, adapted to solve our favorite dummy example

In [14]:
import tensorExamples as te
import pandas
from gdsolvers import sigmoid
import logging as log

tf.reset_default_graph()
n_epochs = 2500
learning_rate = 0.01

X,y,theta,y_pred,features,rfeatures,testMatrix,testY = te.getGagaTfFormat()
m = len(testMatrix[0])

with tf.name_scope("loss"):
    error = y_pred - y  # vs ll ?
    ll = tf.reduce_mean(tf.losses.log_loss(y,y_pred), name='log_loss')  # -log(x) or -log(1-x) ....

with tf.name_scope("gradients"):
    gradients = 2.0/m * tf.matmul(tf.transpose(X), error)  # vs ll vs error 
    training_op = tf.assign(theta, theta - learning_rate * gradients)   # what is this

ll_summary = tf.summary.scalar('log_loss',ll)
file_writer = tf.summary.FileWriter(myutils.getLogDir(),tf.get_default_graph())
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if (epoch % 500 == 0):
            print('Epoch %s Log_Loss %s'%(epoch, ll.eval()))
            summary_str = ll_summary.eval()  # bug
            step = epoch
            file_writer.add_summary(summary_str, step)
        sess.run(training_op)   # whats an opp
    best_theta = theta.eval()
print(myutils.gf(best_theta))   # scores should be similar to sckit and grad5 solver
file_writer.close()

# reduce test set similarly (note below works because we know full set of train+test features ahead of time)    
df = pandas.DataFrame(testMatrix, columns=features)   # new df w/ column names
X = df[rfeatures].as_matrix()                # filter out only rfeatures

testRes = np.dot(X, best_theta)
testResRound = [round(sigmoid(x),0) for x in testRes]
testDiffs = np.array(testResRound) - np.array(testY)
log.warn ('raw results %s '%(myutils.gf(testRes)))
log.warn ('sig results %s'% (myutils.gf([sigmoid(x) for x in testRes])))
log.warn ('0|1 results %s'%([round(sigmoid(x),0) for x in testRes]))
log.warn (testDiffs)
log.error ('mymodel errors: %s / %s = %f'%(sum([abs(x) for x in testDiffs]),len(testY),sum([abs(x) for x in testDiffs])/len(testY)))




Epoch 0 Log_Loss 0.59850425
Epoch 500 Log_Loss 0.4192724
Epoch 1000 Log_Loss 0.34695074
Epoch 1500 Log_Loss 0.30071217
Epoch 2000 Log_Loss 0.26779503
['-0.0209', '-0.0039', '-0.0069', '-0.0125', '-0.0018', '-0.0244', '0.0235', '0.0125', '0.0118', '-0.1697', '0.0118', '-0.0051', '0.0354', '0.0138', '0.0438', '-0.0172', '0.0424', '-0.0088', '0.0309', '0.0315', '...']


NameError: name 'log' is not defined

-----


I was curious when GPU are faster.  I found a few things initially.

- CPU without MMX is slow (no vector operations on matrix and BGD)
- CPU w/ MMX is faster (should measure if I can disable MMX somehow)
- GPU is not always faster (on basic MNIST NN image example it was ~8% faster)

To setup GPU:
- Suggest setup a virtualenv of conda env, so you keep the env clean if you want to compare GPU vs non-GPU
- Install TensorFlow (tensorflow-gpu which includes tensorflow - 1.8 in my case)
- Install CUDA drivers (9.0 in my case, its compiled for only this version in Windows)
- Install Cuda NN DLL (7.1 - just copy DLL after installed to somewhere in path)

Note my initial installs were screwed up.  I reinstalled tensorflow after installing CUDA and now its working.



