## Intro to TensorFlow

Note I am reading the book as my guide -- https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291 - first half of book is really good.  Tensorflow area starts pretty good, but then dives a bit too opaquely into the high level Deep Neural Net API.  Its quite hard to understand whats happening -- you really need a bit more low level code samples of implementing backprop and layers with smaller examples I think.

Anyways the goal here will be to explore building similar tools as I've done before:  
- Basic logistic regression solver
- Gradient Descent solver
- Lady Gaga example applied to TF
- GPU testing

Then to move onto the next major area: Neural Networks (and to competitive frameworks like PyTorch)



TensorFlow requires you to wrap your variables and methods into their own custom structures.  A few caveats:  
- TF wants stuff in a different row/matrix orientation than SciKit.  Keeping matrix shapes straight is key to saving time (debugging).
- TF seems hard to debug since you wrap stuff and evaluate later in a graph (asyncronous almost)
- TF graph gui stuff looks slick
- Its not so intuitive as a Py declarative programmer (there is an eager-eval mode, but doesn't mix well)

Below is a simple Logistic Regression using Batch Gradient Descent in Tensor Flow.


In [7]:
import tensorflow as tf
import numpy as np
from myutils import gf

tf.reset_default_graph()
n_epochs = 400
learning_rate = 0.01

xs = np.array([[10,1],[11,2],[1,6]])   # dummy sample  (high, high, low)
ys = np.array([[1],[1],[0]])           # dummy results (correlates high count as positive, so 1,1,0 results)

X = tf.constant(xs, dtype=tf.float32, name='X')   # wrap in TF vanilla consts
y = tf.constant(ys, dtype=tf.float32, name='y')   

theta = tf.Variable(tf.constant([[0.1],[0.1]]), name='theta')  # TF "variable"
y_pred = tf.sigmoid(tf.matmul(X, theta, name='predictions'))  # wrap in TF sigmoid

with tf.name_scope("loss"):                # named scope (for graph imagery gropuing)
    error = y_pred - y                     # error used in next scope
    ll = tf.reduce_mean(tf.losses.log_loss(y,y_pred), name='log_loss')  # std log_loss function, not used?

with tf.name_scope("gradients"):
    gradients = 2.0/len(ys) * tf.matmul(tf.transpose(X), error)         # std partial deriv/gradient formula 
    training_op = tf.assign(theta, theta - learning_rate * gradients)   # "training_op" is called later

init = tf.global_variables_initializer()   # boilerplate init

with tf.Session() as sess:                 # this is where the TF stuff actually runs
    sess.run(init)
    for epoch in range(n_epochs):          # GD loop
        sess.run(training_op)              # each loop calls "training_op" again which assigns the theta
    best_theta = theta.eval()              # fetch theta array
    print('theta computed: ', gf(best_theta))            


theta computed:  ['0.6762', '-0.8275']


The simple example above shows how a simple logistical regression example is calculated in TensorFlow.   

- There are 2 features, and 3 training examples (3x2 matrix).
- Imagine if this is a Female test (1 is female, 0 is male).  1st feature is Hair Length, 2nd feature is Foot Size
- First 2 examples have long hair, small feet.  3rd example has short hair, big feet.
- Expected Y outputs are 1,1,0 -- first 2 examples are female, last is male.

The model trains itself and weighs Feature1 +0.67 on hair length, and -0.83 on shoe size to feed the sigmoid function to determine >0.5 and map to Female(1) or <0.5 Male(0).

---

## Re-doing the Lady Gaga Classifier with a little bit of TF

Here we go, adapted to solve our favorite dummy example

In [20]:
import tensorGaga as te
import numpy as np
import tensorflow as tf
import pandas
from myutils import gf
from gdsolvers import sigmoid
import logging as log

### setup variables/placeholders ###
tf.reset_default_graph()
n_epochs = 2500
learning_rate = 0.01

X,y,features,rfeatures,testMatrix,testY = te.getGagaTfFormat()   # returns X,y in tf.Variables
m = len(testMatrix[0])
guesses = np.array([0.01]*n,dtype='float32' ).reshape(-1,1)  

theta = tf.Variable(tf.constant(guesses), name='theta')
y_pred = tf.sigmoid(tf.matmul(X, theta, name='predictions'))

with tf.name_scope("loss"):
    error = y_pred - y                                                  # error vector
    ll = tf.reduce_mean(tf.losses.log_loss(y,y_pred), name='log_loss')  # sum of errors w/ -log(x) or -log(1-x)

with tf.name_scope("gradients"):
    gradients = 2.0/m * tf.matmul(tf.transpose(X), error)               # calculate gradients vector
    training_op = tf.assign(theta, theta - learning_rate * gradients)   # to be called later to assign new thetas

### run the actual computations ###
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if (epoch % 500 == 0):
            print('GD Loop %s Log_Loss %s'%(epoch, ll.eval()))
        sess.run(training_op)                                           # assign the gradient values above
    best_theta = theta.eval()                                           # get final theta values post loop
print('\ntrained theta weights:', gf(best_theta))               

print ('\nTraining results: ')
weightSample = pandas.DataFrame(rfeatures[:20], columns=['word/feature'])
weightSample['theta weight'] = gf(best_theta[:20])
display(weightSample.transpose())

### run test data now ###
df = pandas.DataFrame(testMatrix, columns=features)   # new df w/ column names
X = df[rfeatures].as_matrix()                # filter out only rfeatures

testRes = np.dot(X, best_theta)
testResSig = [sigmoid(x) for x in testRes]            
testResRound = [round(sigmoid(x),0) for x in testRes]
testDiffs = np.array(testResRound) - np.array(testY)

testResults = pandas.DataFrame([gf(testRes),gf(testResSig),gf(testResRound), gf(testDiffs)])
testResults.insert(0,'test#',['raw h(x)','sig g(h(x))','round g(h(x))','h(x)-y'])
display(testResults)     
                          
log.error('mymodel errors: %s / %s = %f'%(sum([abs(x) for x in testDiffs]),len(testY),sum([abs(x) for x in testDiffs])/len(testY)))




GD Loop 0 Log_Loss 0.59850425
GD Loop 500 Log_Loss 0.4192724
GD Loop 1000 Log_Loss 0.34695074
GD Loop 1500 Log_Loss 0.30071217
GD Loop 2000 Log_Loss 0.26779503

trained theta weights: ['-0.0209', '-0.0039', '-0.0069', '-0.0125', '-0.0018', '-0.0244', '0.0235', '0.0125', '0.0118', '-0.1697', '0.0118', '-0.0051', '0.0354', '0.0138', '0.0438', '-0.0172', '0.0424', '-0.0088', '0.0309', '0.0315', '...']

Training results: 


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
word/feature,1977.0,99999.0,adds,advertising,afternoon,age,ah,aha,aimin,ain,ak,alike,amen,american,americano,ammunition,angel,angels,animal,apart
theta weight,-0.0209,-0.0039,-0.0069,-0.0125,-0.0018,-0.0244,0.0235,0.0125,0.0118,-0.1697,0.0118,-0.0051,0.0354,0.0138,0.0438,-0.0172,0.0424,-0.0088,0.0309,0.0315


Unnamed: 0,test#,0,1,2,3,4,5,6,7,8,...,11,12,13,14,15,16,17,18,19,20
0,raw h(x),1.7017,0.2724,-0.3703,-0.8951,-0.6691,1.8326,0.016,0.9048,1.093,...,2.4272,-0.7433,1.4334,0.1487,-0.4242,-0.3371,5.591,0.4896,-0.2279,...
1,sig g(h(x)),0.8458,0.5677,0.4085,0.2901,0.3387,0.8621,0.504,0.7119,0.7489,...,0.9189,0.3223,0.8074,0.5371,0.3955,0.4165,0.9963,0.62,0.4433,...
2,round g(h(x)),1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,...,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,...
3,h(x)-y,0.0,0.0,0.0,-1.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...


ERROR:root:mymodel errors: 15.0 / 92 = 0.163043


### Summary of above
Using 500 features (reduced w/ kMeans), we ran 70% training to produce the Theta (mymodel guesses array of weights).   

- <B>raw h(x)</B>:  are the h(x) totals per feature    
- <B>sig g(h(x))</B>:  are the g(h(x)) after running thru the sigmoid function to get 0.0-1.0 values   
- <B>round g(h(x))</B>:  rounding g(h(x)) to 0 or 1 to compare vs Y's  
- <B>round(g(h(x))-y</B>:  the g(h(x)) - y results -- 0 means prediction was correct.  +1 = false pos, -1 false neg   


In summary we have a <B><u>84% correct hit rate</u></B> (15 errors out of 84 test examples).  

-----

### Higher level TF API - tensorflow.estimator.DNNClassifer()

TensorFlow also has higher level API's.  Its almost like 5 lines of code to use the DNNClassifer as per below:


In [4]:
import tensorflow as tf
import tensorflow_hub as hub
from tensorGaga import get_gaga_as_pandas_datasets

tf.logging.set_verbosity(tf.logging.INFO)
tf.reset_default_graph()

train_df, test_df = get_gaga_as_pandas_datasets()
display(train_df.head())
print ("Train set size: %d"%len(train_df))

# Training input on the whole training set with no limit on training epochs. (train then test)
train_input_fn = tf.estimator.inputs.pandas_input_fn(train_df, train_df["gagaflag"], num_epochs=None, shuffle=True)
predict_test_input_fn = tf.estimator.inputs.pandas_input_fn(test_df, test_df["gagaflag"], shuffle=False)

# this stuff is magic, there are various text categorizations u can re-use on tfhub.dev for various languages
embedded_text_feature_column = hub.text_embedding_column(
    key="sentence", module_spec="https://tfhub.dev/google/nnlm-en-dim128/1")

# train 3 layer net of input(4000?)-500-100-2 
estimator = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    feature_columns=[embedded_text_feature_column],
    model_dir="tf_logs/", n_classes=2,
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))

#train and test
estimator.train(input_fn=train_input_fn, steps=1000)
test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)    # expect ~ .80

display(test_df.head())
print ("Test set size: %d"%len(test_df))
print ("Test set accuracy: {accuracy}".format(**test_eval_result))


Unnamed: 0,file,sentence,gagaflag
40,boysboysboys.txt,\n\nHey there sugar baby\nSaw you twice at the...,1
24,wishyouwerehere.txt,"\n\nIt's funny how things, they change\nThe cl...",1
262,thenyoudloveme.txt,"\n\nWhat if I were to leave you?\nBut then, yo...",1
178,cakelikeladygaga.txt,\n\nStuntin' all day\nSwag on a hundred millio...,1
125,letsgocrazy.txt,\n\nSummon up the mas! Play on the pan!\nStari...,0


Train set size: 250
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'tf_logs/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a2211c940>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Initialize variable dnn/input_from_feature_columns/input_layer/sentence_hub_module_embedding/module/embeddings/part_0:0 from checkpoint b'/var/folders/z5/r3lc8_5n7klgg7d5z_tb9sgw0000gn/T/tfhub_modules/32f2b2259e1cc8ca58c876921748361283e73997/variables/variables' with embeddings
I

Unnamed: 0,file,sentence,gagaflag
293,nofloods.txt,\n\nI never ever thought I'd live away \nFrom ...,1
207,filthypop.txt,\n\nHello! Mr.Radio \nGot her pink tights \nAn...,1
183,overpoweredbyfunk.txt,\n\nIf you ain't reggae for it..funk out!\nNo-...,0
259,herewegoagain.txt,"\n\nAll my friends are going out, \nBut I've b...",1
130,firefly.txt,"\n\nI call her Firefly\nCause, oh, my\nShe rad...",1


Test set size: 55
Test set accuracy: 0.8545454740524292


<B>Amazing !!!</B>  (Other than all this excessive logging I can't trim down w/o eliminating it all)

The above example & output shows how easy (but opaque) it is to use a builtin DeepNeuralNetClassifier (DNNClassifer).  You hardly have to do any coding and it works just as well (better) than a hand built classifier.

----


### GPU note

I was curious when GPU are faster and since TF supports GPU I was excited.  I found a few things with initial tests (using the default MNIST image database and default example of a 3-4 layer NN using TF):

- CPU without MMX is slow (no vector operations on matrix and BGD) - my Atom laptop
- CPU w/ MMX is faster (should measure if I can disable MMX somehow) - most laptops
- GPU is not always faster (on basic MNIST NN image example it was ~8% faster) - my 1050Ti

To setup GPU:
- Suggest setup a virtualenv of conda env, so you keep the env clean if you want to compare GPU vs non-GPU
- Install TensorFlow (tensorflow-gpu which includes tensorflow - 1.8 in my case)
- Install CUDA drivers (9.0 in my case, its compiled for only this version in Windows)
- Install Cuda NN DLL (7.1 - just copy DLL after installed to somewhere in path)

<B>Note my initial installs were screwed up.  I reinstalled tensorflow after installing CUDA and now its working.   I think using VirtualEnv or Conda Env or Docker would be cleaner.</B>   But now after installing these compatible CUDA drivers my NiceHash for mining is broken!
