# TensorFlow Workshop 2/4/2017
## Speaker: Raghu Rajah, DataDigital.io

### Description: 
>The objective of the lab is to introduce attendees to machine learning and TensorFlow programming using Python. No prior experience with Python or Machine Learning is assumed, although some programming experience in any language would be helpful. We will start with refreshing some basics in linear algebra and statistics. We will then quickly review what TensorFlow is, the TensorFlow programming model, and code some basics machine learning algorithms in TensorFlow. **Please bring a laptop to this code lab. Also, install TensorFlow 0.11 on Docker (instructions: https://www.tensorflow.org/get_started/os_setup#docker_installation).

## TensorFlow Intro

In [1]:
import tensorflow as tf

In [9]:
hello = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [39]:
sess = tf.Session()

In [40]:
sess.run(tf.transpose(tf.gather(tf.transpose(hello), tf.range(1,2))))

array([[2],
       [5],
       [8]], dtype=int32)

### Make our first execution graph

In [41]:
a = tf.random_uniform([5,5])
b = tf.random_uniform([5,5])
c = tf.matmul(a,b)

In [42]:
a = sess.run(c)

In [43]:
a

array([[ 1.71821737,  2.01720667,  1.56136405,  1.00542319,  0.58998203],
       [ 1.36000121,  1.81619072,  1.28145432,  0.53352749,  0.49752089],
       [ 0.87655884,  1.54827309,  0.79011118,  0.82990181,  0.56916785],
       [ 0.92803049,  1.64017928,  0.78331292,  1.20171821,  0.69121003],
       [ 1.15215063,  1.60893774,  1.12215817,  0.48770118,  0.54047239]], dtype=float32)

### Lazy Eval
- TF works a lot like Spark in that writing code generates an execution graph which will be run by `sess.run`

### Execution Graph
- Nodes can be
    - Data nodes
    - Operations
    - Summaries
- Edges show the computational flow

### TensorBoard
TF has a visualization tool called TensorBoard which works by reading a log file (which we need to write to directly)

![TensorBoard Screenshot](img/TensorBoard.png)

We make a `writer` object and write `sess.graph` to our log file.

In [45]:
writer = tf.summary.FileWriter('logs', sess.graph)

### Distributed Execution
- Outlined in TF paper
- Exec graph is executed by session. Session can distribute this graph to a single node w/ multiple core or even multiple nodes with multiple cores

### Partial Exec
- Not all of the graph is necessarily executed, TF finds the minimal subset of execution steps to create the desired output

### [TensorFrames](https://github.com/databricks/tensorframes)
- Spark - TF crossover. Raghu said they were able to use it for their app but it is cutting edge (like all of ML) so they have to edit it themselves. 



## Machine Learning
- Great summary: "ML is taking in data and trying to model a target function"
    - "Target Function" can be Hotel pricing, image recognition, driving decisions
    - Data can be anything to drive the above functions: current market conditions, input images, vehicles speed, traffic conditions, etc.
- Good summary provided at [R2D3](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)

### Overfitting
- You can fit the training data perfectly but it will not generalize well

### High Dimensionality
- ML happens in many, many dimensions (almost always more than 3, could be hundreds)
- Visualizing and processing this data is *hard*

### Distributed
- Data is often too big to be centrally located, needs to be spread accross many machines

### ML 5 Approaches (from Pedro Domingo's [Master Algorithm](https://en.wikipedia.org/wiki/The_Master_Algorithm))
1. Bayesian Inference: $P(A|B) = \displaystyle \frac{P(B|A)P(A)}{P(B)}$
2. Kernel Machines: reccomender systems, collaborative filtering, clustering
3. Neural Networks: face recognition, fraud detection, computer vision, speech recognition, NLP
4. Reverse Induction: genetics
5. Genetic Programming: robotics

### Popular ML Algorithms
- Linear Regression
- Logistic Regression
- Random Forest
- Collaborative Filtering
- Support Vector Machine
- Deep Learning
    - Convolutional Neural Networks
    - Recurrent Neural Networks
    - LSTM
- Ensemble Learning

### ML Reccomendation
- Andrew Ng Coursera Course
- Machine Learning: A Probabilistic Approach

## Linear Regression TF  Demo

In [67]:
import tensorflow as tf
import numpy as np

#### Create Synthetic Data: Four Features

Simulate data according to:

$y = x_1 + 2x_2 + 3x_3 + 4x_4$

In [68]:
a = np.array([1,2,3,4],dtype=np.float32)
XX = np.random.rand(10000, 4)
yy = np.dot(XX, a.transpose())

#### Build out the algorithm

In [71]:
# Creates a graph.
config = tf.ConfigProto(allow_soft_placement=True)
with tf.device('/gpu:0'):
    X = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)

    # our model
    weights = tf.Variable(tf.random_uniform([4,1]))

    # Predictor
    pred = tf.matmul(X, weights)

    # Cost function
    cost = tf.reduce_mean(tf.square(tf.sub(pred,y)))

    # Gradient function
    grads = tf.gradients(cost, [weights])

    step = tf.constant(1e-1)

    train = tf.assign_add(weights, tf.mul(-step, grads[0]))

    with tf.Session(config= config) as sess:
        sess.run(tf.global_variables_initializer())

        for i in range(100):
            sess.run(train, feed_dict={X: XX, y: yy})

        print(sess.run(weights))

        # write to log
        writer = tf.summary.FileWriter('logs', sess.graph)

[[ 2.28847265]
 [ 2.23842406]
 [ 2.26711798]
 [ 2.43902564]]
