# Tensorflow

### Pros
- low level & flexible
- portable
- auto differentiation
- distributed 
- based on creation of **computational graphs**
- can be run on variety of platforms (mobile phones, CPU, GPU, TPU)
- good for lots of data, even with SVM, Decision trees, etc.

### Cons
- not easy to debug

## Steps
1. define graph
2. initialize variables
3. run, feeding into placeholders
4. evaluate

## Components
- computational graph
- Sessions
- Scopes, Names: ways of defining what's in the graph (layers, etc)
- Placeholders, Variables, Feed_dict: ways of moving around graph (adding things to graph on the fly, etc)

#### Computational graph
- Basic unit: OP Gate (multiply two things)
- Composition: linear combination (OP gate plus bias)
- MatMul, Add, Relu, Xent (cross-entropy) are all nodes in computational graph
- Tensor conventions
    - fully connected
        - Batch_size x Vector length (batch size is always first)
        - e.g. [None, 10] means batch size can be defined as an argument when running, and vector length is 10
    - Convolutional
        - Batch size x H x W x #channels
        - e.g. [None ,32,32, 3]
        
#### Sessions
- operations are run in a session

#### Distributed
- Tensorflow can allocate some operations on CPU and some on GPU
- e.g. pre-processing, augmentation and transfer on CPU, training on GPU
- Keras does the same thing but you don't have as much of an ability to change it

### Tensorflow Record

https://www.tensorflow.org/programmers_guide/datasets

### Visualize graphs
https://web.stanford.edu/class/cs20si/2017/lectures/slides_02.pdf

### Interactive session
- if you don't want to say sess.run() all the time, 
        sess = tf.InteractiveSession()
        x.eval()

# data objects
### Variables (WEIGHTS)
- are things to be learned and updated, e.g. weights and biases
- since this is not a Constant, you can actually a.assign(3.5) to it later
- so for weights, you can re-assign weights later

### Placeholders (INPUTS)
- are meant to be fed in onced and used during node execution, e.g. inputs
        tf.placeholder(dtype=tf.float32, shape=(2,2))
        
        or
        
        x = tf.placeholder("float", [None, n_input])
        y = tf.placeholder("float", [None, n_classes])

### Constants
- are not used very often for neural networks


### Tensors
- can reach them 
        g = tf.get_default_graph()
        
### Estimator
- a tf equivalent of a Keras model 
- a higher level object representing your model
- parts of estimator
    - session, 
    - graph, 
    - loss, 
    - optimizer
    - feature columns
    - initialization
    - partitioning
    - TrainSpec, EvalSpec
- methods of an estimator
    1. train()
    2. evaluate()
    3. predict()
    4. export_savedmodel()
- requirements of an estimator:
    1. input function
    2. 

### tf.data
         dataset=tf.data.Dataset.list_files("/data/*")
             .map(decode_image)
             .shuffle(SHUFFLE_BUFFER_SIZE)
             .batch(BATCH_SIZE)
             
- alternative to feed_dict
- use this when you can't load dataset into memory
- can use this to deploy to multiple GPUs
- define a pipeline that generates batches, which you feed into training (below)
- this works because you're doing all operations on **pointers** of images, not the images themselves; then loading as needed in batches

        ### create dataset
        files = tf.data.Dataset.list_files(file_pattern)
        dataset = tf.data.TFRecordDataset(files)
        
        ### order of the following 4 lines doesn't matter (will produce different output, though)
        dataset = dataset.shuffle(10000)  # element of the dataset is a shard, and you're shuffling shards
        dataset = dataset.repeat(NUM_EPOCHS)
        dataset = dataset.map(lambda x: tf.parse_single_example(x, features)) # pulls single example from each shard, and here "features" shows how to parse a single example (schema)
        dataset = dataset.batch(BATCH_SIZE) # batches all single examples, one from each shard
        
        ### generate iterator of batches
        iterator = dataset.make_one_shot_iterator()
        features = iterator.get_next()  # get one batch
        
        
### Feature columns
- can do one-hot
        tags = categorical_column_with_vocabulary_list('tags', ['a', 'b', 'c'...])
        tags_one_hot = indicator_column(tags)
- or embedding
- or hashing
- etc.
        
### GPUs
- started trying to have GPUs consume from queue, but it was too hard for people
- so created dataset api

## Using tf.keras
- define a model in keras, use tf.keras.estimator.model_to_estimator to convert to tf estimator before fitting (after defining)

In [None]:
import tensorflow as tf

L = tf.keras.layers

model=tf.keras.Sequential([L.reshape((28,28,1)),
                          L.Conv2D(32, 5, activation=t...]
                           L.MaxPooling2D(),
                           L.)

## Eager execution
- prototyping mode
- allows you to see gradients after batches, do model introspection
- once you try different architectures and settle on one, do full training in core mode

### When eager mode is activated
- when you print a constant, it actually prints the constant and not just the pointer
- you can extend the model class to create your own model (see slide before "Gradient Tape")
- then when you instantiate the model, you can immediately run it



## Checking your work (see slides)
- tf.parse_single_example
- iterator = dataset.make_one_shot_iterator()
- features = iterator.get_next()

### Gradient tape
- can do backpropagation on demand to check outputs
- need this because in eager mode you don't have a graph
- so temporarily record gradients

# Distributed deployment

https://www.tensorflow.org/deploy/distributed

- can do data-parallel (synchronous data parallel)
    - split each batch in 4 and send to 4 GPUs
    - all do computation and send back losses
    - average all losses and update weights
- asynchronous data parallel
    - GPUs send back updates asynchronously
    - more noisy
- or with multi-branch models, split parts of models into different GPUsj


**Note: ** GPU is most important for deployment/serving/inference, not necessarily for training

#### How to switch from GPU to CPU and vice versa:
You can also use this to put different parts of the model on CPU and GPU

        with tf.device('gpu:0'):
            model.compile(loss='binary'...)