# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Can-a-Neural-Network-Fit-Random-Data?" data-toc-modified-id="Can-a-Neural-Network-Fit-Random-Data?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Can a Neural Network Fit Random Data?</a></div><div class="lev2 toc-item"><a href="#Training-and-Saving" data-toc-modified-id="Training-and-Saving-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Training and Saving</a></div><div class="lev2 toc-item"><a href="#Loading" data-toc-modified-id="Loading-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Loading</a></div><div class="lev2 toc-item"><a href="#Old-School-Saving-and-Loading" data-toc-modified-id="Old-School-Saving-and-Loading-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Old School Saving and Loading</a></div><div class="lev1 toc-item"><a href="#My-Notes" data-toc-modified-id="My-Notes-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>My Notes</a></div><div class="lev2 toc-item"><a href="#Unsolved-Mysteries" data-toc-modified-id="Unsolved-Mysteries-21"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Unsolved Mysteries</a></div><div class="lev3 toc-item"><a href="#DISAMBIGUATING-GRAPH,-METAGRAPH,-VARIABLES,-OPS,-AND-MORE" data-toc-modified-id="DISAMBIGUATING-GRAPH,-METAGRAPH,-VARIABLES,-OPS,-AND-MORE-211"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>DISAMBIGUATING GRAPH, METAGRAPH, VARIABLES, OPS, AND MORE</a></div><div class="lev2 toc-item"><a href="#SavedModelBuilder" data-toc-modified-id="SavedModelBuilder-22"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>SavedModelBuilder</a></div><div class="lev3 toc-item"><a href="#Trimmed-Impl-of-SavedModelBuilder" data-toc-modified-id="Trimmed-Impl-of-SavedModelBuilder-221"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Trimmed Impl of SavedModelBuilder</a></div><div class="lev2 toc-item"><a href="#tf.train.Saver" data-toc-modified-id="tf.train.Saver-23"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>tf.train.Saver</a></div><div class="lev2 toc-item"><a href="#Misc" data-toc-modified-id="Misc-24"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Misc</a></div><div class="lev1 toc-item"><a href="#Copy-Pastes-for-Plane-Ride" data-toc-modified-id="Copy-Pastes-for-Plane-Ride-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Copy-Pastes for Plane Ride</a></div><div class="lev2 toc-item"><a href="#SavedModelBuilder" data-toc-modified-id="SavedModelBuilder-31"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>SavedModelBuilder</a></div><div class="lev2 toc-item"><a href="#Loader" data-toc-modified-id="Loader-32"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Loader</a></div>

[Link to mnist.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist.py)

# Can a Neural Network Fit Random Data?

In [1]:
import tensorflow as tf
import numpy as np
import os

num_features = 28
num_classes = 5

batch_size = 32
hidden_size = 64

num_batches = 50
fake_inputs = np.random.random(size=(num_batches, batch_size, num_features))
fake_labels = np.random.randint(0, num_classes, size=(num_batches, batch_size))

In [2]:
# Use tf.placeholder for values we will set in sess.run() calls. 
input_layer = tf.placeholder(
    dtype=tf.float32, shape=(None, num_features), name='input_layer')
# Define the two layers with ReLU activation functions.
hidden_layer_1 = tf.layers.dense(
    inputs=input_layer, units=hidden_size, activation=tf.nn.relu, name='hidden_layer_1')
hidden_layer_2 = tf.layers.dense(
    inputs=hidden_layer_1, units=hidden_size, activation=tf.nn.relu, name='hideen_layer_2')
# Project to output layer of dimensionality equal to the number of classes to predict.
# This layer is often called the "logits". 
output_layer = tf.layers.dense(
    inputs=hidden_layer_2, units=num_classes, name='output_layer')

Define our loss function and training operation.

In [3]:
labels = tf.placeholder(dtype=tf.int32, shape=(None,), name='labels')
onehot_labels = tf.one_hot(labels, num_classes, name='onehot_labels')

loss = tf.losses.softmax_cross_entropy(
    onehot_labels=onehot_labels,
    logits=output_layer)
global_step = tf.get_variable(
    'global_step', shape=(), dtype=tf.int32, trainable=False)
train_op = tf.contrib.layers.optimize_loss(
    loss=loss,
    global_step=global_step,
    learning_rate=0.0001,
    optimizer='Adam')

We will evaluate our model by computing the number of correct predictions it makes in a given batch.

In [4]:
correct = tf.nn.in_top_k(output_layer, labels, k=1)
num_correct = tf.reduce_sum(tf.to_int32(correct))

Place the objects we want to use in future `sess.run` calls in collections, so that we can easily access them upon loading models from file.

In [5]:
tf.add_to_collection('fetches', num_correct)
tf.add_to_collection('fetches', train_op)

tf.add_to_collection('feed_dict', input_layer)
tf.add_to_collection('feed_dict', labels)

## Training and Saving

In [6]:
num_epochs = 200
export_dir = 'out'

def run_training_epoch(sess):
    input_layer, labels = tf.get_collection('feed_dict')
    num_correct_total = 0
    for batch_idx in range(num_batches):
        outputs = sess.run(tf.get_collection('fetches'), feed_dict={
            input_layer: fake_inputs[batch_idx],
            labels: fake_labels[batch_idx]})
        num_correct_total += outputs[0]
    return num_correct_total

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for epoch_idx in range(num_epochs):
        num_correct_total = run_training_epoch(sess)
        if epoch_idx % 100 == 0:
            print('Epoch {}: accuracy={:.3%}'.format(
                epoch_idx, float(num_correct_total) / (num_batches * batch_size)))
            
    builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
    builder.add_meta_graph_and_variables(
        sess=sess,
        tags=[tf.saved_model.tag_constants.TRAINING])
    builder.save()

Epoch 0: accuracy=20.125%
Epoch 100: accuracy=35.062%
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'out/saved_model.pb'


## Loading

[loader.py](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/python/saved_model/loader.py)

In [7]:
tf.reset_default_graph()

In [8]:
with tf.Session() as sess:
    tf.saved_model.loader.load(
        sess=sess,
        tags=[tf.saved_model.tag_constants.TRAINING], 
        export_dir=export_dir)
    
    for epoch_idx in range(num_epochs):
        num_correct_total = run_training_epoch(sess)
        if epoch_idx % 100 == 0:
            print('Epoch {}: accuracy={:.3%}'.format(epoch_idx, float(num_correct_total) / (num_batches * batch_size)))


INFO:tensorflow:Restoring parameters from b'out/variables/variables'
Epoch 0: accuracy=41.125%
Epoch 100: accuracy=47.188%


## Old School Saving and Loading

In [None]:
saver = tf.train.Saver()
with tf.Session() as sess:
    save_path = saver.save(sess, 'out/0/saver.ckpt')

In [None]:
tf.reset_default_graph()
v1 = tf.get_variable('v1', shape=[3]) # create ur graph as usual
saver = tf.train.Saver() # even create saver as usual yo!
with tf.Session() as sess:
    saver.restore(sess, '/tmp/model.ckpt') # oh shit whaddup
    # check ur vars breh
    print('v1 :', v1.eval())

[tf.train.import_meta_graph](https://www.tensorflow.org/api_docs/python/tf/train/import_meta_graph)

Recreates a Graph saved in a MetaGraphDef proto.

This function takes a MetaGraphDef protocol buffer as input. If the argument is a file containing a MetaGraphDef protocol buffer , it constructs a protocol buffer from the file content. The function then adds all the nodes from the graph_def field to the current graph, recreates all the collections, and returns a saver constructed from the saver_def field.

In combination with export_meta_graph(), this function can be used to

Serialize a graph along with other Python objects such as QueueRunner, Variable into a MetaGraphDef.
Restart training from a saved graph and checkpoints.
Run inference from a saved graph and checkpoints.

In [None]:
...
# Create a saver.
saver = tf.train.Saver(...variables...)
# Remember the training_op we want to run by adding it to a collection.
tf.add_to_collection('train_op', train_op)
sess = tf.Session()
for step in xrange(1000000):
    sess.run(train_op)
    if step % 1000 == 0:
        # Saves checkpoint, which by default also exports a meta_graph
        # named 'my-model-global_step.meta'.
        saver.save(sess, 'my-model', global_step=step)
        
# TWO....DAYS...LATER...

with tf.Session() as sess:
  new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta')
  new_saver.restore(sess, 'my-save-dir/my-model-10000')
  # tf.get_collection() returns a list. In this example we only want the
  # first one.
  train_op = tf.get_collection('train_op')[0]
  for step in xrange(1000000):
    sess.run(train_op)

> NOTE: Restarting training from saved meta_graph only works if the device assignments have not changed.



# My Notes

## Unsolved Mysteries

> When you want to save and load variables, the graph, and the graph's metadata.

There is some redundancy/sloppiness going on in that sentence, right? What's the difference between each of those exactly? Ok, it is about time I dive into this rabbit hole:

------------------------------------

### DISAMBIGUATING GRAPH, METAGRAPH, VARIABLES, OPS, AND MORE

__tf.Graph__: "A Graph contains a set of 
- tf.Operation objects, which represent units of computation
- tf.Tensor objects, which represent the units of data that flow between operations."

Ok if you actually read the code (and protos), you find that, more technically, _a Graph is a set of Nodes, and a Node is mainly defined by an Operation and the input tensor names for that operation._

OK I THINK I GET IT NOW (after reading protos below): A SavedModelBuilder literally saves a list of MetaGraphDefs, each of which specifies:
1. A GraphDef. This is literally a list of NodeDefs, each of which specify a tf.Operation and a set of tf.Tensors that are fed as input to the operation. Notice how this is purely _structural_ and isn't concerned at all with particular values for any of those tensors (CRUCIAL TO UNDERSTAND THAT SENTENCE). 
2. A SaverDef. Tells how to save and restore __variables__. This is how we get access to the particular values of tensors in our graph (via their corresponding tf.Variable). 
3. A {string => CollectionDef} map. 
4. A {string => SignatureDef} map. A SignatureDef contains two {string => TensorInfo} maps, one for inputs and one for outputs.
5. A list of AssetFileDefs.


-------------------

## SavedModelBuilder

- [saved_model.proto](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/core/protobuf/saved_model.proto)
- [model_builder_impl.py](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/python/saved_model/builder_impl.py#L39)


```c
message SavedModel {
    // CTor just inits this to tf.saved_model.constants.SAVED_MODEL_SCHEMA_VERSION
    int64 saved_model_schema_version = 1;
    repeated MetaGraphDef meta_graphs = 2;
}

message MetaGraphDef {
    MetaInfoDef meta_info_def = 1;
    
    // GRAPH DEF AND NODE DEF
    message GraphDef {
        message NodeDef {
            string name = 1;
            string op = 2;
            repeated string input = 3;
            string device = 4;
            map<string, AttrValue> attr = 5;
        }
        repeated NodeDef node = 1;
        VersionDef versions = 2;
    }
    GraphDef graph_def = 2;
    
    // SAVER DEF
    message SaverDef {
        string filename_tensor_name = 1;
        string save_tensor_name = 2; // saving Operation()
        string restore_op_name = 3;
        int32 max_to_keep = 4;
        bool sharded = 5;
        float keep_checkpoint_ever_n_hours = 6;
    }
    SaverDef saver_def = 3;

    // COLLECTION DEF
    message CollectionDef {
        message NodeList { repeated string value = 1; }
        message BytesList { repeated bytes value = 1; }
        message Int64List { repeated int64 value = 1 [packed = true]; }
        message FloatList { repeated float value = 1 [packed = true]; }
        message AnyList { repeated google.protobuf.Any value = 1; }
        
        oneof kind {
            NodeList node_list = 1;
            BytesList bytes_list = 2;
            Int64List int64_list = 3;
            FloatList float_list = 4;
            AnyList any_list = 5;
        }
    }
    map<string, CollectionDef> collection_def = 4;
    
    // SIGNATURE DEF
    message SignatureDef {
        map<string, TensorInfo> inputs = 1;
        map<string, TensorInfo> outputs = 2;
    }
    map<string, SignatureDef> signature_def = 5;
    
    // ASSET FILE DEF AND TENSOR INFO
    message AssetFileDef {
        message TensorInfo {
            message CooSparse { values_tensor_name, indices_tensor_name, dense_shape_tensor_name } // pseudo-proto
            oneof encoding {
                string name; // For dense Tensors
                CooSparse coo_sparse; // COO encoding for sparse tensors
            }
            DateType dtype;
            TensorShapeProto tensor_shape;
        }
        TensorInfo tensor_info = 1;
        string filename = 2;
    }
    repeated AssetFileDef asset_file_def = 6;
}

// ------------------------------------
// GRAPH STUFF
// -----------------------------------




```

### Trimmed Impl of SavedModelBuilder

In [None]:
from tensorflow.core.protobuf import *
from tensorflow.python.saved_model import constants

class SavedModelBuilder:
    
    def __init__(self, export_dir):
        self._saved_model = saved_model_pb2.SavedModel(
            saved_model_schema_version=constants.SAVED_MODEL_SCHEMA_VERSION)
        self._export_dir = export_dir # Real impl checks that it doesnt exist & then makes it.
        
    def add_meta_graph_and_variables(self, sess, tags,
                                    signature_def_map=None,
                                    assets_collections=None,
                                    legacy_init_op=None,
                                    clear_devices=False,
                                    main_op=None):
        
        if main_op is not None: self._maybe_add_legacy_init_op(legacy_init_op)
        else: self._add_main_op(main_op)
            
        saver = tf.train.Saver(
            var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) + tf.get_collection(tf.GraphKeys.SAVEABLE_OBJECTS),
            sharded=True, 
            write_version=saver_pb2.SaverDef.V2,
            allow_empty=True)
        
        # Save the variables.
        saver.save(sess, variables_path, write_meta_graph=False, write_state=False)
        # Export the meta graph def.
        meta_graph_def = saver.export_meta_graph_def(clear_devices=clear_devices)   
        # Tag the meta graph def and add it to the SavedModel.
        self._tag_and_add_meta_graph(meta_graph_def, tags, signature_def_map)

In [None]:
saved_model_proto = saved_model_pb2.SavedModel()
meta_graph_proto = meta_graph_pb2()

## tf.train.Saver

- [API Documentation](https://www.tensorflow.org/api_docs/python/tf/train/Saver)

Saves and restores variables.

The Saver class adds ops to save and restore variables to and from checkpoints. 
It also provides convenience methods to run these ops.

Checkpoints are binary files in a proprietary format which map variable names to tensor values. 
The best way to examine the contents of a checkpoint is to load it using a Saver.

Savers can automatically number checkpoint filenames with a provided counter. 
This lets you keep multiple checkpoints at different steps while training a model. 
For example you can number the checkpoint filenames with the training step number. 
To avoid filling up disks, savers manage checkpoint files automatically. 
For example, they can keep only the N most recent files, or one checkpoint for every N hours of training.

You number checkpoint filenames by passing a value to the optional global_step argument to save():

```python
saver.save(sess, 'my-model', global_step=0) ==> filename: 'my-model-0'
...
saver.save(sess, 'my-model', global_step=1000) ==> filename: 'my-model-1000'
```
Additionally, optional arguments to the Saver() constructor let you control the proliferation of checkpoint files on disk:
* max_to_keep indicates the maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, all checkpoint files are kept. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.)
* keep_checkpoint_every_n_hours: In addition to keeping the most recent max_to_keep checkpoint files, you might want to keep one checkpoint file for every N hours of training. This can be useful if you want to later analyze how a model progressed during a long training session. For example, passing keep_checkpoint_every_n_hours=2 ensures that you keep one checkpoint file for every 2 hours of training. The default value of 10,000 hours effectively disables the feature.


Note that you still have to call the save() method to save the model. Passing these arguments to the constructor will not save variables automatically for you.

A training program that saves regularly looks like:

```python
# Create a saver.
saver = tf.train.Saver(...variables...)
# Launch the graph and train, saving the model every 1,000 steps.
sess = tf.Session()
for step in xrange(1000000):
    sess.run(..training_op..)
    if step % 1000 == 0:
        # Append the step number to the checkpoint name:
        saver.save(sess, 'my-model', global_step=step)
```

In addition to checkpoint files, savers keep a protocol buffer on disk with the list of recent checkpoints. This is used to manage numbered checkpoint files and by latest_checkpoint(), which makes it easy to discover the path to the most recent checkpoint. That protocol buffer is stored in a file named 'checkpoint' next to the checkpoint files.

If you create several savers, you can specify a different filename for the protocol buffer file in the call to save().

__init__ :
```python
__init__(
    var_list=None,
    reshape=False,
    sharded=False,
    max_to_keep=5,
    keep_checkpoint_every_n_hours=10000.0,
    name=None,
    restore_sequentially=False,
    saver_def=None,
    builder=None,
    defer_build=False,
    allow_empty=False,
    write_version=tf.train.SaverDef.V2,
    pad_step_number=False,
    save_relative_paths=False,
    filename=None
)
```

## Misc

REALLY USEFUL DESCRIPTION OF DEVICE NAME SEMANTICS IN [node_def.proto](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/core/framework/node_def.proto):

```c
  // A (possibly partial) specification for the device on which this
  // node should be placed.
  // The expected syntax for this string is as follows:
  //
  // DEVICE_SPEC ::= PARTIAL_SPEC
  //
  // PARTIAL_SPEC ::= ("/" CONSTRAINT) *
  // CONSTRAINT ::= ("job:" JOB_NAME)
  //              | ("replica:" [1-9][0-9]*)
  //              | ("task:" [1-9][0-9]*)
  //              | ( ("gpu" | "cpu") ":" ([1-9][0-9]* | "*") )
  //
  // Valid values for this string include:
  // * "/job:worker/replica:0/task:1/device:GPU:3"  (full specification)
  // * "/job:worker/device:GPU:3"                   (partial specification)
  // * ""                                    (no specification)
  //
  // If the constraints do not resolve to a single device (or if this
  // field is empty or not present), the runtime will attempt to
  // choose a device automatically.
```

Note: may be useful to see how Estimators save their shit...

# Copy-Pastes for Plane Ride

## SavedModelBuilder

In [None]:
export_dir = ...
...
builder = tf.saved_model_builder.SavedModelBuilder(export_dir)
with tf.Session(graph=tf.Graph()) as sess:
  ...
  builder.add_meta_graph_and_variables(sess,
                                       [tag_constants.TRAINING],
                                       signature_def_map=foo_signatures,
                                       assets_collection=foo_assets)
...
# Add a second MetaGraphDef for inference.
with tf.Session(graph=tf.Graph()) as sess:
  ...
  builder.add_meta_graph([tag_constants.SERVING])
...
builder.save()

## Loader

In [None]:
export_dir = ...
...
with tf.Session(graph=tf.Graph()) as sess:
  tf.saved_model.loader.load(sess, [tag_constants.TRAINING], export_dir)
  ...