<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Understanding-TensorFlow-2.x" data-toc-modified-id="Understanding-TensorFlow-2.x-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Understanding TensorFlow 2.x</a></span><ul class="toc-item"><li><span><a href="#eager-execution," data-toc-modified-id="eager-execution,-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>eager execution,</a></span></li></ul></li><li><span><a href="#Keras-APIs-–-three-programming-models" data-toc-modified-id="Keras-APIs-–-three-programming-models-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Keras APIs – three programming models</a></span><ul class="toc-item"><li><span><a href="#Sequential-API" data-toc-modified-id="Sequential-API-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Sequential API</a></span></li><li><span><a href="#Functional-API" data-toc-modified-id="Functional-API-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Functional API</a></span></li><li><span><a href="#Model-subclassing" data-toc-modified-id="Model-subclassing-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Model subclassing</a></span></li></ul></li><li><span><a href="#Converting-from-1.x-to-2.x" data-toc-modified-id="Converting-from-1.x-to-2.x-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Converting from 1.x to 2.x</a></span></li><li><span><a href="#Using-TensorFlow-2.x-effectively" data-toc-modified-id="Using-TensorFlow-2.x-effectively-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Using TensorFlow 2.x effectively</a></span></li><li><span><a href="#Callbacks" data-toc-modified-id="Callbacks-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Callbacks</a></span></li><li><span><a href="#Saving-a-model-and-weights" data-toc-modified-id="Saving-a-model-and-weights-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Saving a model and weights</a></span></li><li><span><a href="#tf.keras-or-Estimators?" data-toc-modified-id="tf.keras-or-Estimators?-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>tf.keras or Estimators?</a></span></li><li><span><a href="#Ragged-tensors" data-toc-modified-id="Ragged-tensors-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Ragged tensors</a></span></li></ul></div>

# Understanding TensorFlow 2.x
## eager execution,
meaning that the model definitions are dynamic, and the execution is immediate.
Graphs and sessions should be considered as implementation details. The good news is that TensorFlow 2.x natively supports "eager execution."
There is no longer the need to first statically define a computational graph and
then execute it (unless you really wanted to!). All the models can be dynamically
defined and immediately executed. 

AutoGraph comes into play: AutoGraph takes eager-style Python code
and automatically converts it to graph-generating code.

# Keras APIs – three programming models
## Sequential API

Sequential API when we discussed the MNIST code.

## Functional API

In [1]:
import tensorflow as tf
from keras.utils import plot_model


def build_model():
    # variable-length sequence of integers
    text_input_a = tf.keras.Input(shape=(None, ), dtype='int32')
    # variable-length sequence of integers
    text_input_b = tf.keras.Input(shape=(None, ), dtype='int32')
    # Embedding for 1000 unique words mapped to 128-dimensional vectors
    shared_embedding = tf.keras.layers.Embedding(1000, 128)
    # We reuse the same layer to encode both inputs
    encoded_input_a = shared_embedding(text_input_a)
    encoded_input_b = shared_embedding(text_input_b)
    # two logistic predictions at the end
    prediction_a = tf.keras.layers.Dense(1,
                                         activation='sigmoid',
                                         name='prediction_a')(encoded_input_a)
    prediction_b = tf.keras.layers.Dense(1,
                                         activation='sigmoid',
                                         name='prediction_b')(encoded_input_b)
    # this model has 2 inputs, and 2 outputs
    # in the middle we have a shared model
    model = tf.keras.Model(inputs=[text_input_a, text_input_b],
                           outputs=[prediction_a, prediction_b])
#     tf.keras.utils.plot_model(model, to_file="./shared_model.png")


build_model()

Using TensorFlow backend.


<img src="./i/shared_model.png" />

## Model subclassing

Model subclassing offers the highest flexibility and it is generally used when
you need to define your own layer. In other words, it is useful when you are in
the business of building your own special lego brick instead of composing more
standard and well-known bricks.

    • __init__: Optionally used to define all the sublayers to be used by this layer.
    This is the constructor where you can declare your model.
    • build: Used to create the weights of the layer. You can add weights with
    add_weight().
    • call: Used to define the forward pass. This is where your layer is called and
    chained in functional style.
    • Optionally, a layer can be serialized by using get_config() and deserialized
    using from_config().

In [4]:
class MyLayer(layers.Layer):
    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)
    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.kernel = self.add_weight(name='kernel',
        shape=(input_shape[1], self.output_dim),
        initializer='uniform',
        trainable=True)
    def call(self, inputs):
    # Do the multiplication and return
        return tf.matmul(inputs, self.kernel)
    
    
model = tf.keras.Sequential([
            MyLayer(20),
            layers.Activation('softmax')])

# Converting from 1.x to 2.x
TensorFlow 1.x scripts will not work directly with TensorFlow 2.x but they need
converting. The first step to convert from 1.x to 2.x is to use the automatic conversion
script installed with 2.x. For a single file, you can run it with:

tf_upgrade_v2 --infile tensorfoo.py --outfile tensorfoo-upgraded.py

For multiple files in a directory, the syntax is:

tf_upgrade_v2 --intree incode --outtree code-upgraded

The script will try to upgrade automatically to 2.x and will print error messages
where it is not able to upgrade.

# Using TensorFlow 2.x effectively
2.x native code should follow a number of best practices:

    1. Default to higher-level APIs such as tf.keras (or in certain situations,
    Estimators) and avoid lower-level APIs with direct computational graph
    manipulation unless needed for custom operations. So, in general, no
    tf.Session, tf.Session.run.
    
    2. Add a tf.function decorator to make it run efficiently in graph mode with
    AutoGraph. Only use tf.function to decorate high-level computations; all
    functions invoked by high-level computations are automatically annotated
    on your behalf. In this way, you get the best of both worlds: high-level APIs
    with eager support, and the efficiency of computational graphs.
    
    3. Use Python objects to track variables and losses. So, be Pythonic and use
    tf.Variable instead of tf.get_variable. In this way, variables will be
    treated with the normal Python scope.
    
    4. Use tf.data datasets for data inputs and provide these objects directly
    to tf.keras.Model.fit. In this way, you will have a collection of highperformance classes for manipulating data and will adopt the best way to
    stream training data from disk.
    
    5. Use tf.layers modules to combine predefined "lego bricks" whenever it
    is possible, either with Sequential or Functional APIs, or with Subclassing.
    Use Estimators if you need to have production-ready models, in particular
    if these models need to scale on multiple GPUs, CPUs, or on multiple servers.
    When needed, consider converting a tf.keras model into an Estimator.
    
    6. Consider using a distribution strategy across GPUs, CPUs, and multiple
    servers. With tf.keras it is easy

# Callbacks

Callbacks are objects passed to a model to extend or modify behaviors during
training. There are a few useful callbacks that are commonly used in tf.keras:
1.  **tf.keras.callbacks.ModelCheckpoint**: This feature is used to save checkpoints of your model at regular intervals and recover in case of problems.

2. **tf.keras.callbacks.LearningRateScheduler**: This feature is used to dynamically change the learning rate during optimization.
3. **tf.keras.callbacks.EarlyStopping**: This feature is used to interrupt training when validation performance has stopped improving after a while.
4. **tf.keras.callbacks.TensorBoard**: This feature is used to monitor the model's behavior using TensorBoard

# Saving a model and weights
After training a model, it can be useful to save the weights in a persistent way. This
is easily achieved with the following code fragment, which saves to TensorFlow's
internal format:

```python
# Save weights to a Tensorflow Checkpoint file
model.save_weights('./weights/my_model')

# Save weights to a HDF5 file
model.save_weights('my_model.h5', save_format='h5')


# Save weights to a Tensorflow Checkpoint file
model.save_weights('./weights/my_model')

# Save weights to a HDF5 file
model.save_weights('my_model.h5', save_format='h5')

# Save weights to a Tensorflow Checkpoint file
model.save_weights('./weights/my_model')

# Save weights to a HDF5 file
model.save_weights('my_model.h5', save_format='h5')

```

# Training from tf.data.datasets
Another benefit of using TensorFlow 2.x is the introduction of TensorFlow datasets
as a principled mechanism to deal with heterogeneous (large) datasets in different
categories such as audio, image, video, text, and translation. Let's first use pip to
install tensorflow-datasets:


In [9]:
# pip install tensorflow-datasets

import tensorflow as tf
import tensorflow_datasets as tfds
# See all registered datasets
builders = tfds.list_builders()
print(builders)
# Load a given dataset by name, along with the DatasetInfo metadata
data, info = tfds.load("mnist", with_info=True)
train_data, test_data = data['train'], data['test']
print(info)

# tf.keras or Estimators?
In addition to the direct graph computation and to the tf.keras higher-level APIs,
TensorFlow 1.x and 2.x have an additional set of higher-level APIs called Estimators.
With Estimators, you do not need to worry about creating computational graphs or
handling sessions, since Estimators deal with this on your behalf, in a similar way
to tf.keras.

But what are Estimators? Put simply, they are another way to build or to use prebuilt
bricks. A longer answer is that they are highly efficient learning models for largescale production-ready environments, which can be trained on single machines
or on distributed multi-servers, and they can run on CPUs, GPUs, or TPUs without
recoding your model. These models include Linear Classifiers, Deep Learning
Classifiers, Gradient Boosted Trees, and many more, which will be discussed in
the upcoming chapters.

In [8]:
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()

import os, json

BUFFER_SIZE = 10000
BATCH_SIZE = 64

def input_fn(mode, input_context=None):
  datasets, info = tfds.load(name='mnist',
                                with_info=True,
                                as_supervised=True)
  mnist_dataset = (datasets['train'] if mode == tf.estimator.ModeKeys.TRAIN else
                   datasets['test'])

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255
    return image, label

  if input_context:
    mnist_dataset = mnist_dataset.shard(input_context.num_input_pipelines,
                                        input_context.input_pipeline_id)
  return mnist_dataset.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

LEARNING_RATE = 1e-4
def model_fn(features, labels, mode):
  model = tf.keras.Sequential([
      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
      tf.keras.layers.MaxPooling2D(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(10)
  ])
  logits = model(features, training=False)

  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {'logits': logits}
    return tf.estimator.EstimatorSpec(labels=labels, predictions=predictions)

  optimizer = tf.compat.v1.train.GradientDescentOptimizer(
      learning_rate=LEARNING_RATE)
  loss = tf.keras.losses.SparseCategoricalCrossentropy(
      from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, logits)
  loss = tf.reduce_sum(loss) * (1. / BATCH_SIZE)
  if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(mode, loss=loss)

  return tf.estimator.EstimatorSpec(
      mode=mode,
      loss=loss,
      train_op=optimizer.minimize(
          loss, tf.compat.v1.train.get_or_create_global_step()))

strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

config = tf.estimator.RunConfig(train_distribute=strategy)

classifier = tf.estimator.Estimator(
    model_fn=model_fn, model_dir='/tmp/multiworker', config=config)
tf.estimator.train_and_evaluate(
    classifier,
    train_spec=tf.estimator.TrainSpec(input_fn=input_fn),
    eval_spec=tf.estimator.EvalSpec(input_fn=input_fn)
)

INFO:tensorflow:Single-worker CollectiveAllReduceStrategy with local_devices = ('/device:GPU:0',), communication = CollectiveCommunication.AUTO
INFO:tensorflow:Initializing RunConfig with distribution strategies.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/multiworker', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': <tensorflow.python.distribute.collective_all_reduce_strategy.CollectiveAllReduceStrategy object at 0x000001C2E899AA48>, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_servi

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/multiworker\model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/multiworker\model.ckpt.


INFO:tensorflow:loss = 2.3064291, step = 0


INFO:tensorflow:loss = 2.3064291, step = 0


INFO:tensorflow:global_step/sec: 155.759


INFO:tensorflow:global_step/sec: 155.759


INFO:tensorflow:loss = 2.288177, step = 100 (0.643 sec)


INFO:tensorflow:loss = 2.288177, step = 100 (0.643 sec)


INFO:tensorflow:global_step/sec: 313.478


INFO:tensorflow:global_step/sec: 313.478


INFO:tensorflow:loss = 2.2941022, step = 200 (0.319 sec)


INFO:tensorflow:loss = 2.2941022, step = 200 (0.319 sec)


INFO:tensorflow:global_step/sec: 309.599


INFO:tensorflow:global_step/sec: 309.599


INFO:tensorflow:loss = 2.2867157, step = 300 (0.323 sec)


INFO:tensorflow:loss = 2.2867157, step = 300 (0.323 sec)


INFO:tensorflow:global_step/sec: 308.641


INFO:tensorflow:global_step/sec: 308.641


INFO:tensorflow:loss = 2.287331, step = 400 (0.325 sec)


INFO:tensorflow:loss = 2.287331, step = 400 (0.325 sec)


INFO:tensorflow:global_step/sec: 296.736


INFO:tensorflow:global_step/sec: 296.736


INFO:tensorflow:loss = 2.277947, step = 500 (0.336 sec)


INFO:tensorflow:loss = 2.277947, step = 500 (0.336 sec)


INFO:tensorflow:global_step/sec: 289.855


INFO:tensorflow:global_step/sec: 289.855


INFO:tensorflow:loss = 2.299162, step = 600 (0.346 sec)


INFO:tensorflow:loss = 2.299162, step = 600 (0.346 sec)


INFO:tensorflow:global_step/sec: 287.356


INFO:tensorflow:global_step/sec: 287.356


INFO:tensorflow:loss = 2.2868636, step = 700 (0.347 sec)


INFO:tensorflow:loss = 2.2868636, step = 700 (0.347 sec)


INFO:tensorflow:global_step/sec: 324.675


INFO:tensorflow:global_step/sec: 324.675


INFO:tensorflow:loss = 2.300441, step = 800 (0.309 sec)


INFO:tensorflow:loss = 2.300441, step = 800 (0.309 sec)


INFO:tensorflow:global_step/sec: 763.354


INFO:tensorflow:global_step/sec: 763.354


INFO:tensorflow:loss = 2.2791195, step = 900 (0.130 sec)


INFO:tensorflow:loss = 2.2791195, step = 900 (0.130 sec)


INFO:tensorflow:Saving checkpoints for 938 into /tmp/multiworker\model.ckpt.


INFO:tensorflow:Saving checkpoints for 938 into /tmp/multiworker\model.ckpt.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2021-09-30T11:49:56Z


INFO:tensorflow:Starting evaluation at 2021-09-30T11:49:56Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from /tmp/multiworker\model.ckpt-938


INFO:tensorflow:Restoring parameters from /tmp/multiworker\model.ckpt-938


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Evaluation [10/100]


INFO:tensorflow:Evaluation [10/100]


INFO:tensorflow:Evaluation [20/100]


INFO:tensorflow:Evaluation [20/100]


INFO:tensorflow:Evaluation [30/100]


INFO:tensorflow:Evaluation [30/100]


INFO:tensorflow:Evaluation [40/100]


INFO:tensorflow:Evaluation [40/100]


INFO:tensorflow:Evaluation [50/100]


INFO:tensorflow:Evaluation [50/100]


INFO:tensorflow:Evaluation [60/100]


INFO:tensorflow:Evaluation [60/100]


INFO:tensorflow:Evaluation [70/100]


INFO:tensorflow:Evaluation [70/100]


INFO:tensorflow:Evaluation [80/100]


INFO:tensorflow:Evaluation [80/100]


INFO:tensorflow:Evaluation [90/100]


INFO:tensorflow:Evaluation [90/100]


INFO:tensorflow:Evaluation [100/100]


INFO:tensorflow:Evaluation [100/100]


INFO:tensorflow:Finished evaluation at 2021-09-30-11:49:56


INFO:tensorflow:Finished evaluation at 2021-09-30-11:49:56


INFO:tensorflow:Saving dict for global step 938: global_step = 938, loss = 2.2746527


INFO:tensorflow:Saving dict for global step 938: global_step = 938, loss = 2.2746527


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 938: /tmp/multiworker\model.ckpt-938


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 938: /tmp/multiworker\model.ckpt-938


INFO:tensorflow:Loss for final step: 1.1391464.


INFO:tensorflow:Loss for final step: 1.1391464.


({'loss': 2.2746527, 'global_step': 938}, [])

# Ragged tensors
Continuing our discussion on the benefits of TensorFlow 2.x, we should notice that
TensorFlow 2.x added support for "ragged" tensors, which are a special type of dense
tensor with non-uniformly shaped dimensions. This is particularly useful for dealing
with sequences and other data issues where the dimensions can change across
batches, such as text sentences and hierarchical data. Note that ragged tensors are
more efficient than padding tf.Tensor, since no time or space is wasted:
ragged = tf.ragged.constant([[1, 2, 3], [3, 4], [5, 6, 7, 8]]) ==>
<tf.RaggedTensor [[1, 2, 3], [3, 4], [5, 6, 7, 8]]>