<a href="https://colab.research.google.com/github/CSpanias/deep_learning/blob/main/chapter3_6_KerasAPI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3.6 Anatomy of a Neural Network: Understanding Core Keras APIs

## 3.6.1 Layers: The Building Blocks of Deep Learning

The fundamental data structure in NNs is the __layer__. A layer is a data processing module that takes as input one or more tensors and that outputs one or more tensors. 

Some layers are __stateless__, but more __frequently layers have a state__: the layer’s __weights__, one or several tensors learned with stochastic gradient descent, which together contain the __network’s knowledge__.

Different types of layers are appropriate for different tensor formats and different types of data processing. 

For instance, __simple vector data__, stored in rank-2 tensors of shape `(samples, features)`, is often processed by densely connected layers, also called fully connected or __dense layers__ (the Dense class in Keras). 

__Sequence data__, stored in rank-3 tensors of shape `(samples, timesteps, features)`, is typically processed by __recurrent layers__, such as an LSTM layer, or __1D convolution layers__ (Conv1D). 

__Image data__, stored in rank-4 tensors, is usually processed by __2D convolution layers__ (Conv2D).

You can think of layers as the __LEGO bricks of deep learning__, a metaphor that is made explicit by Keras. Building deep learning models in Keras is done by __clipping together compatible layers to form useful data-transformation pipelines__.

### The Base Layer Class in Keras

A simple API should have a single abstraction around which everything is centered. In Keras, that’s the `Layer` class. Everything in Keras is either a `Layer` or something that closely interacts with a Layer.

A `Layer` is an object that encapsulates some state (weights) and some computation (a forward pass). The weights are typically defined in a `build()` (although they could also be created in the constructor, `__init__()`), and the computation is defined in the `call()` method.

In the previous chapter, we implemented a `NaiveDense` class that contained two weights `W` and `b` and applied the computation `output = activation(dot(input, W) + b)`. This is what the same layer would look like in Keras.

In [2]:
from tensorflow import keras
 
# All Keras layers inherit from the base Layer class
class SimpleDense(keras.layers.Layer):
  def __init__(self, units, activation=None):
    super().__init__()
    self.units = units
    self.activation = activation

  # Weight creation takes place in the build() method
  def build(self, input_shape):
    input_dim = input_shape[-1]
    # add_weight() is a shortcut method for creating weights
    # It is also possible to create standalone variables and assign them as
    # layer attributes, like self.W = tf.Variable(tf.random.uniform(w_shape))
    self.W = self.add_weight(shape=(input_dim, self.units),
                                    initializer="random_normal")
    self.b = self.add_weight(shape=(self.units,),
                             initializer="zeros")
    
  def call(self, inputs):
    y = tf.matmul(inputs, self.W) + self.b
    if self.activation is not None:
      y = self.activation(y)
    return y

Once instantiated, a layer like this can be used just like a function, taking as input a TensorFlow tensor.

In [3]:
import tensorflow as tf

# instantiate Layer
my_dense = SimpleDense(units=32, activation=tf.nn.relu)
# create some test inputs
input_tensor = tf.ones(shape=(2, 784))
# call the layer on the inputs, just like a function
output_tensor = my_dense(input_tensor)
print(output_tensor.shape)

(2, 32)


You’re probably wondering, why did we have to implement `call()` and `build()`, since we ended up using our layer by plainly calling it, that is to say, by using its `__call__()` method? 

It’s because we want to be able to create the state just in time. Let’s see how that works.

### Automatic Shape Inference: Building Layers on the Fly

Just like with LEGO bricks, __you can only “clip” together layers that are compatible__. The notion of layer compatibility here refers specifically to the fact that __every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape__.

In [6]:
from tensorflow.keras import layers

layer = layers.Dense(32, activation="relu")

This layer will return a tensor where the first dimension has been transformed to be 32. It can only be connected to a downstream layer that expects 32-dimensional vectors as its input.

When using Keras, you don’t have to worry about size compatibility most of the time, because __the layers you add to your models are dynamically built to match the shape of the incoming layer__.

In [7]:
from tensorflow.keras import models

model = models.Sequential([
                           layers.Dense(32, activation="relu"),
                           layers.Dense(32)
])

The layers didn’t receive any information about the shape of their inputs—instead, they __automatically inferred their input shape as being the shape of the first inputs they see__.

In the toy version of the Dense layer we implemented in chapter 2 (which we named NaiveDense), we had to pass the layer’s input size explicitly to the constructor in order to be able to create its weights. 

That’s not ideal, because it would lead to models that look like this, where each new layer needs to be made aware of the shape of the layer before it.

In [None]:
# model = NaiveSequential([
#     NaiveDense(input_size=784, output_size=32, activation="relu"),
#     NaiveDense(input_size=32, output_size=64, activation="relu"),
#     NaiveDense(input_size=64, output_size=32, activation="relu"),
#     NaiveDense(input_size=32, output_size=10, activation="softmax")
# ])

It would be even worse if the rules used by a layer to produce its output shape are complex. For instance, what if our layer returned outputs of shape `(batch, input_ size * 2 if input_size % 2 == 0 else input_size * 3)`?

If we were to reimplement our `NaiveDense` layer as a Keras layer capable of automatic shape inference, it would look like the previous `SimpleDense` layer, with its `build()` and `call()` methods.

In `SimpleDense`, we no longer create weights in the constructor like in the `NaiveDense example`; instead, we create them in a dedicated state-creation method, `build()`, which receives as an argument the first input shape seen by the layer. 

The `build()` method is called automatically the first time the layer is called (via its `__call__()` method). In fact, that’s why we defined the computation in a separate `call()` method rather than in the `__call__()` method directly. The `__call__()` method of the base layer schematically looks like this:

In [9]:
def __call__(self, inputs):
  if not self.built:
    self.build(inputs.shape)
    self.built = True
  return self.call(inputs)

With automatic shape inference, our previous example becomes simple and neat.

In [10]:
model = keras.Sequential([
                          SimpleDense(32, activation="relu"),
                          SimpleDense(64, activation="relu"),
                          SimpleDense(32, activation="relu"),
                          SimpleDense(10, activation="softmax")
])

Note that __automatic shape inference__ is not the only thing that the `Layer` class’s `__call__()` method handles. It takes care of many more things, in particular __routing between eager and graph execution__ (a concept you’ll learn about in chapter 7), and __input masking__ (which we’ll cover in chapter 11). 

For now, just remember: __when implementing your own layers, put the forward pass in the `call()` method__.

# 3.6 From Layers to Models