### 3.6.1 Layers: The building blocks of deep learning

A layer is a data processing module that takes as input one or more tensors and that outputs one or more tensors. Some layers are stateless, but more frequently layers have a state: the layer’s weights, one or several tensors learned with stochastic gradient descent, which together contain the network’s knowledge.

### A dense layer implemented as a layer subclass

In [1]:
import tensorflow as tf
from tensorflow import keras

class SimpleDense(keras.layers.Layer): #all keras layers inherit from the base layer class
    def __init__(self, units, activation=None):
        super().__init__()
        self.units = units
        self.activation = activation

    def build(self, input_shape):#weight creation takes placce in build()
        input_dim = input_shape[-1]

        #add_weight = atalho do keras cria variaveis standalone como atributos da layer
        #ex: self.W = tf.Variable(tf.random.uniform(w_shape))
        self.W = self.add_weight(shape=(input_dim, self.units),
                                initializer="random_normal")
        self.b = self.add_weight(shape=(self.units,),
                                initializer="zeros")

    def call(self, inputs):
        y = tf.matmul(inputs,self.W) + self.b
        if self.activation is not None:
            y = self.activation(y)
        return y

Once instantiated, a layer like this can be used just like a function, taking as input a TensorFlow tensor:

In [4]:
#instancia a layer
my_dense = SimpleDense(units=32, activation=tf.nn.relu)
input_tensor = tf.ones(shape=(2, 784))
output_tensor = my_dense(input_tensor)
print(output_tensor.shape)

(2, 32)


very layer will only accept input tensors of a certain shape and will return output tensors of a certain shape. Consider the following example:

In [None]:
from keras import layers
layer = layers.Dense(32, activation="relu")


This layer will return a tensor where the first dimension has been transformed to be 32. It can only be connected to a downstream layer that expects 32-dimensional vectors as its input

In [None]:
from keras import models,layers

model = models.Sequential([
    layers.Dense(32, activation="relu"),
    layers.Dense(32)
])

The layers didn’t receive any information about the shape of their inputs—instead, they automatically inferred their input shape as being the shape of the first inputs they see. 

In the toy version of the Dense layer we implemented in chapter 2 (which we named NaiveDense), we had to pass the layer’s input size explicitly to the constructor in order to be able to create its weights. That’s not ideal, because it would lead to models that look like this, where each new layer needs to be made aware of the shape of the layer before it:

In [None]:
model = NaiveSequential([
    NaiveDense(input_size=784, output_size=32, activation="relu"),
    NaiveDense(input_size=32, output_size=64, activation="relu"),
    NaiveDense(input_size=64, output_size=32, activation="relu"),
    NaiveDense(input_size=32, output_size=10, activation="softmax")
])

## For now, just remember: when implementing your own layers, put the forward pass in the call() method.

- Two-branch networks 
- Multihead networks 
- Residual connections

ere are generally two ways of building such models in Keras: you could directly subclass the Model class, or you could use the Functional API, which lets you do more with less code. We’ll cover both approaches in chapter 7.

By choosing a network topology, you constrain your space of possibilities (hypothesis space) to a specific series of tensor operations, mapping input data to output data. What you’ll then be searching for is a good set of values for the weight tensors involved in these tensor operations.

To learn from data, you have to make assumptions about it. These assumptions define what can be learned. As such, the structure of your hypothesis space—the architecture of your model—is extremely important. It encodes the assumptions you make about your problem, the prior knowledge that the model starts with.

two class classification -> sei que o problema é linearmente separável

### 3.6.3 The “compile” step: Configuring the learning process

apos a arquitetura, precisamos escolher:

- Loss function (objective function)—The quantity that will be minimized during training. It represents a measure of success for the task at hand.

- Optimizer—Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).

- Metrics—The measures of success you want to monitor during training and validation, such as classification accuracy. Unlike the loss, training will not optimize directly for these metrics. As such, metrics don’t need to be differentiable.'

### compile & fit

The compile() method configures the training process

In [None]:
from keras.src.optimizers import optimizer


model = keras.Sequential([keras.layers.Dense(1)]) #classificador linear
#specify optimizer, loss, list of metrics
model.compile(optimizer="rmsprop",
            loss="mean_squared_error",
            metrics=["accuracy"])

it’s also possible to specify these arguments as object instances

if you want to pass your own custom losses or metrics, or if you want to further configure the objects you’re using—for instance, by passing a learning_rate argument to the optimizer:

In [None]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-4),
            loss="mean_squared_error",
            metrics=["accuracy"])