**PART 1**

**Foundations of TensorFlow 2 and deep learning**

---

**CHAPTER 2 - TensorFlow**

---

**2.1 First steps with TensorFlow 2**

In this section, the focus is on implementing a Multilayer Perceptron (MLP) using TensorFlow 2. An MLP is a simple type of fully connected neural network with the following components:
* Input layer: Takes in the data.
* Hidden layers: Layers between the input and output, where computations are performed.
* Output layer: Produces the final output.

Each layer in the MLP has weights and biases that are used to compute the outputs of that layer. For example, if we have:
* An input size of 4,
* A hidden layer with 3 nodes,
* An output layer with 2 nodes,

The network will use these parameters to calculate the final prediction for a given input.

This section emphasizes getting familiar with TensorFlow 2 for building and training MLPs.

![Figure2-1.jpg](./02.Chapter-02/Figure2-1.jpg)

The input values (x) are transformed to hidden values (h) using the following
computation

![Eq2-1.jpg](./02.Chapter-02/Eq2-1.jpg)

where σ is the sigmoid function. The sigmoid function is a simple nonlinear elementwise transformation, as shown as in figure 2.2.

![Figure2-2.jpg](./02.Chapter-02/Figure2-2.jpg)

x is a matrix of size 1 × 4 (i.e., one row and four columns), W1 is a matrix of size 4 × 3 (i.e., four rows and three columns), and b1 is 1 × 4 (i.e., one row and four columns). This gives an h of size 1 × 3. Finally, the output is computed as

![Eq2-2.jpg](./02.Chapter-02/Eq2-2.jpg)

Here, W2 is a 3 × 2 matrix, and b2 is a 1 × 2 matrix. Softmax activation normalizes the linear scores of the last layer (i.e., h W2 + b2) to actual probabilities (i.e., values sum up to 1 along columns). Assuming an input vector x of length K, the softmax activation produces a K-long vector y. The i th element of y is computed as

![Eq2-3.jpg](./02.Chapter-02/Eq2-3.jpg)

where y<sub>i</sub> is the i<sup>th</sup> output element and X<sub>i</sub> is the i<sup>th</sup> input element. As a concrete example, assume the final layer without the softmax activation produced,  
[16, 4]

Applying the softmax normalization converts these values to  
[16 / (16 + 4), 4 / (16 + 4)] = [0.8, 0.2]

Initially, we need to import the required libraries using import statements:

In [None]:
import numpy as np
import tensorflow as tf

Then we define the input to the network (x) and the variables (or parameters) (i.e., w1, b1, w2, and b2) of the network:

In [None]:
x = np.random.normal(size=[1,4]).astype('float32')
init = tf.keras.initializers.RandomNormal()
w1 = tf.Variable(init(shape=[4,3]))
b1 = tf.Variable(init(shape=[1,3]))
w2 = tf.Variable(init(shape=[3,2]))
b2 = tf.Variable(init(shape=[1,2]))

In this implementation, x is a NumPy array of size 1 × 4, initialized with values from a normal distribution. The weights and biases of the neural network are defined as TensorFlow variables (tf.Variable), which can change during training. These variables are initialized with random values from a normal distribution, and their shapes are:
* W1: 4 × 3
* b1: 1 × 3
* W2: 3 × 2
* b2: 1 × 2

The core computations of the MLP are defined in a modular function, which allows for easy reuse when computing hidden layer outputs across multiple layers.

In [None]:
@tf.function
def forward(x, W, b, act):
  return act(tf.matmul(x,W)+b)

In this implementation, act represents a nonlinear activation function (e.g., tf.nn.sigmoid). The function tf.matmul(x, W) + b encapsulates the core computations of the neural network (i.e., matrix multiplications for input and weight layers, plus the bias). The tf.matmul function performs matrix multiplication, allowing the formula to be reusable for different layers. This operation is visualized in figure 2.3.

![Figure2-3.jpg](./02.Chapter-02/Figure2-3.jpg)

The @tf.function decorator tells TensorFlow that the function contains TensorFlow operations, enabling optimizations for better performance. In the next section, the purpose of @tf.function will be discussed in more detail. With the inputs, parameters, and core computations defined, the final output of the network can now be computed.

In [None]:
# Computing h
h = forward(x, w1, b1, tf.nn.sigmoid)

# Computing y
y = forward(h, w2, b2, tf.nn.softmax)
print(y)

#which will output
tf.Tensor([[0.4912673 0.5087327]], shape=(1, 2), dtype=float32)

Here, h and y are the resulting tensors (of type tf.Tensor) of various TensorFlow operations (e.g., tf.matmul). The exact values in the output might differ slightly (see the following listing).

![Code2-1.jpg](./02.Chapter-02/Code2-1.jpg)

**How does TensorFlow operate under the hood?**

TensorFlow programs follow two main steps:
* Define a Data-Flow Graph: This graph describes the relationship between inputs, operations, and outputs (e.g., how x, w1, b1, w2, b2, h, and y are related).
* Execute the Graph: Values are fed into the inputs, and the corresponding outputs are computed. For example, to compute h, a value (like a NumPy array) is provided to x, and the value of h is obtained.

TensorFlow 2 uses imperative execution (also called eager execution), where the graph is defined and executed simultaneously, allowing for immediate computation of results. The data-flow graph is represented as a directed acyclic graph (DAG), where tf.Variable and tf.Tensor are the edges, and operations (like tf.matmul) are the nodes. This graph shows how data flows through the computations, like how <i>h = x W<sub>1</sub> + b<sub>1</sub></i> would be represented.

![Figure2-4.jpg](./02.Chapter-02/Figure2-4.jpg)

TensorFlow creates the data-flow graph through the @tf.function decorator. This Python decorator traces the TensorFlow operations in a function and converts them into a data-flow graph. The AutoGraph feature in TensorFlow 2 enables this, allowing users to write modular code while benefiting from the performance advantages of a data-flow graph.

![Figure2-5.jpg](./02.Chapter-02/Figure2-5.jpg)

In TensorFlow 1, the execution style is declarative graph-based, which involves two steps:
* Define the data-flow graph using symbolic elements like placeholders, variables, and operations, without holding actual values at declaration.
* Explicitly execute the graph by feeding values at runtime to evaluate the results.

In contrast, TensorFlow 2 simplifies this by automatically building and executing the data-flow graph in the background, making the code more streamlined and easier to read. This removes the need for explicit graph construction, unlike TensorFlow 1, which required separate steps to define and execute the graph.

![Table2-2.jpg](./02.Chapter-02/Table2-2.jpg)

![Table2-3.jpg](./02.Chapter-02/Table2-3.jpg)

---


**2.2 TensorFlow building blocks**

In TensorFlow 2, there are three key building blocks to understand:
* tf.Variable: Represents variables whose values can change during training (e.g., weights and biases).
* tf.Tensor: Represents the data in TensorFlow, used for inputs, outputs, and intermediate calculations.
* tf.Operation: Defines computations that can be performed on tensors, like matrix multiplications.

Understanding the basic elements of TensorFlow tf.Variable, tf.Tensor, and tf.Operation is crucial because all higher-level concepts, like Keras, rely on these components. Knowing how to use them and their limitations helps in building models and troubleshooting errors, as TensorFlow's error messages often reference these elements. This foundational knowledge is essential for effectively developing and debugging more complex models.

![Table2-4.jpg](./02.Chapter-02/Table2-4.jpg)

**Understanding tf.Variable**

In TensorFlow, tf.Variable is used to represent model parameters (like weights and biases) that change over time during training. These variables are initialized with a value and can be updated as the model learns. A tf.Variable requires three key components:

Shape: The size of each dimension (e.g., rows and columns).

Initial Value: The starting value, often randomly initialized (e.g., using tf.keras.initializers.RandomNormal()).

Data Type: The type of data (e.g., float32).

For example:
* W1: shape 4 × 3

* b1: shape 1 × 3

* W2: shape 3 × 2

* b2: shape 1 × 2

You can define a tf.Variable with an initializer and shape, like using tf.constant or np.ones. The variable's values can be changed during training, allowing you to update them via gradient descent.

You can also manipulate individual elements or slices of a tf.Variable using the assign() method, which lets you update specific values or sections within the variable.

For example, to change a value in a 4x3 matrix, you could use:

In [None]:
v = tf.Variable(np.zeros(shape=[4,3]), dtype='float32')
v = v[0,2].assign(1)

This will update the element at position (0, 2) in the matrix. Similarly, slicing can be used to update multiple elements at once.

**Understanding tf.Tensor**

A tf.Tensor is the result of performing a TensorFlow operation on data (e.g., a tf.Variable or another tf.Tensor). It is a fundamental object in TensorFlow, used to store inputs, intermediate outputs of layers, and final model outputs. Tensors can be one-dimensional (vectors), two-dimensional (matrices), or even n-dimensional, depending on the data structure.

Each dimension of a tensor is referred to as an axis. For example, in a 3D tensor, the axes represent the height, width, and depth of the data. Understanding tensors and their axes is crucial when working with machine learning models in TensorFlow.

![Table2-5.jpg](./02.Chapter-02/Table2-5.jpg)

![Figure2-6.jpg](./02.Chapter-02/Figure2-6.jpg)

A tf.Tensor can be a scalar, vector, or matrix, depending on its dimensions. When discussing the mathematical aspects of models, we refer to the general term "tensor," while tf.Tensor specifically refers to data-related outputs produced by TensorFlow operations.

For example, a tf.Tensor can be produced by performing operations like multiplying a tf.Variable with a constant, or adding two tf.Tensors together. EagerTensors are a special type of tf.Tensor that are evaluated immediately (eager execution).

The key difference between tf.Variable and tf.Tensor is that tf.Variable is mutable, meaning its values can change during training, while tf.Tensor is immutable (its values cannot be changed after initialization).

TensorFlow also offers different types of tensors for specific data structures:
* RaggedTensor: For variable-length sequences that can't be represented as a matrix.
* TensorArray: A dynamic-sized structure, similar to a Python list.
* SparseTensor: For sparse data, like user-item matrices.

**Understanding tf.Operation**

tf.Operation is essential in TensorFlow for performing computations on data, such as basic arithmetic operations (addition, multiplication, subtraction, division), matrix multiplication, and more. These operations can be performed on tf.Variable and tf.Tensor objects.

For example:

* Addition: You can add two tensors element-wise:

In [None]:
a = tf.constant([4, 4, 4, 4], dtype='float32')
b = tf.constant([2, 2, 2, 2], dtype='float32')
c = a + b  # Result: [6, 6, 6, 6]

* Multiplication: Similarly, multiplying two tensors element-wise:

In [None]:
e = a * b  # Result: [8, 8, 8, 8]

You can also perform logical comparisons between tensors:

* Equality check:

In [None]:
equal_check = (a == b)  # Result: [False, False, True, True]

* Less than or equal check:

In [None]:
leq_check = (a <= b)  # Result: [True, True, False, False]

TensorFlow also supports reduction operations to reduce the size of a tensor:

* Sum of all elements in a tensor:

In [None]:
red_a1 = tf.reduce_sum(a)  # Sum of all elements

* Product along a specific axis:

In [None]:
red_a2 = tf.reduce_prod(a, axis=0)  # Element-wise product along axis 0

* Minimum along specific axes:

In [None]:
red_a3 = tf.reduce_min(a, axis=[0, 1])  # Minimum across multiple axes

keepdims is an important parameter in reduction operations. It allows you to retain the reduced dimensions, which is useful for broadcasting:
* Without keepdims=True: Reduces the tensor and loses dimensions.
* With keepdims=True: Retains the dimensions, useful for operations that require consistent tensor shapes.

For example:

* Without keepdims:

In [None]:
red_a1 = tf.reduce_min(a, axis=1)  # Shape becomes [5,3]

* With keepdims=True:

In [None]:
red_a2 = tf.reduce_min(a, axis=1, keepdims=True)  # Shape becomes [5,1,3]

Several other important functions are outlined in table

![Table2-6.jpg](./02.Chapter-02/Table2-6.jpg)

---

**2.3. Neural network–related computations in TensorFlow**

**Matrix multiplication**

Matrix multiplication between two tensors, such as a matrix a of size [4, 3] and matrix b of size [3, 2], results in a new tensor of size [4, 2]. The tf.matmul() function handles this multiplication, which is illustrated

![Figure2-8.jpg](./02.Chapter-02/Figure2-8.jpg)

More generally, if you have an n x m matrix (a) and a m x p matrix (b), the result of matrix multiplication c is given by

![Eq2-4.jpg](./02.Chapter-02/Eq2-4.jpg)

However, if you have high-dimensional tensors a and b, the sum product over the
last axis of a and second-to-last axis of b will be performed. Both a and b tensors need to have identical dimensionality except for the last two axes. For example, if you have a tensor a of size [3,5,7] and b of size [3,7,8], the result would be a [3,5,8]–sized tensor. Coming back to our problem, given three RGB pixels, you can convert it to a grayscale pixel using

![Eq2-5.jpg](./02.Chapter-02/Eq2-5.jpg)

Converting an RGB image to grayscale is a common operation, especially when color is not important for tasks like digit recognition. This process reduces the input size (from three channels to one) and removes unnecessary color information. For example, a 512 × 512 × 3 image multiplied by a 3 × 1 weight array results in a grayscale image of size 512 × 512 × 1. To remove the last dimension (which has a size of 1), the tf.squeeze() function is used, resulting in a 512 × 512 matrix.

![Code2-2.jpg](./02.Chapter-02/Code2-2.jpg)

Matrix multiplication is crucial in fully connected networks. To transform data from the input layer to the hidden layer, matrix multiplication and addition are used. For now, we focus on the linear operations, ignoring the nonlinear activation (which is an element-wise transformation).

![Figure2-9.jpg](./02.Chapter-02/Figure2-9.jpg)

**Convolution operation**

The convolution operation is key to convolutional neural networks (CNNs), which are widely used for image-related tasks like image classification and object detection. In convolution, a filter (or kernel) slides over the data, producing a single value at each position. The values in the convolution window are element-wise multiplied and summed with the corresponding data values, generating the final output at each location.

![Figure2-10.jpg](./02.Chapter-02/Figure2-10.jpg)

To perform edge detection using convolution in TensorFlow, we follow these steps:

* Input Image: Start with a black-and-white image of size 512 × 512, stored as a tf.Tensor (x), and create a new variable y from it.

* Edge Detection Filter: Define an edge detection filter (Laplacian filter) as a 3 × 3 matrix with values -1 except for the middle value, which is 8. The sum of the kernel is zero.

In [None]:
filter = tf.Variable(np.array([[-1,-1,-1],[-1,8,-1],[-1,-1,-1]]).astype('float32'))

* Reshaping Tensors: Since tf.nn.convolution() requires the input and filter to be rank 4 tensors, reshape them:
  * Input: Reshape y from [512, 512] to [1, 512, 512, 1] (adding batch and channel dimensions).
  * Filter: Reshape the filter from [3, 3] to [3, 3, 1, 1] (adding incoming and outgoing channel dimensions).

* Convolution Operation: Perform the convolution with tf.nn.convolution() to obtain the edge-detected image:

In [None]:
y_conv = tf.nn.convolution(y_reshaped, filter_reshaped)

* Visualization: The result can be visualized and compared to the original image

![Figure2-11.jpg](./02.Chapter-02/Figure2-11.jpg)

**Pooling operation**

The pooling operation is used to resize an image by reducing its width and height, such as halving the size of a 512 × 512 image to 256 × 256. This is often done in convolutional neural networks (CNNs) to reduce the output size, making the model more efficient by using fewer parameters for learning.

The term "pooling" comes from statistics, where it refers to combining values into a single entity, such as averaging or taking the maximum. In the pooling operation, values are combined by either averaging (average pooling) or selecting the maximum value (max pooling) from the overlapping area of the kernel.

In TensorFlow:

* Max pooling: Uses the maximum value from the kernel's area.

In [None]:
z_max = tf.nn.max_pool(y_conv, (1,2,2,1), strides=(1,2,2,1), padding='VALID')

* Average pooling: Uses the average value from the kernel's area.

In [None]:
z_avg = tf.nn.avg_pool(y_conv, (1,2,2,1), strides=(1,2,2,1), padding='VALID')

![Figure2-12.jpg](./02.Chapter-02/Figure2-12.jpg)

After performing convolution, we have y_conv, a 4D tensor with the shape [1, 510, 510, 1]. The dimensions are slightly smaller than the original image size (512 x 512) due to the convolution operation, which reduces the image size based on the filter window size.

Next, we apply pooling to downsize the image:
* Average pooling (tf.nn.avg_pool) and max pooling (tf.nn.max_pool) reduce the image size to [1, 255, 255, 1].
* To remove the extra dimensions of size 1, we use tf.squeeze() to reshape the output to [255, 255].

Finally, you can visualize the results using matplotlib:
* Average pooling produces smoother, more consistent lines.
* Max pooling results in a noisier image.

![Figure2-13.jpg](./02.Chapter-02/Figure2-13.jpg)

Unlike the convolution operation, the pooling operation does not use a filter or kernel. Instead, we specify the dimensions of the pooling window, which correspond to the input's dimensions (e.g., [batch, height, width, channels]). Additionally, two parameters are passed: stride (how much the window moves) and padding (handling the edges of the image).