<a href="https://colab.research.google.com/github/Monaa48/TensorFlow-in-Action-starter/blob/main/notebooks/Ch02_TensorFlow_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 02 — TensorFlow 2


## 1) Summary

In this chapter I’m trying to get comfortable with **how TensorFlow 2 actually feels to use**:

- TF2 runs **eagerly** by default (so it behaves like normal Python).
- When I want speed, I can wrap code with `@tf.function` to get a **graph**.
- The core “building blocks” that keep showing up are **tensors**, **variables**, and **ops**.
- Most deep learning computations are combinations of a few patterns: **matrix multiply**, **convolution**, and **pooling**.


## 2) Setup


In [1]:
import random
import numpy as np
import tensorflow as tf

print("TensorFlow version:", tf.__version__)
print("Eager execution:", tf.executing_eagerly())

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)


TensorFlow version: 2.19.0
Eager execution: True


## 3) First steps in TF2 — a small MLP forward pass

A basic MLP layer is primarily:

\[
h = xW + b
\]

and then we apply a non-linearity, like ReLU:

\[
\mathrm{ReLU}(h) = \max(0, h)
\]

Below I’m only doing a **forward pass** . I just want to see shapes and outputs.


In [2]:
# A small batch: batch_size=4, input_dim=3
x = tf.constant([[0.2, 0.8, -0.5],
                 [1.0, -0.2, 0.1],
                 [-0.3, 0.4, 0.9],
                 [0.0, 0.0, 1.0]], dtype=tf.float32)

input_dim = 3
hidden_dim = 5
num_classes = 2

W1 = tf.Variable(tf.random.normal([input_dim, hidden_dim], stddev=0.1), name="W1")
b1 = tf.Variable(tf.zeros([hidden_dim]), name="b1")

W2 = tf.Variable(tf.random.normal([hidden_dim, num_classes], stddev=0.1), name="W2")
b2 = tf.Variable(tf.zeros([num_classes]), name="b2")

h = tf.nn.relu(tf.matmul(x, W1) + b1)
logits = tf.matmul(h, W2) + b2
probs = tf.nn.softmax(logits)

print("x shape     :", x.shape)
print("hidden shape:", h.shape)
print("logits shape:", logits.shape)
print("probabilities:\n", probs.numpy())


x shape     : (4, 3)
hidden shape: (4, 5)
logits shape: (4, 2)
probabilities:
 [[0.4962676  0.50373244]
 [0.5004868  0.49951315]
 [0.50065905 0.49934104]
 [0.5        0.5       ]]


## 4) Eager vs Graph — `tf.function`

In TF2, the key observation is:

- In eager mode, I can print intermediate tensors right away.
- In graph mode (via `tf.function`), TF traces the function and builds a graph, which can run faster.

I will wrap the same forward pass into `tf.function` and peek at the graph ops.


In [3]:
@tf.function
def forward(x):
    h = tf.nn.relu(tf.matmul(x, W1) + b1)
    logits = tf.matmul(h, W2) + b2
    return logits

logits_graph = forward(x)
print("Logits (tf.function):\n", logits_graph.numpy())

concrete = forward.get_concrete_function(tf.TensorSpec([None, input_dim], tf.float32))
ops = [op.name for op in concrete.graph.get_operations()]
print("Number of ops in traced graph:", len(ops))
print("First ~15 ops:", ops[:15])


Logits (tf.function):
 [[-0.01093552  0.00399447]
 [-0.00042657 -0.00237386]
 [-0.00115932 -0.00379534]
 [ 0.          0.        ]]
Number of ops in traced graph: 16
First ~15 ops: ['x', 'MatMul/ReadVariableOp/resource', 'MatMul/ReadVariableOp', 'MatMul', 'add/ReadVariableOp/resource', 'add/ReadVariableOp', 'add', 'Relu', 'MatMul_1/ReadVariableOp/resource', 'MatMul_1/ReadVariableOp', 'MatMul_1', 'add_1/ReadVariableOp/resource', 'add_1/ReadVariableOp', 'add_1', 'Identity']


## 5) Core TF2 components

### 5.1 Tensors vs Variables

- `tf.Tensor` = an **immutable value** produced by ops. It’s like “the result”.
- `tf.Variable` = **mutable state** (usually weights/biases). Optimizers update variables.

### 5.2 Automatic differentiation with `GradientTape`

To train, I need gradients. `tf.GradientTape()` records ops so TF can compute derivatives.


In [4]:
w = tf.Variable(3.0)
b = tf.Variable(-1.0)
x_scalar = tf.constant(2.0)

with tf.GradientTape() as tape:
    y = (w * x_scalar + b) ** 2

dw, db = tape.gradient(y, [w, b])
print("y:", float(y.numpy()))
print("dy/dw:", float(dw.numpy()))
print("dy/db:", float(db.numpy()))

# One manual gradient descent step
lr = 0.1
w.assign_sub(lr * dw)
b.assign_sub(lr * db)
print("Updated w, b:", float(w.numpy()), float(b.numpy()))


y: 25.0
dy/dw: 20.0
dy/db: 10.0
Updated w, b: 1.0 -2.0


## 6) Common neural network computations

### 6.1 Matrix multiplication (Dense layers)

Dense layers are primarily `matmul + bias + activation`.


In [5]:
A = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]], dtype=tf.float32)  # (2, 3)
B = tf.constant([[1., 0.],
                 [0., 1.],
                 [1., 1.]], dtype=tf.float32)      # (3, 2)

C = tf.matmul(A, B)  # (2, 2)
print("A:\n", A.numpy())
print("B:\n", B.numpy())
print("A @ B:\n", C.numpy())


A:
 [[1. 2. 3.]
 [4. 5. 6.]]
B:
 [[1. 0.]
 [0. 1.]
 [1. 1.]]
A @ B:
 [[ 4.  5.]
 [10. 11.]]


### 6.2 Convolution (toy example)

A convolution slides a small filter across an image-like grid and produces feature maps.
Here I use a 5×5 “image” and a 3×3 filter so I can literally see the numbers.


In [6]:
# Conv2D input format: [batch, height, width, channels]
img = tf.reshape(tf.range(25, dtype=tf.float32), [1, 5, 5, 1])

# 3x3 filter: simple left-right edge-ish pattern
kernel = tf.constant([[[[1.]], [[0.]], [[-1.]]],
                      [[[1.]], [[0.]], [[-1.]]],
                      [[[1.]], [[0.]], [[-1.]]]], dtype=tf.float32)  # [3,3,1,1]

out = tf.nn.conv2d(img, kernel, strides=1, padding="VALID")

print("Input image (5x5):\n", tf.squeeze(img).numpy())
print("Kernel (3x3):\n", tf.squeeze(kernel).numpy())
print("Conv output (3x3):\n", tf.squeeze(out).numpy())


Input image (5x5):
 [[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]
Kernel (3x3):
 [[ 1.  0. -1.]
 [ 1.  0. -1.]
 [ 1.  0. -1.]]
Conv output (3x3):
 [[-6. -6. -6.]
 [-6. -6. -6.]
 [-6. -6. -6.]]


### 6.3 Pooling (downsampling)

Pooling reduces spatial size. Max pooling keeps the maximum value in each window.


In [7]:
pooled = tf.nn.max_pool2d(img, ksize=2, strides=2, padding="VALID")
print("MaxPool output shape:", pooled.shape)
print("MaxPool output:\n", tf.squeeze(pooled).numpy())


MaxPool output shape: (1, 2, 2, 1)
MaxPool output:
 [[ 6.  8.]
 [16. 18.]]


## 7) Takeaways

- TF2 feels “Python-first” because of eager execution.
- `tf.function` is useful once code is stable and I want performance.
- Variables are the trainable parts; tensors are the values flowing through the graph.
- GradientTape is the core tool that makes training possible.
- The math patterns I keep seeing: matmul (MLP), conv + pooling (CNN).
