<a href="https://colab.research.google.com/github/RealTaeYoungKang/CAU-AI4DeepLearning/blob/main/chapter02_mathematical_building_blocks_practice_w3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

# The mathematical building blocks of neural networks

## A first look at a neural network

**Loading the MNIST dataset in Keras**

In [31]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [None]:
train_images.shape

In [None]:
len(train_labels)

In [None]:
train_labels

In [None]:
test_images.shape

In [None]:
len(test_labels)

In [None]:
test_labels

**The network architecture**

In [73]:
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

**The compilation step**

In [None]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

**Preparing the image data**

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

**"Fitting" the model**

In [None]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

**Using the model to make predictions**

In [None]:
test_digits = test_images[0:10]
predictions = model.predict(test_digits)
predictions[0]

In [None]:
predictions[0].argmax()

In [None]:
predictions[0][7]

In [None]:
test_labels[0]

**Evaluating the model on new data**

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

## Data representations for neural networks

### Scalars (rank-0 tensors)

In [None]:
import numpy as np
x = np.array(12)
x

In [None]:
x.ndim

### Vectors (rank-1 tensors)

In [None]:
x = np.array([12, 3, 6, 14, 7])
x

In [None]:
x.ndim

### Matrices (rank-2 tensors)

In [None]:
x = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]])
x.ndim

### Rank-3 and higher-rank tensors

In [None]:
x = np.array([[[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]]])
x.ndim

### Key attributes

In [None]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
train_images.ndim

In [None]:
train_images.shape

In [None]:
train_images.dtype

**Displaying the fourth digit**

In [None]:
import matplotlib.pyplot as plt
digit = train_images[4]
plt.imshow(digit, cmap=plt.cm.binary)
plt.show()

In [None]:
train_labels[4]

### Manipulating tensors in NumPy

In [None]:
my_slice = train_images[10:100]
my_slice.shape

In [None]:
my_slice = train_images[10:100, :, :]
my_slice.shape

In [None]:
my_slice = train_images[10:100, 0:28, 0:28]
my_slice.shape

In [None]:
my_slice = train_images[:, 14:, 14:]

In [None]:
my_slice = train_images[:, 7:-7, 7:-7]

### The notion of data batches

In [None]:
batch = train_images[:128]

In [None]:
batch = train_images[128:256]

In [None]:
n = 3
batch = train_images[128 * n:128 * (n + 1)]

### Real-world examples of data tensors

### Vector data

### Timeseries data or sequence data

### Image data

### Video data

## The gears of neural networks: tensor operations

### Element-wise operations

In [6]:
def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy() # 원본 리스트에 영향을 미치지 않는다
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

In [7]:
import numpy as np

aa = np.array([[1, 2, 3, -1], [1, -1, -1, 0]])

In [8]:
aa

array([[ 1,  2,  3, -1],
       [ 1, -1, -1,  0]])

In [9]:
def naive_add(x, y):
    assert len(x.shape) == 2
    assert x.shape == y.shape
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

In [10]:
import time

x = np.random.random((20, 100))
y = np.random.random((20, 100))

t0 = time.time()
for _ in range(1000):
    z = x + y #벡터를 그냥 더해버려서 빨라짐
    z = np.maximum(z, 0.)
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 0.01 s


In [11]:
t0 = time.time()
for _ in range(1000):
    z = naive_add(x, y) # 정의된 함수에 들어간다음, 벡터값을 더해주기 때문에 시간이 좀 더 오래 걸림
    z = naive_relu(z)
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 1.73 s


### Broadcasting

In [12]:
import numpy as np
X = np.random.random((32, 10))
y = np.random.random((10,))

In [14]:
x = np.array([1,2,3,4,5])
y = np.array([[1,1,1,1,1],[1,1,1,1,1]])

In [15]:
x.shape

(5,)

In [16]:
y.shape

(2, 5)

In [17]:
x + y

array([[2, 3, 4, 5, 6],
       [2, 3, 4, 5, 6]])

In [38]:
a = np.array([1,3,5,7,9])
b = np.array([[1,1,1,1,1], [2,2,2,2,2], [3,3,3,3,3]])
a + b

array([[ 2,  4,  6,  8, 10],
       [ 3,  5,  7,  9, 11],
       [ 4,  6,  8, 10, 12]])

In [18]:
y = np.expand_dims(y, axis=0)

In [19]:
Y = np.concatenate([y] * 32, axis=0)

In [20]:
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

In [21]:
import numpy as np
x = np.random.random((64, 3, 32, 10))
y = np.random.random((32, 10))
z = np.maximum(x, y)

### Tensor product

In [22]:
x = np.random.random((32,))
y = np.random.random((32,))
z = np.dot(x, y)

In [39]:
print(x, y, z)

[[0.]
 [1.]
 [2.]
 [3.]
 [4.]
 [5.]] [0.4838554  0.42519959 0.14154154 0.40411423 0.87968121 0.69493157
 0.8628214  0.62949901 0.1320912  0.31374721 0.29386636 0.0033318
 0.88382601 0.71513008 0.87990015 0.253461   0.09705869 0.37062266
 0.86624332 0.70522962 0.40423714 0.61778254 0.71540841 0.90980243
 0.31293472 0.41430472 0.98299368 0.48213162 0.87074648 0.40602908
 0.20935942 0.47639317] 9.064663777320533


In [23]:
def naive_vector_dot(x, y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    assert x.shape[0] == y.shape[0]
    z = 0.
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

In [24]:
def naive_matrix_vector_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            z[i] += x[i, j] * y[j]
    return z

In [25]:
def naive_matrix_vector_dot(x, y):
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        z[i] = naive_vector_dot(x[i, :], y)
    return z

In [26]:
def naive_matrix_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 2
    assert x.shape[1] == y.shape[0]
    z = np.zeros((x.shape[0], y.shape[1]))
    for i in range(x.shape[0]):
        for j in range(y.shape[1]):
            row_x = x[i, :]
            column_y = y[:, j]
            z[i, j] = naive_vector_dot(row_x, column_y)
    return z

### Tensor reshaping

In [32]:
train_images = train_images.reshape((60000, 28 * 28))

In [37]:
train_images.shape

(60000, 784)

In [33]:
x = np.array([[0., 1.],
             [2., 3.],
             [4., 5.]])
x.shape

(3, 2)

In [34]:
x = x.reshape((6, 1))
x

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [30]:
x = np.zeros((300, 20))
x = np.transpose(x)
x.shape

(20, 300)

### Geometric interpretation of tensor operations

### A geometric interpretation of deep learning

## The engine of neural networks: gradient-based optimization

### What's a derivative?

### Derivative of a tensor operation: the gradient

### Stochastic gradient descent

### Chaining derivatives: The Backpropagation algorithm

#### The chain rule

#### Automatic differentiation with computation graphs

#### The gradient tape in TensorFlow

In [51]:
import tensorflow as tf
x = tf.Variable(0.) # x의 variable 타입 설정 / 또한 실수로 만들어 주기 위해서 숫자. 사용
with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_y_wrt_x = tape.gradient(y, x)

In [52]:
x

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.0>

In [53]:
grad_of_y_wrt_x

<tf.Tensor: shape=(), dtype=float32, numpy=2.0>

In [54]:
x = tf.Variable(tf.random.uniform((2, 2)))
with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_y_wrt_x = tape.gradient(y, x)

In [55]:
x

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[0.9569689 , 0.41107738],
       [0.94506276, 0.7183237 ]], dtype=float32)>

In [56]:
grad_of_y_wrt_x

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 2.],
       [2., 2.]], dtype=float32)>

In [57]:
W = tf.Variable(tf.random.uniform((2, 2)))
b = tf.Variable(tf.zeros((2,)))
x = tf.random.uniform((2, 2))
with tf.GradientTape() as tape:
    y = tf.matmul(x, W) + b
grad_of_y_wrt_W_and_b = tape.gradient(y, [W, b])

In [58]:
W

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[0.5141969 , 0.7400174 ],
       [0.02545142, 0.94190454]], dtype=float32)>

In [59]:
b

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>

In [60]:
x

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0.20510805, 0.2720821 ],
       [0.44549358, 0.9475393 ]], dtype=float32)>

In [61]:
grad_of_y_wrt_W_and_b

[<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
 array([[0.6506016, 0.6506016],
        [1.2196214, 1.2196214]], dtype=float32)>,
 <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 2.], dtype=float32)>]

In [77]:
x = tf.Variable(2.)
y = tf.Variable(1.)
with tf.GradientTape() as tape:
    z = x * x * y + x * y + 3 * y
grad_of_z_wrt_xy = tape.gradient(z, [x, y])

In [78]:
grad_of_z_wrt_xy

[<tf.Tensor: shape=(), dtype=float32, numpy=5.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=9.0>]

In [71]:
x = tf.constant(np.array([1.,4.,3.]).reshape(1,3), dtype=tf.float32)
W = tf.Variable(tf.random.uniform((3, 2)), dtype=tf.float32)
b = tf.Variable(tf.zeros((2,)), dtype=tf.float32)

with tf.GradientTape() as tape:
    z = tf.matmul(x, W) + b
grad_of_z_wrt_W_and_b = tape.gradient(z, [W, b])

In [89]:
W

<tf.Variable 'Variable:0' shape=(3, 2) dtype=float32, numpy=
array([[0.548612  , 0.07097495],
       [0.7577739 , 0.37923038],
       [0.14547026, 0.43312883]], dtype=float32)>

In [90]:
x

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>

In [91]:
b

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>

In [72]:
grad_of_z_wrt_W_and_b

[<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[1., 1.],
        [4., 4.],
        [3., 3.]], dtype=float32)>,
 <tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>]

## Looking back at our first example

In [69]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

In [74]:
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

In [75]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

In [76]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7dde542b9150>

### Reimplementing our first example from scratch in TensorFlow

#### A simple Dense class

In [79]:
import tensorflow as tf

class NaiveDense:
    def __init__(self, input_size, output_size, activation):
        self.activation = activation

        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)

        b_shape = (output_size,)
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)

    @property
    def weights(self):
        return [self.W, self.b]

#### A simple Sequential class

In [80]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x

    @property
    def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights

In [81]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])
assert len(model.weights) == 4

#### A batch generator

In [82]:
import math

class BatchGenerator:
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)

    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

### Running one training step

In [83]:
def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
            labels_batch, predictions)
        average_loss = tf.reduce_mean(per_sample_losses)
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss

In [84]:
learning_rate = 1e-3

def update_weights(gradients, weights):
    for g, w in zip(gradients, weights):
        w.assign_sub(g * learning_rate)

In [85]:
from tensorflow.keras import optimizers

optimizer = optimizers.SGD(learning_rate=1e-3)

def update_weights(gradients, weights):
    optimizer.apply_gradients(zip(gradients, weights))

### The full training loop

In [86]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
        batch_generator = BatchGenerator(images, labels)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print(f"loss at batch {batch_counter}: {loss:.2f}")

In [87]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

fit(model, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
loss at batch 0: 3.52
loss at batch 100: 2.20
loss at batch 200: 2.17
loss at batch 300: 2.09
loss at batch 400: 2.19
Epoch 1
loss at batch 0: 1.88
loss at batch 100: 1.85
loss at batch 200: 1.81
loss at batch 300: 1.72
loss at batch 400: 1.80
Epoch 2
loss at batch 0: 1.57
loss at batch 100: 1.56
loss at batch 200: 1.50
loss at batch 300: 1.45
loss at batch 400: 1.49
Epoch 3
loss at batch 0: 1.31
loss at batch 100: 1.33
loss at batch 200: 1.24
loss at batch 300: 1.23
loss at batch 400: 1.26
Epoch 4
loss at batch 0: 1.11
loss at batch 100: 1.15
loss at batch 200: 1.04
loss at batch 300: 1.07
loss at batch 400: 1.10
Epoch 5
loss at batch 0: 0.97
loss at batch 100: 1.01
loss at batch 200: 0.90
loss at batch 300: 0.95
loss at batch 400: 0.98
Epoch 6
loss at batch 0: 0.86
loss at batch 100: 0.90
loss at batch 200: 0.80
loss at batch 300: 0.86
loss at batch 400: 0.89
Epoch 7
loss at batch 0: 0.78
loss at batch 100: 0.82
loss at batch 200: 0.72
loss at batch 300: 0.78
loss at batch 40

### Evaluating the model

In [92]:
model.weights

[<tf.Variable 'Variable:0' shape=(784, 512) dtype=float32, numpy=
 array([[0.07864714, 0.03484159, 0.09506575, ..., 0.09952944, 0.07237871,
         0.00384505],
        [0.00768822, 0.00599482, 0.08059927, ..., 0.00331527, 0.00858729,
         0.07468941],
        [0.00881873, 0.05860001, 0.05719827, ..., 0.03366283, 0.03860359,
         0.00185711],
        ...,
        [0.03444393, 0.01451908, 0.07246784, ..., 0.03542503, 0.07106637,
         0.05002553],
        [0.06649014, 0.09289093, 0.08591646, ..., 0.07284045, 0.04208571,
         0.06512765],
        [0.09155259, 0.01088053, 0.09200063, ..., 0.09400503, 0.0901578 ,
         0.01700965]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(512,) dtype=float32, numpy=
 array([ 3.82751413e-03,  3.21786548e-03,  8.09865363e-04, -5.87682752e-03,
         2.38325121e-03, -5.11128828e-03, -3.50361341e-03, -1.09110288e-02,
        -6.84074312e-03, -5.09870332e-03, -5.05737448e-03,  6.21566433e-04,
        -4.79530729e-03, -8.57797451e

In [88]:
predictions = model(test_images)
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {matches.mean():.2f}")

accuracy: 0.82


In [94]:
NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu).weigths

AttributeError: 'NaiveDense' object has no attribute 'weigths'

## Summary