## Question 1

The output of the network is given by:

$y = \text{ReLU}(w_1 \cdot 1 + w_2 \cdot x_1)$

**Given:**
- Weight vector: **w = [w₁, w₂] = [4, 1]**
- Input: **x₁ = 3**
- Activation function: **ReLU(x) = max(0, x)**
- True label: **y_real = 9**

---

### a) Network prediction for input x₁ = 3

Substitute the values into the formula:

$z = w_1 \cdot 1 + w_2 \cdot x_1 = 4 \cdot 1 + 1 \cdot 3 = 4 + 3 = 7$

Apply the ReLU activation:

$y = \text{ReLU}(z) = \max(0, 7) = 7$

🔸 **Network prediction:** 7

---

### b) Mean Squared Error (MSE)

The quadratic error is computed as:

$\text{Error} = \frac{1}{2}(y_{\text{real}} - y_{\text{predicted}})^2$


Substitute the values:

$\text{Error} = \frac{1}{2}(9 - 7)^2 = \frac{1}{2} \cdot (2)^2 = \frac{1}{2} \cdot 4 = 2$

🔸 **Mean Squared Error:** \( \boxed{2} \)


## Question 2 

A `MaxPooling2D` layer is initialized in Keras with:

layers.MaxPooling2D(pool_size=(3, 3), padding='same', strides=(2, 1))


The input tensor `Z` has shape $4 \times 4$ and is defined as:

$$
Z = \begin{bmatrix}
1 & 0 & 3 & 0 \\
-5 & 2 & 0 & 4 \\
3 & 18 & 5 & 0 \\
1 & 0 & 0 & -2
\end{bmatrix}
$$

---

### Computing output shape

Given:

* Input size: $4 \times 4$
* Kernel size: $3 \times 3$
* Strides: $2, 1$
* Padding: `'same'`

Using the formula from https://mmuratarat.github.io/2019-01-17/implementing-padding-schemes-of-tensorflow-in-python :

$$
\text{output size($H_2$ and $W_2$)} = \left\lceil \frac{\text{input size}}{\text{stride}} \right\rceil
$$

* $H_2 = ⌈4 / 2⌉ = 2$ 
* $W_2  = ⌈4 / 1⌉ = 4$ 
---

### Applying MaxPooling with padding

For `'same'` padding, the output size is:

$$
\text{output size} = \left\lceil \frac{\text{input size}}{\text{stride}} \right\rceil
$$

The total padding needed is:

- **Height padding ($P_h$):**

  If $H_1 \bmod S_h = 0$:

  $$
  P_h = \max(F_h - S_h,\, 0)
  $$

  Else:

  $$
  P_h = \max(F_h - (H_1 \bmod S_h),\, 0)
  $$

- **Width padding ($P_w$):**

  If $W_1 \bmod S_w = 0$:

  $$
  P_w = \max(F_w - S_w,\, 0)
  $$

  Else:

  $$
  P_w = \max(F_w - (W_1 \bmod S_w),\, 0)
  $$

- **Height padding ($P_h$):**
  $$
  P_h = \max\left((3 - 2),\ 0\right) = \max(1,\ 0) = 1
  $$
- **Width padding ($P_w$):**
  $$
  P_w = \max\left((3 - 1) ,\ 0\right) = \max(2,\ 0) = 2
  $$

**Padding calculation details:**

- Top padding ($P_{t}$): $\left\lfloor \dfrac{P_h}{2} \right\rfloor = 0$
- Bottom padding: $P_h - P_{t} = 1$
- Left padding ($P_{l}$): $\left\lfloor \dfrac{P_w}{2} \right\rfloor = 1$
- Right padding: $P_w - P_{l} = 1$

*The bottom and right sides always get the one additional padded pixel if the padding is odd.*

The padded input becomes:

$$
Z_{\text{padded}} =
\begin{bmatrix}
0 & 1 & 0 & 3 & 0 & 0 \\
0 & -5 & 2 & 0 & 4 & 0 \\
0 & 3 & 18 & 5 & 0 & 0 \\
0 & 1 & 0 & 0 & -2 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
\end{bmatrix}
$$

---

###  Computing each window's max

We slide a $3 \times 3$ window with stride (2, 1).

#### Row 0:

* **(0,0):** max of

  $$
  \begin{bmatrix}
  0 & 1 & 0 \\
  0 & -5 & 2 \\
  0 & 3 & 18
  \end{bmatrix}
  = 18
  $$

* **(0,1):**

  $$
  \begin{bmatrix}
  1 & 0 & 3 \\
  -5 & 2 & 0 \\
  3 & 18 & 5
  \end{bmatrix}
  = 18
  $$

* **(0,2):**

  $$
  \begin{bmatrix}
  0 & 3 & 0 \\
  2 & 0 & 4 \\
  18 & 5 & 0
  \end{bmatrix}
  = 18
  $$

* **(0,3):**

  $$
  \begin{bmatrix}
  3 & 0 & 0 \\
  0 & 4 & 0 \\
  5 & 0 & 0
  \end{bmatrix}
  = 5
  $$

#### Row 1:

* **(1,0):**

  $$
  \begin{bmatrix}
  0 & 3 & 18 \\
  0 & 1 & 0 \\
  0 & 0 & 0
  \end{bmatrix}
  = 18
  $$

* **(1,1):**

  $$
  \begin{bmatrix}
  3 & 18 & 5 \\
  1 & 0 & 0 \\
  0 & 0 & 0
  \end{bmatrix}
  = 18
  $$

* **(1,2):**

  $$
  \begin{bmatrix}
  18 & 5 & 0 \\
  0 & 0 & -2 \\
  0 & 0 & 0
  \end{bmatrix}
  = 18
  $$

* **(1,3):**

  $$
  \begin{bmatrix}
  5 & 0 & 0 \\
  0 & -2 & 0 \\
  0 & 0 & 0
  \end{bmatrix}
  = 5
  $$

---

### Final Output Tensor:

$$
\text{Output} =
\begin{bmatrix}
18 & 18 & 18 & 5 \\
18 & 18 & 18 & 5
\end{bmatrix}
$$

* **Shape:** $(2, 4)$

# Code Question 2

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import MaxPooling2D

# Define the input tensor Z
Z = np.array([
    [1,  110,  3,  10],
    [170, 2,  34,  -22],
    [3, 1, -255,  220],
    [1,  -23,  45, 27]
], dtype=np.float32)

# Reshape for format: (batch_size, altura, largura, canais)
Z = Z.reshape((1, 4, 4, 1))

# Define MaxPooling2D layer 
max_pool = MaxPooling2D(pool_size=(5, 5), strides=(2, 2), padding='same')

# Apply max pooling
output = max_pool(Z)

# --- Visualizando a Matriz Aumentada (com Padding) ---

# A lógica do padding='same' para este caso (entrada 4x4, kernel 5x5, stride 2x2)
# requer que a matriz final tenha 5x5 para que a janela do kernel "caiba" duas vezes.
# Padding Total (Altura): (2-1)*2 + 5 - 4 = 3
# Padding Topo = floor(3/2) = 1
# Padding Base = 3 - 1 = 2
# A mesma lógica se aplica à largura.

# Definindo o padding manualmente para visualização: [[topo, base], [esquerda, direita]]
paddings = tf.constant([[0, 0], [1, 2], [1, 2], [0, 0]]) # Padding na altura e largura

# Aplica o padding à matriz Z original
Z_padded = tf.pad(Z, paddings, "CONSTANT", constant_values=-np.inf)

# Mostra a matriz aumentada
print("\nMatriz Aumentada Z (com padding):")
print(Z_padded.numpy().squeeze())

# Show the output shape and values
print("Output (shape):", output.shape)
print("Output (values):\n", output.numpy().squeeze())



Matriz Aumentada Z (com padding):
[[ -inf  -inf  -inf  -inf  -inf  -inf  -inf]
 [ -inf    1.  110.    3.   10.  -inf  -inf]
 [ -inf  170.    2.   34.  -22.  -inf  -inf]
 [ -inf    3.    1. -255.  220.  -inf  -inf]
 [ -inf    1.  -23.   45.   27.  -inf  -inf]
 [ -inf  -inf  -inf  -inf  -inf  -inf  -inf]
 [ -inf  -inf  -inf  -inf  -inf  -inf  -inf]]
Output (shape): (1, 2, 2, 1)
Output (values):
 [[220. 220.]
 [220. 220.]]


## Question 3

The input tensor of a convolutional neural network has shape **(4, 4, 2)**, where:
- **4 x 4** = height x width
- **2** = depth (channels)

The first hidden layer is a **convolutional layer** with:
- **2 filters**
- **Stride = (1, 2)**
- **No padding** (`padding='valid'`)
- **Activation: ReLU**

---


### a)  Number of trainable parameters

Each filter has:
- Two ($2 \times 2$) kernels (one per input channel): $2 \times 2 \times 2 = 8$
- One bias

So:
- Per filter: **9 parameters**
- Total for 2 filters:  

$2 \times (2 \times 2 \times 2  + 1) = \boxed{18 \text{ trainable parameters}}$

---

### b) Output tensor shape

- Input size: \(4 \times 4\)
- Kernel size: \(2 \times 2\)
- Stride: (1, 2)
- Padding: `'valid'`

Using formula from https://mmuratarat.github.io/2019-01-17/implementing-padding-schemes-of-tensorflow-in-python for any zero padding to the input:

$$H_2=⌈\frac{H_1−F_h+1}{S_h}⌉ = ⌈\frac{4−2+1}{1}⌉ =3 $$
$$W_2=⌈\frac{W_2−F_w+1}{S_w}⌉ = ⌈\frac{4−2+1}{2}⌉ =2 $$

- Filters: 2  
**Output shape: (3, 2, 2)** (height, width, filters)

---

### c) Computing output tensor values

### Input:

$$
X_1 =
\begin{bmatrix}
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
1 & 1 & 1 & 0 \\
1 & 0 & 0 & 1
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
0 & 0 & 2 & 2 \\
2 & 0 & 0 & 0 \\
2 & 1 & 0 & 0 \\
2 & 0 & 0 & 2
\end{bmatrix}
$$

### Filters:

**Filter 1:**

$$
F_1 =
\begin{bmatrix}
1 & -1 \\
0 & 1
\end{bmatrix},
\quad
F_2 =
\begin{bmatrix}
-1 & 1 \\
1 & 0
\end{bmatrix},
\quad \text{bias} = -3
$$

**Filter 2:**

$$
G_1 =
\begin{bmatrix}
1 & 0 \\
1 & 1
\end{bmatrix},
\quad
G_2 =
\begin{bmatrix}
2 & 0 \\
0 & 1
\end{bmatrix},
\quad \text{bias} = -5
$$

---

### Stride = (1, 2), Kernel = (2, 2)

**The final result below already includes the bias addition and the application of the ReLU activation function to each output element.**

**Output shape:** $(3, 2, 2)$

---

### 🔹 Output[0, 0]

$$
X_1 =
\begin{bmatrix}
0 & 0 \\
0 & 1
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
0 & 0 \\
2 & 0
\end{bmatrix}
$$

- **Filter 1:**

$0 - 0 + 0 + 1 + (-0 + 0 + 2 + 0) - 3 = 1 + 2 - 3 = \boxed{0}$

- **Filter 2:**

$0 + 0 + 0 + 1 + 0 - 5 = \boxed{0}$

---

### 🔹 Output[0, 1]

$$
X_1 =
\begin{bmatrix}
1 & 0 \\
0 & 0
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
2 & 2 \\
0 & 0
\end{bmatrix}
$$

- **Filter 1:**

$1 + 0 + 0 + 0 + (-2 + 2 + 0 + 0) - 3 = 1 + 0 - 3 = \boxed{0}$

- **Filter 2:**

$1 + 0 + 0 + 0 + (4 + 0 + 0 + 0) - 5 = 5 - 5 = \boxed{0}$

---

### 🔹 Output[1, 0]

$$
X_1 =
\begin{bmatrix}
0 & 1 \\
1 & 1
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
2 & 0 \\
2 & 1
\end{bmatrix}
$$

- **Filter 1:**

$0 - 1 + 0 + 1 + (-2 + 0 + 2 + 0) - 3 = 0 + 0 - 3 = \boxed{0}$

- **Filter 2:**

$0 + 0 + 1 + 1 + (4 + 0 + 0 + 1) - 5 = 2 + 5 - 5 = \boxed{2}$

---

### 🔹 Output[1, 1]

$$
X_1 =
\begin{bmatrix}
0 & 0 \\
1 & 0
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
0 & 0 \\
0 & 0
\end{bmatrix}
$$

- **Filter 1:**

$0 - 0 + 0 + 0 + (-0 + 0 + 0 + 0) - 3 =  - 3 = \boxed{0}$

- **Filter 2:**

$0 + 0 + 1 + 0 + (0 + 0 + 0 + 0) - 5 = 1 - 5 = \boxed{0}$

---

### 🔹 Output[2, 0]

$$
X_1 =
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
2 & 1 \\
2 & 0
\end{bmatrix}
$$

- **Filter 1:**

$1 - 1 + 0 + 0 + (-2 + 1 + 2 + 0) - 3 = 0 + 1 - 3 = \boxed{0}$

- **Filter 2:**

$1 + 0 + 1 + 0 + 4 - 5 = 6 - 5 = \boxed{1}$

---

### 🔹 Output[2, 1]

$$
X_1 =
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix},
\quad
X_2 =
\begin{bmatrix}
0 & 0 \\
0 & 2
\end{bmatrix}
$$

- **Filter 1:**

$1 - 0 + 1 + 0 + 0 - 3 = \boxed{0}$

- **Filter 2:**

$1 + 0 + 1 + 0 + 2 - 5 = 4 - 5 = \boxed{0}$

---

### Final Output Tensor:

$$
\text{Output} =
\begin{bmatrix}
\begin{bmatrix} 0 & 0 \end{bmatrix} & \begin{bmatrix} 0 & 0 \end{bmatrix} \\
\begin{bmatrix} 0 & 0 \end{bmatrix} & \begin{bmatrix} 2 & 0 \end{bmatrix} \\
\begin{bmatrix} 0 & 0 \end{bmatrix} & \begin{bmatrix} 1 & 0 \end{bmatrix}
\end{bmatrix}
$$



## Question 4 

### Given:

Let:

- $x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}$ be the input.
- Bias is included by prepending a $1$:  
  $\bar{x} = \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}$

- Activation function: $\Phi(x)$ is ReLU:
  $$
  \Phi(x) = \max(0, x)
  $$

### Weight Matrices:

$$
W_1 =
\begin{bmatrix}
1 & 1 & -3 \\
1 & -1 & 2 \\
-1 & 1 & 1 \\
1 & -1 & -1
\end{bmatrix},
\quad
W_2 =
\begin{bmatrix}
4 & -5 \\
-4 & 5 \\
3 & 1 \\
2 & 7
\end{bmatrix}
$$

---

### a) Network Topology

#### 1. Input Dimensions ($\bar{x}$)

- The operation is $W_1^T \cdot \bar{x}$.
- $W_1$ has dimension $4 \times 3$.
- Its transpose, $W_1^T$, has dimension $3 \times 4$.
- For the matrix multiplication $(3 \times 4) \cdot (\text{dim of } \bar{x})$ to be valid, $\bar{x}$ must be a $4 \times 1$ vector.
- Since $\bar{x} = [1, x_1, x_2, x_3]^T$, the input vector has 3 features plus 1 bias.

**Input dimension:** $4 \times 1$ (1 for bias + 3 for data).

---

#### 2. Hidden Layer Dimensions ($h_1$)

- The result of $W_1^T \cdot \bar{x}$ is a $(3 \times 4) \cdot (4 \times 1) \Rightarrow 3 \times 1$ vector.
- This means the first hidden layer has 3 neurons.

**Hidden layer dimension:** $4 \times 1$ (1 for bias + 3 for neurons).

---

#### 3. Output Layer Dimensions ($\bar{o}$)

- The input to the second layer is the output vector from the first layer, $h_1$, with bias added. So, the vector feeding the second layer is $4 \times 1$ (1 for bias + 3 from $h_1$).
- The operation is $W_2^T \cdot h_1$ (with bias).
- $W_2$ has dimension $4 \times 2$.
- Its transpose, $W_2^T$, has dimension $2 \times 4$.
- The result of $W_2^T \cdot h_1$ is a $(2 \times 4) \cdot (4 \times 1) \Rightarrow 2 \times 1$ vector.

**Output dimension:** $2 \times 1$.
#### 4. Neural Network

The network topology diagram was provided in the classroom along with the .ipynb file.

---

### b) Compute Output for $\bar{x} = [1, 0, 0, 0]$ 

$$
\bar{x} =
\begin{bmatrix}
1 \\
0 \\
0 \\
0
\end{bmatrix}
$$

**Step 1: Compute the output of the first hidden layer ($h_1$)**

First, calculate the pre-activation value, $z_1$:

$$
z_1 = W_1^T \cdot \bar{x}
$$


Multiplying:

$$
z_1 =
\begin{bmatrix}
1 & 1 & -1 & 1 \\
1 & -1 & 1 & -1 \\
-3 & 2 & 1 & -1
\end{bmatrix}
\begin{bmatrix}
1 \\
0 \\
0 \\
0
\end{bmatrix}
=
\begin{bmatrix}
1 \\
1 \\
-3
\end{bmatrix}
$$

Applying the ReLU activation function:

$$
h_1 = \Phi(z_1) = \max(0, z_1) =
\begin{bmatrix}
1 \\
1 \\
0
\end{bmatrix}
$$

**Step 2: Compute the final output ($\bar{o}$)**

Add the bias to the $h_1$ vector:

$$
\bar{h}_1 =
\begin{bmatrix}
1 \\
1 \\
1 \\
0
\end{bmatrix}
$$

Now, calculate the pre-activation of the output layer:

$$
z_2 = W_2^T \cdot \bar{h}_1
$$

Recall:

$$
W_2^T =
\begin{bmatrix}
4 & -4 & 3 & 2 \\
-5 & 5 & 1 & 7
\end{bmatrix}
$$

Multiplying:

$$
z_2 =
\begin{bmatrix}
4 & -4 & 3 & 2 \\
-5 & 5 & 1 & 7
\end{bmatrix}
\begin{bmatrix}
1 \\
1 \\
1 \\
0
\end{bmatrix}
=
\begin{bmatrix}
4 \cdot 1 + (-4) \cdot 1 + 3 \cdot 1 + 2 \cdot 0 \\
-5 \cdot 1 + 5 \cdot 1 + 1 \cdot 1 + 7 \cdot 0
\end{bmatrix}
=
\begin{bmatrix}
3 \\
1
\end{bmatrix}
$$

Applying the ReLU activation function:

$$
\bar{o} = \Phi(z_2) = \max(0, z_2) =
\begin{bmatrix}
3 \\
1
\end{bmatrix}
$$

**Final Answer:**  
The output of the neural network for an input where all elements are zero ($x_1 = x_2 = x_3 = 0$) is:

$$
\bar{o} =
\begin{bmatrix}
3 \\
1
\end{bmatrix}
$$



# Q  uestion 5
### a) Output Tensor Dimension of Each Layer

The output dimension of each layer is calculated sequentially, where the output of one layer becomes the input of the next.

**Formulas used:**
* For Conv2D and MaxPooling2D layers with padding='valid' (no padding): 

    * $H_{out} = \left\lceil \dfrac{H_{in} - H_{filter}+1}{S_h} \right\rceil $
    * $W_{out} = \left\lceil \dfrac{W_{in} - W_{filter}+1}{S_w} \right\rceil $
* For layers with padding='same' (padding to preserve borders):  

    * $H_{out} = \left\lceil \dfrac{H_{in}}{S_h} \right\rceil$
    * $W_{out} = \left\lceil \dfrac{W_{in}}{S_w} \right\rceil$

**Calculated Dimensions:**

1.  **Input Layer**

    * Output Shape: (640, 480, 3)

2.  **Conv2D (1st Layer)**

    * Input: (640, 480, 3)

    * Parameters: kernel=(20, 10), strides=(4, 2), padding='valid'

    * Height: $H_{out} = \left\lceil \frac{640 - 20 + 1}{4} \right\rceil = 156$

    * Width: $W_{out} = \left\lceil \frac{480 - 10 + 1}{2} \right\rceil= 236$

    * Output Shape: (156, 236, 40) (40 is the number of filters)

3.  **MaxPooling2D (1st Layer)**

    * Input: (156, 236, 40)
    * Parameters: pool_size=(2, 2), strides=(2, 2), padding='same'

    * Height: $H_{out} = \left\lceil \frac{156}{2} \right\rceil = 78$

    * Width: $W_{out} = \left\lceil \frac{236}{2} \right\rceil = 118$

    * Output Shape: (78, 118, 40)

4.  **Conv2D (2nd Layer)**

    * Input: (78, 118, 40)

    * Parameters: kernel=(5, 3), strides=(3, 2), padding='same'

    * Height: $H_{out} = \left\lceil \frac{78}{3} \right\rceil = 26$

    * Width: $W_{out} = \left\lceil \frac{118}{2} \right\rceil = 59$

    * Output Shape: (26, 59, 6) (6 is the number of filters)

5.  **MaxPooling2D (2nd Layer)**

* Input: (26, 59, 6)

    * Parameters: pool_size=(3, 3), strides=(3, 2), padding='valid'

* Height: $H_{out} = \left\lceil \frac{26 - 3 + 1}{3} \right\rceil = 8$

* Width: $W_{out} = \left\lceil \frac{59 - 3 + 1}{2} \right\rceil = 29$

* Output Shape: (8, 29, 6)

6.  **Flatten**

    * Input: (8, 29, 6)
    * Operation: $8 \times 29 \times 6 = 1392$
    * Output Shape: (1392,)

7.  **Dense**

    * Input: (1392,)
    * Output Shape: (5,) (5 is the number of neurons/units)

-----
### b) Number of Trainable Parameters in Each Layer

**Formulas used:**

Conv2D: $(H_{kernel} \times W_{kernel} \times C_{in} + 1) \times N_{filters}$ (the `+1` is for the bias of each filter)

Dense: $(N_{input} \times N_{output}) + N_{output}$ (the  N_{output} is for the biases of each neuron)

MaxPooling2D and Flatten have no trainable parameters.

**Calculated Parameters:**

1.  **Conv2D (1st Layer)**

    * $(20 \times 10 \times 3 + 1) \times 40 = (600 + 1) \times 40 = 601 \times 40 = 24,040$
    * **Parameters: 24,040**

2.  **MaxPooling2D (1st Layer)**

    * **Parameters: 0**

3.  **Conv2D (2nd Layer)**

    * Input channels come from the previous layer (40 filters).
    * $(5 \times 3 \times 40 + 1) \times 6 = (600 + 1) \times 6 = 601 \times 6 = 3,606$
    * **Parameters: 3,606**

4.  **MaxPooling2D (2nd Layer)**

    * **Parameters: 0**

5.  **Flatten**

    * **Parameters: 0**

6.  **Dense**

    * Input of 1392 neurons from the Flatten layer.
    * $(1392 \times 5) + 5 = 6,960 + 5 = 6,965$
    * **Parameters: 6,965**

**Total Check:** $24,040 + 3,606 + 6,965 = 34,611$. This matches the total provided in the question.

-----

### c) Parameters of the 1st Layer with New Input Dimension

The formula to calculate the parameters of a Conv2D layer is: 

$$(H_{kernel} \times W_{kernel} \times C_{in} + 1) \times N_{filters}$$

Where:
- **Kernel Size** ($H_{kernel}, W_{kernel}$): Defined when creating the layer (20, 10). Does not depend on the input size.
- **Input Channels** ($C_{in}$): The depth of the input tensor.
- **Number of Filters** ($N_{filters}$): Defined when creating the layer (40). Does not depend on the input size.

The spatial dimension of the input (height and width) **does not affect** the number of parameters in a convolutional filter. Only the **depth (number of channels)** of the input matters.

- Original input: (640, 480, 3) → $C_{in} = 3$
- New input: (6400, 4800, 3) → $C_{in} = 3$
  
The number of trainable parameters in the first convolutional layer would **remain the same** (**24,040**).


# Code Question 5

In [9]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(40, (20, 10), activation='relu', strides=(4,2), input_shape=(640, 480, 3)))
model.add(tf.keras.layers.MaxPooling2D((2, 2), padding='same', strides=(2,2)))
model.add(tf.keras.layers.Conv2D(6, (5, 3), activation='relu', padding='same', strides=(3,2)))
model.add(tf.keras.layers.MaxPooling2D((3, 3), strides=(3,2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(5, activation='linear'))
model.summary()


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


# Question 6

### **a) What is the value of Ndense?**

The formula for the parameters of a dense layer is:
$$(\text{number of input neurons} + 1) \times \text{number of output neurons}$$

First, we need to know the **number of input neurons**, which is the size of the vector after the Flatten layer. For this, we calculate the output shape after the MaxPooling2D layer.

1.  **Input Layer**

    * Output Shape: (60, 60, 3)

2.  **Conv2D**

    * Input: (60, 60, 3)
    * Parameters: kernel=(12, 6), strides=(1, 1), padding='valid'
    * Height: $H_{out} = \left\lceil \frac{60 - 12 + 1}{1} \right\rceil = 49$
    * Width: $W_{out} = \left\lceil \frac{60 - 6 + 1}{1} \right\rceil= 55$
    * Output Shape: (49, 55, 5) 

3.  **MaxPooling2D**

    * Input: (49, 55, 5)
    * Parameters: pool_size=(2, 4), strides=(1, 1), padding='same'
    * Height: $H_{out} = \left\lceil \frac{49}{1} \right\rceil = 49$
    * Width: $W_{out} = \left\lceil \frac{55}{1} \right\rceil = 55$
    * Output Shape: (49, 55, 5)

4.  **Flatten**
    
    * Input: (49, 55, 5)
    * Operation: $49 \times 55 \times 5 = 13,475$
    * Output Shape: (13,475,)

5.  **Dense**

    * Input: (13,475,)

Now, we can define the number of parameters in the Dense layer:  
* Input neurons: 13,475  
* Output neurons: $N_{dense}$  

Dense layer parameters:  
$$(13,475 + 1) \times N_{dense} = 13,476 \times N_{dense} = 41,513 - 1,085$$
$$N_{dense}= \frac{40,428}{13,476} = 3$$

### **b) What is the number of trainable parameters in each layer?**

#### **1. Conv2D Layer**

The formula to calculate the parameters of a convolutional layer is:
$$(\text{kernel height} \times \text{kernel width} \times \text{input channels} + 1) \times \text{number of filters}$$

Therefore:

$$(12 \times 6 \times 3 + 1) \times 5 = (72 \times 3 + 1) \times 5 = (216 + 1) \times 5 = 217 \times 5 = \mathbf{1,085 \text{ parameters}}$$

#### **2. MaxPooling2D and Flatten Layers**

The `MaxPooling2D` and `Flatten` layers have no trainable parameters. MaxPooling2D only reduces the spatial dimensionality by applying a max operation, and Flatten only reshapes the data from a multi-dimensional tensor to a one-dimensional vector. Therefore, both have **0 parameters**.

#### **3. Dense (Fully Connected) Layer**

$$41,513 - 1,085 = 40,428$$

### Summary Table

| Layer         | Output (shape)    | Trainable Parameters |
|---------------|-------------------|---------------------|
| Conv2D        | (49, 55, 5)       | 1,085               |
| MaxPooling2D  | (49, 55, 5)       | 0                   |
| Flatten       | (13,475,)         | 0                   |
| Dense         | (3,)              | 40,428              |
