# Part 1: 
### a: In your opinion, what were the most important turning points in the history of deep learning?
In my opinion, the history of deep learning has several key turning points. Early neural networks and perceptrons introduced the idea of learning from data, but training deep networks became practical only after backpropagation was developed. The 2006 paper on deep belief networks and the term "deep learning" marked a significant turning point, separating it from traditional neural networks.

The success of GPU-accelerated CNNs in 2012 (e.g., AlexNet) showed that deep networks could achieve state-of-the-art results in image recognition. Later, architectures like RNNs, LSTMs, and Transformers revolutionized natural language processing and sequence modeling. Today, large language models such as ChatGPT have transformed how we interact with AI. Personally, I found OpenAI Five impressive as well, since it showed how reinforcement learning could train agents to reach a superhuman win rate in complex environments.

### b: Explain the ADAM optimizer.
The ADAM (Adaptive Moment Estimation) is basically an optimizer that improves how neural networks learn. It is one of the most widely used optimizers in deep learning. ADAM optimizer is an advanced optimization algorithm that combines the benefits of two other popular optimizers: AdaGrad (moving averages of past gradients) and RMSProp (scaling updates based on squared gradients). It computes adaptive learning rates for each parameter by maintaining two moving averages: the first moment (mean) and the second moment (uncentered variance) of the gradients. This means it can train faster and more reliably on large or noisy datasets.

### c: Assume data input is a single 30x40 pixel image. First layer is a convolutional layer with 5 filters, with kernel size 3x2, step size (1,1) and padding='valid'. What are the output dimensions?


We use the standard convolution output size formula:

[
H_{out} = \frac{H_{in} + 2 \cdot P_h - K_h}{S_h} + 1
]
[
W_{out} = \frac{W_{in} + 2 \cdot P_w - K_w}{S_w} + 1
]

Where:

* (H_{in}, W_{in}) = input height and width
* (K_h, K_w) = kernel size
* (S_h, S_w) = stride
* (P_h, P_w) = padding (0 for `"valid"`)

Now in Python:

```python
# Part 1 - c
# Input dimensions
H_in, W_in = 30, 40  

# Conv layer parameters
kernel_h, kernel_w = 3, 2
stride_h, stride_w = 1, 1
pad_h, pad_w = 0, 0   # 'valid' means no padding
num_filters = 5

# Compute output dimensions
H_out = (H_in + 2*pad_h - kernel_h)//stride_h + 1
W_out = (W_in + 2*pad_w - kernel_w)//stride_w + 1

output_shape = (num_filters, H_out, W_out)
print("Output shape:", output_shape)
```

✅ **Expected result:**

```
Output shape: (5, 28, 39)
```

So the output is **5 feature maps of size 28×39**.

---

Want me to also format the **answer explanation** (like part a and b) so it reads like: *“The output dimensions are (5, 28, 39) because …”*?
