# Chapter 4: Dipping Toes in Deep Learning

## 1️⃣ Chapter Overview

In the previous chapters, we learned about the building blocks (Tensors, Variables) and the tools (Keras, tf.data) of TensorFlow. Now, we finally put them to use to build actual Deep Learning models.

This chapter introduces the three fundamental pillars of deep learning architectures, applying each to a specific real-world problem type:

1.  **Fully Connected Networks (FCNs):** We will build an **Autoencoder** to restore/denoise corrupted images (Unsupervised Learning).
2.  **Convolutional Neural Networks (CNNs):** We will build a **Classifier** to identify objects in images using the CIFAR-10 dataset (Computer Vision).
3.  **Recurrent Neural Networks (RNNs):** We will build a **Forecaster** to predict future CO2 levels based on historical time-series data (Sequence Modeling).

**Practical Skills:**
* Image restoration/denoising.
* Handling multi-dimensional image data for CNNs.
* Preprocessing time-series data (windowing, differencing) for RNNs.
* Implementing `SimpleRNN`, `Conv2D`, and `Dense` layers.

## 2️⃣ Section 1: Fully Connected Networks (Autoencoders)

### 2.1 Theoretical Explanation

**What is an Autoencoder?**
An Autoencoder is a type of neural network trained to copy its input to its output. It usually consists of two parts:
1.  **Encoder:** Compresses the input $x$ into a lower-dimensional latent representation (code) $h$. 
    $$ h = f(x) $$
2.  **Decoder:** Reconstructs the input $x'$ from the latent representation $h$.
    $$ x' = g(h) $$

**Why use it?**
If the network is forced to prioritize which information to keep (by creating a bottleneck), it learns useful features about the data. Here, we use a **Denoising Autoencoder**. We feed it a *corrupted* image and force it to predict the *clean* original image. This forces the model to learn the underlying structure of the visual data to fill in the missing pieces.

### 2.2 Data Preparation (MNIST)
We will use the MNIST handwritten digits dataset. We will artificially corrupt the images by applying a random mask (setting pixels to black).

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt

# 1. Load Data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 2. Preprocessing
# We normalize inputs to [-1, 1] because we will use 'tanh' activation in the last layer.
# Formula: (x - 127.5) / 127.5 approx equal to (x - 128) / 128
x_train = (x_train.astype('float32') - 127.5) / 127.5
x_test = (x_test.astype('float32') - 127.5) / 127.5

# Flatten the images (28x28 -> 784) for the Fully Connected Network
x_train = x_train.reshape((-1, 784))
x_test = x_test.reshape((-1, 784))

# 3. Corrupting Data Function
def generate_masked_inputs(x, p=0.5, seed=42):
    """
    Randomly sets pixels to 0 (black) with probability p.
    """
    np.random.seed(seed)
    mask = np.random.binomial(n=1, p=p, size=x.shape).astype('float32')
    return x * mask

# Generate corrupted training data
x_train_masked = generate_masked_inputs(x_train)
x_test_masked = generate_masked_inputs(x_test)

# Visualize
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.title("Original")
plt.imshow(x_train[0].reshape(28,28), cmap='gray')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.title("Corrupted (Input to Model)")
plt.imshow(x_train_masked[0].reshape(28,28), cmap='gray')
plt.axis('off')
plt.show()

### 2.3 Implementing the Autoencoder
We will build a simple stack of Dense layers. The architecture is hourglass-shaped:
Input (784) -> 64 -> 32 (Bottleneck) -> 64 -> Output (784).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the model
autoencoder = Sequential([
    # Encoder
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(32, activation='relu'),
    
    # Decoder
    Dense(64, activation='relu'),
    # Output layer: tanh produces values in [-1, 1], matching our normalization
    Dense(784, activation='tanh') 
])

autoencoder.compile(loss='mse', optimizer='adam')
autoencoder.summary()

# Train the model
# Input: Masked Images, Target: Original Images
history = autoencoder.fit(x_train_masked, x_train, 
                          batch_size=64, 
                          epochs=10,
                          validation_split=0.1,
                          verbose=1)

### 2.4 Evaluation
Let's see how well our model restores unseen corrupted images.

In [None]:
# Predict on test data
restored_imgs = autoencoder.predict(x_test_masked)

# Visualization
n = 5
plt.figure(figsize=(15, 5))
for i in range(n):
    # Corrupted
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test_masked[i].reshape(28, 28), cmap='gray')
    plt.title("Corrupted")
    plt.axis('off')

    # Restored
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(restored_imgs[i].reshape(28, 28), cmap='gray')
    plt.title("Restored")
    plt.axis('off')
plt.show()

## 3️⃣ Section 2: Convolutional Neural Networks (CNNs)

### 3.1 Theoretical Explanation
Fully connected networks destroy spatial information by flattening images. **CNNs** preserve the grid structure of images (height, width, channels).

**Key Components:**
1.  **Convolution:** Slides a small filter (kernel) over the image to detect local patterns (edges, textures).
2.  **Pooling:** Downsamples the image (e.g., Max Pooling) to reduce dimensionality and provide translation invariance.

**Dimensionality Arithmetic:**
If input is $W$, filter size is $K$, padding is $P$, and stride is $S$, output size is:
$$ \frac{W - K + 2P}{S} + 1 $$

### 3.2 Data Preparation (CIFAR-10)
We use CIFAR-10, a dataset of 60,000 color images (32x32) in 10 classes (airplane, car, bird, cat, etc.).

In [None]:
import tensorflow_datasets as tfds

# Load CIFAR-10 using TFDS
data, info = tfds.load("cifar10", with_info=True, as_supervised=True)
train_data = data['train']
test_data = data['test']

def format_image(image, label):
    image = tf.cast(image, tf.float32) / 255.0
    label = tf.one_hot(label, depth=10)
    return image, label

BATCH_SIZE = 32
train_ds = train_data.map(format_image).shuffle(1000).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_ds = test_data.map(format_image).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

print("Image Shape:", info.features['image'].shape)
print("Num Classes:", info.features['label'].num_classes)

### 3.3 Implementing the CNN
We use `Conv2D` layers followed by `MaxPool2D`. 

**Note on Padding:** We use `padding='same'` to keep dimensions consistent after convolution, relying on Pooling to reduce the size. If we used `padding='valid'`, the size would shrink at every convolution, potentially causing "Negative Dimension" errors in deeper networks if the image becomes too small.

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten

cnn = Sequential([
    # Layer 1: Conv -> MaxPool
    # Input: 32x32x3 -> Output: 32x32x16 (same padding) -> 16x16x16 (pooling)
    Conv2D(16, (3,3), activation='relu', padding='same', input_shape=(32,32,3)),
    MaxPool2D((2,2)),
    
    # Layer 2: Conv -> MaxPool
    # Input: 16x16x16 -> Output: 16x16x32 -> 8x8x32
    Conv2D(32, (3,3), activation='relu', padding='same'),
    MaxPool2D((2,2)),
    
    # Layer 3: Conv (No pooling)
    # Input: 8x8x32 -> Output: 8x8x64
    Conv2D(64, (3,3), activation='relu', padding='same'),
    
    # Flatten for dense layers
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax') # 10 classes
])

cnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
cnn.summary()

# Train
history_cnn = cnn.fit(train_ds, epochs=10, validation_data=test_ds)

## 4️⃣ Section 3: Recurrent Neural Networks (RNNs)

### 4.1 Theoretical Explanation
Standard networks assume inputs are independent (I.I.D). Time-series data (stock prices, weather) violates this; today's temperature depends on yesterday's.

**RNNs** handle this by maintaining a **State (Memory)**. As they process a sequence, the output depends on both the current input $x_t$ and the previous state $h_{t-1}$.

### 4.2 Data Preparation (CO2 Concentration)
We will download a dataset of atmospheric CO2 concentrations and try to predict the future.

In [None]:
import pandas as pd
import os
import requests

# 1. Download Data
url = "https://datahub.io/core/co2-ppm/r/co2-mm-gl.csv"
file_path = "co2-mm-gl.csv"

if not os.path.exists(file_path):
    r = requests.get(url)
    with open(file_path, 'wb') as f:
        f.write(r.content)

data = pd.read_csv(file_path)
data['Date'] = pd.to_datetime(data['Date'])
data = data.set_index('Date')

# 2. Preprocessing (Detrending)
# The raw CO2 values constantly increase. This trend makes it hard for ML models.
# We convert values to 'Differences': Value(t) - Value(t-1).
# This makes the data stationary (range approx -2.0 to 1.5).
data['Average Diff'] = data['Average'] - data['Average'].shift(1)
data = data.dropna()

# Visualization
data['Average Diff'].plot(figsize=(10, 5), title="CO2 Monthly Differences")
plt.show()

### 4.3 Windowing the Data
RNNs need sequences as input. We cannot feed a single number to predict the next. 

**Strategy:**
We take a window of past `N` values (e.g., 12 months) to predict the `N+1`th value.
* Input shape: `(Batch_Size, Time_Steps, Features)`
* Time Steps = 12
* Features = 1 (just the CO2 value)

In [None]:
def generate_data(series, n_seq):
    x, y = [], []
    values = series.values
    for i in range(len(values) - n_seq):
        # Inputs: i to i+n_seq
        x.append(values[i : i+n_seq])
        # Target: i+n_seq (the next value)
        y.append(values[i+n_seq])
    return np.array(x), np.array(y)

N_SEQ = 12 # Look back 1 year
x_rnn, y_rnn = generate_data(data['Average Diff'], N_SEQ)

# Reshape for RNN: [Samples, Time_Steps, Features]
x_rnn = x_rnn.reshape(-1, N_SEQ, 1)
y_rnn = y_rnn.reshape(-1, 1)

print("RNN Input Shape:", x_rnn.shape)
print("RNN Target Shape:", y_rnn.shape)

### 4.4 Implementing the RNN
We use `SimpleRNN`. Note that this layer is computationally simpler than LSTM or GRU but suffers from the vanishing gradient problem for very long sequences. For 12 steps, it works fine.

In [None]:
from tensorflow.keras.layers import SimpleRNN

rnn_model = Sequential([
    # SimpleRNN layer
    SimpleRNN(64, activation='relu', input_shape=(N_SEQ, 1)),
    Dense(64, activation='relu'),
    Dense(1) # Regression output (predicting a continuous value)
])

rnn_model.compile(loss='mse', optimizer='adam')
rnn_model.summary()

# Train
history_rnn = rnn_model.fit(x_rnn, y_rnn, epochs=20, batch_size=32, verbose=1)

### 4.5 Forecasting the Future
We can now use the model to predict CO2 levels. Note that the model predicts the *difference*. To get the actual value, we must add the predicted difference to the previous month's actual value.

In [None]:
# Let's forecast the last 12 months known data to verify
last_sequence = x_rnn[-1:]
predicted_diff = rnn_model.predict(last_sequence)

actual_last_val = data['Average'].iloc[-2] # The value before the one we predict
predicted_val = actual_last_val + predicted_diff[0][0]

print(f"Predicted Diff: {predicted_diff[0][0]:.4f}")
print(f"Predicted Absolute Value: {predicted_val:.2f}")

## 5️⃣ Chapter Summary

In this chapter, we implemented the "Big Three" architectures of deep learning:

* **Fully Connected Networks (Autoencoders):**
    * Used `Dense` layers.
    * Learned to map Inputs $\to$ Latent Code $\to$ Inputs.
    * Successfully removed noise from images.
* **Convolutional Neural Networks (CNNs):**
    * Used `Conv2D` and `MaxPool2D`.
    * Learned spatial hierarchies in image data (Edges $\to$ Shapes $\to$ Objects).
    * Achieved classification on CIFAR-10.
* **Recurrent Neural Networks (RNNs):**
    * Used `SimpleRNN`.
    * Processed data sequentially, maintaining a memory of the past.
    * Learned to forecast time-series trends.