Explanation of deep learning techniques
Introduction
Deep learning is a subset of machine learning that focuses on algorithms inspired by the structure and function of the brain, called artificial neural networks. These networks are designed to learn from vast amounts of data, making deep learning a powerful tool for tasks such as image recognition, natural language processing, and game playing.

Key deep learning techniques
Feedforward neural networks
Overview
A feedforward neural network (FNN) is the simplest form of neural network. In these networks, information flows in one direction—from the input layer, through the hidden layers, to the output layer—without any feedback loops.

Key features
Architecture: composed of an input layer, one or more hidden layers, and an output layer.

Activation functions: these introduce non-linearity into the network. Common activation functions include Rectified linear unit (ReLU) and Sigmoid.

Training: feedforward networks are trained using backpropagation, where the error is propagated backward through the network to update the weights.

Applications
Image classification: image classification is the process of identifying and categorizing objects within an image, using algorithms that assign labels to specific features. For example, in a system that recognizes animals, an image classification model can distinguish between a cat, dog, or bird based on visual characteristics. A common real-world use is in medical imaging, where AI helps classify X-rays or MRIs as normal or showing signs of disease.

Simple regression tasks: simple regression involves predicting a continuous value based on input variables, typically using linear regression to model the relationship. For example, predicting house prices based on factors such as square footage or location uses regression to estimate the expected price. A real-world example includes forecasting sales based on past data, helping businesses plan inventory or budget allocations.

In [1]:
import tensorflow as tf

2025-07-10 10:53:13.652945: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-07-10 10:53:14.404546: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2025-07-10 10:53:14.404577: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2025-07-10 10:53:14.485246: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-10 10:53:16.889958: W tensorflow/stream_executor/platform/de

# 1 - FeedForward Neural Network - FNNs - Problemas mais simples - "Substitui SciKit Learn"

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1, 1)

# One-hot encode labels
encoder = OneHotEncoder(sparse_output=False)
y = encoder.fit_transform(y)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [3]:
from tensorflow.keras import layers, models

# Build the FNN model
model_fnn = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')  # 3 output classes for the Iris dataset
])

2025-07-10 10:54:21.805164: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2025-07-10 10:54:21.806002: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2025-07-10 10:54:21.806041: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (codespaces-dadea9): /proc/driver/nvidia/version does not exist
2025-07-10 10:54:21.807275: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


O modelo criado é uma Rede Neural Feedforward (FNN) projetada para classificar as flores do famoso conjunto de dados Iris em três espécies distintas. A arquitetura da rede é composta por três camadas densas (fully connected):

- **Primeira camada densa:** Possui 64 neurônios e utiliza a função de ativação ReLU (Rectified Linear Unit), que introduz não-linearidade ao modelo e permite que ele aprenda relações complexas entre as variáveis de entrada. O input_shape=(X_train.shape[1],) indica que a camada espera receber vetores de 4 características (comprimento e largura das sépalas e pétalas).

- **Segunda camada densa:** Com 32 neurônios e também ativação ReLU, essa camada aprofunda a capacidade de representação do modelo, permitindo que ele combine as informações extraídas na camada anterior de forma ainda mais abstrata.

- **Camada de saída:** Possui 3 neurônios, correspondendo às três classes do problema (setosa, versicolor e virginica). A função de ativação softmax transforma as saídas em probabilidades, garantindo que a soma das probabilidades seja igual a 1. Assim, o modelo retorna a probabilidade de cada flor pertencer a cada uma das espécies.

Matematicamente, cada camada realiza uma transformação linear dos dados de entrada (multiplicação por uma matriz de pesos e soma de um vetor de bias), seguida por uma função de ativação não-linear. O processo pode ser representado como:

$$
\text{Saída} = \text{softmax}(W_3 \cdot \text{ReLU}(W_2 \cdot \text{ReLU}(W_1 \cdot X + b_1) + b_2) + b_3)
$$

Onde $W_i$ e $b_i$ são os pesos e bias de cada camada, e $X$ é o vetor de entrada.

Essa estrutura permite ao modelo aprender fronteiras de decisão complexas no espaço de características, tornando-o capaz de separar as diferentes espécies de flores com alta precisão. O uso de camadas densas e funções de ativação apropriadas é fundamental para capturar padrões não-lineares presentes nos dados.

In [4]:
# Compile the model
model_fnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model_fnn.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x79b89cec73a0>

# 2 - Implementando uma rede neural Convolucional - CNN - Imagens 

Convolutional neural networks
Overview
Convolutional neural networks (CNNs) are specialized for processing grid-like data such as images. CNNs use convolutional layers to detect patterns automatically in data, such as edges, textures, and shapes.

Key features
Convolutional layers: these layers apply filters (kernels) that slide over the input data, producing feature maps.

Pooling layers: these layers reduce the spatial dimensions of the data, which decreases the computational load and helps the network focus on the most important features.

Fully connected layers: these layers are usually at the end of the network to perform classification or regression tasks.

Applications
Image classification

Object detection: object detection goes beyond classification by identifying and locating objects within an image, and drawing bounding boxes around them. For example, in self-driving cars, object detection is used to identify pedestrians, traffic lights, and other vehicles in real time. A common application is security surveillance, where cameras detect and track intruders in real time.

Video analysis: video analysis processes video data to extract insights, such as recognizing actions, events, or patterns over time. For instance, in sports broadcasting, video analysis can track players' movements and analyze game strategies in real time. Another example is traffic monitoring, where video analysis is used to detect accidents or measure traffic flow.

In [11]:
# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [2]:
# Convolutional Neural Network
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

## Explicação conceitual e matemática da CNN construída

A rede neural convolucional (CNN) montada no código é composta por várias camadas, cada uma responsável por uma transformação matemática específica dos dados de entrada. Vamos detalhar cada etapa, utilizando notação matricial e vetorial para descrever as operações.

### 1. Entrada

A entrada da rede é uma imagem de dimensão $32 \times 32 \times 3$, onde $32 \times 32$ são as dimensões espaciais e $3$ representa os canais de cor (RGB).

$$
\mathbf{X}_0 \in \mathbb{R}^{32 \times 32 \times 3}
$$

---

### 2. Primeira camada convolucional

`Conv2D(32, (3, 3), activation='relu')`

- Aplica 32 filtros (kernels) de tamanho $3 \times 3$.
- Cada filtro $\mathbf{K}_i \in \mathbb{R}^{3 \times 3 \times 3}$.
- A operação de convolução para cada filtro $i$ é:

$$
\mathbf{Z}_1^{(i)} = \mathbf{X}_0 * \mathbf{K}_i + b_i
$$

onde $*$ denota a convolução e $b_i$ é o viés do filtro $i$.

- Após a convolução, aplica-se a função de ativação ReLU:

$$
\mathbf{A}_1^{(i)} = \text{ReLU}(\mathbf{Z}_1^{(i)}) = \max(0, \mathbf{Z}_1^{(i)})
$$

- A saída tem dimensão $30 \times 30 \times 32$ (considerando padding 'valid').

---

### 3. Primeira camada de MaxPooling

`MaxPooling2D((2, 2))`

- Reduz as dimensões espaciais pela metade, pegando o valor máximo em cada janela $2 \times 2$:

$$
\mathbf{A}_2^{(i)}(x, y) = \max_{(m, n) \in [0,1]^2} \mathbf{A}_1^{(i)}(2x+m, 2y+n)
$$

- Saída: $15 \times 15 \times 32$

---

### 4. Segunda camada convolucional

`Conv2D(64, (3, 3), activation='relu')`

- 64 filtros de tamanho $3 \times 3 \times 32$.
- Para cada filtro $j$:

$$
\mathbf{Z}_3^{(j)} = \mathbf{A}_2 * \mathbf{K}_j + b_j
$$

$$
\mathbf{A}_3^{(j)} = \text{ReLU}(\mathbf{Z}_3^{(j)})
$$

- Saída: $13 \times 13 \times 64$

---

### 5. Segunda camada de MaxPooling

`MaxPooling2D((2, 2))`

- Reduz para $6 \times 6 \times 64$:

$$
\mathbf{A}_4^{(j)}(x, y) = \max_{(m, n) \in [0,1]^2} \mathbf{A}_3^{(j)}(2x+m, 2y+n)
$$

---

### 6. Flatten

`Flatten()`

- Transforma o tensor $6 \times 6 \times 64$ em um vetor de dimensão $2304$:

$$
\mathbf{a}_5 \in \mathbb{R}^{2304}
$$

---

### 7. Camada densa (fully connected)

`Dense(64, activation='relu')`

- Multiplicação matricial:

$$
\mathbf{z}_6 = \mathbf{W}_6 \mathbf{a}_5 + \mathbf{b}_6
$$

$$
\mathbf{a}_6 = \text{ReLU}(\mathbf{z}_6)
$$

onde $\mathbf{W}_6 \in \mathbb{R}^{64 \times 2304}$, $\mathbf{b}_6 \in \mathbb{R}^{64}$.

---

### 8. Camada de saída

`Dense(10, activation='softmax')`

- Multiplicação matricial:

$$
\mathbf{z}_7 = \mathbf{W}_7 \mathbf{a}_6 + \mathbf{b}_7
$$

- Softmax para obter probabilidades para cada classe $k$:

$$
\hat{y}_k = \frac{\exp(z_{7,k})}{\sum_{l=1}^{10} \exp(z_{7,l})}
$$

onde $\mathbf{W}_7 \in \mathbb{R}^{10 \times 64}$, $\mathbf{b}_7 \in \mathbb{R}^{10}$.

---

### Resumo do fluxo de dados

$$
\mathbf{X}_0 \xrightarrow{\text{Conv2D}} \mathbf{A}_1 \xrightarrow{\text{MaxPool}} \mathbf{A}_2 \xrightarrow{\text{Conv2D}} \mathbf{A}_3 \xrightarrow{\text{MaxPool}} \mathbf{A}_4 \xrightarrow{\text{Flatten}} \mathbf{a}_5 \xrightarrow{\text{Dense}} \mathbf{a}_6 \xrightarrow{\text{Dense+Softmax}} \hat{\mathbf{y}}
$$

Cada etapa transforma os dados, extraindo características espaciais (convoluções), reduzindo dimensionalidade (pooling), e finalmente classificando (camadas densas e softmax). O aprendizado ocorre ajustando os pesos ($\mathbf{K}_i$, $\mathbf{W}_6$, $\mathbf{W}_7$) e vieses para minimizar a função de perda durante o treinamento.


# 3 - Recurrent Neural Network - Time Series - Voz - Dados financeiros

Recurrent neural networks
Overview
Recurrent neural networks (RNNs) are designed for sequential data, such as time series or language. Unlike FNNs, RNNs maintain a "memory" of previous inputs by passing the output of one layer back into the network.

Key features
Hidden state: this maintains the context of previous inputs in the network.

Long short-term memory (LSTM) and gated recurrent units (GRUs): these are advanced RNN architectures that address the problem of long-term dependencies, making them effective at capturing information over long sequences.

Applications
Time-series forecasting

Natural language processing, such as language translation and sentiment analysis

In [5]:
import numpy as np

# Generate synthetic sine wave data
t = np.linspace(0, 100, 10000)
X = np.sin(t).reshape(-1, 1)

# Prepare sequences
def create_sequences(data, seq_length):
    X_seq, y_seq = [], []
    for i in range(len(data) - seq_length):
        X_seq.append(data[i:i+seq_length])
        y_seq.append(data[i+seq_length])
    return np.array(X_seq), np.array(y_seq)

seq_length = 200
X_seq, y_seq = create_sequences(X, seq_length)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, random_state=42)

O dataset foi criado para simular um problema típico de séries temporais, ideal para testar redes neurais recorrentes (RNNs). O processo matemático é o seguinte:

1. **Geração do sinal senoide:**
    - O vetor de tempo $t$ é criado igualmente espaçado entre 0 e 100, com 10.000 pontos:
      $$
      t = [0, \Delta t, 2\Delta t, ..., 100], \quad \Delta t = \frac{100}{9999}
      $$
    - O sinal $X$ é então:
      $$
      X = \sin(t)
      $$
      Cada elemento $X[i] = \sin(t[i])$.

2. **Criação das sequências para RNN:**
    - Para treinar uma RNN, precisamos de pares (entrada, saída) onde a entrada é uma sequência de valores e a saída é o próximo valor da série.
    - Para cada índice $i$ de $0$ até $N - \text{seq\_length} - 1$:
      - Entrada: $[X[i], X[i+1], ..., X[i+\text{seq\_length}-1]]$
      - Saída: $X[i+\text{seq\_length}]$
    - Formalmente:
      $$
      \mathbf{X}_{\text{seq}}^{(i)} = [X[i], X[i+1], ..., X[i+\text{seq\_length}-1]]
      $$
      $$
      y_{\text{seq}}^{(i)} = X[i+\text{seq\_length}]
      $$
    - Isso gera um dataset de pares (sequência, próximo valor).

3. **Divisão em treino e teste:**
    - O conjunto de sequências $(\mathbf{X}_{\text{seq}}, y_{\text{seq}})$ é dividido aleatoriamente em treino (80%) e teste (20%) usando `train_test_split`.

**Resumo:**  
O objetivo é prever o próximo valor da série senoide, dado os 100 valores anteriores. Assim, a RNN aprende a modelar padrões temporais do sinal senoide.

In [6]:
# Simple RNN
model_rnn = models.Sequential([
    layers.SimpleRNN(128, input_shape=(200, 1)),
    layers.Dense(10, activation='softmax')
])

In [7]:
# Compile the model
model_rnn.compile(optimizer='adam', loss='mse')

# Train the model
model_rnn.fit(X_train, y_train, epochs=15, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/15


Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x79b894585b70>

# 4 - Generative Adversarial Networks - GANs - Image - Recriar Dados - Sintéticos

Generative adversarial networks
Overview
Generative adversarial networks (GANs) consist of two networks, a generator and a discriminator, that are trained simultaneously. The generator creates fake data, and the discriminator attempts to distinguish between real and generated data. Over time, the generator improves its ability to produce realistic data.

Key features
Generator: learns to create data that is indistinguishable from real data.

Discriminator: learns to distinguish between real and generated data.

Adversarial training: the two networks compete with each other, leading to better results over time.

Applications
Image generation: image generation uses algorithms to create new images from scratch based on input data, often employing techniques such as GANs. For instance, AI can generate realistic images of faces that don’t belong to any real person. A common application is in art, where AI generates original artworks or designs based on style preferences.

Style transfer: style transfer involves applying the artistic style of one image to the content of another, creating a blend of both. For example, AI can take a photo of a cityscape and apply the painting style of Van Gogh, transforming it into a starry, impressionistic version. This technique is used in digital art and content creation to stylize photos or videos.

Data augmentation for training models: data augmentation artificially expands a dataset by creating modified versions of the existing data, such as by rotating, cropping, or flipping images. For instance, in image recognition tasks, augmenting a dataset of dog photos by flipping or zooming in on them helps improve the model's accuracy. This technique is essential for preventing overfitting in machine learning models and improving generalization.

In [8]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Define the generator model
def build_generator():
    model = models.Sequential([
        layers.Dense(128, activation='relu', input_shape=(100,)),
        layers.Dense(784, activation='sigmoid')  # Output: 28x28 flattened image
    ])
    return model

# Define the discriminator model
def build_discriminator():
    model = models.Sequential([
        layers.Dense(128, activation='relu', input_shape=(784,)),  # Input: Flattened 28x28 image
        layers.Dense(1, activation='sigmoid')  # Output: Probability (real or fake)
    ])
    return model

In [None]:
import numpy as np
from tensorflow.keras.datasets import mnist

# Load and preprocess dataset (MNIST for example)
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
#(X_train, _), (_, _) = mnist.load_data()

# Normalize images to [-1, 1] and flatten to (784,) for the discriminator input
X_train = (X_train.astype(np.float32) - 127.5) / 127.5  # Normalize to range [-1, 1]
X_train = X_train.reshape(-1, 784)  # Flatten 28x28 images to vectors of size 784

# Check the shape of the dataset
print(f"X_train shape: {X_train.shape}")  # Should print: (60000, 784)

# Build the models
generator = build_generator()
discriminator = build_discriminator()

# Compile the discriminator
discriminator.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Create GAN model: stack generator and discriminator
gan = models.Sequential([generator, discriminator])
discriminator.trainable = False  # Freeze the discriminator when training the GAN
gan.compile(optimizer='adam', loss='binary_crossentropy')

# Training loop
epochs = 1000
batch_size = 32
half_batch = batch_size // 2

for epoch in range(epochs):
    # Train discriminator with real images
    idx = np.random.randint(0, X_train.shape[0], half_batch)  # Random real images
    real_imgs = X_train[idx]
    real_labels = np.ones((half_batch, 1))  # Real labels (1s)

    # Train discriminator with fake images
    noise = np.random.normal(0, 1, (half_batch, 100))  # Random noise input
    fake_imgs = generator.predict(noise)  # Fake images generated by the generator
    fake_labels = np.zeros((half_batch, 1))  # Fake labels (0s)

    # Train the discriminator on real and fake images
    d_loss_real = discriminator.train_on_batch(real_imgs, real_labels)
    d_loss_fake = discriminator.train_on_batch(fake_imgs, fake_labels)

    # Train the generator (the generator wants to fool the discriminator)
    noise = np.random.normal(0, 1, (batch_size, 100))  # Generate new noise
    gan_labels = np.ones((batch_size, 1))  # We want the generator to produce "real" images
    g_loss = gan.train_on_batch(noise, gan_labels)

    # Log progress every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Discriminator Loss: {d_loss_real[0]}, Generator Loss: {g_loss}")

# 5 - Autoencoders - Unsupervised - Achar outliers - Compressão de Dados

In this code, we train the discriminator on both real and fake images. Then, we train the generator to produce images that can fool the discriminator. The GAN is trained for 100 epochs, and we track the loss of both networks over time. For real-world applications, you’d want to train a GAN for well over 100 epochs. 

Autoencoders
Overview
Autoencoders are unsupervised learning models used for data compression. They consist of an encoder that compresses the input data into a lower-dimensional representation and a decoder that reconstructs the original data from this representation.

Key features
Encoder: compresses the data into a lower-dimensional space.

Decoder: reconstructs the original data from the compressed representation.

Bottleneck layer: this is the low-dimensional representation, also called the latent space.

Applications
Dimensionality reduction: dimensionality reduction reduces the number of features in a dataset while retaining essential information, simplifying the data for analysis. For example, using principal component analysis in a dataset with hundreds of features (such as gene expression data) can reduce it to a few key dimensions for easier visualization. This technique is often used in fields such as bioinformatics or finance to avoid overfitting and improve computational efficiency.

Anomaly detection: anomaly detection identifies unusual patterns or data points that deviate from the norm, often used in monitoring and fraud detection. For example, in credit card transactions, anomaly detection algorithms flag suspicious activities such as unusually high purchases or transactions from uncommon locations. This method helps detect fraud, equipment malfunctions, or cybersecurity threats in real time.

Data denoising: data denoising removes noise or irrelevant information from data to improve its quality and make it more usable for analysis. For instance, in image processing, denoising algorithms remove graininess or distortions in pictures captured under poor lighting conditions. It's commonly used in audio, video, and image processing to enhance the clarity and accuracy of data.

In [10]:
# Define the encoder
def build_encoder():
    input_img = layers.Input(shape=(784,))
    encoded = layers.Dense(128, activation='relu')(input_img)
    encoded = layers.Dense(64, activation='relu')(encoded)
    return models.Model(input_img, encoded)

# Define the decoder
def build_decoder():
    encoded_input = layers.Input(shape=(64,))
    decoded = layers.Dense(128, activation='relu')(encoded_input)
    decoded = layers.Dense(784, activation='sigmoid')(decoded)
    return models.Model(encoded_input, decoded)

# Build the full autoencoder
encoder = build_encoder()
decoder = build_decoder()

input_img = layers.Input(shape=(784,))
encoded_img = encoder(input_img)
decoded_img = decoder(encoded_img)

autoencoder = models.Model(input_img, decoded_img)

The encoder compresses the input image to a 64-dimensional latent space, while the decoder reconstructs the original 784-dimensional image. This compressed latent space is key to the autoencoder’s ability to learn meaningful representations of data.

Training the Autoencoder
Training the autoencoder involves minimizing the difference between the original input and the reconstructed output

In [12]:
# Compile and train the autoencoder
autoencoder.compile(optimizer='adam', loss='mse')


# Normalize images to [-1, 1] and flatten to (784,) for the discriminator input
X_train = (X_train.astype(np.float32) - 127.5) / 127.5  # Normalize to range [-1, 1]
X_train = X_train.reshape(-1, 784)  # Flatten 28x28 images to vectors of size 784

# Train the autoencoder
autoencoder.fit(X_train, X_train, epochs=50, batch_size=256, validation_data=(X_test, X_test))

2025-07-10 11:19:48.340160: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 188160000 exceeds 10% of free system memory.
2025-07-10 11:19:48.536656: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 188160000 exceeds 10% of free system memory.


Epoch 1/50

ValueError: in user code:

    File "/opt/conda/envs/ai-tf/lib/python3.10/site-packages/keras/engine/training.py", line 1727, in test_function  *
        return step_function(self, iterator)
    File "/opt/conda/envs/ai-tf/lib/python3.10/site-packages/keras/engine/training.py", line 1713, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/opt/conda/envs/ai-tf/lib/python3.10/site-packages/keras/engine/training.py", line 1701, in run_step  **
        outputs = model.test_step(data)
    File "/opt/conda/envs/ai-tf/lib/python3.10/site-packages/keras/engine/training.py", line 1665, in test_step
        y_pred = self(x, training=False)
    File "/opt/conda/envs/ai-tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/opt/conda/envs/ai-tf/lib/python3.10/site-packages/keras/engine/input_spec.py", line 295, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "model_2" is incompatible with the layer: expected shape=(None, 784), found shape=(None, 28, 28)


In this case, we use mean squared error (MSE) as the loss function since we want the output to be as close as possible to the original input. The model is trained for 50 epochs, and the performance is validated on a test set.

Conclusion
In this reading, we’ve implemented two powerful deep learning techniques—GANs and Autoencoders. GANs are used to generate new data, while Autoencoders help with data compression and reconstruction. Both of these models are critical in modern AI applications, from generating realistic images to reducing data dimensions. 

Mastering these models not only enhances your ability to handle complex data challenges but also opens up opportunities in fields like image synthesis, data augmentation, and anomaly detection.  

Take the next step by experimenting with different datasets and tweaking these architectures to see how they perform. Start by choosing a dataset you're familiar with—perhaps images from your industry or text data from your field. Implement a basic GAN or Autoencoder on this dataset. Can you generate new, realistic data points or effectively compress and reconstruct your data? Remember, every experiment, successful or not, is a step towards mastering these powerful techniques.

In [5]:
# Simple Autoencoder
input_img = layers.Input(shape=(784,))
encoded = layers.Dense(128, activation='relu')(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)

autoencoder = models.Model(input_img, decoded)

Conclusion
Deep learning techniques such as CNNs, RNNs, GANs, and autoencoders have revolutionized industries such as healthcare, finance, and entertainment. Each technique is suited to specific tasks, from image processing to sequential data analysis, and mastering these architectures is essential for solving complex problems with deep learning. These techniques form the backbone of today’s AI applications, and understanding when to use each model will help you become a more effective deep learning practitioner.