# **Unsupervised Learning in Keras**

**What is Unsupervised Learning?**

A **machine learning technique** in which the **algorithm analyzes data** to identify **hidden structures and patterns**, **without** using **predetermined labels** or **output variables**.

**Unlike supervised learning**—where the **goal is to predict a well-defined outcome—in unsupervised learning**, the main goal is to **understand the intrinsic structure of the data**, discovering spontaneous correlations and groupings.

The **three main categories of Unsupervised Learning**

1. **Clustering**
It consists of **grouping similar data into clusters**, such that the data points in the **same group are more similar to each other** than those in other groups.

2. **Association**
It consists of **identifying relationships between variables** within **large datasets**.
Commonly used in market analysis to discover products that are frequently bought together (e.g., bread and milk).

3. **Dimensionality Reduction**
It reduces the **number of variables required to describe the data**, while retaining most of the original information.
Useful **when datasets have a very large number of variables** that can complicate and slow down the analysis.

## **Autoencoder**
A special **type of artificial neural network** used predominantly to **learn efficient representations of data**, reducing dimensionality or **extracting relevant features**.

**Structure of an Autoencoder:**

- **Encoder**: compresses **(encodes)** the **input data** into a **reduced representation (latent space)**.
- **Bottleneck**: Compressed Representation
- **Decoder**: reconstructs the original data from the compressed representation.

### **The autoencoder** is trained by minimizing the **difference between the original input and its generated reconstruction**. This process **forces the model to capture the most significant features** of the data, eliminating redundant or noisy ones.

## **Generative Adversarial Networks (GANs)**
**GANs** are **neural networks** introduced by Ian Goodfellow in 2014. They **use two neural networks that compete with each other**, creating a competitive **"game"**.

- **Generator**:

**Produces new synthetic data**, trying to **mimic the real data** from the training dataset as closely as possible.

- **Discriminator**:

**Evaluates** whether the provided **data (real or synthetic) is authentic or not**.

The **generator** continually **tries to "trick" the discriminator**, producing increasingly realistic data.
The **discriminator**, on the other hand, **tries to get better at distinguishing between real data and artificially** generated data.

## **[Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114)**

How can we perform efficient **inference** in direct probabilistic models with continuous latent variables when the posterior distribution is intractable and the datasets are large?

- **Inference**: The process of using a trained model to **make predictions** on new, unseen data.
- **Latent Variables**: which are **variables that are not directly observable** but are **deduced through mathematical models** from observable variables.

### **Fundamental Concepts**

Here is a **detailed and clearly explained summary** of the paper **"Auto-Encoding Variational Bayes" (Kingma & Welling, 2013-2022)** with the fundamental concepts and key formulas.
## What does a **VAE** (Variational AutoEncoder) do, in practice?

### 1. **Encoder**
- It takes an input data (e.g. an image)
- Instead of returning a single point in the compressed space (like a normal autoencoder),
**it returns a distribution**:
a **mean** $( \mu )$ and a **standard deviation** $( \sigma )$

---

### 2. **Latent Space**
- There is no single "z", but it **picks a random point** $$( z )$$ from the distribution $$( \mathcal{N}(\mu, \sigma^2) )$$
- This latent space is a **probabilistic representation** of the data:
it encodes its main characteristics but also leaves room for uncertainty.

---

### 3. **Reparameterization Trick**
- The trick is to write:
$$[
z = \mu + \sigma \cdot \epsilon \quad \text{con} \quad \epsilon \sim \mathcal{N}(0, 1)
]$$
- So you can do **backpropagation** even if you have a random component.
(Otherwise, the network would not be trainable with gradients.)

---

### 4. **Decoder**
- Takes the point $( z )$ from the latent space
- Tries to **reconstruct the original input**
- The goal is that the result resembles the input as much as possible

---

### 5. **Loss Function = Reconstruction + Regularization**

$$[
\text{Loss} = \underbrace{\text{Reconstruction error}}_{\text{how well I recreate x}} + \underbrace{D_{KL}(q(z|x) \parallel p(z))}_{\text{how "normal" the encoding is}}
]$$

- The **first part** (reconstruction) measures how well the network has regenerated the input.
- The **second part** (KL divergence) forces the latent space to resemble a **standard normal** $( \mathcal{N}(0, I) )$, so everything stays regular and continuous.

---

## So, in a dummies summary:

> A **VAE** is like a **smart compressor**:
> - it takes your data,
> - it summarizes it in a **cloud of possibilities** (a distribution),
> - from there it extracts a point,
> - and **tries to reconstruct** what it had at the beginning.
>
> The trick is that this probabilistic encoding **can also be used to generate new realistic data**.
> And thanks to a "math trick" (reparameterization trick), we can train it like a normal neural network.


## **“[Generative Adversarial Nets” by Ian Goodfellow (2014), where GANs were introduced.](https://arxiv.org/pdf/1406.2661)**

## WHAT ARE GANs?

A **GAN (Generative Adversarial Network)** is like a **two-player game** between two neural networks:

- **Discriminator (D)**: tries to figure out if an image is **real** (taken from real data) or **fake** (generated).
- **Generator (G)**: tries to **trick** the discriminator, creating fake images so realistic that they look real.

Imagine G as a counterfeiter drawing money, and D as a policeman trying to catch them.

---

## HOW IT WORKS (Key Steps)

1. G takes a **random vector** (e.g. noise $( z \sim p(z) )$)
and transforms it into a **fake** image $( G(z) )$

2. D receives **both real and fake images**
and tries to figure out if they are real or generated.

3. The system is **trained together**:

- D learns to **recognize the fakes**.
- G learns to **trick D**.

---

## BASIC FORMULA: The Minimax Game

The goal is:

$$[
\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))]
]$$

### What this means:
- The discriminator **maximizes** the probability of **guessing** the labels.
- The generator **minimizes** the probability that D recognizes its "fakes".

In theory, the game ends when D **cannot distinguish** true from false anymore → $( D(x) = 0.5 )$

---

## A TRICK: Alternative Objective for G

At the beginning, G is terrible, so D always beats it → the gradient for G is too weak.

So, instead of minimizing:

$[
\log(1 - D(G(z)))
]$

the generator can directly **maximize**:

$[
\log(D(G(z)))
]$

It's the same in the end, but G gets stronger signals to learn.

---

## HOW DO YOU TRAIN GANs? (Basic Algorithm)

1. Take a **minibatch** of real data $( x )$ and a minibatch of **random noise** $( z )$.
2. Train D to **distinguish** (maximize the probability of saying "true" for the reals and "false" for the generated ones).
3. Then update G to **trick D**.

Repeat alternating **k steps for D** and **1 step for G**.

---

## WHAT HAPPENS WHEN IT WORKS?

- The generator **learns the distribution of the real data** → $( p_g = p_{\text{data}} )$
- The discriminator **cannot distinguish anymore**, it always gives $( D(x) = 0.5 )$
- The system is in equilibrium.

---

## OPTIMAL EQUILIBRIUM FORMULA

When the GAN is perfectly trained:

$[
D^*(x) = \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)}
]$

and the function to minimize for G is related to the **Jensen–Shannon divergence** between $( p_{\text{data}} )$ and $( p_g )$.
It is minimized when the two become **equal**.

---

## VISUAL EXAMPLES (experiments)

They tested GAN on:

- **MNIST** (handwritten numbers)
- **TFD** (human faces)
- **CIFAR-10** (complex images)

The generator created new images, **without ever seeing real examples**, only with noise as input.

---

## GAN PROS AND CONS

### Pros:
- **No complicated inference**, just backpropagation.
- No **Markov chains**, so easier and faster training.
- Can generate **ultra-realistic data**.

### Cons:
- G and D must be **synchronized** well → otherwise one "wins" too much and the GAN crashes.
- Generator can fall into **collapse mode**: always produces the same type of data.

---

## IMPORTANT EXTENSIONS

1. **Conditional GAN ​​(cGAN)**: adds labels as input to control the output.
2. **Inference nets**: you can train a network to predict $( z )$ from $( x )$, to map data into latent space.
3. **Z-space interpolation** → creates smooth transitions between images.
4. **Semi-supervised learning** → uses the discriminator as a feature extractor.

---

## In short, for dummies:

> **GANs** are a **game between two neural networks**:
> one creates fake data, the other tries to unmask it.
> The forger gets better until his creations look real,
> the policeman gets better at distinguishing them.
> Eventually, the forger gets so good that the policeman **can no longer tell** what is real and what is not.


## **Creating an AUTOENCODER IN KERAS**

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model 

In [2]:
# Define the Encoder
input_layer = Input(shape=(784, ))
encoded = Dense(64, activation="relu")(input_layer)
# Define the Encoder
decoded = Dense(784, activation="sigmoid")(encoded)
# Combine the Encoder and Decoder into an AutoEncoder Model
model = Model(input_layer, decoded)
# Compile the model
model.compile(optimizer="adam", loss="binary_crossentropy")
# Summary of the Model
model.summary()

In [3]:
# Load the MNIST data
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

# Normalize data
x_train = x_train.astype("float32")/255
x_test = x_test.astype("float32")/255
x_train = x_train.reshape((len(x_train), 784))
x_test = x_test.reshape((len(x_test), 784))

In [4]:
# Train the autoencoder
model.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test))

Epoch 1/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - loss: 0.3480 - val_loss: 0.1630
Epoch 2/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.1541 - val_loss: 0.1279
Epoch 3/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.1239 - val_loss: 0.1090
Epoch 4/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.1071 - val_loss: 0.0979
Epoch 5/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.0970 - val_loss: 0.0907
Epoch 6/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.0904 - val_loss: 0.0857
Epoch 7/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.0855 - val_loss: 0.0818
Epoch 8/50
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 0.0818 - val_loss: 0.0792
Epoch 9/50
[1m235/235[0m [32m━━━━━━━━

<keras.src.callbacks.history.History at 0x1ee60d2c430>

## **Creating an GAN IN KERAS**

In [5]:
from tensorflow.keras.layers import LeakyReLU
import numpy as np

In [6]:
# Define the generator model
def build_generator():
    model = tf.keras.Sequential()
    model.add(Dense(128, input_dim = 100))
    model.add(LeakyReLU(alpha=0.01))
    model.add(Dense(784, activation="tanh"))
    return model

# Define the discriminator model
def build_discriminator():
    model = tf.keras.Sequential()
    model.add(Dense(128, input_shape =(784, ))) 
    model.add(LeakyReLU(alpha=0.01))
    model.add(Dense(1, activation="sigmoid"))
    return model

In [10]:
def train_gan(gan, generator, discriminator, x_train, epochs=400, batch_size=128):
    # Loop Through Epochs
    for epoch in range(epochs):
        # Generate random noise as input for the generator
        noise = np.random.normal(0, 1, (batch_size, 100))
        generated_images = generator.predict(noise)

        # Generate random set of real images
        idx = np.random.randint(0, x_train.shape[0], batch_size)
        real_images = x_train[idx]

        # Labels for real and fake images
        real_labels = np.ones((batch_size, 1))
        fake_labels = np.zeros((batch_size, 1))

        # Train the discriminator
        d_loss_real = discriminator.train_on_batch(real_images, real_labels)
        d_loss_fake = discriminator.train_on_batch(generated_images, fake_labels)
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

        # Train the generator
        noise = np.random.normal(0, 1, (batch_size, 100))
        g_loss = gan.train_on_batch(noise, real_labels)

        # Print the progress
        if epoch % 10 == 0:
            print(f"{epoch} [D loss: {d_loss[0]}, acc.: {100*d_loss[1]}%] [G loss: {g_loss}]")

        return d_loss, g_loss
        