# Autoencoders

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CAMM-UTK/acns-AI-tutorial/blob/main/Intro_Unsupervised_Learning/02_Autoencoders.ipynb)

#### Some setup necessary to run in a co-lab environment and get the data

In [None]:
%pip install zenodo_get
!zenodo_get --doi=10.5281/zenodo.12174462
!tar -xzf ./unsupervised_acns_ai_tutorial.tar.gz

%pip install git+https://github.com/agdelma/ml4s.git#egg=ml4s

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('./include')

import ml4s
ml4s.set_css_style('./include/bootstrap.css')

%config InlineBackend.figure_format = 'svg'
plt.style.use('./include/notebook.mplstyle')
np.set_printoptions(linewidth=120)
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

## Previously

- Principal Component Analysis
- Identifying the low-dimensional latent space which maximally explains the *variance* of the data
- Implementing PCA by hand and with `sklearn`

## Now

- Connection between PCA and autoencoders, a compressive deep neural network architecture
- Application of PCA for clustering

In [None]:
ml4s.draw_network([12,8,4,2,4,8,12],annotate=False )

## Principal Component Analysis (PCA)

Recall that for a given set of unlabelled data $\{ \boldsymbol{x}^{(n)} \}_{n=1}^{N}$ our goal is to project the data onto a latent space having dimensionality $M < D$.  We did this by performing a spectral decomposition of the covariance matrix

\begin{equation}
\Sigma(\mathbf{X}) = \frac{1}{N-1} \mathbf{X}^{\top}\mathbf{X}
\end{equation}

where $\mathbf{X}$ is the  data design matrix: 

\begin{equation}
\mathbf{X} = \left( \begin{array}{cccc}
        x_{1}^{(1)} & x_{2}^{(1)} & \cdots & x_{D}^{(1)} \\
\vdots        &      \vdots    & \ddots & \vdots \\
        x_{1}^{(N)} & x_{2}^{(N)} & \cdots & x_{D}^{(N)} \\
\end{array}
\right)\, .
\end{equation}

We determined:

\begin{equation}
\boldsymbol{V}^\top \Sigma(\mathbf{X}) \boldsymbol{V} = \Lambda
\end{equation}

where $\Lambda_{ij} = \lambda_i \delta_{ij}$ is the diagonal matrix of principle components and the PCA vectors are encoded as the columns of the orthogonal matrix $\boldsymbol{V}$.

Also recall the *percentage of the explained variance* defined:

\begin{equation}
\text{PCA-j} = \frac{\lambda_j}{\sum_{j=1}^{D} \lambda_j}
\end{equation}

and the projector:

\begin{equation}
\boldsymbol{P} = \sum_{j=1}^M\boldsymbol{v}_j\boldsymbol{v}_j^\mathsf{T}\, .
\end{equation}


## Neural Networks and Linear Autoencoders

There is a very nice way to interpret PCA as a type of *linear autoencoder* whereby one trains a neural network with a hidden layer (with linear activation) that acts as an **information bottleneck.**  We want to minimize the least squred error between input and output. The network calculates:

\begin{equation}
\boldsymbol{P} \mathbf{x}^{(n)}
\end{equation}

for each $\mathbf{x}_n$ and we minimize the cost:

\begin{equation}
\mathcal{C} = \left \langle \mathbf{x}^{(n) \top}  \mathbf{x}^{(n)} - \mathbf{x}^{(n)\top} \boldsymbol{P}\mathbf{x}^{(n)} \right \rangle  = \frac{1}{N} \sum_{n=1}^{N}\left( \mathbf{x}^{(n) \top} \mathbf{x}^{(n)} - \mathbf{x}^{(n) \top} \boldsymbol{P}\mathbf{x}^{(n)} \right) \, .
\end{equation}

To obtain the 1st princpal component for our example above we consider the linear autoencoder.

In [None]:
ml4s.draw_network([2,1,2])

In [None]:
import torch
import torch.nn as nn

### Setup a GPU device (if available)

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f'You are running on: {device}')

### Create a simple linear autoencoder in `pytorch`

In [None]:
class LinearAutoencoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(LinearAutoencoder, self).__init__()
        self.encoder = nn.Linear(input_dim, hidden_dim)
        self.decoder = nn.Linear(hidden_dim, input_dim)
        self.hidden = None

    def forward(self, x):
        self.hidden = self.encoder(x)
        x = self.decoder(self.hidden)
        return x

### Set some hyperparameters

In [None]:
input_dim = 2
latent_dim = 1
learning_rate = 0.001
num_epochs = 100
batch_size = 32

### Load the data set

In [None]:
x = np.loadtxt('./data/scatter_2d_pca.dat').astype(np.float32)
dataset = torch.tensor(x).to(device)

# DataLoader
train_loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True)

### Setup the model

In [None]:
# Model Initialization
model = LinearAutoencoder(input_dim, latent_dim).to(device)

# Validation using MSE Loss function
criterion = nn.MSELoss()

# Using an Adam Optimizer
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)

### Model Training

In [None]:
losses = []
for epoch in range(num_epochs):
    for batch in train_loader:
        # Forward pass
        outputs = model(batch)
        loss = criterion(outputs, batch)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    losses.append(loss.item())
    # Print loss every 10 epochs
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print('Training complete')

In [None]:
# Evaluate the model
with torch.no_grad():
    outputs = model(dataset)
    final_loss = criterion(outputs, dataset)
    
    plt.plot(losses,color=colors[0], linestyle='-',label=f'cost = {final_loss.item():.2f}')
    plt.ylabel("Cost")
    plt.xlabel("Epoch")
    plt.legend()

### Extract the trained weights and biases from the model

The weights are in the format [`decoder_weight, decoder_bias, encoder_weight, encoder_bias]`


In [None]:
weights = [param.detach().numpy() for param in model.parameters()]
ml4s.draw_network([input_dim,latent_dim,input_dim], weights=[weights[0],weights[2]], 
                           biases=[weights[1],weights[3]])

### Plot the raw data with the learned decoder weight

In [None]:
plt.scatter(x[:,0], x[:,1], s=1, alpha=0.5, label='data')
_x = np.linspace(-4, 4, 100)

decoder_weight = weights[2]
plt.plot(_x, decoder_weight[1, 0] / decoder_weight[0, 0] * _x, '-', color=colors[0], label=r'$\mathbf{w}_1$')

plt.axis('equal')
plt.xticks([])
plt.yticks([]);
plt.legend()

### Can you get more principal components with this strategy? 

Unlike the eigenvector problem we solved for PCA, the issue is that there is no guarentee the components will be orthognoal.  See:

[E. Plaut, From Principal Subspaces to Principal Components with Linear Autoencoders, arXiv:1804.10253 (2018)](https://arxiv.org/abs/1804.10253)

for a discussion of how you can re-orthogonalize via a singular value decomposition of the weight matrix.

### Let's see what happens

In [None]:
# Model Initialization
latent_dim = 2
model = LinearAutoencoder(input_dim, latent_dim).to(device)

# Validation using MSE Loss function
criterion = nn.MSELoss()

# Using an Adam Optimizer with lr = 0.1
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)

In [None]:
losses = []
for epoch in range(num_epochs):
    for batch in train_loader:
        # Forward pass
        outputs = model(batch)
        loss = criterion(outputs, batch)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    losses.append(loss.item())
    # Print loss every 10 epochs
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print('Training complete')

In [None]:
# Evaluate the model
with torch.no_grad():
    outputs = model(dataset)
    final_loss = criterion(outputs, dataset)

plt.plot(losses,color=colors[0], linestyle='-',label=f'cost = {final_loss.item():.2f}')
plt.ylabel("Cost")
plt.xlabel("Epoch")
plt.legend()

### Plot the final weights

In [None]:
weights = [param.detach().numpy() for param in model.parameters()]
decoder_weight = weights[2]

ml4s.draw_network([input_dim,latent_dim,input_dim], weights=[weights[0],weights[2]], 
                           biases=[weights[1],weights[3]])

### Plot the data with the learned (non-orghgonal) latent space

In [None]:
fig,ax = plt.subplots()
ax.scatter(x[:,0],x[:,1], s=1, alpha=0.5,label='data')
_x = np.linspace(-1,1,100)

ax.plot(_x,decoder_weight[1, 0] / decoder_weight[0, 0] * _x, '-', color=colors[0], label=r'$\mathbf{w}_1$')
ax.plot(_x,decoder_weight[1, 1] / decoder_weight[0, 1] * _x, '-', color=colors[-2], label=r'$\mathbf{w}_2$')

ax.axis('equal')
ax.set_xticks([])
ax.set_yticks([])
ax.legend()

<hr>
<div class="span alert alert-success">
<h2>Use Case: Detecting a Phase Transition in the 2D Ising Model </h2>

In a recent <a href="https://journals.aps.org/pre/abstract/10.1103/PhysRevE.107.054104" title="Group-equivariant autoencoder for identifying spontaneously broken symmetries">paper</a> we used a group-equivariant autoencoder for identifying a phase transition. In this exercise you will confirm our analysis.
    
1. Load Ising model configurations from disk and investigate the configurations.
2. Define a non-linear autoencoder model and train.
3. Investigate the properties of the learned latent space representation.
4. Use known labels to confirm that we identify a ferromagnetic phase transition.
</div>

## Load the data from disk

We have a $2000$ Ising model configurations for a $80 \times 80$ square lattice at various temperatures obtained via Monte Carlo sampling.  Without the temperature information we can think of this as unlabeled data.

In [None]:
L = 80
configs = np.loadtxt(f'./data/Ising2D_config_L{L}.dat.gz')
temps = np.loadtxt(f'./data/Ising2D_temps_L{L}.dat')

### Plot a random configuration

In [None]:
fig,ax = plt.subplots(ncols=1,nrows=1,figsize=(4,4))

idx = np.random.randint(low=0,high=configs.shape[0]+1)
img = ax.matshow(configs[idx].reshape(L,L), cmap='binary')
ax.set_xticks([])
ax.set_yticks([])

ax.set_title(f'$T = {temps[idx]:.1f}J$');

### Define a non-linear autoencoder 

In [None]:
class Autoencoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, latent_dim):
        super(Autoencoder, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.LeakyReLU(),
            nn.Linear(hidden_dim,latent_dim),
            nn.LeakyReLU()
        )
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim,hidden_dim),
            nn.LeakyReLU(),
            nn.Linear(hidden_dim,input_dim)
        )
        self.hidden = None

    def forward(self, x):
        self.hidden = self.encoder(x)
        x = self.decoder(self.hidden)
        return x

### Hyperparameters

In [None]:
input_dim = configs.shape[1]
hidden_dim = 32
latent_dim = 2
learning_rate = 1e-5
num_epochs = 200
batch_size = 32

#### Create the dataloader

In [None]:
dataset = torch.tensor(configs.astype(np.float32)).to(device)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

In [None]:
# Model Initialization
model = Autoencoder(input_dim,hidden_dim,latent_dim).to(device)

# Validation using MSE Loss function
criterion = nn.MSELoss()

# Using an Adam Optimizer
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate,weight_decay = 1e-8)

In [None]:
losses = []
for epoch in range(num_epochs):
    for batch in train_loader:
        # Forward pass
        outputs = model(batch)
        loss = criterion(outputs, batch)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    losses.append(loss.item())
    # Print loss every 10 epochs
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print('Training complete')

### Investigate the training history

In [None]:
# Evaluate the model
with torch.no_grad():
    outputs = model(dataset)
    final_loss = criterion(outputs, dataset)
    
plt.plot(losses,color=colors[0], linestyle='-',label=f'cost = {final_loss.item():.2f}')
plt.ylabel("Cost")
plt.xlabel("Epoch")
plt.legend()

### Plot the latent-space

In [None]:
with torch.no_grad():
    latent_x = model.hidden.cpu().numpy()
    
fig,ax = plt.subplots(1)
ax.scatter(latent_x[:,0],latent_x[:,1], s=5)
ax.set_ylabel(r'$latent-2$')
ax.set_xlabel(r'$latent-1$')

### We actually have labels!

We know the temperatures where each configuration was measured, so we can add temperature labels to the dat to see what the latent space is doing.

In [None]:
fig,ax = plt.subplots(1)
sc = ax.scatter(latent_x[:,0],latent_x[:,1], s=5, c=temps, cmap='Spectral_r')
ax.set_ylabel(r'$latent-2$')
ax.set_xlabel(r'$latent-1$')

fig.colorbar(sc, ax=ax, label='Temperature / J')

### Investigate what we have learned

In [None]:
fig,ax = plt.subplots(1)
ax.plot(temps,latent_x[:,0], 'o', ms=1, alpha=0.1, label='Raw Autoencoder')
ax.set_xlabel('Temperature $T/J$')
ax.set_ylabel('Magnetization')
ax.legend()

<div class="span alert alert-info" role="alert">
    <h3>We have discovered something that looks like the magnetization!</h3>
</div>

However, note, there is no *physics* in our autoencoder.  We have not input the fact that the magnetization needs to be normalized to the 1. Let's do this and compare with the exact magnetization:

\begin{equation}
m=\left[1-\left(\sinh \frac{2J}{k_{\rm B}T}\right)^{-4}\right]^{\frac {1}{8}}
\end{equation}

In [None]:
def magnetization_exact_(T):
    '''We use units where J/k_B = 1.'''
    Tc = 2.0/np.log(1.0+np.sqrt(2.0))
    if T < Tc:
        return (1.0 - np.sinh(2.0/T)**(-4))**(1.0/8)
    else:
        return 0.0
magnetization_exact = np.vectorize(magnetization_exact_)

### Normalize the raw latent space values

In [None]:
latent_m = latent_x[:,0]
m = np.zeros_like(latent_m)

# find the maximum value and normalize
idx_pos = np.where(latent_m>0)
m[idx_pos] = latent_m[idx_pos]/np.max(latent_m)

# find the minimum value and normalize
idx_neg = np.where(latent_m<0)
m[idx_neg] = latent_m[idx_neg]/np.min(latent_m)

### Plot and Compare

In [None]:
# plot and compare
fig,ax = plt.subplots(1)

ax.plot(temps,m, 'o', ms=2, alpha=0.1, label='Norm. Autoencoder')
_T = np.linspace(1,3,1000)
ax.plot(_T,magnetization_exact(_T), color=colors[0], zorder=-1, lw=1, label='Exact')

ax.set_xlabel('Temperature $T/J$')
ax.set_ylabel('Magnetization')

ax.legend()

<div class="span alert alert-info" role="alert">
    <h3>It Works!</h3>
</div>