# Autoencoder with pytorch



Autoencoders are a type of neural network used for unsupervised learning of efficient codings. The goal of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. Unlike CNNs, which are trained to optimize for accuracy in classification tasks, autoencoders are designed to learn representations of the input data and then reconstruct the input data from these representations.

Here, I'll the use of an autoencoder for CIFAR10 image classification. The hypothesis is that by learning efficient representations of CIFAR10 images, the network can achieve better classification accuracy (better than MLP and CNN), especially when coupled with a classifier on top of the encoded representations.


#### Autoencoder

In [None]:
# Load required libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms

#### Image Preprocessing

Before feeding the CIFAR10 images into the neural network, they need to be properly preprocessed, which is achieved using torchvision.transforms, which are common image transformations for PyTorch models.

'transforms.Compose' is a method that bundles multiple transformations together.
 Here, two main transformations are applied:
   1. 'transforms.ToTensor()' converts PIL images or NumPy ndarrays into PyTorch tensors.
   2. 'transforms.Normalize()' normalizes the tensor image with mean and standard deviation.

 In this case, we normalize all three color channels (R, G, B) with mean 0.5 and std 0.5.

Additionally, a utility function 'imshow' is defined to visualize the images.
It takes a tensor image, unnormalizes it, converts it to a NumPy array, and then uses matplotlib to display the image




In [None]:
# image preprocessing
train_transform = transforms.Compose(
    [transforms.RandomHorizontalFlip(p = 0.5),
     transforms.RandomAffine(degrees=(-5, 5), translate=(0.1, 0.1), scale=(0.9, 1.1)),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)

test_transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

#### Define the autoencoder architecture

This section implements the Autoencoder architecture, a type of neural network used for unsupervised learning of efficient codings.

In [None]:
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, 3, stride=2, padding=1), # output: 16 x 16 x 16
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=1), # output: 32 x 8 x 8
            nn.ReLU(),
            nn.Conv2d(32, 64, 7) # output: 64 x 2 x 2
        )
        # Decoder
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 7),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 3, 3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid() # output: 3 x 32 x 32
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

In [None]:
# Initialize the autoencoder
autoencoder = Autoencoder()

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(autoencoder.parameters(), lr=0.001)

#### Data Preparation

In [None]:
batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:02<00:00, 80156055.00it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data


#### Training

In the training of the autoencoder model, conducted over 10 epochs, we focused on optimizing the reconstruction of input images. Each epoch involved processing the training data in batches, using the loss between the input and the output images to iteratively update the model's weights via backpropagation and optimizer steps. The model's progress was monitored by calculating and reporting the average loss per epoch, providing insights into the effectiveness of the reconstruction process. This continuous refinement culminated in the successful completion of the autoencoder's training

In [None]:
# Training loop
for epoch in range(10):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, _ = data
        optimizer.zero_grad()
        outputs = autoencoder(inputs)
        loss = criterion(outputs, inputs)  # Reconstruction loss
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss / len(trainloader):.3f}')

print('Finished Training Autoencoder')

Epoch 1, Loss: 0.262
Epoch 2, Loss: 0.243
Epoch 3, Loss: 0.242
Epoch 4, Loss: 0.241
Epoch 5, Loss: 0.240
Epoch 6, Loss: 0.239
Epoch 7, Loss: 0.239
Epoch 8, Loss: 0.239
Epoch 9, Loss: 0.238
Epoch 10, Loss: 0.238
Finished Training Autoencoder


#### Performance Testing

To evaluate the accuracy of the autoencoder on the CIFAR10 dataset, we need to measure how well the autoencoder reconstructs the input images. Since autoencoders are typically used for tasks like dimensionality reduction or feature learning rather than classification, traditional accuracy metrics (like those used in classification tasks) are not directly applicable. Instead, we'll use reconstruction error to measure performance. For this purpose, Mean Squared Error (MSE) is commonly used.

In [None]:
# Load the test dataset
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

# Switch the model to evaluation mode
autoencoder.eval()

Files already downloaded and verified


Autoencoder(
  (encoder): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(32, 64, kernel_size=(7, 7), stride=(1, 1))
  )
  (decoder): Sequential(
    (0): ConvTranspose2d(64, 32, kernel_size=(7, 7), stride=(1, 1))
    (1): ReLU()
    (2): ConvTranspose2d(32, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (3): ReLU()
    (4): ConvTranspose2d(16, 3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (5): Sigmoid()
  )
)

In [None]:
# Initialize the loss and class-wise loss tracking
total_loss = 0.0
class_losses = {i: 0.0 for i in range(10)}  # 10 classes in CIFAR10
class_counts = {i: 0 for i in range(10)}

# No gradient is needed for evaluation
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = autoencoder(images)
        loss = criterion(outputs, images)
        total_loss += loss.item() * images.size(0)

        # Compute class-wise loss
        for label in labels:
            class_losses[label.item()] += loss.item()
            class_counts[label.item()] += 1

# Calculate the average loss
avg_loss = total_loss / len(testset)
print(f'Overall Mean Squared Error: {avg_loss:.4f}')

# Print class-wise MSE
for i in range(10):
    avg_class_loss = class_losses[i] / class_counts[i]
    print(f'Class {i} Mean Squared Error: {avg_class_loss:.4f}')

Overall Mean Squared Error: 0.1439
Class 0 Mean Squared Error: 0.1300
Class 1 Mean Squared Error: 0.1524
Class 2 Mean Squared Error: 0.1394
Class 3 Mean Squared Error: 0.1495
Class 4 Mean Squared Error: 0.1434
Class 5 Mean Squared Error: 0.1469
Class 6 Mean Squared Error: 0.1532
Class 7 Mean Squared Error: 0.1438
Class 8 Mean Squared Error: 0.1355
Class 9 Mean Squared Error: 0.1451


### Conclusion

The training of the autoencoder model on the CIFAR dataset culminated in an overall mean squared error (MSE) of 0.1439, indicating a competent level of image reconstruction. The MSE varied across classes, with the lowest being 0.1300 for Class 0 and the highest at 0.1532 for Class 6, reflecting the model's varying proficiency in reconstructing different types of images. These results demonstrate the autoencoder's general effectiveness in capturing and reconstructing the dataset's diverse image features, while also suggesting the potential for further model optimization to achieve even lower reconstruction errors across all classes.