<a href="https://colab.research.google.com/github/MohammadMahdi1128/Store-Management-/blob/main/Autoencoder_FashionMNIST_CIFAR100.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autoencoder for Fashion MNIST and CIFAR-100

## Project Overview

In this project, you will implement an Autoencoder to explore and generate hybrid images by combining feature vectors from different classes. This notebook focuses on using the **Fashion MNIST** dataset, and as a challenge, you will later apply the same methodology to the **CIFAR-100** dataset.

**Tasks:**
1. Dataset Preparation and Filtering
2. Autoencoder Implementation
3. Training the Autoencoder
4. Class Feature Centroid Calculation
5. Average Image Creation
6. Hybrid Image Generation
7. CIFAR-100 Challenge Exercise

Final Goal. Create hybrid objects. E.g. First a hybrid between a sneaker and a t-shirt and later a hybrid between a car and a plane.

**Important**: At the end you should write a report of adequate size, which will probably mean at least half a page. In the report you should describe how you approached the task. You should describe:
- Encountered difficulties (due to the method, e.g. "not enough training samples to converge", not technical like "I could not install a package over pip")
- Steps taken to alleviate difficulties
- General description of what you did, explain how you understood the task and what you did to solve it in general language, no code.
- Potential limitations of your approach, what could be issues, how could this be hard on different data or with slightly different conditions
- If you have an idea how this could be extended in an interesting way, describe it.


## Step 1: Dataset Preparation

We will work with the **Fashion MNIST** dataset, which contains 10 classes of grayscale images representing items of clothing.

### Your Tasks:
1. Load the Fashion MNIST dataset using `torchvision.datasets`.
2. Apply necessary transformations, including:
   - Normalization to scale pixel values.
   - Resizing if needed.
3. Create training and validation DataLoaders for efficient data loading.

### Hints:
- Use `torchvision.transforms` for preprocessing.
- Normalize images to have mean `0.5` and standard deviation `0.5`.

Start by writing your code to load and preprocess the dataset.

## **Please note, you can use code from https://github.com/junaidaliop/pytorch-fashionMNIST-tutorial/blob/main/pytorch_fashion_mnist_tutorial.ipynb to make it easier to load the dataset. However, whenever you copy&paste code without modifications you need to write a comment where you copied that code from.**

In [None]:
!pip install torch torchvision



In [2]:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split

# Set device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Running on {device}.")

# 1. Define transformations: normalize pixel values to range [-1, 1]
transform = transforms.Compose([
    transforms.Resize((32, 32)),  # Resize images to 32x32
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # Normalize to mean=0.5, std=0.5
])

# 2. Load the Fashion MNIST dataset
train_dataset = torchvision.datasets.FashionMNIST(
    root='./data', train=True, download=True, transform=transform)

# Validation split: Use 10% of the training set as validation
val_size = int(0.1 * len(train_dataset))
train_size = len(train_dataset) - val_size

train_dataset, val_dataset = random_split(train_dataset, [train_size, val_size])

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=2)

# Define the class labels
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')

# Sanity check: Print dataset sizes
print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(val_dataset)}")


Running on cpu.
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26.4M/26.4M [00:01<00:00, 19.6MB/s]


Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29.5k/29.5k [00:00<00:00, 304kB/s]


Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4.42M/4.42M [00:00<00:00, 5.50MB/s]


Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5.15k/5.15k [00:00<00:00, 4.79MB/s]

Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

Training set size: 54000
Validation set size: 6000





## Step 2: Autoencoder Implementation

A rudimentary implementation of an Autoencoder is given here. You may need to modify it depending on your needs. That can mean adding more convolutional layers.

Note that you need to change the kernel size and stride potentially.

Depending on your input size the Adaptive Pooling may be used on a very large feature map which can reduce the performance. You will need to figure out the sizes of the input as you add more convolutional layers. One way is to remove outputs from the encoder, for example the AdaptiveAvgPool2d and then print the output shape of the encoder. After you can stop execution for example with "assert False".

In [3]:
# Cell 1: Import required libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

# Cell 2: Define the Autoencoder
class Autoencoder(nn.Module):
    def __init__(self, input_channels=1, latent_dim=128):
        """
        Autoencoder with dynamic adjustments for varying input image dimensions.

        Parameters:
        - input_channels: Number of input channels (e.g., 1 for grayscale, 3 for RGB).
        - latent_dim: Dimensionality of the latent representation.
        """
        super(Autoencoder, self).__init__()

        # Encoder: Dynamically adapts to reduce input to a fixed latent space
        self.encoder = nn.Sequential(
            nn.Conv2d(input_channels, 32, kernel_size=3, stride=2, padding=1),  # Halve dimensions
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),  # Halve dimensions again
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),  # Further downsample
            nn.ReLU(),
            nn.AdaptiveAvgPool2d((1, 1))  # Compress to a fixed-size latent space (1x1 feature map)
        )

        # Latent space representation
        self.latent = nn.Linear(128, latent_dim)  # Flatten into 1D latent space

        # Decoder: Dynamically expands the latent representation back to the input shape
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 128),  # Expand back to initial channel size
            nn.Unflatten(1, (128, 1, 1)),  # Reshape to (C, H, W) for convolutional operations
            nn.ConvTranspose2d(128, 128, kernel_size=4),  # Upsample
            nn.ReLU(),
            nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, output_padding=1),  # Double dimensions
            nn.ReLU(),
            nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1),  # Double dimensions again
            nn.ReLU(),
            nn.ConvTranspose2d(32, input_channels, kernel_size=3, stride=2, padding=1, output_padding=1),  # Final upsample
            nn.Tanh()  # Ensure output values are between 0 and 1
        )

    def forward(self, x):
        """
        Forward pass for the Autoencoder.

        - Encodes the input into a fixed-size latent representation.
        - Decodes the latent representation back to the input shape.
        """
        # Encode
        encoded = self.encoder(x)
        encoded = encoded.view(encoded.size(0), -1)  # Flatten to pass through linear layer
        latent = self.latent(encoded)  # Map to latent space

        # Decode
        decoded = self.decoder(latent)
        return decoded

# Cell 3: Initialize the Model
# Define hyperparameters for the model
input_channels = 1  # Example: Grayscale images
latent_dim = 128  # Latent space dimensionality
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Initialize the Autoencoder
model = Autoencoder(input_channels=input_channels, latent_dim=latent_dim).to(device)
print(model)

Autoencoder(
  (encoder): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (5): ReLU()
    (6): AdaptiveAvgPool2d(output_size=(1, 1))
  )
  (latent): Linear(in_features=128, out_features=128, bias=True)
  (decoder): Sequential(
    (0): Linear(in_features=128, out_features=128, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(128, 1, 1))
    (2): ConvTranspose2d(128, 128, kernel_size=(4, 4), stride=(1, 1))
    (3): ReLU()
    (4): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (5): ReLU()
    (6): ConvTranspose2d(64, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (7): ReLU()
    (8): ConvTranspose2d(32, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding

### Your Tasks:
1. Implement the encoder and decoder parts of the Autoencoder.
2. Ensure the model takes grayscale images (1 input channel) and outputs images of the same shape.

### Hints:
- Use `nn.Conv2d` and `nn.ConvTranspose2d`.
- Add non-linear activations like `ReLU` between layers.
- Use `Tanh` or `Sigmoid` for the final activation in the decoder.

Write your Autoencoder model below.


## Step 3: Training the Autoencoder

Train your Autoencoder to reconstruct images from the Fashion MNIST dataset.

### Your Tasks:
1. Define a suitable loss function (e.g., Mean Squared Error).
2. Set up an optimizer like Adam.
3. Write a training loop to:
   - Pass inputs through the Autoencoder.
   - Compute the reconstruction loss.
   - Backpropagate and update weights.

4. Visualize the reconstructed images periodically during training.

### Hints:
- Use GPU acceleration if available (`.cuda()`).
- Visualize outputs using `matplotlib`.

Write your training loop below.


In [4]:
import matplotlib.pyplot as plt
import numpy as np

# Define loss function and optimizer
criterion = nn.MSELoss()  # Mean Squared Error loss for reconstruction
optimizer = optim.Adam(model.parameters(), lr=0.004)  # Adam optimizer with a learning rate of 0.001

# Training settings
num_epochs = 25  # Number of epochs
train_losses = []  # Store training losses for visualization

# Helper function to visualize images
def visualize_reconstruction(model, data_loader, device, num_images=10):
    """
    Visualizes original and reconstructed images side by side.

    Parameters:
    - model: Trained autoencoder model.
    - data_loader: DataLoader containing the images.
    - device: Current device (CPU/GPU).
    - num_images: Number of images to visualize.
    """
    model.eval()  # Set model to evaluation mode
    with torch.no_grad():
        # Fetch a single batch
        data_iter = iter(data_loader)
        images, _ = next(data_iter)  # Ignore labels
        images = images[:num_images].to(device)

        # Get reconstructions
        reconstructed = model(images)

        # Move to CPU for visualization
        images = images.cpu().numpy()
        reconstructed = reconstructed.cpu().numpy()

        # Plot original and reconstructed images
        fig, axes = plt.subplots(2, num_images, figsize=(15, 4))
        for i in range(num_images):
            # Original images
            axes[0, i].imshow(images[i].squeeze(), cmap='gray')
            axes[0, i].axis('off')

            # Reconstructed images
            axes[1, i].imshow(reconstructed[i].squeeze(), cmap='gray')
            axes[1, i].axis('off')

        axes[0, 0].set_title("Original Images", fontsize=14)
        axes[1, 0].set_title("Reconstructed Images", fontsize=14)
        plt.show()

In [5]:
model.load_state_dict(torch.load("autoencoder_model.pth"))
optimizer.load_state_dict(torch.load("autoencoder_optimizer.pth"))


  model.load_state_dict(torch.load("autoencoder_model.pth"))


FileNotFoundError: [Errno 2] No such file or directory: 'autoencoder_model.pth'

In [6]:



# Track validation losses
val_losses = []

# Training loop
for epoch in range(num_epochs):
    model.train()  # Set model to training mode
    running_loss = 0.0  # Track total training loss for this epoch

    for batch_idx, (inputs, _) in enumerate(train_loader):
        # Move inputs to the appropriate device
        inputs = inputs.to(device)

        # Zero the gradient buffers
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Compute loss
        loss = criterion(outputs, inputs)  # Compare reconstructed images with original

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Accumulate loss
        running_loss += loss.item()

    # Calculate average training loss for the epoch
    avg_train_loss = running_loss / len(train_loader)
    train_losses.append(avg_train_loss)

    # Validation step
    model.eval()  # Set model to evaluation mode
    val_running_loss = 0.0

    with torch.no_grad():
        for inputs, _ in val_loader:
            inputs = inputs.to(device)

            # Forward pass only
            outputs = model(inputs)
            val_loss = criterion(outputs, inputs)
            val_running_loss += val_loss.item()

    # Calculate average validation loss for the epoch
    avg_val_loss = val_running_loss / len(val_loader)
    val_losses.append(avg_val_loss)

    # Print progress
    print(f"Epoch [{epoch + 1}/{num_epochs}], "
          f"Training Loss: {avg_train_loss:.4f}, Validation Loss: {avg_val_loss:.4f}")

    # Visualize reconstructed images periodically
    if (epoch + 1) % 2 == 0:
        print(f"Reconstruction after epoch {epoch + 1}:")
        visualize_reconstruction(model, val_loader, device)

# Plot training and validation loss over epochs
plt.figure(figsize=(8, 6))
plt.plot(range(1, num_epochs + 1), train_losses, label='Training Loss', marker='o')
plt.plot(range(1, num_epochs + 1), val_losses, label='Validation Loss', marker='x')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss Over Epochs')
plt.legend()
plt.show()


KeyboardInterrupt: 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
torch.save(model.state_dict(), "autoencoder_model.pth")
torch.save(optimizer.state_dict(), "autoencoder_optimizer.pth")



## Step 4: Latent Space Analysis

Once the Autoencoder is trained, explore the latent space.

### Your Tasks:
1. Pass a batch of images through the encoder and store the latent representations.
2. Compute the **centroids** (average latent vectors) for each Fashion MNIST class.
3. Visualize the latent space using dimensionality reduction techniques like PCA or t-SNE.

### Hints:
- Use `sklearn.decomposition.PCA` or `sklearn.manifold.TSNE` for visualization.
- Compute centroids by averaging latent vectors of images from the same class.

Write your code to analyze the latent space below.



## Step 5: Hybrid Image Generation

Using the latent space centroids, you can create hybrid images by interpolating between centroids of two classes.

### Your Tasks:
1. Select two class centroids (e.g., "T-shirt" and "Sneaker").
2. Linearly interpolate between the centroids with a parameter `alpha` in [0, 1].
3. Decode the interpolated latent representations back into image space.

### Hints:
- Use a simple linear interpolation formula: `(1 - alpha) * centroid1 + alpha * centroid2`.
- Visualize the hybrid images for different values of `alpha`.

Write your code to generate hybrid images below.



## Step 7: Challenge Exercise: Reimplement with CIFAR-100

Now that you've successfully implemented the Autoencoder for Fashion MNIST, your next challenge is to apply the same pipeline to the **CIFAR-100 dataset**. This dataset contains 100 classes of images, each with diverse objects, making it more challenging than Fashion MNIST.

### Your Tasks:
1. Preprocess the CIFAR-100 dataset, ensuring the images are appropriately normalized and resized if needed.
2. Redefine the Autoencoder architecture to accommodate CIFAR-100's RGB images (3 channels).
3. Train the Autoencoder on CIFAR-100 and visualize the reconstructed images.
4. Compute class centroids in the latent space for selected CIFAR-100 classes (choose a manageable subset, such as 10 classes).
5. Generate hybrid images by interpolating between class centroids in the latent space.
6. Visualize the latent space clustering for the selected CIFAR-100 classes using PCA or t-SNE.

This exercise will test your understanding of the Autoencoder pipeline and challenge you to work with a more complex dataset.
