<a href="https://colab.research.google.com/github/ananya2108/Deep-Learning-and-Data-Analytics-Lab-2025/blob/main/24MCS121_Experiment_9_Autoencoder_and_GAN_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Experiment 9: Autoencoder and GAN implementation**

## Abstract

In medical imaging, noise and limited dataset sizes can hinder diagnostic performance. In this work, we propose a two-pronged deep learning framework: an advanced denoising autoencoder for removing noise from medical images, and a Generative Adversarial Network (GAN) to synthesize realistic images for data augmentation. The autoencoder leverages a multi-layer convolutional architecture with batch normalization and pooling to reconstruct clean images from noisy inputs. Concurrently, the GAN is designed to generate synthetic images that closely mimic real medical images, thus expanding the training data. Experimental results, including quantitative metrics such as Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM), as well as qualitative visualizations, demonstrate the effectiveness of the proposed methods. The framework shows promise for improving downstream tasks in medical image analysis.


## 1. Introduction
Medical imaging is central to modern diagnostic processes, yet the quality of these images can be compromised by noise from various sources, such as sensor limitations and low-dose imaging protocols. Furthermore, the scarcity of annotated medical images limits the training of robust diagnostic models. To address these issues, this work integrates two deep learning techniques:
- A **denoising autoencoder** designed to remove noise from images by learning a compact latent representation.
- A **Generative Adversarial Network (GAN)** for synthesizing realistic images to augment the available dataset.
This integrated approach not only improves image quality but also enriches the training data, potentially leading to enhanced performance in clinical applications.


## 2. Methodology





### 2.1 Denoising Autoencoder

The denoising autoencoder is designed to learn a compact latent representation of the medical images and reconstruct a noise-free version from a noisy input. The architecture comprises an encoder and a decoder:

- **Encoder:**  
  The encoder uses multiple convolutional layers with ReLU activation, batch normalization, and max pooling to gradually reduce the spatial dimensions of the input image. This helps the network capture the underlying structure and essential features of the image while discarding high-frequency noise.

- **Decoder:**  
  The decoder mirrors the encoder with upsampling layers and convolutional layers to reconstruct the image from the latent space. A sigmoid activation in the final layer ensures that the output pixel values are in the [0,1] range, matching the normalized input.

The model is trained to minimize the Mean Squared Error (MSE) between the reconstructed image and the original clean image. Additional metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) are used to quantitatively assess the quality of reconstruction.


### 2.2 Generative Adversarial Network (GAN) for Data Augmentation

The GAN consists of two adversarial components:

- **Generator:**  
  The generator transforms a random noise vector into a synthetic medical image. It uses a series of transposed convolutional layers (Conv2DTranspose) along with batch normalization and LeakyReLU activations to upscale the noise into a full-resolution image. The output is produced with a tanh activation to yield pixel values in the range [-1,1].

- **Discriminator:**  
  The discriminator is a convolutional neural network that classifies images as real or fake. It employs convolutional layers with LeakyReLU activations and dropout for regularization. The discriminator outputs a single logit indicating the authenticity of the input image.

The adversarial training setup pits the generator against the discriminator in a min-max game. The generator is optimized to produce images that can fool the discriminator, while the discriminator is trained to accurately distinguish between real and generated images. Label smoothing and a balanced update schedule (with multiple discriminator updates per generator update) are applied to stabilize training.



### 2.3 Experimental Setup

For our experiments, we used three publicly available Kaggle datasets:
- **Lung & Colon Cancer Histopathological Images:** Provided in the directory `/kaggle/input/lung-and-colon-cancer-histopathological-images/lung_colon_image_set/colon_image_sets`.
- **Brain Tumor MRI Dataset:** Loaded from `/kaggle/input/brain-tumor-mri-dataset/Training`.
- **Chest X-ray Pneumonia Dataset:** Extracted from `/kaggle/input/chest-xray-pneumonia/chest_xray/chest_xray/train`.

**Data Preprocessing:**  
Each dataset is resized to 128×128 pixels and normalized to the [0,1] range. For the autoencoder, Gaussian noise is added on-the-fly to create noisy input images, while the clean images serve as reconstruction targets.

**Training Procedure:**  
- **Autoencoder:**  
  The advanced denoising autoencoder is trained using a Mean Squared Error (MSE) loss. An EarlyStopping callback is employed to prevent overfitting. The model is trained for up to 50 epochs with a batch size of 32.
  
- **GAN:**  
  The GAN is trained using binary crossentropy loss with label smoothing applied to the discriminator's real labels. The training loop includes two discriminator updates for every generator update to maintain balance. A fixed noise vector is used to generate sample images at the end of each epoch for qualitative evaluation.

**Hardware and Software:**  
Experiments were conducted in a Kaggle Notebook environment with a GPU (e.g., NVIDIA T4). TensorFlow was used as the deep learning framework, leveraging its high-level Keras API for model development and training.

**Evaluation Metrics:**  
- **Mean Squared Error (MSE):** To quantify the reconstruction error of the autoencoder.
- **Peak Signal-to-Noise Ratio (PSNR):** To measure the quality of denoised images.
- **Structural Similarity Index Measure (SSIM):** To assess the structural similarity between the reconstructed and original images.
- **Visual Inspection:** Generated samples from both the autoencoder and GAN are visually inspected for qualitative assessment.

This comprehensive experimental setup ensures that the models are trained and evaluated rigorously, providing insights into both the quantitative performance and the qualitative visual quality of the denoising and augmentation processes.

## 3. Code Implementation

### 3.1 Auto encoders for image denoising on medical image  datasets :

In [1]:
# Enable GPU memory growth (must run first, before any TF/Keras imports)
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        # Set memory growth to avoid allocating all GPU memory at once
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("Memory growth enabled for GPUs")
    except RuntimeError as e:
        print("Error setting memory growth:", e)


Memory growth enabled for GPUs


In [None]:
# [Cell 2] Install required package kagglehub (if not installed already)
!pip install kagglehub


In [None]:
# Import necessary libraries (all comments are inline)
import numpy as np  # For numerical operations
import tensorflow as tf  # TensorFlow for model building
from tensorflow.keras import layers, models, losses, optimizers, callbacks  # Keras modules
import matplotlib.pyplot as plt  # For plotting graphs and images
import math  # For computing PSNR
import os  # For file path operations
import kagglehub  # For downloading Kaggle datasets
import time

print("TensorFlow version:", tf.__version__)
