In [None]:
!git clone 'https://github.com/MSaidKartal/denoise-pix2pix.git' && cd ../
!pip install -r denoise-pix2pix/requirements.txt

### Preamble to the Hands-On Workshop on GANs for Prostate MRI Denoising

#### Introduction
Welcome to our hands-on workshop dedicated to exploring the innovative application of Generative Adversarial Networks (GANs) in the field of medical imaging, particularly focusing on denoising prostate MRI scans. This workshop is designed for radiologists and researchers interested in the intersection of artificial intelligence and medical imaging. Our goal is to provide a comprehensive understanding of how GANs can be leveraged to enhance the clarity and utility of prostate MRI scans, a crucial tool in the diagnosis and management of prostate cancer.

#### Workshop Overview
Throughout this workshop, participants will engage in a series of practical exercises and discussions centered around the implementation and optimization of GANs for medical imaging tasks. We will delve into the fundamentals of GANs, their architecture, and the nuances of their training process, tailored specifically for the challenges presented in prostate MRI scans.

#### Objectives
1. **Understanding GAN Architecture**: Gain insights into the components of GANs – the discriminator and generator – and how they interact in the adversarial training process.
2. **Practical Application**: Apply GANs to real-world prostate MRI data, learning how to preprocess data, train models, and evaluate their performance using key metrics like PSNR, SSIM, and MAE.
3. **Denoising Techniques**: Explore advanced denoising techniques using GANs to improve the quality of MRI scans, enhancing their diagnostic value.
4. **Critical Analysis**: Critically analyze the outcomes, discussing the strengths, limitations, and potential improvements in applying GANs to medical imaging.

#### Data and Tools
Participants will work with a subset of data from the PI-CAI (Prostate Imaging: Cancer AI) Grand Challenge, offering a rich dataset for hands-on experience. The workshop will primarily utilize Python and deep learning frameworks to implement and test GAN models.

#### Target Audience
This workshop is tailored for radiologists, AI researchers, and anyone with an interest in the application of deep learning to medical imaging. A basic understanding of deep learning concepts and familiarity with Python programming is recommended to fully benefit from this workshop.

#### Conclusion
By the end of this workshop, participants will have a clear understanding of how GANs can be used to significantly improve the quality of prostate MRI scans. We aim to empower you with both theoretical knowledge and practical skills, paving the way for future innovations in medical imaging.

We look forward to your active participation and the insightful discussions that will emerge, driving forward the field of radiology with AI.


In [2]:
import os  # For interacting with the operating system
import sys  # For system-specific parameters and functions
from tqdm import tqdm

import numpy as np  # For numerical operations
import pandas as pd  # For data manipulation and analysis
from keras.models import load_model  # For loading trained Keras models
from sklearn.model_selection import train_test_split  # For splitting data into training and test sets
import matplotlib.pyplot as plt  # For plotting graphs
import seaborn as sns  # For making attractive and informative statistical graphics

sys.path.append("denoise-pix2pix/")  # Add the 'denoise-pix2pix' directory to the system path for module import

# Importing custom functions and classes from the denoise-pix2pix project
from pix2pix import define_discriminator, define_generator, define_gan, train
from datasets import load_case, DataLoader
from plot_utils import interactive_show, interactive_inference, mae_calc
from skimage.metrics import peak_signal_noise_ratio as compare_psnr  # For PSNR calculation
from skimage.metrics import structural_similarity as compare_ssim  # For SSIM calculation

In [3]:
# Check if the 'data' directory exists, if not, download and unzip the dataset
if not os.path.exists('data'):  # Condition to check existence of 'data' directory
    !wget -q https://huggingface.co/datasets/msaidkartal/denoise-prostateMRI/resolve/main/data.zip  # Download dataset zip file quietly
    !unzip -q data.zip  # Unzip the downloaded dataset quietly

# Check if the 'models' directory exists, if not, download and unzip the pre-trained models
if not os.path.exists('models'):  # Condition to check existence of 'models' directory
    !wget -q https://huggingface.co/datasets/msaidkartal/denoise-prostateMRI/resolve/main/models.zip  # Download model zip file quietly
    !unzip -q models.zip  # Unzip the downloaded model quietly

# Note: The -q flag in wget and unzip commands ensures quiet (non-verbose) operation with no unnecessary output to the console.

In [4]:
# Define a dictionary 'dirs' to store paths for various components used in the project
dirs = {
    'low_res': 'data/low_res',  # Path to the directory containing low resolution (noisy) MRI images
    'high_res': 'data/source',  # Path to the directory containing high resolution (original) MRI images
    'metrics': 'data/metrics.xlsx',  # Path to an Excel file containing metrics data
    'model': 'models/best_model.h5'  # Path to the pre-trained model file
}

### Dataset Overview: PI-CAI Grand Challenge Subset
![im](./assests/picai.png)

In this workshop, we are utilizing a subset of the data from the Prostate Imaging: Cancer AI (PI-CAI) Grand Challenge. This pioneering initiative is a comprehensive effort to harness the power of artificial intelligence in the diagnosis and detection of prostate cancer through advanced imaging techniques.

#### Key Features of the Dataset:
1. **Origin:** Our dataset is derived from the extensive collection of over 10,000 carefully curated prostate MRI exams made available through the PI-CAI Grand Challenge. This rich dataset is pivotal for validating modern AI algorithms in the field of radiology, especially in the context of clinically significant prostate cancer (csPCa).

2. **Composition:** The subset we are working with includes high-quality bi-parametric MRI (bpMRI) scans. These scans are instrumental in identifying csPCa, offering a comprehensive view of the prostate region with varying degrees of resolution and clarity.

3. **Purpose and Utility:** The primary objective of using this dataset is to demonstrate and validate the effectiveness of AI algorithms, particularly in the patient-level diagnosis and lesion-level detection of csPCa. Our focus will be on exploring how deep learning models can be trained and tested to enhance diagnostic accuracy in prostate cancer detection.

4. **Challenges and Opportunities:** Working with this dataset presents an extraordinary opportunity to delve into the nuances of medical imaging in AI. It allows us to tackle challenges such as varying image quality, the subtlety of lesion appearances, and the complexities of cancer detection in MRI scans.

5. **Educational Value:** For participants of this workshop, engaging with this dataset offers a hands-on experience in applying deep learning techniques to real-world medical imaging problems. It is an excellent opportunity to understand the intricacies of AI applications in healthcare, specifically in the domain of cancer diagnosis.

https://pi-cai.grand-challenge.org/PI-CAI/

In [None]:
# Load and process the metrics data
data = pd.read_excel(dirs['metrics'], index_col=0)  # Load the metrics data from Excel file into a pandas DataFrame. 'index_col=0' sets the first column as the index.

data = data.reset_index(drop=True)  # Reset the index of the DataFrame, dropping the old index.

data.head(10)  # Display the first 10 rows of the DataFrame for a quick overview.

In [None]:
# @title
# Loading and visualizing a specific case from the dataset
index = 15  # Set the index for the patient case to be loaded

vol = load_case(data.PatientName[index])  # Load the MRI volume for the specified patient case using the 'load_case' function

interactive_show(vol)  # Display the loaded MRI volume using an interactive viewer

In [None]:
# Loading and visualizing a specific case from the dataset
index = 15  # Set the index for the patient case to be loaded

vol = load_case(data.PatientName[index])  # Load the MRI volume for the specified patient case using the 'load_case' function

interactive_show("fill here") # Display the loaded MRI volume using an interactive viewer

In [None]:
model = load_model(dirs['model'])

In [11]:
# Loading preprocessed low and high resolution images for a specific case
index = 100  # Set the index for the patient case to be loaded

# Load both low and high resolution images for the specified patient case
# The 'preprocess' flag is set to True, indicating that preprocessing steps will be applied to the images
low_res, high_res = load_case(data.PatientName[index], preprocess=True)

# Generating denoised images using the pre-trained model
# 'model.predict' is used to generate images from the low resolution input, with no output verbosity
generated_im = model.predict(low_res, verbose=0)

In [None]:
# Calculating and comparing image quality metrics between high-resolution and generated images
rangeD = high_res.max() - high_res.min()  # Determine the range of pixel values in the high-resolution images

# Compute the Structural Similarity Index (SSIM) for comparing the similarity between high-res and generated images
ssim = compare_ssim(high_res[:,:,:,0], generated_im[:,:,:,0], data_range=rangeD)

# Compute the Peak Signal-to-Noise Ratio (PSNR) for assessing the quality of the generated images against high-res images
psnr = compare_psnr(high_res[:,:,:,0], generated_im[:,:,:,0], data_range=rangeD)

# Calculate the Mean Absolute Error (MAE) to measure the average magnitude of errors between high-res and generated images
mae = mae_calc(high_res[:,:,:,0], generated_im[:,:,:,0])

# Print out the calculated PSNR, SSIM, and MAE values
print(f"PSNR: {psnr:.3f}\nSSIM: {ssim:.3f}\nMAE: {mae:.3f}")

#### PSNR (Peak Signal-to-Noise Ratio):
![im](./assests/psnr1.png)
* PSNR is a widely used metric to measure the quality of reconstructed or generated images compared to the original high-resolution images.
* It is expressed in decibels (dB), with higher values indicating better image quality.
* PSNR evaluates the ratio between the maximum possible power of a signal (in this case, pixel intensity) and the power of corrupting noise.
* In the context of image denoising, a higher PSNR means the denoised image is closer to the original image, indicating better performance of the denoising algorithm.

#### SSIM (Structural Similarity Index):
![im](./assests/psnr2.png)
* SSIM is used to measure the similarity between two images, in this case, the generated image and the original high-resolution image.
* Unlike PSNR that focuses on pixel-level differences, SSIM considers changes in structural information, luminance, and contrast.
* SSIM values range from -1 to 1, with 1 indicating perfect similarity. A higher SSIM value suggests that the structural integrity and visual perception of the generated image are more aligned with the original.
#### MAE (Mean Absolute Error):
![im](./assests/psnr3.png)
* MAE is a straightforward measure of the average magnitude of errors between the paired observations, here between pixels of the generated and original images.
* It calculates the absolute difference between corresponding pixels of the two images and then averages these differences over all pixels.
* A lower MAE indicates that the generated image is closer to the original, signifying a more accurate reconstruction or denoising process.

In summary, these metrics (PSNR, SSIM, MAE) provide a comprehensive assessment of the generated image's quality compared to the original, covering aspects like noise reduction, structural similarity, and overall error magnitude. They are critical in evaluating the effectiveness of deep learning models in medical imaging tasks such as MRI denoising.

In [None]:
# @title
interactive_inference((low_res, high_res, generated_im))

In [None]:
interactive_inference(("fill here", "fill here", "fill here"))

### Introduction to Generative AI
#### What is Generative AI?
Generative AI encompasses a range of artificial intelligence techniques designed to create new content, including images, text, and complex data structures like medical imaging scans. This branch of AI focuses on generating new data that closely resembles authentic samples, going beyond traditional AI models that are typically used for interpretation or classification.
#### Core Techniques in Generative AI:
![im](./assests/gen1.png)
1. **Generative Adversarial Networks (GANs):** Consist of two parts, a generator and a discriminator, working together to improve the quality of generated outputs.
2. **Variational Autoencoders (VAEs):** Aim to compress data into a lower-dimensional representation and then reconstruct it, trying to retain as much original information as possible.
3. **Diffusion Models:** These are a newer class of generative models that transform patterns of noise into coherent images or structures through a gradual refining process.
4. **Large Language Models (LLMs):** Specialized in generating human-like text, LLMs like GPT (Generative Pre-trained Transformer) are trained on vast amounts of textual data to produce contextually relevant and coherent language outputs.

#### Applications in Healthcare:
![im](./assests/gen2.png)
Generative AI holds transformative potential in healthcare. It can be used for synthesizing medical images, augmenting datasets for machine learning models, developing personalized medicine strategies, and even in generating medical literature or reports. The accuracy and efficiency of patient care can be significantly enhanced through these applications.
#### Relevance to This Workshop:
Our focus will be on using GANs for denoising prostate MRI scans, a prime example of generative AI's capability to improve medical imaging quality. We'll explore how these advanced AI techniques can refine noisy images, thereby enhancing their diagnostic value in a clinical setting.

In [None]:
# Splitting the dataset into training and testing subsets
train_case, test_case, train_y, test_y = train_test_split(
    data['PatientName'].tolist(),  # List of patient names to be used as features for splitting
    data['SSIM'].tolist(),  # List of SSIM values to be used as labels for splitting
    test_size=0.20,  # 20% of the data is allocated for the test set
    random_state=0  # Set a random state for reproducibility of the split
)

# Display the number of cases in the training and testing sets
len(train_case), len(test_case)

In [None]:
# @title
def model_test(test_case, model=model, input_shape=(512,512)):

  ssims = []
  psnrs = []
  maes = []

  cases = []

  genssims = []
  genpsnrs = []
  genmaes = []
  for patient_num in tqdm(test_case):

    low_res, high_res = load_case(patient_num, preprocess=True, input_shape=input_shape)
    generated_im = model.predict(low_res, verbose=0)

    rangeD = high_res.max() - high_res.min()

    genssim = compare_ssim(high_res[:,:,:,0], generated_im[:,:,:,0], data_range=rangeD)
    genpsnr = compare_psnr(high_res[:,:,:,0], generated_im[:,:,:,0], data_range=rangeD)
    genmae = mae_calc(high_res[:,:,:,0], generated_im[:,:,:,0])

    genssims.append(genssim)
    genpsnrs.append(genpsnr)
    genmaes.append(genmae)
    cases.append(patient_num)

    rangeD = high_res.max() - high_res.min()

    ssim = compare_ssim(high_res[:,:,:,0], low_res[:,:,:,0], data_range=rangeD)
    psnr = compare_psnr(high_res[:,:,:,0], low_res[:,:,:,0], data_range=rangeD)
    mae = mae_calc(high_res[:,:,:,0], low_res[:,:,:,0])

    ssims.append(ssim)
    psnrs.append(psnr)
    maes.append(mae)

  return pd.DataFrame({'PatientName':cases, 'PSNR':psnrs, 'SSIM':ssims, 'MAE':maes,
                       'GenPSNR':genpsnrs, 'GenSSIM':genssims, 'GenMAE':genmaes})


test_df = model_test(test_case)
test_df.head(5)

In [None]:
def model_test(test_case, model=model, input_shape=(512,512)):

  ssims = []
  psnrs = []
  maes = []

  cases = []

  genssims = []
  genpsnrs = []
  genmaes = []
  for patient_num in tqdm(test_case):

    low_res, high_res = load_case(patient_num, preprocess=True, input_shape=input_shape)
    generated_im = model.predict(low_res, verbose=0)

    rangeD = high_res.max() - high_res.min()

    genssim = compare_ssim(high_res[:,:,:,0], generated_im[:,:,:,0], data_range=rangeD)
    genpsnr = compare_psnr(high_res[:,:,:,0], generated_im[:,:,:,0], data_range=rangeD)
    genmae = mae_calc(high_res[:,:,:,0], generated_im[:,:,:,0])

    genssims.append(genssim)
    genpsnrs.append(genpsnr)
    genmaes.append(genmae)
    cases.append(patient_num)

    rangeD = high_res.max() - high_res.min()

    ssim = compare_ssim(high_res[:,:,:,0], low_res[:,:,:,0], data_range=rangeD)
    psnr = compare_psnr(high_res[:,:,:,0], low_res[:,:,:,0], data_range=rangeD)
    mae = mae_calc(high_res[:,:,:,0], low_res[:,:,:,0])

    ssims.append(ssim)
    psnrs.append(psnr)
    maes.append(mae)

  return pd.DataFrame({'PatientName':"fill here", 'PSNR':"fill here", 'SSIM':"fill here", 'MAE':"fill here",
                       'GenPSNR':"fill here", 'GenSSIM':"fill here", 'GenMAE':"fill here"})

test_df = model_test(test_case)
test_df.head(5)

In [None]:
# Creating a 2x2 subplot layout
fig, axs = plt.subplots(2, 2, figsize=(20, 10))

# Plotting the first boxplot for PSNR
sns.boxplot(data=test_df[['PSNR', 'GenPSNR']], orient='h', ax=axs[0, 0])
axs[0, 0].set_title('Low Res(Input MRI) vs Generated MRI PSNR')

# Plotting the second boxplot for SSIM
sns.boxplot(data=test_df[['SSIM', 'GenSSIM']], orient='h', ax=axs[0, 1])
axs[0, 1].set_title('Low Res(Input MRI) vs Generated MRI SSIM')

# Plotting the third boxplot for MAE
sns.boxplot(data=test_df[['MAE', 'GenMAE']], orient='h', ax=axs[1, 0])
axs[1, 0].set_title('Low Res(Input MRI) vs Generated MRI MAE')

# Leaving the bottom-right subplot empty
axs[1, 1].axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Creating a DataLoader instance for the training dataset
train_dataset = DataLoader(
    train_case,  # List of patient cases to be included in the training dataset
    shape=(256, 256)  # Specifying the shape (resolution) for the images in the dataset
)

### Introduction to Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) are a fascinating and powerful class of artificial intelligence models used in the field of machine learning, particularly for the task of generative modeling. They were introduced by Ian Goodfellow and his colleagues in 2014 and have since revolutionized the way we think about generating new content, whether it's images, text, or even complex structures like 3D models.
#### Fundamental Concepts of GANs
![im](./assests/gan1.png)
A GAN consists of two main components that work in tandem:

1. **Generator:** This part of the GAN is responsible for creating new data. It takes in random noise as an input and transforms it into a data output (e.g., an image). The generator's goal is to produce data that is indistinguishable from real data.

2. **Discriminator:** The discriminator acts as a critic that tries to differentiate between real data (from the training set) and fake data (created by the generator). It is essentially a binary classifier that learns to identify whether a given input is real or generated.

#### The Training Process
The training of a GAN involves a competitive game between the generator and the discriminator:
![im](./assests/gan2.png)
* The **generator** is trained to produce increasingly realistic data, trying to fool the discriminator.

![im](./assests/gan3.png)
* The **discriminator** is trained to get better at distinguishing real data from the fakes created by the generator.

This process is akin to a forger trying to create a perfect fake painting, and an art expert trying to detect the forgery. Over time, the forger becomes skilled at creating realistic art, while the expert becomes better at spotting fakes.

#### Applications
GANs have a wide range of applications, including but not limited to:

* **Image Generation:** Creating realistic images from scratch.
* **Data Augmentation:** Generating new data for training machine learning models.
* **Style Transfer:** Modifying images to change their style (e.g., changing a day scene to night).
* **Medical Imaging:** Enhancing the quality of medical images or creating synthetic medical data for research and training.

In [None]:
# @title
# Setting up the GAN components: discriminator, generator, and the composite model
image_shape = (256, 256, 1)  # Define the shape of the images (256x256 pixels with 1 color channel)

# Initialize the discriminator model
discriminator_model = define_discriminator(image_shape)  # Create the discriminator model with the specified image shape

# Initialize the generator model
generator_model = define_generator(image_shape)  # Create the generator model with the same image shape

# Define the composite GAN model
gan_model = define_gan(generator_model, discriminator_model, image_shape) # Combine generator and discriminator into the GAN model

In [None]:
# Setting up the GAN components: discriminator, generator, and the composite model
image_shape = (256, 256, 1)  # Define the shape of the images (256x256 pixels with 1 color channel)

# Initialize the discriminator model
discriminator_model = define_discriminator(image_shape)  # Create the discriminator model with the specified image shape

# Initialize the generator model
generator_model = define_generator(image_shape)  # Create the generator model with the same image shape

# Define the composite GAN model
gan_model = define_gan("fill here", "fill here", "fill here")  # Combine generator and discriminator into the GAN model

### GAN Training Process in Medical Imaging
In the context of our workshop, where we use GANs for denoising prostate MRI scans, understanding the training process is crucial. Here's an overview of how GANs are trained, particularly focusing on the dynamics between the generator and discriminator components.

#### The Training Dynamics:
1. **Initial Phase:** At the start of training, the generator produces images that are far from the desired quality, and the discriminator easily differentiates between real and generated images.

2. **Iterative Improvement:** Both the generator and discriminator improve through iterations. The generator learns to produce more realistic images, while the discriminator becomes more adept at distinguishing fakes from real images.

3. **Feedback Loop:** The generator is guided by the feedback it receives from the discriminator. If the discriminator easily identifies a generated image, the generator adjusts to produce more convincing images.

Key Training Steps:
![im](./assests/train1.png)
1. **Training the Discriminator:** In each training iteration, the discriminator is trained first. It is fed with a batch of real images and a batch of images generated by the generator. The goal is for the discriminator to learn to label real images as real and generated images as fake.

2. **Training the Generator:** Next, the generator is trained. The objective is to create images that the discriminator classifies as real. The generator's success is measured by how often the discriminator mistakes its output for real images.

3. **Loss Functions:** The loss functions play a critical role. For the discriminator, the loss is high when it incorrectly classifies images. For the generator, the loss is high when the discriminator correctly identifies its images as fake.

4. **Backpropagation and Updates:** Both the generator and discriminator use the backpropagation algorithm to update their weights. The goal is to minimize their respective loss functions.

5. **Reaching Equilibrium:** The training continues until the generator produces images indistinguishable from real images to the discriminator. At this point, the discriminator has a 50% success rate, essentially guessing at random.

#### Considerations for Medical Imaging:
* **Quality and Clarity:** In medical imaging, the quality of generated images is paramount. The training focuses on achieving high clarity and detail, essential for accurate diagnosis.
* **Data Sensitivity:** The training must be conducted with an understanding of the sensitivity and specificity required in medical imaging, ensuring that the denoised images maintain all critical information.

In our workshop, we will explore the nuances of this training process with a hands-on approach, providing insights into the practical application of GANs in improving the quality of medical images.

In [None]:
# @title
# Initiating the training process for the GAN
train(
    discriminator_model,    # The discriminator model to be trained
    generator_model,        # The generator model to be trained
    gan_model,              # The composite GAN model
    train_dataset,          # The training dataset to be used
    n_epochs=1,             # Number of epochs for training (set to 1 for demonstration)
    n_batch=1               # Batch size for training (set to 1 for demonstration)
)


In [None]:
# Initiating the training process for the GAN
train(
    discriminator_model,  # The discriminator model to be trained
    "fill here",      # The generator model to be trained
    "fill here",            # The composite GAN model
    "fill here",        # The training dataset to be used
    n_epochs="fill here",           # Number of epochs for training (set to 1 for demonstration)
    n_batch="fill here"             # Batch size for training (set to 1 for demonstration)
)

In [None]:
newtest_df = model_test(test_case, model=generator_model, input_shape=(256,256))
newtest_df.head(5)

In [None]:
# Creating a 2x2 subplot layout
fig, axs = plt.subplots(2, 2, figsize=(20, 10))

# Plotting the first boxplot for PSNR
sns.boxplot(data=newtest_df[['PSNR', 'GenPSNR']], orient='h', ax=axs[0, 0])
axs[0, 0].set_title('Low Res(Input MRI) vs Generated MRI PSNR')

# Plotting the second boxplot for SSIM
sns.boxplot(data=newtest_df[['SSIM', 'GenSSIM']], orient='h', ax=axs[0, 1])
axs[0, 1].set_title('Low Res(Input MRI) vs Generated MRI SSIM')

# Plotting the third boxplot for MAE
sns.boxplot(data=newtest_df[['MAE', 'GenMAE']], orient='h', ax=axs[1, 0])
axs[1, 0].set_title('Low Res(Input MRI) vs Generated MRI MAE')

# Leaving the bottom-right subplot empty
axs[1, 1].axis('off')

plt.tight_layout()
plt.show()