# CycleGAN generative model and InceptionV3 classifier in Alzheimer's disease detection.

Summary

1. Introduction;
2. Alzheimer's disease;
3. Data;
    * 3.1 NIfTI images;
4. CycleGAN Model;
    * 4.1 Generator with U-NET architecture;
    * 4.2 Discriminator;
    * 4.3 Cycle consistancy;
    * 4.4 Loss Functions - Optimizers;
    * 4.5 Training;
5. Classification with InceptionV3 model;
    * 5.1 Training;
6. Results;
7. Conclusions.

## 1 - Introduction

Artificial intelligence (AI) has revolutionized many fields, including the medical one. The use of Machine Learning, Deep Learning techniques and other artificial intelligence techniques in the medical field has made it possible to develop predictive models capable of supporting doctors and researchers in their daily activities.
One of the main areas of use of the AI in the medical field concerns the analysis of medical images, such as those obtained from magnetic resonances, computerized tomographs and radiographs. However, one of the main problems in the use of the AI in this context is represented by the difficulty of finding dataset of medical images sufficiently large and accurate to train the models. This is due to the privacy of patients and the limitations imposed by personal data protection regulations.
To overcome this difficulty, different images generator models are used, such as GAN (Generative Adversarial Networks), VAE (Variational Auto-Encoders) and GPTs (Generative Pretrained Transformers). These models are able to generate new images starting from existing ones, thus creating artificial datasets of medical images that can be used for training artificial intelligence models.
In summary, the use of AI in the medical field is revolutionizing the approach to diagnosis and treatment of diseases, allowing doctors to have access to increasingly advanced support tools. However, important challenges are still to be faced, such as that of finding dataset of high quality medical images.

The case of the identification of Alzheimer's disease is taken as an example of how AI can be used in the medical field. In a recent study by Islam and Zhang [1], they tackled the problem by useing a generative GAN model to augment the training dataset, resulting in a $10\%$ improvement in disease classification. In this work, the same problem is approached, but following the same way taken by Bargshady et al. [2] as described in their study on the classification of Covid-19 in X-ray images. In a first step, the dataset of images used is increased by generating new synthetic images obtained through a CycleGAN model trained with real images. Finally, the classification of the presence of the disease is carried out through the pretrained classification model Inception V3.

### 2 - Alzheimer's disease

Alzheimer's disease is a degenerative brain disorder in humans that gradually leads to the loss of memory and other intellectual abilities to such an extent that it interferes with daily life. Alzheimer's is not a normal part of aging, although the greatest known risk factor is increasing age, and the majority of people affected by Alzheimer's are $65$ years and older. However, Alzheimer's is not solely a disease of old age. Up to $5\%$ of people with this disease experience early onset Alzheimer's (also known as "early onset"), which often occurs when a person is in their forties to fifties or between the ages of fifty and sixty.

The onset of the disease and its progression are determined by the way specific biomarkers change:

1. β-amyloid (indicating deposition of amyloid in plaques outside the cell, measured in CSF and by amyloid PET)
2. Tau (indicating the formation of tau fibrils with the neurons)
3. Glucose metabolism (measured on PET, indicating damage to neurons)
4. Structural MRI (indicating damage to brain structure)
5. Cognitive impairment

In this study, disease recognition is primarily based on the structural differences between the brain of a healthy patient and the brain of an affected patient. Structural changes in the Alzheimer's-afflicted brain compared to a normal brain include cortical thinning, ventricular expansion, widening of the sulci, and overall loss of brain volume.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/Brain-ALZH.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T163304Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=100a3a21945eb7d5229341846e0132586712c26dd89f535efef157786bd787ab06420e659c761e32766f50ed6b1015c21dd61edfad41a7d7b82c027783b398200e5572d8718830d06026aea15efead5dac2248d2f48675f7cee77bd55a092a7accd98dcea87fd11a25be8b5b2a59eb8f4b21c9df4d36c595d137228f3362e85669a6c0f60d4032cba8f43cd6b4b09f481009ccd2c7ef3d93f9eee75b7347f35474d1bc684a808e316f5cb1bd429ec54bf5a9a900e5c234e4617eb89c1c8863ea50f66001bdae4bf391b78da6b3c87850c1fb6628603bc931c2b504fe536643579d77865fdb59c34c8c0c09bebf016dd32bc0ed754bc22ec4c5363989dc4b2389" alt="Figure 1 - Descrizione dell'immagine" width="600" height="600" align="center"/>

## 3 - Data

The Alzheimer's Disease Neuroimaging Initiative (ADNI) [3] is a collaborative project involving public institutions and private pharmaceutical companies. Since its establishment in 2004, it has brought together researchers from around the world with the goal of focusing their efforts on studying the progression of the disease through neuroimaging, the study of biochemical processes, and the identification of genetic biomarkers. The knowledge gained allows for the improvement of clinical trials for the prevention and treatment of the disease, as well as the development and standardization of protocols, but most importantly, the sharing of collected data and results.
For this study, the ADNI1 image dataset was used, utilizing only images classified as CN (Cognitively Normal patients) and AD (Advanced-stage positive patients). This dataset comprises a total of $2408$ three-dimensional images ($1204$ CN type and $1204$ AD type) obtained through magnetic resonance imaging (MRI) on patients aged between $55$ and $92$ years.


### 3.1 - NIfTI images

Generally, scanners used in the medical field produce images in a proprietary format, posing a significant barrier to open diagnostics. In this regard, a solution has been provided by the Neuroimaging Informatics Technology Initiative (NIfTI) [4], which has released a new standard and open format called nitifi, which every neuroimaging package should support.

The smallest volumetric element that composes a three-dimensional image is the voxel, which is the counterpart of the pixel that represents the unit of measurement in a two-dimensional image. Each voxel has associated numerical values that represent measurable properties or independent variables (e.g., color, opacity, density, material, coverage ratio, refractive index, velocity, force, time) of the real phenomenon or object residing within the volume unit represented by that voxel. In addition, a 3D image can be considered as an overlay of stacked 2D slices, one on top of another.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/mri_slices.jpg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T165854Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=87dc03095ed439bd5272f74b4bae4a78b8cb6f2be73b4371ebac5a676bfeb24a5e9dcdab914779773b452472083ac1ef31db001b86ef64b42dbf383a39b599d4d8ea80f4be54276d1cb7f53298e44899edfa1bba2a9855665cb74f2f7b98b4668ac4841e7f4ee4cb016118663c94e1fa4e5173d1c9228362a0f0dbb16d0de4ba86a9381c3923024d533ef7b2c226b9e6e2e07a44a2a9f6a07fb23697ad0d4c0374bae20b9b9d7533b228ad12a4f9034a4809aebbb5f831fe283649c3cc281e4eb28310424efb335275f4f5b59dfbe23c049b03c65fec51b5fa08aa7946d1e21f566619569497ac8f6bfad4b120bc5a70215aab01d5b6139284e6d2ad28b20ec6" alt="Figure 2 - Descrizione dell'immagine" width="600" height="600" align="center"/>

A NIfTI image (file with .nii extention) can be considered divided into three parts:

1. The header;
2. The image data;
3. The affine matrix.

**The header** contain metadata about the scan, such as the units of measurement, the voxel size, etc. In case of fMRI files, there is a fourth dimension thet represents the “time” dimension, which is the time that elapses between the acquisition of two consecutive volumes. For example, you could have an image of dimensions $(80, 80, 44, 50)$, where the first three numbers represent the spatial dimensions $(X, Y, Z)$, and the last number refers to the number of temporal time points used in the scan. In this case, it is possible to have $80\times80\times44$ voxels with a size of $2.7\times2.7\times2.97 mm$ scanned every $0.7$ seconds.

**The data** contained in a NIfTI file represents the intensities of the voxels and is expressed with numerical values that are associated with colors during the image plotting stage.

**The affine matrix A** is responsible for relating the spatial coordinates of the voxels to the world spatial coordinates. It is assumed that the coordinate $(0, 0, 0)$ corresponds to the $\textit{isocenter}$, and the first axis is oriented from left to right, the second axis from posterior to anterior, and the third axis from inferior to superior. At this point, there are two conventions regarding the orientation of the axes. With $RAS+$, the orientation is identified as right, anterior, and superior of the isocenter as positive, and left, posterior, and inferior of the isocenter as negative, thus defined as $RAS-$.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/FSCoords_RAS01.gif?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T165921Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=ae8f0da652de03d4d4cb638eb083bd5c17c1b74e15941551082c7dadee5c7f641d35f769210bf1a71eb79740807c11020f03d7e3790e5dc00b66f4e7d55b1c63f4884b85de905ac09ce55e70c62c68f97035823b2cf46ad96aa83e5265a18afffbc85be44f3f1349d836f2eff2e02fdb05cb610ed54eac071964713e60a59ad4255b932cb5b666bececc9e9ee339f1d7c415e59bddaed7b22e39007ffe26ed8851087dc198ecce610bed3fcf2ca64ad569b64ad8ab908d5b015535d5597adccc0d85a6cbd830b97b085a8b3c4e6b033bec51c3d8cff393c715de067b6da1c2c3be478819b72b2cc2ca63ca3752f482a207a85b009a274954e02815fa59b3de09" alt="Figure 3 - Descrizione dell'immagine" width="600" height="600" align="center"/>

A set of voxel coordinates $(i, j, k)$ is related to a set of real-world coordinates $(x, y, z)$ through the affine matrix A, using the relationship

$$(x, y, z, 1) = A(i, j, k, 1)^T$$

where A is a $4\times4$ matrix and a $1$ has been added as the fourth dimension.

Each NIfTI file in the dataset ADNI1 has dimensions $(256, 256, 256)$, and from this, a new dataset was created consisting of 2D images in png format obtained by extracting slice number $150$ from the original file (Figure below is an example). For this purpose, the Nibabel library was used for handling NIfTI images and the Matplotlib library was used to maintaining a 36-bit RGB image quality in png images conversion.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/ADNI_s150_examples.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T165955Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=b69eb1ffe4cbeeabb566351ac50b0a729579387130e76672f760f80e08c0ffde16e62bed1fd88b27e2fea9a5f6f1b2900c6286d9f41091d70b7514f1c530e0099249efab789e938e57ce5ca7e40fc8f842f32585cd28b06fa5babae27075a1dad64898a436c03e64b8b40a3064b001affef255d1dd22c643ca19580377371865ac569dc641e0cf12e2ae2175dfe40d9f3118d18b04075fbf3396839ba8b1dd47c27da4cb478837b526b0d531736d4748dc34a18a080245130e148a29c2119651ac07d4598b98c54e0e121a367a47fe266327b93eb130a75ff94b181a939ba3517b03f6b0f19dde3a30c7e635d921bf227b6f6500adcbc4f9d5ec5282004785ab" alt="Figure 3 - Descrizione dell'immagine" width="600" height="600" align="center"/>

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow import image
import nibabel as nib
from PIL import Image

In [None]:
# extract, resize and save from nii files
def img_from_nii(height, width, n_slice, label,  in_path, temp_path):
    
    filenames = os.listdir(in_path)
    
    for i in range(len(filenames)):
        mri_file = in_path + filenames[i]
        img_data = nib.load(mri_file).get_fdata()
        img_data = np.transpose(img_data, (2, 1, 0))
        slice_2D = Image.fromarray(img_data[:, :, n_slice]).resize((height, width), resample = Image.Resampling.LANCZOS)

        plt.imsave(temp_path + label + '_ADNI_' + 'slc' + str(n_slice) + '_' + str(i + 1) + '.png', slice_2D)
        
        plt.close()
    
    print('ADNI ' + label + ' dataset done!')

In [None]:
infolder_CN = '/kaggle/input/ADNI1_CN/nii_files/'
outfolder_CN = '/kaggle/input/ADNI1_CN/images_s150_CN/'
infolder_AD = '/kaggle/input/ADNI1_AD/nii_files/'
outfolder_AD = '/kaggle/input/ADNI1_AD/images_s150_AD/'
category_CN = 'N' 
category_AD = 'P'
h , w = [256, 256]
nslice = 150

img_from_nii(h , w, nslice, category_CN, infolder_CN, outfolder_CN)
img_from_nii(h , w, nslice, category_AD, infolder_AD, outfolder_AD)

In [None]:
# check the work
filename_CN = 'N_ADNI_slc150_7.png'
filename_AD = 'P_ADNI_slc150_7.png'

imN = Image.open(outfolder_CN + filename_CN)
imP = Image.open(outfolder_AD + filename_AD)

width_N, height_N = imN.size
width_P, height_P = imP.size

print('N image width:', width_N)
print('N image height:', height_N)
print('P image width:', width_P)
print('P image height:', height_P)

In [None]:
# Model parameters
input_path_A = '/kaggle/input/adcn1204ds/images_s150_CN/'
input_path_B = '/kaggle/input/adcn1204ds/images_s150_AD/'
output_path_A = '/kaggle/working/CN_fake_imgs/'
output_path_B = '/kaggle/working/AD_fake_imgs/'
step_path = '/kaggle/working/step_by_step/'
checkpoint_path = '../ckpts/'
sample_img = 'N_ADNI_slc150_6.png'
EPOCHS = 50
buffer_size = 1000
batch_size = 1
HEIGHT = 256
WIDTH = 256
CHANNEL = 3
LAMBDA = 10

In [None]:
# Datasets loading and saveing functions
def load_and_norm(filename):

    img = tf.io.read_file(filename) # get only filename string
    img = tf.image.decode_png(img, channels = 3) # necesary converting to tensor
    img = tf.cast(img, tf.float32) / 127.5 - 1 # normalization to [-1, 1]
    
    return img
    
def load_dataset(ds_folder, batch_size, buffer_size):

    img_filenames = tf.data.Dataset.list_files(os.path.join(ds_folder, "*.png"))
    img_dataset = img_filenames.map(load_and_norm)
    
    img_dataset = img_dataset.batch(batch_size).shuffle(buffer_size)
    
    return img_dataset

# loading dataset without batch and shuffle    
def load_ds(ds_folder):

    img_filenames = tf.data.Dataset.list_files(os.path.join(ds_folder, "*.png"))
    img_dataset = img_filenames.map(load_and_norm)
    
    img_dataset = img_dataset.batch(1)
    
    return img_dataset

# showing dataset images
def show_images (dataset, title):
    fig, axes = plt.subplots(nrows = 2, ncols = 5, figsize = (10, 5))
    
    for i, image in enumerate(dataset):
        image = np.squeeze(image)
        axes[i // 5, i % 5].imshow(image)
        axes[i // 5, i % 5].axis('off')
        
        if i == 9:
            break
    
    axes[0, 0].set_title(title)
    plt.tight_layout()
    plt.show()

# saveing generated function
def save_generated(image_ds, label, generator_model, outputs_path):
    i =  1
    for img in image_ds:
        generated = generator_model(img, training = False)[0].numpy()
        
        generated = (generated * 127.5 + 127.5).astype(np.uint8)   # re-scale
        im = Image.fromarray(generated)
        #plt.imsave(outputs_path + label + '_fake_image_' + str(i) + '.png', im)
        #plt.close()
        im.save(f'{outputs_path}{str(label)}_fake_image_{str(i)}.png')
        i += 1      

# saveing image step by step
def save_step(img_path, img_name, epoch, label, generator_model, outputs_path):

    img_png = img_path + img_name
    img = load_and_norm(img_png)
    img = np.expand_dims(img, axis = 0)
    
    generated = generator_model(img, training = False)[0].numpy()
    generated = (generated * 127.5 + 127.5).astype(np.uint8)
    im = Image.fromarray(generated)
    #im.show()
    im.save(f'{outputs_path}{str(label)}_fake_image_ep{str(epoch + 1)}.png')

### 4 - CycleGAN Model

With the image-to-image translation technique, it is possible to teach a model the mapping between an input image and an output image by using a training set of paired images. However, it is not always possible to leverage this technique since obtaining paired image sets is not always feasible. An alternative method to achieve the same result was introduced by Zhu et al. [5] in 2019, using a new technique called CycleGAN.

In CycleGAN, the basic idea is to translate an image from the source domain $X$ to a target domain $Y$ in the absence of paired samples. Given two starting datasets, ${x} \in X$ and ${y} \in Y$, the model is trained to learn the mappings $G: X \rightarrow Y$ and $F: Y \rightarrow X$ by introducing two adversarial discriminators $D_x$, which aim to distinguish between the original images ${x}$ and the translated ones ${F(y)}$, and $D_y$, which aims to distinguish between the original images ${y}$ and the translated ones ${G(x)}$. Then, $D_x$ and $D_y$ are considered as estimates of the probability that an element belongs to its reference domain. The ultimate objective that the model aims to achieve consists of two components: the adversarial loss and the cycle consistency loss.

Let $E_{x \sim p_{data}(x)}$ and $E_{y \sim p_{data}(y)}$ be the expected values of the data distributions $x \sim p_{data}(x)$ and $y \sim p_{data}(y)$ respectively.

The **adversarial loss** aims to align the distribution of generated images with the data distribution in the target domain. For the mapping function $G: X \rightarrow Y$, it can be written as

$$L_{GAN}(G,D_y,X,Y)=E_{y \sim p_{data}(y)}[log D_y(y)]+E_{x \sim p_{data}(x)}[1-log D_y(G(x))]$$

where $G$ attempts to generate an image $G(x)$ that is similar to the images in the $Y$ domain, and therefore try to minimize the adversarial loss

$$\text{min}_G L_{GAN}(G,D_y,X,Y)$$

$D_y$ tries to distinguish the generated image $G(x)$ from the real image $y$, and therefore try to maximize the adversarial loss

$$\text{max}_{D_y} L_{GAN}(G,D_y,X,Y)$$

Similarly, for the mapping function $F: Y \rightarrow X$, it can be written as

$$L_{GAN}(F,D_x,Y,X)=E_{x \sim p_{data}(x)}[log D_x(x)]+E_{y \sim p_{data}(y)}[1-log D_x(F(y))]$$

and for the adversarial loss, it follows

$$\text{min}_F \text{max}_{D_x} L_{GAN}(F,D_x,Y,X)$$

With a sufficiently large capacity, during adversarial training, a network can map the same set of input images to any random permutation of images in the target domain, where each learned mapping can lead to an output distribution that matches the target distribution. To reduce the potential space of possible mapping functions, only those functions that possess the property of cycle consistency are considered. This requirement translates into wanting the cycle translation to be able to retrieve the original image for each image in the respective domain

$$x \rightarrow G \rightarrow F(G(x)) \rightarrow x$$
$$y \rightarrow F \rightarrow G(F(y)) \rightarrow y$$

To encourage this behavior, a **cycle consistency loss** is employed

$$L_{cyc}(G,F)=E_{x \sim p_{data}(x)}[\left \| F(G(x))-x \right \|_1]+E_{y \sim p_{data}(y)}[\left \| G(F(y))-y \right \|_1]$$

So, the overall objective can be rewritten as

$$L(G,F,D_x,D_y)=L_{GAN}(G,D_y,X,Y)+L_{GAN}(F,D_x,Y,X)+ \lambda L_{cyc}(G,F)$$

where $\lambda$ is responsible for weighing the importance of the two components. The entire model can be thought of as an autoencoder that maps an image to itself through an intermediate representation that translates the original image into the $Y$ domain.
The following figure illustrates what has been described.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/cycleGAN_scheme.jpg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T170218Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=0f37b47eff408e2f5e62c6ad54b2fc3616e3080e78e742a39a68d341f743261201d6a8a03f8d57d74ef1bbdf2c7c62cb067bb9dbad18637de5357737282b9732a69197447cb85423def20ec017255b41c1d9a0f780f5ee54e5e38eab964290c05768b47c6da137b81cc9002c1ddbe03658dabc94615f32963b004a5e2ae8cfe3932851321fe258490aec2ab425be677105dac8916a96eff6f165abefc445d033ba6bc3c2893266cb7e56e1cb6174aee64aab4b5fe827dabd4e0283f08e4e25f6912e10dc3242162906c0b142e49960039d24534dcf6ea4b93460aec483eca2d8f4e9af7e3efd5a5708383be2adda576d77ffa2e1bbb1ef5afaa1fe1b04d85b6d" alt="Figure 2 - Descrizione dell'immagine" width="800" height="800" align="center"/>

In this work, the implementation of the generator consists of a CNN neural network characterized by a U-NET architecture.

### 4.1 -  Generator with U-NET architecture

The U-NET architecture is a design concept initially developed for semantic segmentation, primarily for processing biomedical images and later applied to other fields as well. It is a CNN developed by O. Ronneberger et al. [6] at the Computer Science Department of the University of Freiburg. Semantic segmentation refers to assigning a class to each pixel in the image, highlighting areas composed of pixels belonging to the same category. Models using this architecture utilize segmentation maps as target variables. The U-NET architecture consists of an encoder, a decoder, and a skip connection between these two parts.

**The encoder** (or downsample) is a Convolutional Neural Network (CNN) responsible for creating a more compact representation of the image, meaning a smaller-sized representation that contains only the most important information. However, the last layer of this CNN would have too few nodes to allow for accurate classification. It is essential to restore the image to its original size.

**The decoder** (or upsample) is a Deconvolutional Neural Network (DCNN) that reconstructs the image to its original dimensions using the most important information extracted from the encoder.

By applying an encoder and a decoder, it is possible to reconstruct the original image from compressed data. However, the localization of the mapped features in the image remains lost. To overcome this issue, a large amount of data is required to train an autoencoder. A U-NET differs from a regular autoencoder in reducing this data through skip connections.

In a U-Net, **skip connections** are used to pass information from previous convolutional layers to the deconvolutional layers. Essentially, what is transmitted is the location of the feature extracted from the convolutional layers. The skip connections tell the network where the features come from in the image. This is done by concatenating the last layer in the convolutional block with the first layer in the opposite deconvolutional block. The U-Net is symmetric: the dimensions of the opposing layers will be the same. This simplifies combining the layers into a single tensor. The convolution is then performed as usual by running the kernel on the concatenated tensor. 

U-Net combines two important pieces of information: 

* Feature extraction: features are transferred from the previous layer to the upsampled layer. 
* Element localization: the position of the element is passed from the opposite convolutional layer. 

By combining these pieces of information, is possible to improve the performance of semantic models and reduce the amount of data required to train the network.

In this work, the implementation of the encoder of the generator consists of $8$ Convolutional layers followed by Batch Normalization, and a ReLU activation function was used. Similarly, the decoder was implemented with $8$ Transpose Convolutional layers followed by Batch Normalization, and only the first three layers include a dropout with a rate of $0.2$. Again, a ReLU activation function was used. Additionally, Figure below shows the schema used for the concatenation in the U-Net architecture.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/u-net_gen.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T170614Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=1d2937c726a9c4737838ed226bc3b8e6bec9a42f7152d8994baff0d61a95ebad8fd936fa962d8acb1165787d607093a9c006ba539b63c09cc244bd321268829e4c5cc8c727d0c70061f42dfd7e301805dd5024c3e5508a47b92f7586b8690797dfb0471dcf1b39a87345360589a5aebd3ab55f0e1bacedd4f5d219994d5dce556aa34400741a3818e7b13d9528c60b26935883edfd8ef40e521cc33475949f3f5be6b7f68aec6ef00378ec535728fec8d8e75f3c435426a4a5f6ee269d4f1f2e6a28a1eeed26504ce74c2c8465200a8e074dfa12e87f4f1ec109f5f0c24a611bc8f0f4eff4c2657dfd17a97fb2b546a889f6baebb0ffc5dd67553f2c6fe59855" alt="Figure 2 - Descrizione dell'immagine" width="1000" height="1000" align="center"/>

In [None]:
#CNN layers for downsampling
def downsample(filetrs, size, strides):
    
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = tf.keras.initializers.RandomNormal(mean = 0.0, stddev = 0.02)
    
    downblock = tf.keras.Sequential()
    downblock.add(tf.keras.layers.Conv2D(filetrs, size, strides, kernel_initializer = initializer, 
							padding = 'same', use_bias = False))
    downblock.add(tfa.layers.InstanceNormalization(gamma_initializer = gamma_init))
    downblock.add(tf.keras.layers.ReLU())
    
    return downblock

#CNN layers for upsampling
def upsample(filters, size, strides, add_dropout, rate):
    
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = tf.keras.initializers.RandomNormal(mean = 0.0, stddev = 0.02)
    
    upblock = tf.keras.Sequential()
    upblock.add(tf.keras.layers.Conv2DTranspose(filters, size, strides, kernel_initializer = initializer, 
                                                padding = 'same', use_bias = False))
    upblock.add(tfa.layers.InstanceNormalization(gamma_initializer = gamma_init))
    
    if add_dropout: upblock.add(tf.keras.layers.SpatialDropout2D(rate = rate, 
                                                                data_format = 'channels_first'))
    
    upblock.add(tf.keras.layers.ReLU())
    
    return upblock

#last layer for generator
def lastlayer(filters, size, strides):
    initializer = tf.keras.initializers.RandomNormal(mean = 0.0, stddev = 0.02)
    
    lastblock = tf.keras.Sequential()
    lastblock.add(tf.keras.layers.Conv2DTranspose(filters, size, strides, kernel_initializer = initializer, 
                                                padding = 'same', activation = 'tanh', use_bias = False))
    
    return lastblock

#cycleGAN Generator
def generator():
    
    inputs = tf.keras.layers.Input(shape = [HEIGHT, WIDTH, CHANNEL])
    
    n_layers = 8          #number of layers in CNN
    filters = 64          #starting filters
    out_filters = CHANNEL #outputs channels
    strides = 2           #convolution strides
    rate = 0.2            #dropout rate
    size = 4              #convolution size
    
    #downsampling
    cropped = [] #features maps for concatenating steps
    inlayer = inputs
    for i in range (n_layers):
    	inlayer = downsample(filters, size, strides)(inlayer)
    	cropped.append(inlayer)
    	
    	if filters != 512: filters = int(filters * 2)
            
    #upsampling
    cropped = cropped[::-1]
    up = cropped[0]
    for i in range (n_layers - 1):
        add_dropout = False
        if i <= 2: add_dropout = True
        if i > 3: filters = int(filters / 2)
        
        up = upsample(filters, size, strides, add_dropout, rate)(up)
        up = tf.keras.layers.Concatenate()([up, cropped[i + 1]]) #concatenate features maps
        
    #output layer
    outputs = lastlayer(out_filters, size, strides)(up)
    
    gen_model = tf.keras.Model(inputs = inputs, outputs = outputs)
    
    return gen_model

### 4.2 Discriminator

The discriminator's task is to discriminate between the image generated by the generator and the real image. In a CycleGAN, two discriminators are used: one discriminates between the generated positive image and the real positive image, while the other discriminator discriminates between the generated negative image and the real negative image. The implementation follows a classical CNN architecture consisting of $4$ layers followed by Batch Normalization (momentum = $0.8$) and ReLU activation function. Two zero-padding layers were added before and after the fourth convolution, and the  output of the neural network is closed with an additional convolution layer. Figure below provides a schematic overview of the discriminator's implementation.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/discriminator.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T170255Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=9072fbd3a8f09c892ab4a102ccb60f82761d4912fe51f9144b5426c8f99d255230432e319a92cbd5c9176ce0a83b62240c0bc8f6f3f13e579ca1ca8c0f33ab77c68390ef0fb076e2713aeb3ae0c61b36c862f60b6cd305fe81e18968bdf3a225fbf04f58a5a3c87976847f1dd6da391923c20605f87725fbfe4081b873cd6d833ac3ac403a95583134f727cb93a75b7b661d91de12c8b1820dd058e6af08ee85cb1d3256ac4987f026a1f29fb1515e56b5eb68d4f0a99b2a74640f06eaac8d89fc851bacb362dcfd042c3ccee13f4e723db5243eadf0ffcafc88c9ad1b4045947ca6051b6faa0a170436a7b828729a1935ee1bbc1a518f1d4b1974ce87de12cf" alt="Figure 2 - Descrizione dell'immagine" width="600" height="600" align="center"/>

In [None]:
# Architecture for cycleGAN discriminator
def discriminator():
    
    inputs = tf.keras.layers.Input(shape = [HEIGHT, WIDTH, CHANNEL])
    initializer = tf.keras.initializers.RandomNormal(mean = 0.0, stddev = 0.02)
    
    n_layers = 3          #number of layers in CNN
    filters = 64          #starting filters
    out_filters = 1       #ending filter
    strides = 2           #convolution strides
    rate = 0.2            #dropout rate
    size = 4              #convolution size
    
    inlayer = inputs
    for i in range(n_layers):
        inlayer = downsample(filters, size, strides)(inlayer)
        filters = filters * 2
        
    zeropad_1 = tf.keras.layers.ZeroPadding2D()(inlayer)
    
    conv = downsample(filters, size, strides)(zeropad_1)
    
    zeropad_2 = tf.keras.layers.ZeroPadding2D()(conv)
    
    outputs = tf.keras.layers.Conv2D(out_filters, size, strides, kernel_initializer = initializer, 
                                     padding = 'same', use_bias = False)(zeropad_2)
    
    disc_model = tf.keras.Model(inputs = inputs, outputs = outputs)
    
    return disc_model  

In [None]:
generateA = generator()
discriminateA = discriminator()
generateB = generator()
discriminateB = discriminator()

### 4.3 - Cycle consistancy

Cycle consistency aims to ensure that the transformation of an image from one domain to another and its subsequent retransformation back to the original domain produce an image that is similar to the original input image. In other words, if an image $A$ is taken, transform it into image $B$, and then transform it back to the original domain of $A$, is expected to obtain an image that resembles the original image $A$.

Cycle consistency is achieved through the use of two generators and two discriminators in the CycleGAN model. One generator transforms images from one domain to another (e.g., from $A$ to $B$), while the other generator performs the inverse transformation (from $B$ to $A$). The discriminators evaluate the quality of the generated images and attempt to distinguish between real and generated images.

In essence, cycle consistency in the CycleGAN model ensures that the transformation process between domains is reversible and that the information from the original image is preserved during both the transformation and the retransformation.

In [None]:
# cycleGAN architecture
def cyclegan(input_A, input_B):
    
    # fake images generation
    BfromA = generateB(input_A, training = True)
    AfromB = generateA(input_B, training = True)
        
    # images recostruction
    regenAfromB = generateA(BfromA, training = True)
    regenBfromA = generateB(AfromB, training = True)

    # auto-generating
    gen_orig_A = generateA(input_A, training = True)
    gen_orig_B = generateB(input_B, training = True)
    
    # auto-validating
    valid_A = discriminateA(input_A, training = True)
    valid_B = discriminateB(input_B, training = True)
    
    # fake images validating
    valid_AfromB = discriminateA(AfromB, training = True)
    valid_BfromA = discriminateB(BfromA, training = True)
    
    return regenAfromB, regenBfromA, gen_orig_A, gen_orig_B, valid_A, valid_B, valid_AfromB, valid_BfromA

### 4.4 - Loss Functions - Optimizers

To encourage cycle consistency, a loss function is employed that compares the original image with the retransformed image. This loss function measures the difference between the original image and the retransformed image. Adding this cyclic loss to the model helps stabilize training and improve the quality of the generated images.

In this work, a binary cross entropy loss function was used, and Adam optimization was employed with a $\text{learning rate}=2*10^{-4}$ and $ \beta_1 = 0.5$.

In [None]:
# Loss Functions - Optimizers
def generator_loss(generated):
    return tf.keras.losses.BinaryCrossentropy(from_logits = True, reduction=tf.keras.losses.Reduction.NONE)(tf.ones_like(generated), generated)

def discriminator_loss(real, generated):
    real_loss = tf.keras.losses.BinaryCrossentropy(from_logits = True, reduction=tf.keras.losses.Reduction.NONE)(tf.ones_like(real), real)
    generated_loss = tf.keras.losses.BinaryCrossentropy(from_logits = True, reduction=tf.keras.losses.Reduction.NONE)(tf.zeros_like(generated), generated)
    total_disc_loss = real_loss + generated_loss
    
    return total_disc_loss

def cycle_loss(real, generated, LAMBDA):
    c_loss = tf.reduce_mean(tf.abs(real - generated))
    
    return LAMBDA * c_loss

def identity_loss(real, same, LAMBDA):
    i_loss = tf.reduce_mean(tf.abs(real - same))

    return LAMBDA * i_loss

#optimizers
genA_optimizer = tf.keras.optimizers.legacy.Adam(2e-4, beta_1 = 0.5)
discA_optimizer = tf.keras.optimizers.legacy.Adam(2e-4, beta_1 = 0.5)
genB_optimizer = tf.keras.optimizers.legacy.Adam(2e-4, beta_1 = 0.5)
discB_optimizer = tf.keras.optimizers.legacy.Adam(2e-4, beta_1 = 0.5)

### 4.5 - Training

The training was conducted for $50$ epochs with a $\text{batch size} = 1$.

In [None]:
# Training session
generateA = generator()
discriminateA = discriminator()
generateB = generator()
discriminateB = discriminator()

inputA = tf.keras.layers.Input(shape = [HEIGHT, WIDTH, CHANNEL])
inputB = tf.keras.layers.Input(shape = [HEIGHT, WIDTH, CHANNEL])

@tf.function
def train_step(inputA, inputB):
    
    with tf.GradientTape(persistent = True) as tape:
        
        regenA, regenB, gen_origA, gen_origB, disc_A, disc_B, disc_AfB, disc_BfA = cyclegan(inputA, inputB)
        
        
        A_gen_loss = generator_loss(disc_AfB)
        B_gen_loss = generator_loss(disc_BfA)
        
        total_cycle_loss = cycle_loss(inputA, regenA, LAMBDA) + cycle_loss(inputB, regenB, LAMBDA)
        
        A_identity_loss = identity_loss(inputA, gen_origA, LAMBDA)
        B_identity_loss = identity_loss(inputB, gen_origB, LAMBDA)
    
        total_A_gen_loss = A_gen_loss + total_cycle_loss + A_identity_loss
        total_B_gen_loss = B_gen_loss + total_cycle_loss + B_identity_loss
        
        A_disc_loss = discriminator_loss(disc_A, disc_AfB)
        B_disc_loss = discriminator_loss(disc_B, disc_BfA)

        
    # Gradients and optimizers
    A_generator_gradients = tape.gradient(total_A_gen_loss, generateA.trainable_variables)
    genA_optimizer.apply_gradients(zip(A_generator_gradients, generateA.trainable_variables))

    B_generator_gradients = tape.gradient(total_B_gen_loss, generateB.trainable_variables)
    genB_optimizer.apply_gradients(zip(B_generator_gradients, generateB.trainable_variables))
    
    A_discriminator_gradients = tape.gradient( A_disc_loss, discriminateA.trainable_variables)
    discA_optimizer.apply_gradients(zip(A_discriminator_gradients, discriminateA.trainable_variables))

    B_discriminator_gradients = tape.gradient(B_disc_loss, discriminateB.trainable_variables)
    discB_optimizer.apply_gradients(zip(B_discriminator_gradients, discriminateB.trainable_variables))

# Training

def train(train_ds, epochs):
    for epoch in range(epochs):
        start = time.time()
        print("Starting epoch", epoch + 1)
        
        for image_x, image_y in train_ds:
            train_step(image_x.numpy(), image_y.numpy())
            
        if (epoch + 1) % 25 == 0:
            ckpt_save_path = ckpt_manager.save()
            print ('Saving checkpoint for epoch {} at {}'.format(epoch + 1, ckpt_save_path))
            
        print('Time for epoch {} is {} sec'.format(epoch + 1, time.time() - start))
        save_step(input_path_A, sample_img, epoch, 'P', generateB, step_path)

In [None]:
# Checkpoints setup
ckpt = tf.train.Checkpoint(generateA = generateA,
                           generateB = generateB,
                           discriminateA = discriminateA,
                           discriminateB = discriminateB,
                           genA_optimizer = genA_optimizer,
                           genB_optimizer = genB_optimizer,
                           discA_optimizer = discA_optimizer,
                           discB_optimizer = discB_optimizer)

ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep = 5)

# if a checkpoint exists, restore the latest checkpoint.
if ckpt_manager.latest_checkpoint:
    ckpt.restore(ckpt_manager.latest_checkpoint)
    print ('Latest checkpoint restored!!')

In [None]:
# Execution
train_ds_A = load_dataset(input_path_A, batch_size, buffer_size)
train_ds_B = load_dataset(input_path_B, batch_size, buffer_size)

train_dataset = tf.data.Dataset.zip((train_ds_A, train_ds_B))

train(train_dataset, EPOCHS)

In [None]:
# Generating new images
dataset_A = load_ds(input_path_A)
dataset_B = load_ds(input_path_B)

save_generated(dataset_A, 'P', generateB, output_path_B)
save_generated(dataset_B, 'N', generateA, output_path_A)

In [None]:
shutil.make_archive("/kaggle/working/results", 'zip', "/kaggle/working")
shutil.make_archive("/kaggle/working/ckpts", 'zip', "/kaggle/ckpts")

### 5 - Classification with InceptionV3 model

The images obtained through the CycleGAN model, along with the original ADNI dataset, were used to test a pre-trained image classifier. The InceptionV3 model is a deep $48$-layer convolutional neural network used for image classification. It has more than $25$ million parameters and was developed by Google Researchers C. Szegedy et al [7] as an advanced version of its predecessor, InceptionV1.

The name "Inception" comes from the fact that the model uses "inception" modules, which are blocks of parallel convolutions with different filter sizes. These modules allow capturing information at different scales and hierarchies of features during the feature extraction stage.

InceptionV3 has been trained on a large dataset of images called ImageNet, which contains millions of images divided into different categories. During training, the model learns to recognize a wide range of objects and concepts present in the images.

A distinctive feature of InceptionV3 is the use of 1x1 convolutions used for dimensionality reduction, followed by larger convolutions to capture features at different scales. These help reduce the dimension of input channels and improve the computational efficiency of the model. In particular, in its initial version, each module that composes the entire model (Naive form) consisted of convolutional layers of different sizes, namely $1\times1$, $3\times3$, and $5\times5$ convolutional layers. One of the bottlenecks of this initial version was represented by the computationally expensive $5\times5$ convolutional layers. To overcome this issue, in version $3$, both the $5\times5$ and $3\times3$ convolutional layers were replaced with asymmetric convolutions of the type $1\times n$ and $n \times1$. A two-layer solution like the one just described costs about $33\%$ less for the same number of output filters when the number of input and output filters is the same. The following figure illustrates the evolution of the Naive form module from its initial version to the current optimized version, transitioning to a classical asymmetric convolutional structure.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/naive_form.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T170727Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=8f51d182c09e8c450b914a4d24569e57635586463f13d69a891a8abfd26e4deac4b9b77c897d26aeea61b34fdefafa730bc6fc1536f1de685ae73d83c900bacdb43939820bd4a37412b22e268e949281171fd9e72e218b08de4637802b3675b0e1182fa9350318e610da74756f7876efdf542e9788079be49ad42b635b741188ed415f3295308356c119d545058c97a78c3d62905c17d548dec3a8c7a8efe5132b04dd87de4b9878b71523295f1e0b2445385f727b12ada263350b9e52afb9b30f28fae93dd25cfc96770af9d2da782f9f004673dc7fbffd1ca147dc22fc4afa65baf909e89dd3028d19db0b0e8dc98491d751b81c0a2c3cf355053850432cbe" alt="Figure 3 - Descrizione dell'immagine" width="1200" height="1200" align="center"/>

The full model itself consists of symmetric and asymmetric basic components, including convolutions, average pooling, max pooling, concatenations, dropouts, and fully connected layers. Batch normalization is widely used throughout the model and applied to the activation inputs. The loss is calculated using Softmax.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/inceptionv3onc--oview.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T170751Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=6af6c201bb32c9ab30df62c50962855b2852a49eaad279edcac90e9808dfe1db7eded27e2c40ede1dc5105ffecc584cf347e830507819b59329b88632639eebc8757ad57073fe443a56c86d01b39b5033523016b54862242c00b76a8c0392323f4fb83e23a173a6d0e71c791004e6f05e4ba182875f819cf1c91567d2d5595cf4b41f5414073c77ba8a7546218c23ecd3cb77d0ff7c29de528f5629702f14da046b87c281911d95725c886689539a69a8f9cf96ac18fe138dcf182d8c2c736b273735db4609392c0e846188e3baa1ad7be0262de22d657a52ed01af23302109897a6379acc84b5698f18200fbb8d12dca6334fe435ac993ac092d148786ec155" alt="Figure 3 - Descrizione dell'immagine" width="1000" height="1000" align="center"/>

The InceptionV3 model has been applied by appropriately modifying the final part, removing the fully connected layers with Softmax activation, and replacing them with fully connected layers of size $92$, $1028$, and $2$ with ReLU activation.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/inceptiov3_top.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T170815Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=306d7b5ff576d55ad2ced8c4f59876d727557dfa5821e3fd1d579bff8284025be2f4996a7ae7a001e860ebb43292d0fb37e11348922c3ba5513af8307e973402f848e70e8a136367a949c485aff1b802384d2d3dc09becfa02a6e2f86e8d2b108880466234fee1af176d833af5ba72418c020343d5608230b2add74427769568b870647b057bdf2519acfa60a4fbbcf2dd6114c5a83eecfb5a99f4c7223d3957aad25bf48333166ac6cce4d8a1c2eb786288b1fba3e7f09592e44a7431954ce45ec68909e4fc7297f16df0ab5d8b0e59f1068eb3d6e2c6b07e5f69b6c6a3b0178c57d97658f2475ddd4900b64b74dfe4f64fa202f35c68870314e4a2f6a54b31" alt="Figure 3 - Descrizione dell'immagine" width="600" height="600" align="center"/>

In [None]:
import tensorflow as tf
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn as skl
import os
from tensorflow.keras.applications.inception_v3 import InceptionV3
from sklearn.model_selection import KFold
#from sklearn.metrics import accuracy_score, mean_squared_error, mean_absolute_error, confusion_matrix, roc_curve, auc
#from tensorflow.python.ops.numpy_ops import np_config
#np_config.enable_numpy_behavior()

In [None]:
ds_folder = '/kaggle/input/adcn-refk650/'
out_folder = '/kaggle/working/'
HEIGHT = 256
WIDTH = 256 
CHANNEL = 3
BATCH_SIZE = 32
NUM_FOLDS = 10
NUM_EPOCHS = 50
NUM_CLASSES = 2

In [None]:
# get filenames in data directory e subdirectory
def filenames(data_directory):
    
    file_names = []
    for root, dirs, files in os.walk(data_directory):
        for file in files:
            if file.endswith(".png"):
                file_path = os.path.join(root, file)
                file_names.append(file_path)
                
    return file_names
    
# images and labels loader
def load_image(image_path, label):

    image = tf.io.read_file(image_path)
    image = tf.image.decode_png(image, channels = CHANNEL)
    image = tf.cast(image, tf.float32) / 255.0
    label = tf.one_hot(label, NUM_CLASSES)

    return image, label

# dataset costructor
def get_dataset(all_image_paths, split_dataset):
    
    #all_image_paths = filenames(images_dir)
    buffer_size = len(all_image_paths)
    
    # labels from string to integer
    all_image_labels = []
    for img in all_image_paths:
        if os.path.basename(img).split('_')[0] == 'P':
            all_image_labels.append(0)
        else:
            all_image_labels.append(1)
    
    dataset = tf.data.Dataset.from_tensor_slices((all_image_paths, all_image_labels))
    dataset = dataset.map(load_image)#.batch(BATCH_SIZE)
    dataset = dataset.shuffle(buffer_size = len(all_image_paths))
    
    # split dataset into train and test ds
    if split_dataset:
        size = 0.8
        train_size = int(size * 10)
        test_size = int((1 - size) * 10)
        
        train_dataset = dataset.take(train_size)
        test_dataset = dataset.skip(train_size).take(test_size)
        
        train_dataset = train_dataset.batch(BATCH_SIZE)
        test_dataset = test_dataset.batch(BATCH_SIZE)
        
        return train_dataset, test_dataset
    else:
        return dataset

# merge tensors of different shape
def merge_tensors(tensors_list):
    merged_tensor = tensors_list[0]
    for tensor in tensors_list[1:]:
        merged_tensor = tf.concat([merged_tensor, tensor], axis=0)
    return merged_tensor

# metrics
def calc_metrics(y_true, y_pred):
    
    acc = skl.metrics.accuracy_score(y_true, y_pred, normalize = True)
    mse = skl.metrics.mean_squared_error(y_true, y_pred)
    mae = skl.metrics.mean_absolute_error(y_true, y_pred)
    
    fpr, tpr, thresholds = skl.metrics.roc_curve(y_true, y_pred)
    auc_val = skl.metrics.auc(fpr, tpr)
    
    #conf_mat = confusion_matrix(y_test, y_pred)
    
    print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')
    #print(f'-----------------ACC={acc}, MSE={mse}, MAE={mae}, -----------------')
    
    #return acc, mse, mae
    ##return acc, mse, mae, conf_mat
    return acc, mse, mae, auc_val, fpr, tpr

def plot_roc_curve(fpr, tpr, mean_auc):
    # List initialization for collect the ROC curve results
    all_fpr = np.linspace(0, 1, 100)
    mean_tpr = 0
    
    # fpr, tpr interpolation for mean ROC curve
    for elm1, elm2 in zip(fpr, tpr):
        mean_tpr += np.interp(all_fpr, elm1, elm2)
    
    # Mean ROC curve
    mean_tpr /= NUM_FOLDS
    
    # Plot mean Roc cuerve
    plt.figure(figsize=(8, 6))
    plt.plot(all_fpr, mean_tpr, color='b', label='Mean ROC (AUC = {:.2f})'.format(mean_auc))
    plt.plot([0, 1], [0, 1], color='gray', linestyle='--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Mean ROC Curve using {}-fold Cross Validation'.format(NUM_FOLDS))
    plt.legend(loc='lower right')
    plt.grid(True)
    plt.savefig(out_folder + 'roc_curve.png')
    plt.show()

def conf_matrix(y_true, y_pred):
    
    conf_mat = skl.metrics.confusion_matrix(y_true, y_pred)
    
    sns.set(font_scale = 1.4)
    class_labels = ['P', 'N']
    cm = sns.heatmap(conf_mat, annot = True, yticklabels = class_labels, xticklabels = class_labels, 
                annot_kws = {"size": 16}, cmap = 'Blues')
    
    cm.set_title('Confusion Matrix')
    cm.set_yticklabels(cm.get_yticklabels(), rotation = 0, horizontalalignment = 'right')

    plt.xlabel('Predicted label')
    plt.ylabel('True label')
    #plt.title('Confusion Matrix')
    
    plt.savefig(out_folder + 'confusion_matrix.png')
    
    plt.show()

In [None]:
# customization top of the InceptionV3
def inception_top():
    
    units_1 = 92
    units_2 = 1028
    ince_top = tf.keras.Sequential()
    ince_top.add(tf.keras.layers.Flatten())
    ince_top.add(tf.keras.layers.Dense(units_1, activation = 'relu'))
    ince_top.add(tf.keras.layers.Dropout(0.2))
    ince_top.add(tf.keras.layers.Dense(units_2, activation = 'relu'))
    ince_top.add(tf.keras.layers.Dropout(0.2))
    ince_top.add(tf.keras.layers.Dense(NUM_CLASSES, activation = 'sigmoid'))
    
    #ince_top.add(tf.keras.layers.Dense(1024, activation = 'relu'))
    #ince_top.add(tf.keras.layers.Dropout(0.2))
    #ince_top.add(tf.keras.layers.Dense(512, activation = 'relu'))
    #ince_top.add(tf.keras.layers.Dropout(0.2))
    #ince_top.add(tf.keras.layers.Dense(128, activation = 'relu', 
    #                                  kernel_regularizer = tf.keras.regularizers.L2(0.01)))
    #ince_top.add(tf.keras.layers.Dropout(0.2))
    #ince_top.add(tf.keras.layers.Dense(2, activation = 'softmax'))
    
    return ince_top

# InceptionV3 model customized
def inceptionV3():
    
    input_tensor = tf.keras.layers.Input(shape = (HEIGHT, WIDTH, CHANNEL))
    
    base_model = tf.keras.applications.inception_v3.InceptionV3(weights = 'imagenet', 
                                                                include_top = False, 
                                                                input_tensor = input_tensor)
    
    x = base_model.output
    #  output = BatchNormalization()(x)
    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    
    x = inception_top()(x) 
    
    model = tf.keras.Model(inputs = base_model.input, outputs = x) # new model to train
    
    # first: train only the top layers (which were randomly initialized)
    # i.e. freeze all convolutional InceptionV3 layers
    for layer in base_model.layers:
        layer.trainable = False
        
    #print(model.summary())
    
    loss_fn = tf.keras.losses.BinaryCrossentropy()
    optimizer = tf.keras.optimizers.Adam()
    
    model.compile(optimizer = optimizer, loss = loss_fn, metrics = ['accuracy'])
    
    return model

### 5.1 - Trainig

The training of the modified InceptionV3 classifier was performed for $50$ epochs and evaluated using the K-Fold cross-validation technique with $\text{fold}=10$. Based on the results obtained from the confusion matrix, evaluation metrics such as accuracy (ACC), mean squared error (MSE), mean absolute error (MAE), and the confusion matrix itself have been computed.

The **confusion matrix** relates predictions to the true elements, dividing them into four parts: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

Given $n$ samples, where $\hat{y}$ denotes the $i-th$ predicted element and $y$ represents the corresponding true element, it's possible define the **accuracy** (ACC) as the metric that represents the fraction of correct predictions. Accuracy is expressed as

$$\text{ACC}(\hat{y}, y)=\frac{1}{n_{samples}}\sum_{i=0}^{n_{samples}-1}1(\hat{y}_i=y_i) = (TP+TN)/(TP+FP+TN+FN)$$

In the case of a multi-label classification problem, if the entire set of predicted labels matches the set of true labels, then the accuracy has a value of $\text{ACC}=1.0$, otherwise $\text{ACC}=0.0$

The **Mean Absolute Error** (MAE or L1 Loss) mathematically represents the distance between the predicted value and the actual value. As a distance metric, it does not have negative values (note the absolute value). It is defined as 

$$\text{MAE}(\hat{y}, y)=\frac{1}{n_{samples}}\sum_{i=0}^{n_{samples}-1}\left | y_i-\hat{y}_i \right |$$


The **Mean Squared Error** (MSE or L2 Loss) is calculated by squaring the difference between the actual value $y$ and the predicted value $\hat{y}$

$$\text{MSE}(\hat{y}, y)=\frac{1}{n_{samples}}\sum_{i=0}^{n_{samples}-1}( y_i-\hat{y}_i)^2$$


In [None]:
def kf_validation(images_path_list):
    
    #all_image_paths = sorted([str(path) for path in Path(dataset_dir).glob("*.png")])
    #all_image_paths = tf.data.Dataset.list_files(dataset_dir + '/*/*.png')
    
    kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)
    
    model = inceptionV3()
    
    acc_list = []
    mse_list = []
    mae_list = []
    auc_list = []
    fpr_list = []
    tpr_list = []
    
    for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):
        
        print('============================================||')
        print(f'--------------------------------- FOLD {fold + 1} ---||')
        print('============================================||')
        
        dataset = get_dataset(images_path_list, split_dataset = False)
        train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
        test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)
        
        model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)
        
        y_true = [label for _, label in test_ds]
        y_true = merge_tensors(y_true)
        #y_true = tf.convert_to_tensor([label for _, label in test_ds])
        y_pred = model.predict(test_ds)#.ravel()
        y_pred = tf.argmax(y_pred, axis = 1)
        #y_true = y_true.reshape((len(y_pred), NUM_CLASSES))
        y_true = tf.argmax(y_true, axis = 1)        
        
        results = calc_metrics(y_true, y_pred)
        acc_list, mse_list, mae_list, auc_list, fpr_list, tpr_list = zip([(results[0], results[1], results[2], results[3], results[2], results[3]) for _ in range(6)])
    
    print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
    print(f'average ACC: {np.mean(acc_list):.3f}')
    print(f'average MSE: {np.mean(mse_list):.3f}')
    print(f'average MAE: {np.mean(mae_list):.3f}')
    print(f'average AUC: {np.mean(auc_list):.3f}')
    print('--------------------------------------------------------------------------------')
    print(skl.metrics.classification_report(y_true, y_pred))
    print('--------------------------------------------------------------------------------')
    conf_matrix(y_true, y_pred)
    print('--------------------------------------------------------------------------------')
    plot_roc_curve(fpr_list, tpr_list, np.mean(auc_list))

    return y_true, y_pred

In [None]:
images_list = filenames(ds_folder)
#images_ds = get_dataset(ds_folder, split_dataset = False)
#images_np = np.array(list(images_ds))
true_label = []
pred_label = []
true_label, pred_label = kf_validation(images_list)

#conf_matrix(true_label, pred_label)

### 6 - Resuts

The generative model was tested in advance to ensure its correct implementation. To this end, the horse/zebra conversion was tested by training the model with a dataset of $2134$ horse and zebra images ($1067$ horse images and $1067$ zebra images). The next figure shows an example of the result obtained.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/h2z_test.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T171253Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=9d2a67e11a1d36731b37fce851859619a00f164e70a5c34c4899195ea319da1643ac67ad5d976b921f37983709e84f083a1e31e0a466d0b6b65c4d3f2995ca6260178961bcb482903978d3c28478ded3f762ee70b05c726b118968f624e492f9551e2160b3b843539fd411ec9b55dfbf36dea307f6bd0fa25fe0c91109cfd674bea961e14274e8d5cefb3d9656c9fbd7b9f0ca37d104e5581fb5e0abeebc805eba7874bec0105d6769796b2c514469a5ed4f4b455f17e7dd87d1a92e5622ea8de1923d64ffdf6f342088299601b577e9abc2744905397f210edecf9205fae9cb5b98cde41c25aa6c679f49ea154452bc03cb9aea45f14dd440fdc9063987a811" alt="Figure 3 - Descrizione dell'immagine" width="400" height="400" align="center"/>

Although the model appears to be correctly implemented, the transformation of the images is not too accurate. It can be seen that the model is not always able to respect the edges of the figures and has a significantly unbalanced dosage of the learned pattern.

The implemented model was subsequently trained with a dataset consisting of $2600$ MRI images, including both real images and synthetic ones obtained earlier through the cycleGAN model (specifically, $650$ images from each category). Figure below shows the results obtained from the transformations once the model has been trained. Specifically, at the top, you can see the translation of an MRI belonging to a healthy subject into an MRI belonging to a subject with an advanced stage of the disease. At the bottom, on the other hand, an example of the translation of an MRI belonging to a diseased subject into an MRI of a hypothetically healthy subject is shown. 

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/real_fake_results.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T171319Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=4a84778d212c5dcd4e78bf119ad94c9f5ad85a508c01ab8adf46fc77b274b7ae475b4745b537657ab4da92082cce9a679b79f54d34cd542b91bc14b8c78283d861a3d666ba15e7d0bb09a1d38203281bc93fd2d3d74563b3bafb4916f17ab1c2bf7826ddbc97989d2845e845cf5a96680bc84487351fe67a43fdef5722d4c45a30dd96c1817d218f9aaef37c6b8d94c0b423237089ff271ba1246cd3b0ce200f77ad8f053a625f664fa812889c029cb492321b81fd46c066e339a23f9738da439ea9d6ae040d23ced06cc6c7bf9dad89874c11cbeaaf1f2c4bdd3e0395c48b7b2e2bce69beb84acee1f27e5340d1eb051c334c148b985eaaba6cedabe4c51cab" alt="Figure 3 - Descrizione dell'immagine" width="400" height="400" align="center"/>

It is noticeable that the transformation affects very localized areas of the images and primarily involves a change in pixel intensity, without visibly distorting the image. An increase in background noise can also be observed. The following figure shows the various steps of a transformation during the model training. The generated image at each epoch is characterized by the aforementioned changes, and no significant modifications to the original image are evident in the intermediate steps.

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/stepbystep.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T171746Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=3406d78e24f2870c610b99a8936251a3f6d27e3cd619973005dd158a063c32bed5b2920bd06a39877c503ef421cf71f0a958817a87968137a27c46c03ea9e5ed25acc12643673790fb690cab6e94a7049400ee5127d26334659e92729b0bdfb26f1e4c0df885b4d93af50e6f9e80da723596876ac753bc532f42893bc0f72d7af0b63f214d45e57ea2b899782e107efaa8f6b8f3a88c662a5210bd8ef948a2641a35bccb00a417335e942ac980df311645e343d57b34d97ebf7e85e1bfef625cb87ac31e1e9e4e575cc0ebcbd511cdbc28aa3b08627416aeed9a4e4c8b30ce024d19bf61f6b0d35d579c057d35b77ad1e4bb12cd607c41537553160a879c30c1" alt="Figure 3 - Descrizione dell'immagine" width="400" height="400" align="center"/>

As for image classification, the entire ADNI1 dataset was expanded by adding the images generated by the CycleGAN model. The obtained result shows that the implemented classifier is weak in categorizing the images. In particular, the obtained confusion matrix (Figure below) highlights how the modified InceptionV3 model has problems classifying images depicting MRIs of healthy subjects. This is reflected in a significantly low accuracy value (ACC), and the high values of MSE and MAE indicate that the results obtained do not exhibit strong distinctive characteristics and reflect a certain difficulty for the model to find a proper pattern on which to base the classification.

| |Mean|Metrics| |
|---|---|---|---|
|ACC|MAE|MSE|AUC|
|0.50|0.50|0.50|0.50|

<img src="https://storage.googleapis.com/kagglesdsdata/datasets/3531030/6155730/confusion_matrix_ADNI_20230708.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240119%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240119T171828Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=4b557ab4bc7bb7cadedc45e2dc5fcb32397ef8b4a2dcd44a26b6133697e1c1c9c996e5a878a1f25345d90c827fd5a78f2af7b16bfd265b0eee752cd7eef59cbcdaa66718556a32c217c2501792a2c6617f8c0c2b6bab808f5da7164deab36e7b8b4102c2a0a58ad534e27f9bdc4545d52b08832fdc50d1a9dc14a709a28040aff696825370120d4313b14efb511e9faeb5f553868df7bbe5a449071f6022148f277406fd54091e25608edbba502dd49ce4288d6d7fa4291645b6900e5e7451853e0d05a87b6d0aabf1f3be60528776944f784c14bdb89191a36a3cdf0ee6e4f4b703f21903508d6e3a43548c223ff61d358e260b5aba06b766203813b474b113" alt="Figure 3 - Descrizione dell'immagine" width="400" height="400" align="center"/>


### 7 - Conclusions

The generative CycleGAN model was successfully implemented; however, some significant issues were encountered. The generative model proved to be too weak in recognizing and applying patterns to generate new images, particularly when using MRI brain scan images that depict the complex structure of the brain, which is considerably different from representations found in typical non-medical images. As a result, the generated images did not differ significantly from the original images, posing challenges for classification by the customized InceptionV3 classifier.
There can be several solutions to improve the performance of the generative model. One of them could be the use of a larger number of images for training. By having a greater amount of data, the model would have more resources to generate more realistic images. Another possible solution is to increase the complexity of the model, such as implementing even deeper neural networks. By increasing the number of layers or the number of neurons per layer, the model would have a better ability to learn the relationships among different image features. This solution, however, would lead to a higher workload on the involved hardware and therefore require longer timeframes, especially during the training stage. Further performance improvement for the generative model could be achieved by adding a properly trained detection model that can recognize and localize the typical brain structures associated with the disease. This additional model could guide the CycleGAN model to focus its attention only on determining areas of the images that are of real interest.

References
[1] J. Islam, Y. Zhang - "GAN‐based synthetic brain PET image generation." - Brain Informatics 7, 3 (2020) - DOI: https://doi.org/10.1186/s40708-020-00104-2.

[2] G. Bargshady, X. Zhou, P. Datta Barua, R. Gururajan, Y. Li, U. Rajendra Acharya - "Application of CycleGAN and transfer learning techniques for automated detection of COVID-19 using X-ray images" - Pattern Recognition Letters, Vol. 153, 2022, Pages 67-74 - ISSN 0167-8655 - DOI: https://doi.org/10.1016/j.patrec.2021.11.020.

[3] The Alzheimer's Disease Neuroimaging Initiative (ADNI) - Web: https://adni.loni.usc.edu/.

[4] Neuroimaging Informatics Technology Initiative (NIfTI) - Web: https://nifti.nimh.nih.gov/.

[5] J. Y. Zhu, T. Park, P. Isola, A. A. Efros - "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" - arXiv:1703.10593, 2020 - DOI: https://doi.org/10.48550/arXiv.1703.10593.

[6] O. Ronneberger, P. Fischer, T. Brox - "U-Net: Convolutional Networks for Biomedical Image Segmentation" - arXiv:1505.04597, 2015 - DOI: https://doi.org/10.48550/arXiv.1505.04597.

[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich - "Going Deeper with Convolutions" - arXiv:1409.4842, 2014 - DOI: https://doi.org/10.48550/arXiv.1409.4842.