In [None]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from torchvision import transforms, datasets
from torch.utils.data import Dataset, DataLoader
from PIL import Image

The provided code snippet imports several essential libraries and modules that are commonly used in data science, machine learning, and deep learning projects.

1. **os**: This module provides a way of using operating system-dependent functionality like reading or writing to the file system. It is useful for tasks such as navigating the file system, handling file paths, and manipulating directories.

2. **pandas as pd**: Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrames, which are particularly useful for handling and analyzing structured data. The alias 

pd

 is a common convention to make the code more concise.

3. **train_test_split from sklearn.model_selection**: This function is part of the scikit-learn library, which is widely used for machine learning tasks. The 

train_test_split

 function is used to split a dataset into training and testing sets, which is a crucial step in building and evaluating machine learning models.

4. **transforms and datasets from torchvision**: Torchvision is a library that provides tools for computer vision tasks. The 

transforms

 module includes common image transformations that are often used in preprocessing steps, such as resizing, cropping, and normalizing images. The 

datasets

 module provides access to popular datasets and utilities to load them.

5. **Dataset and DataLoader from torch.utils.data**: These classes are part of PyTorch, a deep learning framework. The 

Dataset

 class is an abstract class representing a dataset, and the 

DataLoader

 class provides an iterable over a dataset, with support for batching, shuffling, and parallel data loading. These are essential for efficiently handling large datasets during training and evaluation of deep learning models.

6. **Image from PIL**: The Python Imaging Library (PIL) is a library that adds image processing capabilities to Python. The 

Image

 module is used for opening, manipulating, and saving many different image file formats. It is often used in conjunction with torchvision for image preprocessing tasks.

Together, these imports set up a robust environment for handling data, preprocessing images, and building machine learning and deep learning models.

In [None]:
class HAM10000Dataset(Dataset):
    def __init__(self, csv_file, img_dirs, transform=None):
        self.data = pd.read_csv(csv_file)
        self.img_dirs = img_dirs  # List of directories
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        # Look up the image name
        img_name = self.data.iloc[idx]['image_id'] + '.jpg'
        
        # Search for the image in the directories
        for img_dir in self.img_dirs:
            img_path = os.path.join(img_dir, img_name)
            if os.path.exists(img_path):
                image = Image.open(img_path).convert('RGB')
                break
        else:
            raise FileNotFoundError(f"Image {img_name} not found in specified directories.")
        
        # Get the label
        label = self.data.iloc[idx]['dx']  # Diagnosis column
        label_map = {label: idx for idx, label in enumerate(self.data['dx'].unique())}
        label = label_map[label]

        if self.transform:
            image = self.transform(image)

        return image, label


The provided code defines a custom dataset class named 

HAM10000Dataset

 that inherits from PyTorch's 

Dataset

 class. This custom dataset is designed to handle the HAM10000 dataset, which is a collection of dermatoscopic images used for skin lesion analysis.

1. **Initialization (

__init__

 method)**: The constructor takes three parameters: 

csv_file

, 

img_dirs

, and an optional 

transform

. The 

csv_file

 is expected to be a CSV file containing metadata about the images, such as their filenames and labels. The 

img_dirs

 is a list of directories where the images are stored. The 

transform

 parameter allows for optional image transformations (e.g., resizing, normalization) to be applied to the images. The constructor reads the CSV file into a pandas DataFrame and stores the image directories and transform.

2. **Length (

__len__

 method)**: This method returns the number of samples in the dataset by returning the length of the DataFrame. This is a required method for PyTorch datasets, enabling functions like 

len(dataset)

 to work correctly.

3. **Get Item (

__getitem__

 method)**: This method retrieves a single sample from the dataset. It takes an index 

idx

 as input and performs the following steps:
   - Looks up the image name in the DataFrame using the provided index and appends the `.jpg` extension.
   - Searches for the image file in the specified directories. If the image is found, it is opened and converted to RGB format. If the image is not found in any directory, a 

FileNotFoundError

 is raised.
   - Retrieves the label for the image from the DataFrame. The label is mapped to an integer using a dictionary that maps unique labels to indices.
   - If a transform is provided, it is applied to the image.
   - Returns a tuple containing the image and its corresponding label.

This custom dataset class is essential for loading and preprocessing the HAM10000 dataset, making it ready for training and evaluating machine learning models. It handles the complexities of locating images across multiple directories, reading image files, and applying necessary transformations.

In [None]:
metadata_path = "../input/skin-cancer-mnist-ham10000/HAM10000_metadata.csv"
metadata = pd.read_csv(metadata_path)

# Check the number of unique images in metadata
print(f"Total images in metadata: {len(metadata)}")

The provided code snippet is responsible for loading and inspecting metadata related to the HAM10000 dataset, which is used for skin lesion analysis.

1. **Setting the Metadata Path**: The variable 

metadata_path

 is assigned the file path to the CSV file containing the metadata for the HAM10000 dataset. This path points to a file named `HAM10000_metadata.csv` located in the directory `../input/skin-cancer-mnist-ham10000/`.

2. **Loading the Metadata**: The 

pd.read_csv(metadata_path)

 function call reads the CSV file into a pandas DataFrame named 

metadata

. This DataFrame will contain various details about the images, such as their filenames, labels, and possibly other relevant information.

3. **Checking the Number of Unique Images**: The 

print

 statement outputs the total number of images listed in the metadata. The 

len(metadata)

 function call returns the number of rows in the DataFrame, which corresponds to the number of unique images described in the metadata file.

This code is crucial for verifying that the metadata has been loaded correctly and for understanding the scope of the dataset by checking the total number of images available for analysis.

In [None]:
# Split metadata into train and test sets
train_metadata, test_metadata = train_test_split(metadata, test_size=0.99, random_state=42)

# Save split metadata for easier loading
train_metadata.to_csv("train_metadata.csv", index=False)
test_metadata.to_csv("test_metadata.csv", index=False)

The provided code snippet is responsible for splitting the metadata of the HAM10000 dataset into training and testing sets and then saving these splits to CSV files for easier future access.

1. **Splitting the Metadata**: The 

train_test_split

 function from scikit-learn is used to split the 

metadata

 DataFrame into two separate DataFrames: 

train_metadata

 and 

test_metadata

. The 

test_size=0.99

 parameter specifies that 99% of the data should be allocated to the test set, leaving only 1% for the training set. The 

random_state=42

 parameter ensures that the split is reproducible by setting a seed for the random number generator.

2. **Saving the Split Metadata**: The 

to_csv

 method is called on both 

train_metadata

 and 

test_metadata

 DataFrames to save them as CSV files named 

train_metadata.csv

 and 

test_metadata.csv

, respectively. The 

index=False

 parameter ensures that the row indices are not included in the saved CSV files.

This code is essential for preparing the dataset for machine learning tasks. By splitting the metadata into training and testing sets, it allows for proper evaluation of the model's performance. Saving these splits to CSV files makes it convenient to load the pre-split data in future sessions, avoiding the need to perform the split operation repeatedly.

In [None]:
# Directories containing images
image_dirs = [
    "../input/skin-cancer-mnist-ham10000/HAM10000_images_part_1",
    "../input/skin-cancer-mnist-ham10000/HAM10000_images_part_2"
]

The provided code snippet defines a list of directories that contain the images for the HAM10000 dataset, which is used for skin lesion analysis.

1. **Defining Image Directories**: The variable 

main.ipynb

 ) is assigned a list of two directory paths. These directories are specified as strings and point to the locations where the image files are stored. The paths are:
   - `../input/skin-cancer-mnist-ham10000/HAM10000_images_part_1`
   - `../input/skin-cancer-mnist-ham10000/HAM10000_images_part_2`

2. **Purpose of Image Directories**: These directories are likely part of the dataset's structure, where the images have been split into multiple parts for organizational purposes. By listing these directories, the code can later iterate through them to locate and load the images as needed.

This setup is crucial for managing and accessing the image files efficiently. By specifying the directories in a list, the code can easily handle the images regardless of their distribution across multiple folders. This approach simplifies the process of loading and preprocessing the images for further analysis or model training.

In [None]:
transform = transforms.Compose([
    transforms.Resize((64, 64)),  # Resize images to 256x256
    transforms.ToTensor(),         # Convert to PyTorch tensor
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize
])

The provided code snippet defines a sequence of image transformations using the 

transforms.Compose

 function from the `torchvision` library. These transformations are applied to the images in the HAM10000 dataset to prepare them for input into a machine learning model.

1. **Resizing Images**: The 

transforms.Resize((64, 64))

 transformation resizes the images to a fixed size of 64x64 pixels. This step ensures that all images have the same dimensions, which is necessary for batch processing in neural networks. The comment incorrectly mentions resizing to 256x256, but the actual code resizes to 64x64.

2. **Converting to Tensor**: The 

transforms.ToTensor()

 transformation converts the images from PIL format (or numpy arrays) to PyTorch tensors. This conversion is essential because PyTorch models require input data in tensor format. Additionally, this transformation scales the pixel values from the range [0, 255] to [0, 1].

3. **Normalizing**: The 

transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])

 transformation normalizes the pixel values of the images. Normalization adjusts the pixel values to have a mean of 0.5 and a standard deviation of 0.5 for each of the three color channels (red, green, and blue). This step helps in stabilizing and speeding up the training process by ensuring that the input data has a consistent distribution.

By composing these transformations, the code ensures that the images are uniformly resized, converted to a suitable format for PyTorch, and normalized. These preprocessing steps are crucial for preparing the dataset for training and evaluating machine learning models effectively.

In [None]:
# Datasets
train_dataset = HAM10000Dataset(csv_file="train_metadata.csv", img_dirs=image_dirs, transform=transform)
test_dataset = HAM10000Dataset(csv_file="test_metadata.csv", img_dirs=image_dirs, transform=transform)

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=4, shuffle=False, num_workers=2)

# Print dataset sizes
print(f"Number of training samples: {len(train_dataset)}")
print(f"Number of testing samples: {len(test_dataset)}")

The provided code snippet sets up the datasets and data loaders for training and testing a machine learning model using the HAM10000 dataset. It also prints the sizes of the training and testing datasets.

1. **Creating Datasets**: 
   - 

train_dataset

 and 

test_dataset

 are instances of the custom 

HAM10000Dataset

 class. 
   - The 

train_dataset

 is initialized with the metadata file `train_metadata.csv`, the list of image directories 

image_dirs

, and the transformation pipeline 

transform

.
   - Similarly, the 

test_dataset

 is initialized with the metadata file `test_metadata.csv`, the same image directories, and the same transformation pipeline.
   - These datasets will handle loading and preprocessing the images and their corresponding labels.

2. **Creating DataLoaders**:
   - 

train_loader

 and 

test_loader

 are instances of PyTorch's 

DataLoader

 class.
   - 

train_loader

 is created with the 

train_dataset

, a batch size of 4, shuffling enabled (

shuffle=True

), and 2 worker threads (

num_workers=2

) for parallel data loading. Shuffling ensures that the training data is presented in a different order each epoch, which helps in training the model more effectively.
   - 

test_loader

 is created with the 

test_dataset

, the same batch size of 4, shuffling disabled (

shuffle=False

), and 2 worker threads. Shuffling is typically disabled for the test set to ensure consistent evaluation.

3. **Printing Dataset Sizes**:
   - The 

print

 statements output the number of samples in the training and testing datasets by calling 

len()

 on 

train_dataset

 and 

test_dataset

.
   - This provides a quick check to ensure that the datasets have been loaded correctly and to understand the amount of data available for training and testing.

Overall, this code is essential for preparing the data pipeline, ensuring that the images and labels are correctly loaded, preprocessed, and batched for training and evaluation of the machine learning model.

In [None]:
# Visualize one batch of images
images, labels = next(iter(train_loader))
print(f"Image batch shape: {images.shape}")
print(f"Label batch shape: {labels.shape}")

# Display first 4 images
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 4, figsize=(12, 4))
for i in range(4):
    axes[i].imshow(images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)  # Denormalize
    axes[i].set_title(f"Label: {labels[i].item()}")
    axes[i].axis("off")
plt.show()

The provided code snippet is responsible for visualizing a batch of images from the training dataset. This helps in verifying that the images are being loaded and preprocessed correctly.

1. **Loading a Batch of Images**:
   - The line 

images, labels = next(iter(train_loader))

 retrieves the next batch of images and their corresponding labels from the 

train_loader

. The 

iter(train_loader)

 creates an iterator over the 

train_loader

, and 

next()

 fetches the next batch.
   - The 

print

 statements output the shapes of the image and label batches. This provides information about the dimensions of the data, ensuring that the batch size and image dimensions are as expected.

2. **Importing Matplotlib**:
   - The 

import matplotlib.pyplot as plt

 statement imports the 

matplotlib.pyplot

 module, which is used for creating visualizations.

3. **Creating a Figure for Display**:
   - The line 

fig, axes = plt.subplots(1, 4, figsize=(12, 4))

 creates a figure with 4 subplots arranged in a single row. The 

figsize

 parameter sets the size of the figure to 12 inches by 4 inches.

4. **Displaying the First 4 Images**:
   - A `for` loop iterates over the first 4 images in the batch.
   - Inside the loop, 

axes[i].imshow(images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)

 displays each image. The 

permute(1, 2, 0)

 method rearranges the dimensions of the image tensor from (C, H, W) to (H, W, C), which is required for displaying the image using 

imshow

. The 

numpy()

 method converts the tensor to a NumPy array, and the multiplication and addition (`* 0.5 + 0.5`) denormalize the pixel values back to the range [0, 1].
   - 

axes[i].set_title(f"Label: {labels[i].item()}")

 sets the title of each subplot to the corresponding label.
   - 

axes[i].axis("off")

 removes the axis ticks and labels for a cleaner display.

5. **Showing the Figure**:
   - The 

plt.show()

 statement renders the figure and displays the images.

This visualization step is crucial for ensuring that the data loading and preprocessing pipeline is functioning correctly. By displaying a few images from the training set, you can visually inspect the images and their labels to confirm that they are being processed as expected.

In [None]:
import torch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, in_dim):
        super(SelfAttention, self).__init__()
        self.query_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.key_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.value_conv = nn.Conv2d(in_dim, in_dim, 1)
        self.gamma = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        batch, channels, height, width = x.size()
        proj_query = self.query_conv(x).view(batch, -1, width * height).permute(0, 2, 1)
        proj_key = self.key_conv(x).view(batch, -1, width * height)
        attention = torch.bmm(proj_query, proj_key)
        attention = torch.softmax(attention, dim=-1)

        proj_value = self.value_conv(x).view(batch, -1, width * height)
        out = torch.bmm(proj_value, attention.permute(0, 2, 1))
        out = out.view(batch, channels, height, width)
        out = self.gamma * out + x
        return out

class ResidualBlock(nn.Module):
    def __init__(self, in_channels):
        super(ResidualBlock, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(in_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(in_channels)
        )

    def forward(self, x):
        return x + self.block(x)

class Generator(nn.Module):
    def __init__(self, latent_dim, img_channels, img_size=64):
        super(Generator, self).__init__()
        
        self.latent_dim = latent_dim
        self.img_channels = img_channels
        self.img_size = img_size
        self.init_size = img_size // 8  # Downsample by 8 (adjusted for 64x64 output)
        self.fc = nn.Linear(latent_dim, 128 * self.init_size * self.init_size)

        self.upsample = nn.Sequential(
            nn.BatchNorm2d(128),
            nn.ConvTranspose2d(128, 128, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            ResidualBlock(128),
            SelfAttention(128),  # Self-Attention after first upscale
            
            nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),

            nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            SelfAttention(32),  # Self-Attention in the middle layers
        )

        self.final_layer = nn.Sequential(
            nn.Conv2d(32, img_channels, kernel_size=3, stride=1, padding=1),
            nn.Tanh()  # Normalize output to [-1, 1]
        )

    def forward(self, z):
        out = self.fc(z)
        out = out.view(out.size(0), 128, self.init_size, self.init_size)
        out = self.upsample(out)
        img = self.final_layer(out)
        return img

# Instantiate the generator
latent_dim = 100  # Size of latent vector
img_channels = 3  # RGB images
img_size = 64  # Output image size

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator(latent_dim, img_channels, img_size).to(device)

# Test the generator
z = torch.randn(4, latent_dim).to(device)  # Random latent vector (batch size = 4)
generated_images = generator(z)

print(f"Generated image shape: {generated_images.shape}")  # Should be [4, 3, 64, 64]


The provided code snippet defines and tests a deep learning model for generating images using PyTorch. It includes the implementation of a self-attention mechanism, a residual block, and a generator network.

1. **Imports**:
   - The code imports the necessary modules from PyTorch, including 

torch

 and 

torch.nn

.

2. **Self-Attention Class**:
   - The 

SelfAttention

 class inherits from 

nn.Module

 and implements a self-attention mechanism.
   - The 

__init__

 method initializes convolutional layers for query, key, and value projections, and a learnable parameter 

gamma

.
   - The 

forward

 method computes the attention map and applies it to the input feature map, enhancing the model's ability to focus on relevant parts of the image.

3. **Residual Block Class**:
   - The 

ResidualBlock

 class inherits from 

nn.Module

 and implements a residual block.
   - The 

__init__

 method defines a sequence of convolutional, batch normalization, and ReLU activation layers.
   - The 

forward

 method adds the input to the output of the block, facilitating gradient flow and improving training stability.

4. **Generator Class**:
   - The 

Generator

 class inherits from 

nn.Module

 and defines the architecture of the generator network.
   - The 

__init__

 method initializes the network with a fully connected layer, several upsampling layers, residual blocks, and self-attention layers.
   - The 

forward

 method processes the input latent vector through the network to generate an image.

5. **Instantiating and Testing the Generator**:
   - The generator is instantiated with a latent dimension of 100, 3 image channels (for RGB images), and an output image size of 64x64 pixels.
   - The generator is moved to the appropriate device (GPU if available, otherwise CPU).
   - A random latent vector 

z

 is generated and passed through the generator to produce a batch of images.
   - The shape of the generated images is printed to verify the output dimensions.

This code demonstrates the implementation of a generative model with advanced components like self-attention and residual blocks, which are designed to improve the quality and stability of the generated images.

In [None]:
import torch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, in_dim):
        super(SelfAttention, self).__init__()
        self.query_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.key_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.value_conv = nn.Conv2d(in_dim, in_dim, 1)
        self.gamma = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        batch, channels, height, width = x.size()
        proj_query = self.query_conv(x).view(batch, -1, width * height).permute(0, 2, 1)
        proj_key = self.key_conv(x).view(batch, -1, width * height)
        attention = torch.bmm(proj_query, proj_key)
        attention = torch.softmax(attention, dim=-1)

        proj_value = self.value_conv(x).view(batch, -1, width * height)
        out = torch.bmm(proj_value, attention.permute(0, 2, 1))
        out = out.view(batch, channels, height, width)
        out = self.gamma * out + x
        return out

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample=True):
        super(ResidualBlock, self).__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1) if downsample else nn.Identity()
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.pool = nn.AvgPool2d(2) if downsample else nn.Identity()

    def forward(self, x):
        shortcut = self.shortcut(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += shortcut
        out = self.relu(out)
        out = self.pool(out)
        return out

class Discriminator(nn.Module):
    def __init__(self, img_channels, img_size=64):
        super(Discriminator, self).__init__()

        self.model = nn.Sequential(
            ResidualBlock(img_channels, 64, downsample=True),            # 64x64 -> 32x32
            SelfAttention(64),
            ResidualBlock(64, 128, downsample=True),           # 32x32 -> 16x16
            SelfAttention(128),
            ResidualBlock(128, 256, downsample=True),          # 16x16 -> 8x8
            ResidualBlock(256, 512, downsample=True)           # 8x8 -> 4x4
        )

        self.final_layer = nn.Sequential(
            nn.Flatten(),
            nn.Linear(512 * 4 * 4, 1),  # Final score output
            nn.Sigmoid()  # Outputs probability of being real or fake
        )

    def forward(self, img):
        out = self.model(img)
        out = self.final_layer(out)
        return out

# Instantiate the discriminator
img_channels = 3  # RGB images
img_size = 64  # Input image size

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
discriminator = Discriminator(img_channels, img_size).to(device)

# Test the discriminator
batch_size = 4
test_images = torch.randn(batch_size, img_channels, img_size, img_size).to(device)  # Fake images batch
output = discriminator(test_images)

print(f"Discriminator output shape: {output.shape}")  # Should be [4, 1]


The provided code snippet defines and tests a discriminator model for a Generative Adversarial Network (GAN) using PyTorch. The discriminator is designed to distinguish between real and fake images.

1. **Imports**:
   - The code imports the necessary modules from PyTorch, including 

torch

 and 

torch.nn

.

2. **Self-Attention Class**:
   - The 

SelfAttention

 class inherits from 

nn.Module

 and implements a self-attention mechanism.
   - The 

__init__

 method initializes convolutional layers for query, key, and value projections, and a learnable parameter 

gamma

.
   - The 

forward

 method computes the attention map and applies it to the input feature map, enhancing the model's ability to focus on relevant parts of the image.

3. **Residual Block Class**:
   - The 

ResidualBlock

 class inherits from 

nn.Module

 and implements a residual block.
   - The 

__init__

 method defines a sequence of convolutional, batch normalization, and ReLU activation layers, along with a shortcut connection and optional downsampling.
   - The 

forward

 method processes the input through the convolutional layers, adds the shortcut connection, and applies downsampling if specified.

4. **Discriminator Class**:
   - The 

Discriminator

 class inherits from 

nn.Module

 and defines the architecture of the discriminator network.
   - The 

__init__

 method initializes the network with a sequence of residual blocks and self-attention layers, followed by a final layer that flattens the output and applies a linear transformation and sigmoid activation to produce a probability score.
   - The 

forward

 method processes the input image through the network to produce the final output.

5. **Instantiating and Testing the Discriminator**:
   - The discriminator is instantiated with 3 image channels (for RGB images) and an input image size of 64x64 pixels.
   - The discriminator is moved to the appropriate device (GPU if available, otherwise CPU).
   - A batch of random fake images is generated and passed through the discriminator to produce an output.
   - The shape of the discriminator's output is printed to verify the dimensions, which should be `[4, 1]` for a batch size of 4.

This code demonstrates the implementation of a discriminator model with advanced components like self-attention and residual blocks, which are designed to improve the model's ability to distinguish between real and fake images.

In [None]:
import warnings
warnings.filterwarnings("ignore")

The provided code snippet is responsible for managing warning messages in Python.

1. **Importing the Warnings Module**:
   - The line 

import warnings

 imports the 

warnings

 module, which is a built-in Python module used to handle warning messages. Warnings are typically issued to alert the user about potential issues in the code that do not necessarily stop the execution but might lead to unexpected behavior.

2. **Filtering Warnings**:
   - The line 

warnings.filterwarnings("ignore")

 sets a filter to ignore all warning messages. This means that any warnings that would normally be printed to the console will be suppressed and not displayed.
   - This can be useful in scenarios where the user is aware of certain non-critical warnings and wants to avoid cluttering the output with these messages. However, it is important to use this with caution, as ignoring warnings might cause the user to miss important information about potential issues in the code.

By using this code, the user ensures that the output remains clean and free of warning messages, which can be particularly useful in a production environment or when running long scripts where warnings are expected and understood. However, it is generally a good practice to address the root causes of warnings rather than ignoring them.

In [None]:
from torch import optim
from tqdm import tqdm

#def train_gan(generator, discriminator, train_loader, latent_dim, device, epochs=1000, lr=0.0002, beta1=0.5, beta2=0.999):
def train_gan(generator, discriminator, train_loader, latent_dim, device, epochs=1, lr=0.0002, beta1=0.5, beta2=0.999):
    generator.to(device)
    discriminator.to(device)

    criterion = nn.BCEWithLogitsLoss()
    optimizer_G = optim.Adam(generator.parameters(), lr=lr, betas=(beta1, beta2))
    optimizer_D = optim.Adam(discriminator.parameters(), lr=lr, betas=(beta1, beta2))
    
    scaler = torch.cuda.amp.GradScaler()  # For mixed precision training

    for epoch in range(epochs):
        generator.train()
        discriminator.train()
        epoch_loss_G = 0.0
        epoch_loss_D = 0.0

        for real_images, _ in tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}"):
            batch_size = real_images.size(0)
            real_images = real_images.to(device)

            valid = torch.ones((batch_size, 1), requires_grad=False).to(device)
            fake = torch.zeros((batch_size, 1), requires_grad=False).to(device)

            # Train Generator
            optimizer_G.zero_grad()
            z = torch.randn(batch_size, latent_dim).to(device)
            
            with torch.cuda.amp.autocast():  # Mixed precision training
                generated_images = generator(z)
                g_loss = criterion(discriminator(generated_images), valid)

            scaler.scale(g_loss).backward()
            scaler.step(optimizer_G)
            scaler.update()
            epoch_loss_G += g_loss.item()

            # Train Discriminator
            optimizer_D.zero_grad()
            with torch.cuda.amp.autocast():
                real_loss = criterion(discriminator(real_images), valid)
                fake_loss = criterion(discriminator(generated_images.detach()), fake)
                d_loss = (real_loss + fake_loss) / 2

            scaler.scale(d_loss).backward()
            scaler.step(optimizer_D)
            scaler.update()
            epoch_loss_D += d_loss.item()

            # Clear cache to reduce memory fragmentation
            torch.cuda.empty_cache()

        print(f"Epoch [{epoch+1}/{epochs}] | Generator Loss: {epoch_loss_G:.4f} | Discriminator Loss: {epoch_loss_D:.4f}")

    print("Training completed.")


# Call the train_gan function with the train_loader, generator, and discriminator
train_gan(generator, discriminator, train_loader, latent_dim, device)


The provided code snippet defines a function 

train_gan

 that trains a Generative Adversarial Network (GAN) using PyTorch. The function trains both the generator and discriminator models over a specified number of epochs.

1. **Imports**:
   - The code imports the 

optim

 module from PyTorch for optimization algorithms and 

tqdm

 for displaying progress bars during training.

2. **Function Definition**:
   - The 

train_gan

 function takes several parameters: 

generator

, 

discriminator

, 

train_loader

, 

latent_dim

, 

device

, 

epochs

, 

lr

, 

beta1

, and 

beta2

.
   - The default number of epochs is set to 1, but it can be adjusted as needed.

3. **Model Preparation**:
   - The generator and discriminator models are moved to the specified device (GPU or CPU) using 

generator.to(device)

 and 

discriminator.to(device)

.
   - The loss function used is 

nn.BCEWithLogitsLoss()

, which combines a sigmoid layer and binary cross-entropy loss.
   - Two Adam optimizers are created for the generator (

optimizer_G

) and discriminator (

optimizer_D

) with the specified learning rate (

lr

) and beta values (

beta1

, 

beta2

).

4. **Mixed Precision Training**:
   - A gradient scaler (

scaler

) is initialized for mixed precision training, which can improve performance and reduce memory usage on compatible hardware.

5. **Training Loop**:
   - The outer loop iterates over the number of epochs.
   - Within each epoch, the generator and discriminator models are set to training mode using 

generator.train()

 and 

discriminator.train()

.
   - Two variables, 

epoch_loss_G

 and 

epoch_loss_D

, are initialized to accumulate the generator and discriminator losses for the epoch.

6. **Batch Processing**:
   - The inner loop iterates over batches of real images from the 

train_loader

, displaying progress with 

tqdm

.
   - The batch size is determined from the real images, and the images are moved to the specified device.
   - Two tensors, 

valid

 and 

fake

, are created to represent the labels for real and fake images, respectively.

7. **Training the Generator**:
   - The generator's gradients are zeroed using 

optimizer_G.zero_grad()

.
   - A batch of random latent vectors (

z

) is generated and moved to the device.
   - Mixed precision training is used to generate images and compute the generator loss (

g_loss

).
   - The loss is scaled, backpropagated, and the optimizer is stepped using the gradient scaler.

8. **Training the Discriminator**:
   - The discriminator's gradients are zeroed using 

optimizer_D.zero_grad()

.
   - Mixed precision training is used to compute the discriminator loss (

d_loss

) from both real and fake images.
   - The loss is scaled, backpropagated, and the optimizer is stepped using the gradient scaler.

9. **Memory Management**:
   - The CUDA cache is cleared using 

torch.cuda.empty_cache()

 to reduce memory fragmentation.

10. **Logging**:
    - The generator and discriminator losses for the epoch are printed.

11. **Function Call**:
    - The 

train_gan

 function is called with the 

train_loader

, 

generator

, and 

discriminator

 to start the training process.

This function provides a comprehensive framework for training a GAN, including mixed precision training, progress tracking, and memory management.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models
import warnings
warnings.filterwarnings("ignore")

# Define EfficientNetV2 model for HAM10000 with 7 classes
class EfficientNetV2Classifier(nn.Module):
    def __init__(self, num_classes=7):  # 7 classes for HAM10000
        super(EfficientNetV2Classifier, self).__init__()
        self.efficientnet_v2 = models.efficientnet_v2_s(pretrained=True)
        
        in_features = self.efficientnet_v2.classifier[1].in_features
        self.efficientnet_v2.classifier = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(in_features, num_classes)
        )

    def forward(self, x):
        return self.efficientnet_v2(x)

# Initialize the model
model_EfficientNetV2 = EfficientNetV2Classifier(num_classes=7)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_EfficientNetV2.parameters(), lr=0.001)
epochs = 1#20

# Training loop
def train_model(model, train_loader, test_loader, criterion, optimizer, epochs=1):#20):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        total_correct = 0
        
        for data, labels in train_loader:
            data, labels = data.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(data)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
            print("done")
        
        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = total_correct / len(train_loader.dataset)
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}')
        
        # Validation after each epoch
        validate_model(model, test_loader)

# Validation loop
def validate_model(model, test_loader):
    model.eval()
    total_correct = 0
    total_loss = 0.0
    
    with torch.no_grad():
        for data, labels in test_loader:
            data, labels = data.to(device), labels.to(device)
            outputs = model(data)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
    
    avg_loss = total_loss / len(test_loader)
    accuracy = total_correct / len(test_loader.dataset)
    print(f'Validation Loss: {avg_loss:.4f}, Validation Accuracy: {accuracy:.4f}')

model_EfficientNetV2.to(device)

# Train the model
train_model(model_EfficientNetV2, train_loader, test_loader, criterion, optimizer, epochs=epochs)

The provided code snippet defines and trains a deep learning model using the EfficientNetV2 architecture for the HAM10000 dataset, which consists of 7 classes of skin lesions.

1. **Imports and Warnings**:
   - The code imports necessary modules from PyTorch (

torch

, 

torch.nn

, 

torch.optim

) and 

models

 from 

torchvision

.
   - Warnings are suppressed using 

warnings.filterwarnings("ignore")

 to keep the output clean.

2. **EfficientNetV2 Classifier**:
   - The 

EfficientNetV2Classifier

 class inherits from 

nn.Module

 and defines a custom classifier based on the EfficientNetV2 architecture.
   - In the 

__init__

 method, the EfficientNetV2 model is loaded with pretrained weights using 

models.efficientnet_v2_s(pretrained=True)

.
   - The classifier layer of the model is replaced with a new sequential layer consisting of a dropout layer and a linear layer to output predictions for 7 classes.
   - The 

forward

 method defines the forward pass of the model, which simply calls the forward method of the EfficientNetV2 model.

3. **Model Initialization**:
   - An instance of the 

EfficientNetV2Classifier

 is created with 7 output classes and assigned to 

model_EfficientNetV2

.

4. **Loss and Optimizer**:
   - The loss function used is 

nn.CrossEntropyLoss()

, which is suitable for multi-class classification problems.
   - The optimizer used is Adam (

optim.Adam

), with a learning rate of 0.001, to update the model parameters.

5. **Training Loop**:
   - The 

train_model

 function is defined to train the model. It takes the model, training and testing data loaders, loss function, optimizer, and number of epochs as input.
   - The model is set to training mode using 

model.train()

.
   - For each epoch, the running loss and total correct predictions are initialized.
   - The inner loop iterates over batches of data from the training loader. For each batch:
     - Data and labels are moved to the specified device.
     - The optimizer gradients are zeroed.
     - The model outputs are computed, and the loss is calculated.
     - The loss is backpropagated, and the optimizer steps are performed.
     - The running loss and total correct predictions are updated.
   - After each epoch, the average loss and accuracy are printed.
   - The 

validate_model

 function is called to evaluate the model on the test set.

6. **Validation Loop**:
   - The 

validate_model

 function is defined to evaluate the model on the test set.
   - The model is set to evaluation mode using 

model.eval()

.
   - The total correct predictions and total loss are initialized.
   - The loop iterates over batches of data from the test loader. For each batch:
     - Data and labels are moved to the specified device.
     - The model outputs are computed, and the loss is calculated.
     - The total loss and total correct predictions are updated.
   - The average loss and accuracy are printed.

7. **Model Training**:
   - The model is moved to the specified device using 

model_EfficientNetV2.to(device)

.
   - The 

train_model

 function is called with the model, training and testing data loaders, loss function, optimizer, and number of epochs to start the training process.

This code sets up and trains an EfficientNetV2-based classifier for the HAM10000 dataset, providing a comprehensive framework for training and evaluating the model.

In [None]:
# Define ShuffleNetV2 model for HAM10000 with 7 classes
class ShuffleNetV2Classifier(nn.Module):
    def __init__(self, num_classes=7):  # 7 classes for HAM10000
        super(ShuffleNetV2Classifier, self).__init__()
        self.shufflenet_v2 = models.shufflenet_v2_x1_0(pretrained=True)
        
        # Modify the last fully connected layer to match the number of classes
        in_features = self.shufflenet_v2.fc.in_features
        self.shufflenet_v2.fc = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(in_features, num_classes)
        )

    def forward(self, x):
        return self.shufflenet_v2(x)

# Initialize the model
model_ShuffleNetV2 = ShuffleNetV2Classifier(num_classes=7)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_ShuffleNetV2.parameters(), lr=0.001)
epochs = 1#20

# Training loop
def train_model(model, train_loader, test_loader, criterion, optimizer, epochs=1):#20):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        total_correct = 0
        
        for data, labels in train_loader:
            data, labels = data.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(data)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
        
        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = total_correct / len(train_loader.dataset)
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}')
        
        # Validation after each epoch
        validate_model(model, test_loader)

# Validation loop
def validate_model(model, test_loader):
    model.eval()
    total_correct = 0
    total_loss = 0.0
    
    with torch.no_grad():
        for data, labels in test_loader:
            data, labels = data.to(device), labels.to(device)
            outputs = model(data)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
    
    avg_loss = total_loss / len(test_loader)
    accuracy = total_correct / len(test_loader.dataset)
    print(f'Validation Loss: {avg_loss:.4f}, Validation Accuracy: {accuracy:.4f}')

model_ShuffleNetV2.to(device)

# Train the model
train_model(model_ShuffleNetV2, train_loader, test_loader, criterion, optimizer, epochs=epochs)

The provided code snippet defines and trains a deep learning model using the ShuffleNetV2 architecture for the HAM10000 dataset, which consists of 7 classes of skin lesions.

1. **ShuffleNetV2 Classifier**:
   - The 

ShuffleNetV2Classifier

 class inherits from 

nn.Module

 and defines a custom classifier based on the ShuffleNetV2 architecture.
   - In the 

__init__

 method, the ShuffleNetV2 model is loaded with pretrained weights using 

models.shufflenet_v2_x1_0(pretrained=True)

.
   - The classifier layer of the model is modified to match the number of classes (7) by replacing the last fully connected layer with a new sequential layer consisting of a dropout layer and a linear layer.
   - The 

forward

 method defines the forward pass of the model, which simply calls the forward method of the ShuffleNetV2 model.

2. **Model Initialization**:
   - An instance of the 

ShuffleNetV2Classifier

 is created with 7 output classes and assigned to 

model_ShuffleNetV2

.

3. **Loss and Optimizer**:
   - The loss function used is 

nn.CrossEntropyLoss()

, which is suitable for multi-class classification problems.
   - The optimizer used is Adam (

optim.Adam

), with a learning rate of 0.001, to update the model parameters.

4. **Training Loop**:
   - The 

train_model

 function is defined to train the model. It takes the model, training and testing data loaders, loss function, optimizer, and number of epochs as input.
   - The model is set to training mode using 

model.train()

.
   - For each epoch, the running loss and total correct predictions are initialized.
   - The inner loop iterates over batches of data from the training loader. For each batch:
     - Data and labels are moved to the specified device.
     - The optimizer gradients are zeroed.
     - The model outputs are computed, and the loss is calculated.
     - The loss is backpropagated, and the optimizer steps are performed.
     - The running loss and total correct predictions are updated.
   - After each epoch, the average loss and accuracy are printed.
   - The 

validate_model

 function is called to evaluate the model on the test set.

5. **Validation Loop**:
   - The 

validate_model

 function is defined to evaluate the model on the test set.
   - The model is set to evaluation mode using 

model.eval()

.
   - The total correct predictions and total loss are initialized.
   - The loop iterates over batches of data from the test loader. For each batch:
     - Data and labels are moved to the specified device.
     - The model outputs are computed, and the loss is calculated.
     - The total loss and total correct predictions are updated.
   - The average loss and accuracy are printed.

6. **Model Training**:
   - The model is moved to the specified device using 

model_ShuffleNetV2.to(device)

.
   - The 

train_model

 function is called with the model, training and testing data loaders, loss function, optimizer, and number of epochs to start the training process.

This code sets up and trains a ShuffleNetV2-based classifier for the HAM10000 dataset, providing a comprehensive framework for training and evaluating the model.

In [None]:
def create_support_set(generator, model_EfficientNetV2, model_ShuffleNetV2, labels, noise_dim=128):
    noise = torch.randn(batch_size, noise_dim)  # Random noise for generator
    created_imgs = generator(noise, labels) 
    EfficientNetV2Classifier_labels = model_EfficientNetV2(created_imgs)
    ShuffleNetV2Classifier_labels = model_ShuffleNetV2(created_imgs)
    if EfficientNetV2Classifier_labels == labels and ShuffleNetV2Classifier_labels == labels:
        return created_imgs
    else:
        return None

The provided code snippet defines a function 

create_support_set

 that generates a set of images using a generator model and then validates these images using two classifier models, EfficientNetV2 and ShuffleNetV2.

1. **Function Definition**:
   - The function 

create_support_set

 takes five parameters: 

generator

, 

model_EfficientNetV2

, 

model_ShuffleNetV2

, 

labels

, and an optional 

noise_dim

 with a default value of 128.
   - The purpose of this function is to create a support set of images that are validated by both classifier models.

2. **Generating Noise**:
   - The line 

noise = torch.randn(batch_size, noise_dim)

 generates a batch of random noise vectors. The 

noise_dim

 parameter specifies the dimensionality of each noise vector, and 

batch_size

 is assumed to be defined elsewhere in the code.
   - This random noise serves as input to the generator model to produce synthetic images.

3. **Generating Images**:
   - The line 

created_imgs = generator(noise, labels)

 uses the generator model to create images from the random noise and the provided labels. The generator is expected to take both noise and labels as input to produce labeled images.

4. **Classifying Generated Images**:
   - The generated images are then passed through two classifier models: 

model_EfficientNetV2

 and 

model_ShuffleNetV2

.
   - The lines 

EfficientNetV2Classifier_labels = model_EfficientNetV2(created_imgs)

 and 

ShuffleNetV2Classifier_labels = model_ShuffleNetV2(created_imgs)

 obtain the predicted labels for the generated images from both classifiers.

5. **Validation**:
   - The function checks if the predicted labels from both classifiers match the provided labels using the condition 

if EfficientNetV2Classifier_labels == labels and ShuffleNetV2Classifier_labels == labels

.
   - If both classifiers correctly identify the generated images, the function returns the created images (

return created_imgs

).
   - If either classifier fails to correctly identify the images, the function returns `None`.

This function is useful for generating and validating synthetic images, ensuring that the generated images are realistic and correctly labeled according to both classifier models. This can be particularly valuable in scenarios where high-quality labeled data is needed for training or evaluation purposes.

In [None]:
import torch
import torch.nn as nn

class CNNEncoder(nn.Module):
    def __init__(self, in_channels=3, base_features=64):
        super(CNNEncoder, self).__init__()
        
        # Encoder block 1
        self.block1 = nn.Sequential(
            nn.Conv2d(in_channels, base_features, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features, base_features, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 64x64 -> 32x32
        )
        
        # Encoder block 2
        self.block2 = nn.Sequential(
            nn.Conv2d(base_features, base_features * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 2),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 2, base_features * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 32x32 -> 16x16
        )
        
        # Encoder block 3
        self.block3 = nn.Sequential(
            nn.Conv2d(base_features * 2, base_features * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 4),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 4, base_features * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 16x16 -> 8x8
        )
        
        # Encoder block 4
        self.block4 = nn.Sequential(
            nn.Conv2d(base_features * 4, base_features * 8, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 8),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 8, base_features * 8, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 8),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 8x8 -> 4x4
        )

    def forward(self, x):
        # Apply each encoder block to the input
        x = self.block1(x)  # 64x64 -> 32x32
        x = self.block2(x)  # 32x32 -> 16x16
        x = self.block3(x)  # 16x16 -> 8x8
        x = self.block4(x)  # 8x8 -> 4x4
        return x


The provided code snippet defines a convolutional neural network (CNN) encoder using PyTorch. This encoder is designed to process input images through a series of convolutional layers, batch normalization, activation functions, and pooling layers, progressively reducing the spatial dimensions while increasing the feature depth.

1. **Imports and Class Definition**:
   - The code imports the necessary modules from PyTorch, including 

torch

 and 

torch.nn

.
   - The 

CNNEncoder

 class inherits from 

nn.Module

, which is the base class for all neural network modules in PyTorch.

2. **Initialization (

__init__

 method)**:
   - The 

__init__

 method initializes the encoder with two parameters: 

in_channels

 (default is 3 for RGB images) and 

base_features

 (default is 64).
   - Four encoder blocks are defined within the 

__init__

 method, each consisting of convolutional layers, batch normalization, ReLU activation, and max pooling.

3. **Encoder Block 1**:
   - 

self.block1

 is a sequential container that includes:
     - A convolutional layer with 

in_channels

 input channels and 

base_features

 output channels, a kernel size of 3, and padding of 1.
     - Batch normalization for 

base_features

 channels.
     - ReLU activation.
     - Another convolutional layer with 

base_features

 input and output channels, a kernel size of 3, and padding of 1.
     - Batch normalization and ReLU activation.
     - Max pooling with a kernel size and stride of 2, reducing the spatial dimensions from 64x64 to 32x32.

4. **Encoder Block 2**:
   - 

self.block2

 is similar to block 1 but with:
     - Convolutional layers that double the number of features to 

base_features * 2

.
     - Max pooling reduces the spatial dimensions from 32x32 to 16x16.

5. **Encoder Block 3**:
   - 

self.block3

 follows the same structure, further doubling the features to 

base_features * 4

.
     - Max pooling reduces the spatial dimensions from 16x16 to 8x8.

6. **Encoder Block 4**:
   - 

self.block4

 continues the pattern, doubling the features to 

base_features * 8

.
     - Max pooling reduces the spatial dimensions from 8x8 to 4x4.

7. **Forward Method**:
   - The 

forward

 method defines the forward pass of the encoder.
   - The input 

x

 is sequentially passed through each encoder block (

block1

, 

block2

, 

block3

, 

block4

), progressively reducing its spatial dimensions and increasing its feature depth.
   - The final output is returned after passing through all four blocks.

This CNN encoder is designed to extract hierarchical features from input images, making it suitable for tasks such as image classification, segmentation, or as a feature extractor in more complex models. The progressive reduction in spatial dimensions and increase in feature depth allows the network to capture both local and global patterns in the input data.

In [None]:
import torch
import torch.nn as nn

class AttentionModule(nn.Module):
    def __init__(self, feature_dim, num_heads=4):
        super(AttentionModule, self).__init__()
        
        self.num_heads = num_heads
        self.head_dim = feature_dim // num_heads
        
        # Linear transformations for multi-head attention
        self.query_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        self.key_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        self.value_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        
        # Multi-head attention mechanism
        self.attn_heads = nn.ModuleList(
            [nn.Sequential(
                nn.Conv2d(self.head_dim, self.head_dim, kernel_size=1),
                nn.Softmax(dim=-1)  # Softmax across the spatial dimension
            ) for _ in range(num_heads)]
        )
        
        # Channel attention to recalibrate feature maps
        self.channel_attention = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(feature_dim, feature_dim // 16, kernel_size=1),
            nn.ReLU(),
            nn.Conv2d(feature_dim // 16, feature_dim, kernel_size=1),
            nn.Sigmoid()
        )
        
        # Spatial attention to emphasize important regions in the spatial dimension
        self.spatial_attention = nn.Sequential(
            nn.Conv2d(2, 1, kernel_size=7, padding=3),
            nn.Sigmoid()
        )
        
        # Final 1x1 conv to combine outputs
        self.output_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
    
    def forward(self, features):
        # Compute query, key, and value maps for multi-head attention
        queries = self.query_conv(features)  # [B, C, H, W]
        keys = self.key_conv(features)       # [B, C, H, W]
        values = self.value_conv(features)   # [B, C, H, W]
        
        B, C, H, W = queries.size()
        queries = queries.view(B, self.num_heads, self.head_dim, H * W)  # [B, heads, head_dim, H*W]
        keys = keys.view(B, self.num_heads, self.head_dim, H * W)        # [B, heads, head_dim, H*W]
        values = values.view(B, self.num_heads, self.head_dim, H * W)    # [B, heads, head_dim, H*W]
        
        # Multi-head attention
        attention_outputs = []
        for i in range(self.num_heads):
            attn_weights = torch.bmm(queries[:, i], keys[:, i].transpose(1, 2))  # [B, head_dim, head_dim]
            attn_weights = self.attn_heads[i](attn_weights.view(B, self.head_dim, H, W))  # Apply learned attention map
            attn_output = torch.bmm(attn_weights.view(B, self.head_dim, H * W), values[:, i])  # [B, head_dim, H*W]
            attention_outputs.append(attn_output.view(B, self.head_dim, H, W))
        
        # Concatenate all attention head outputs
        multi_head_output = torch.cat(attention_outputs, dim=1)  # [B, C, H, W]
        
        # Channel Attention
        channel_attn_weights = self.channel_attention(multi_head_output)
        channel_attn_output = multi_head_output * channel_attn_weights  # Element-wise multiplication (recalibration)
        
        # Spatial Attention
        avg_pool = torch.mean(channel_attn_output, dim=1, keepdim=True)  # Average pooling across channels
        max_pool = torch.max(channel_attn_output, dim=1, keepdim=True)[0]  # Max pooling across channels
        spatial_attn_weights = self.spatial_attention(torch.cat([avg_pool, max_pool], dim=1))
        spatial_attn_output = channel_attn_output * spatial_attn_weights  # Element-wise multiplication (spatial recalibration)
        
        # Final 1x1 conv to produce the final attention output
        output = self.output_conv(spatial_attn_output)
        return output


The provided code snippet defines an 

AttentionModule

 class in PyTorch, which implements a sophisticated attention mechanism combining multi-head attention, channel attention, and spatial attention. This module is designed to enhance feature representations by focusing on important parts of the input data.

1. **Class Definition and Initialization**:
   - The 

AttentionModule

 class inherits from 

nn.Module

, the base class for all neural network modules in PyTorch.
   - The 

__init__

 method initializes the module with two parameters: 

feature_dim

, which specifies the dimensionality of the input features, and 

num_heads

, which defaults to 4 and specifies the number of attention heads.
   - The 

head_dim

 is calculated by dividing 

feature_dim

 by 

num_heads

, determining the dimensionality of each attention head.

2. **Linear Transformations for Multi-Head Attention**:
   - Three convolutional layers (

query_conv

, 

key_conv

, and 

value_conv

) are defined with a kernel size of 1. These layers transform the input features into query, key, and value maps, respectively, for the multi-head attention mechanism.

3. **Multi-Head Attention Mechanism**:
   - A 

ModuleList

 named 

attn_heads

 is created, containing 

num_heads

 sequential modules. Each module consists of a convolutional layer followed by a softmax activation function, which normalizes the attention weights across the spatial dimension.

4. **Channel Attention**:
   - The 

channel_attention

 sequential module recalibrates the feature maps by focusing on important channels. It includes:
     - An adaptive average pooling layer that reduces the spatial dimensions to 1x1.
     - Two convolutional layers with a ReLU activation in between.
     - A sigmoid activation to produce the channel attention weights.

5. **Spatial Attention**:
   - The 

spatial_attention

 sequential module emphasizes important regions in the spatial dimension. It includes:
     - A convolutional layer with a kernel size of 7 and padding of 3.
     - A sigmoid activation to produce the spatial attention weights.

6. **Final Convolutional Layer**:
   - A final 1x1 convolutional layer (

output_conv

) is defined to combine the outputs of the attention mechanisms and produce the final attention-enhanced feature map.

7. **Forward Method**:
   - The 

forward

 method defines the forward pass of the module.
   - Query, key, and value maps are computed using the respective convolutional layers.
   - The input features are reshaped to facilitate multi-head attention, splitting the feature dimension into multiple heads.
   - For each attention head, the attention weights are computed using batch matrix multiplication (

torch.bmm

), and the attention output is obtained by applying the learned attention map to the value map.
   - The outputs of all attention heads are concatenated along the feature dimension.
   - Channel attention is applied to recalibrate the feature maps, followed by spatial attention to emphasize important spatial regions.
   - The final attention-enhanced feature map is produced using the 1x1 convolutional layer and returned as the output.

This 

AttentionModule

 class provides a comprehensive attention mechanism that can be integrated into larger neural network architectures to improve their ability to focus on relevant features and regions in the input data.

In [None]:
import torch
import torch.nn as nn

class MTUNet2(nn.Module):
    def __init__(self, in_channels=3, base_features=64, num_classes=5, feature_dim=512, num_heads=4):
        super(MTUNet2, self).__init__()
        
        # Complex CNN Encoder shared by both query and support
        self.encoder = CNNEncoder(in_channels, base_features)
        
        # Complex Attention mechanism
        self.attn_module = AttentionModule(feature_dim, num_heads=num_heads)
        
        # Classification Decoder
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(base_features*16*8*8, 1024),  # Updated linear layer input size for complex encoder
            nn.ReLU(),
            nn.Linear(1024, num_classes)
        )
    
    def forward(self, query, support):
        # Step 1: Extract features from the query image using the updated CNNEncoder
        query_features = self.encoder(query)  # Query features are [B, 1024, 8, 8] based on complex CNNEncoder
        
        # Step 2: Extract and aggregate features from the support set
        N = support.size(0)  # Number of support images
        support_features = []
        for i in range(N):
            support_feature = self.encoder(support[i].unsqueeze(0))  # Each support image's features
            support_features.append(support_feature)
        
        # Aggregate support features (using average pooling for simplicity)
        support_features = torch.mean(torch.stack(support_features), dim=0)  # [B, 1024, 8, 8]
        
        # Step 3: Apply complex attention to both query and support features
        query_attn = self.attn_module(query_features)  # Attention on query
        support_attn = self.attn_module(support_features)  # Attention on support
        
        # Step 4: Combine query and support features via one-to-one concatenation
        combined_features = torch.cat((query_attn, support_attn), dim=1)  # Concatenate along the channel dimension
        # Combined features will be [B, 1024 + 1024 = 2048, 8, 8]
        
        # Step 5: Classification Decoder (use the combined query-support features)
        classification_output = self.classifier(combined_features)
        
        return classification_output


The provided code snippet defines a neural network model named 

MTUNet2

 using PyTorch. This model is designed for tasks that involve both query and support images, leveraging a complex CNN encoder, an attention mechanism, and a classification decoder.

1. **Class Definition and Initialization**:
   - The 

MTUNet2

 class inherits from 

nn.Module

, the base class for all neural network modules in PyTorch.
   - The 

__init__

 method initializes the model with several parameters: 

in_channels

 (default is 3 for RGB images), 

base_features

 (default is 64), 

num_classes

 (default is 5), 

feature_dim

 (default is 512), and 

num_heads

 (default is 4).
   - The model consists of three main components:
     - A complex CNN encoder (

self.encoder

) shared by both query and support images, instantiated from the 

CNNEncoder

 class.
     - An attention module (

self.attn_module

) instantiated from the 

AttentionModule

 class, which applies a complex attention mechanism to the features.
     - A classification decoder (

self.classifier

), defined as a sequential module that flattens the input, applies a linear transformation followed by a ReLU activation, and then another linear transformation to produce the final class predictions.

2. **Forward Method**:
   - The 

forward

 method defines the forward pass of the model, taking two inputs: 

query

 and 

support

.
   - **Step 1**: Extract features from the query image using the CNN encoder. The output features have dimensions `[B, 1024, 8, 8]`, where `B` is the batch size.
   - **Step 2**: Extract and aggregate features from the support set. The support set contains 

N

 images. Each support image is passed through the CNN encoder, and the features are aggregated using average pooling to produce a single feature map with dimensions `[B, 1024, 8, 8]`.
   - **Step 3**: Apply the attention module to both query and support features. The attention module enhances the features by focusing on important parts of the input data.
   - **Step 4**: Combine the query and support features by concatenating them along the channel dimension, resulting in combined features with dimensions `[B, 2048, 8, 8]`.
   - **Step 5**: Pass the combined features through the classification decoder to produce the final class predictions.

This model architecture is designed to handle tasks that require the integration of information from both query and support images, making it suitable for applications such as few-shot learning or meta-learning. The use of a complex CNN encoder, attention mechanism, and classification decoder allows the model to effectively extract, enhance, and classify features from the input data.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = MTUNet2(in_channels=3, base_features=64, num_classes=5)
criterion_cls = nn.CrossEntropyLoss()  # For classification output
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function
def train(model, train_loader, criterion_cls, optimizer, epoch):
    model.train()
    running_loss = 0.0
    
    for data, target in enumerate(train_loader):
        
        # Clear gradients
        optimizer.zero_grad()

        # Creating support set
        support = create_support_set(generator, model_EfficientNetV2, model_ShuffleNetV2, target, noise_dim=128)

        # Forward pass
        classification_output = model(data, support)  # Assuming same data for support set in FSL
        
        # Compute loss
        loss_cls = criterion_cls(classification_output, target)  # Assuming target is for classification
        
        # Backward pass
        loss_cls.backward()
        optimizer.step()

        # Accumulate the running loss
        running_loss += loss_cls.item()

        # Compute accuracy for classification output
        _, predicted = torch.max(classification_output.data, 1)
        total += target.size(0)
        correct_cls += (predicted == target).sum().item()

    accuracy = 100 * correct_cls / total
    
    return running_loss / len(train_loader), accuracy


# Evaluation function
def evaluate(model, test_loader, criterion_cls):
    model.eval()
    test_loss = 0.0
    correct_cls = 0
    total = 0

    with torch.no_grad():
        for data, target in test_loader:

            # Forward pass
            classification_output = model(data)
            
            # Compute loss
            loss_cls = criterion_cls(classification_output, target)
            
            test_loss += loss_cls.item()

            # Compute accuracy for classification output
            _, predicted = torch.max(classification_output.data, 1)
            total += target.size(0)
            correct_cls += (predicted == target).sum().item()

    accuracy = 100 * correct_cls / total
    avg_loss = test_loss / len(test_loader)
    
    return avg_loss, accuracy


# Main training loop
num_epochs = 1#500
for epoch in range(1, num_epochs + 1):
    train_loss, train_accuracy = train(model, train_loader, criterion_cls, optimizer, epoch)
    print(f'Epoch [{epoch}], Training Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%')

    test_loss, test_accuracy = evaluate(model, test_loader, criterion_cls)
    print(f'Epoch [{epoch}], Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%')
    print()

The provided code snippet defines the setup and training loop for a neural network model named 

MTUNet2

 using PyTorch. This includes initializing the model, defining the loss function and optimizer, and implementing the training and evaluation functions.

1. **Initialization**:
   - The model 

MTUNet2

 is instantiated with 3 input channels, 64 base features, and 5 output classes.
   - The loss function used is 

nn.CrossEntropyLoss()

, which is suitable for multi-class classification tasks.
   - The optimizer used is Adam (

optim.Adam

), with a learning rate of 0.001, to update the model parameters.

2. **Training Function**:
   - The 

train

 function is defined to train the model for one epoch. It takes the model, training data loader, loss function, optimizer, and the current epoch number as input.
   - The model is set to training mode using 

model.train()

.
   - A running loss variable is initialized to accumulate the loss over the epoch.
   - The function iterates over batches of data from the training loader. For each batch:
     - Gradients are cleared using 

optimizer.zero_grad()

.
     - A support set is created using the 

create_support_set

 function, which generates images and validates them using two classifier models.
     - A forward pass is performed by passing the data and support set through the model.
     - The classification loss is computed using the loss function.
     - The loss is backpropagated, and the optimizer steps are performed to update the model parameters.
     - The running loss is accumulated, and the classification accuracy is computed.
   - The function returns the average loss and accuracy for the epoch.

3. **Evaluation Function**:
   - The 

evaluate

 function is defined to evaluate the model on the test set. It takes the model, test data loader, and loss function as input.
   - The model is set to evaluation mode using 

model.eval()

.
   - Variables for test loss and correct predictions are initialized.
   - The function iterates over batches of data from the test loader. For each batch:
     - A forward pass is performed by passing the data through the model.
     - The classification loss is computed and accumulated.
     - The classification accuracy is computed.
   - The function returns the average loss and accuracy for the test set.

4. **Main Training Loop**:
   - The main training loop runs for a specified number of epochs (

num_epochs

).
   - For each epoch, the 

train

 function is called to train the model, and the training loss and accuracy are printed.
   - The 

evaluate

 function is called to evaluate the model on the test set, and the test loss and accuracy are printed.

This code provides a comprehensive framework for training and evaluating the 

MTUNet2

 model, including data loading, model training, loss computation, and accuracy evaluation. The use of a support set and attention mechanism in the model allows for advanced feature extraction and classification capabilities.