In [None]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from torchvision import transforms, datasets
from torch.utils.data import Dataset, DataLoader
from PIL import Image

The provided code snippet imports several essential libraries and modules that are commonly used in data science and machine learning projects.

1. **os**: This module provides a way of using operating system-dependent functionality like reading or writing to the file system. It is useful for handling file paths and directories.

2. **pandas as pd**: Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrames, which are essential for handling and analyzing structured data. The alias 

pd

 is a common convention to simplify the usage of the library.

3. **train_test_split from sklearn.model_selection**: This function from the Scikit-learn library is used to split datasets into training and testing sets. It is crucial for evaluating the performance of machine learning models by training them on one subset of the data and testing them on another.

4. **transforms and datasets from torchvision**: These modules are part of the Torchvision library, which is used in conjunction with PyTorch for computer vision tasks. 

transforms

 provides common image transformations for data augmentation and preprocessing, while 

datasets

 offers access to popular datasets like CIFAR-10 and ImageNet.

5. **Dataset and DataLoader from torch.utils.data**: These classes are part of PyTorch's data loading utilities. 

Dataset

 is an abstract class representing a dataset, and 

DataLoader

 is used to load data in batches, shuffle it, and handle multiprocessing for efficient data loading.

6. **Image from PIL**: The Python Imaging Library (PIL) is used for opening, manipulating, and saving many different image file formats. The 

Image

 class is specifically used to work with image data, which is often necessary in computer vision tasks.

Together, these imports set up the environment for a machine learning workflow that involves data manipulation, dataset splitting, image processing, and efficient data loading.

In [None]:
class HAM10000Dataset(Dataset):
    def __init__(self, csv_file, img_dirs, transform=None):
        self.data = pd.read_csv(csv_file)
        self.img_dirs = img_dirs  # List of directories
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        # Look up the image name
        img_name = self.data.iloc[idx]['image_id'] + '.jpg'
        
        # Search for the image in the directories
        for img_dir in self.img_dirs:
            img_path = os.path.join(img_dir, img_name)
            if os.path.exists(img_path):
                image = Image.open(img_path).convert('RGB')
                break
        else:
            raise FileNotFoundError(f"Image {img_name} not found in specified directories.")
        
        # Get the label
        label = self.data.iloc[idx]['dx']  # Diagnosis column
        label_map = {label: idx for idx, label in enumerate(self.data['dx'].unique())}
        label = label_map[label]

        if self.transform:
            image = self.transform(image)

        return image, label


The provided code defines a custom dataset class named 

HAM10000Dataset

 that inherits from PyTorch's 

Dataset

 class. This class is designed to handle the HAM10000 dataset, which contains images of skin lesions along with their corresponding diagnoses.

1. **Initialization (

__init__

 method)**: The constructor takes three parameters: 

csv_file

, 

img_dirs

, and 

transform

. The 

csv_file

 parameter is the path to a CSV file containing metadata about the images, such as their filenames and diagnoses. The 

img_dirs

 parameter is a list of directories where the images are stored. The 

transform

 parameter is optional and can be used to apply transformations to the images (e.g., data augmentation). The constructor reads the CSV file into a pandas DataFrame and stores the image directories and transform function as instance variables.

2. **Length (

__len__

 method)**: This method returns the number of samples in the dataset by returning the length of the DataFrame. This is a required method for PyTorch datasets, as it allows PyTorch to know how many samples are available.

3. **Get Item (

__getitem__

 method)**: This method retrieves a single sample from the dataset. It takes an index 

idx

 as input and performs the following steps:
   - It constructs the image filename by appending '.jpg' to the `image_id` from the DataFrame.
   - It searches for the image file in the specified directories. If the image is found, it is opened and converted to RGB format. If the image is not found in any directory, a 

FileNotFoundError

 is raised.
   - It retrieves the label (diagnosis) for the image from the DataFrame. The labels are mapped to numerical values using a dictionary that assigns a unique index to each unique diagnosis.
   - If a transform function is provided, it is applied to the image.
   - The method returns a tuple containing the image and its corresponding label.

This custom dataset class allows for efficient loading and preprocessing of the HAM10000 dataset, making it suitable for training machine learning models in PyTorch.

In [None]:
metadata_path = "../input/skin-cancer-mnist-ham10000/HAM10000_metadata.csv"
metadata = pd.read_csv(metadata_path)

# Check the number of unique images in metadata
print(f"Total images in metadata: {len(metadata)}")

The provided code snippet is responsible for loading and inspecting metadata for the HAM10000 dataset, which is a collection of skin lesion images and their associated information.

1. **Setting the Metadata Path**: The variable 

metadata_path

 is assigned the file path to the CSV file containing the metadata for the HAM10000 dataset. This path points to a file named `HAM10000_metadata.csv` located in the directory `../input/skin-cancer-mnist-ham10000/`.

2. **Loading the Metadata**: The 

pd.read_csv(metadata_path)

 function is used to read the CSV file into a pandas DataFrame named 

metadata

. This DataFrame will contain various columns with information about each image, such as the image ID, diagnosis, and other relevant details.

3. **Checking the Number of Unique Images**: The 

print

 statement outputs the total number of images listed in the metadata. The 

len(metadata)

 function returns the number of rows in the DataFrame, which corresponds to the number of unique images described in the metadata file.

This code is essential for verifying that the metadata has been loaded correctly and for understanding the size of the dataset, which is a crucial step before proceeding with further data processing and analysis.

In [None]:
# Split metadata into train and test sets
train_metadata, test_metadata = train_test_split(metadata, test_size=0.2, random_state=42)

# Save split metadata for easier loading
train_metadata.to_csv("train_metadata.csv", index=False)
test_metadata.to_csv("test_metadata.csv", index=False)

The provided code snippet is responsible for splitting the metadata of the HAM10000 dataset into training and testing sets and then saving these subsets to CSV files for easier future access.

1. **Splitting the Metadata**: The 

train_test_split

 function from Scikit-learn is used to divide the 

metadata

 DataFrame into two subsets: 

train_metadata

 and 

test_metadata

. The 

test_size=0.2

 parameter specifies that 20% of the data should be allocated to the test set, while the remaining 80% will be used for training. The 

random_state=42

 parameter ensures that the split is reproducible, meaning that the same split will be obtained each time the code is run with this seed value.

2. **Saving the Split Metadata**: The 

to_csv

 method of the pandas DataFrame is used to save the training and testing metadata to CSV files named 

train_metadata.csv

 and 

test_metadata.csv

, respectively. The 

index=False

 parameter ensures that the row indices are not included in the saved CSV files, keeping the files clean and focused on the actual data.

By splitting the metadata into training and testing sets, this code prepares the dataset for model training and evaluation. Saving these subsets to CSV files allows for quick and easy loading in future steps of the workflow, ensuring that the same data split is used consistently throughout the project.

In [None]:
# Directories containing images
image_dirs = [
    "../input/skin-cancer-mnist-ham10000/HAM10000_images_part_1",
    "../input/skin-cancer-mnist-ham10000/HAM10000_images_part_2"
]

The provided code snippet defines a list of directories where the images for the HAM10000 dataset are stored.

1. **Defining Image Directories**: The variable 

image_dirs

 is assigned a list containing two directory paths. These directories, `HAM10000_images_part_1` and `HAM10000_images_part_2`, are located within the `../input/skin-cancer-mnist-ham10000/` directory. Each directory contains a portion of the image files associated with the HAM10000 dataset.

2. **Purpose of Image Directories**: By specifying these directories, the code sets up the locations from which the images will be loaded. This is crucial for any subsequent steps that involve accessing and processing the images, such as loading them into a dataset class, applying transformations, or feeding them into a machine learning model.

This setup ensures that the code knows exactly where to find the image files, facilitating efficient data loading and management throughout the project.

In [None]:
transform = transforms.Compose([
    transforms.Resize((64, 64)),  # Resize images to 64x64
    transforms.ToTensor(),         # Convert to PyTorch tensor
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize
])

The provided code snippet defines a series of image transformations that will be applied to the images in the dataset. These transformations are composed using the 

transforms.Compose

 function from the Torchvision library, which allows multiple transformations to be chained together.

1. **Resizing Images**: The 

transforms.Resize((64, 64))

 transformation resizes each image to a fixed size of 64x64 pixels. This ensures that all images have the same dimensions, which is necessary for consistent input to a neural network. The comment incorrectly states "Resize images to 256x256"; it should be corrected to "Resize images to 64x64".

2. **Converting to Tensor**: The 

transforms.ToTensor()

 transformation converts the image from a PIL Image or NumPy array to a PyTorch tensor. This conversion is essential because PyTorch models require input data to be in tensor format.

3. **Normalizing**: The 

transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])

 transformation normalizes the image tensor. Each channel (Red, Green, Blue) is normalized by subtracting the mean value of 0.5 and dividing by the standard deviation of 0.5. This normalization scales the pixel values to the range [-1, 1], which can help improve the convergence of neural network training.

By defining these transformations, the code ensures that all images are preprocessed in a consistent manner before being fed into a machine learning model. This preprocessing step is crucial for achieving good model performance and stability.

In [None]:
# Datasets
train_dataset = HAM10000Dataset(csv_file="train_metadata.csv", img_dirs=image_dirs, transform=transform)
test_dataset = HAM10000Dataset(csv_file="test_metadata.csv", img_dirs=image_dirs, transform=transform)

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=4, shuffle=False, num_workers=2)

# Print dataset sizes
print(f"Number of training samples: {len(train_dataset)}")
print(f"Number of testing samples: {len(test_dataset)}")

The provided code snippet sets up the datasets and data loaders for training and testing a machine learning model using the HAM10000 dataset.

1. **Creating Datasets**: 
   - 

train_dataset

 and 

test_dataset

 are instances of the 

HAM10000Dataset

 class, which is a custom dataset class designed to handle the HAM10000 dataset. 
   - The 

train_dataset

 is created using the metadata from `train_metadata.csv`, while the 

test_dataset

 is created using the metadata from `test_metadata.csv`. 
   - Both datasets use the same list of image directories (

image_dirs

) and the same set of transformations (

transform

) defined earlier. These transformations include resizing the images, converting them to tensors, and normalizing them.

2. **Creating DataLoaders**: 
   - 

train_loader

 and 

test_loader

 are instances of the 

DataLoader

 class from PyTorch, which provides an efficient way to load data in batches.
   - 

train_loader

 is created using the 

train_dataset

 and is configured with a batch size of 4, shuffling enabled (

shuffle=True

), and 2 worker threads (

num_workers=2

) for loading data in parallel. Shuffling the training data helps to ensure that the model does not learn the order of the data, which can improve generalization.
   - 

test_loader

 is created using the 

test_dataset

 with the same batch size and number of worker threads, but shuffling is disabled (

shuffle=False

). This is because the order of the test data does not need to be randomized.

3. **Printing Dataset Sizes**: Although the code to print the dataset sizes is not included in the snippet, it is implied that the sizes of the training and testing datasets will be printed. This is useful for verifying that the datasets have been loaded correctly and contain the expected number of samples.

Overall, this setup prepares the data for training and evaluating a machine learning model by organizing it into manageable batches and ensuring consistent preprocessing.

In [None]:
# Visualize one batch of images
images, labels = next(iter(train_loader))
print(f"Image batch shape: {images.shape}")
print(f"Label batch shape: {labels.shape}")

# Display first 4 images
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 4, figsize=(12, 4))
for i in range(4):
    axes[i].imshow(images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)  # Denormalize
    axes[i].set_title(f"Label: {labels[i].item()}")
    axes[i].axis("off")
plt.show()

The provided code snippet is designed to visualize a batch of images from the training dataset, which helps in understanding the data and verifying that the preprocessing steps have been applied correctly.

1. **Loading a Batch of Images**: 
   - The line 

images, labels = next(iter(train_loader))

 retrieves the next batch of images and their corresponding labels from the 

train_loader

. This batch is stored in the variables 

images

 and 

labels

.
   - The 

print

 statements output the shapes of the image and label batches. This is useful for confirming that the batch size and image dimensions are as expected. The shape of 

images

 should be `(batch_size, channels, height, width)`, and the shape of 

labels

 should be `(batch_size,)`.

2. **Importing Matplotlib**: 
   - The 

import matplotlib.pyplot as plt

 statement imports the Matplotlib library, which is used for plotting and visualizing data.

3. **Creating a Plot**: 
   - The 

fig, axes = plt.subplots(1, 4, figsize=(12, 4))

 line creates a figure with a 1x4 grid of subplots, each with a size of 12x4 inches. This layout is used to display the first four images in the batch.
   - The `for` loop iterates over the first four images in the batch. For each image:
     - 

axes[i].imshow(images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)

 displays the image. The 

permute(1, 2, 0)

 method rearranges the dimensions of the image tensor from (channels, height, width) to (height, width, channels), which is the format expected by Matplotlib. The 

numpy()

 method converts the tensor to a NumPy array, and the multiplication and addition operations denormalize the image (reversing the normalization applied during preprocessing).
     - 

axes[i].set_title(f"Label: {labels[i].item()}")

 sets the title of the subplot to the label of the image.
     - 

axes[i].axis("off")

 removes the axis ticks and labels for a cleaner display.

4. **Displaying the Plot**: 
   - The 

plt.show()

 command renders the plot and displays the images.

This visualization step is crucial for ensuring that the images are being loaded and preprocessed correctly, and it provides a quick way to inspect the data visually.

In [None]:
import torch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, in_dim):
        super(SelfAttention, self).__init__()
        self.query_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.key_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.value_conv = nn.Conv2d(in_dim, in_dim, 1)
        self.gamma = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        batch, channels, height, width = x.size()
        proj_query = self.query_conv(x).view(batch, -1, width * height).permute(0, 2, 1)
        proj_key = self.key_conv(x).view(batch, -1, width * height)
        attention = torch.bmm(proj_query, proj_key)
        attention = torch.softmax(attention, dim=-1)

        proj_value = self.value_conv(x).view(batch, -1, width * height)
        out = torch.bmm(proj_value, attention.permute(0, 2, 1))
        out = out.view(batch, channels, height, width)
        out = self.gamma * out + x
        return out

class ResidualBlock(nn.Module):
    def __init__(self, in_channels):
        super(ResidualBlock, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(in_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(in_channels)
        )

    def forward(self, x):
        return x + self.block(x)

class Generator(nn.Module):
    def __init__(self, latent_dim, img_channels, img_size=64):
        super(Generator, self).__init__()
        
        self.latent_dim = latent_dim
        self.img_channels = img_channels
        self.img_size = img_size
        self.init_size = img_size // 8  # Downsample by 8 (adjusted for 64x64 output)
        self.fc = nn.Linear(latent_dim, 128 * self.init_size * self.init_size)

        self.upsample = nn.Sequential(
            nn.BatchNorm2d(128),
            nn.ConvTranspose2d(128, 128, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            ResidualBlock(128),
            SelfAttention(128),  # Self-Attention after first upscale
            
            nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),

            nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            SelfAttention(32),  # Self-Attention in the middle layers
        )

        self.final_layer = nn.Sequential(
            nn.Conv2d(32, img_channels, kernel_size=3, stride=1, padding=1),
            nn.Tanh()  # Normalize output to [-1, 1]
        )

    def forward(self, z):
        out = self.fc(z)
        out = out.view(out.size(0), 128, self.init_size, self.init_size)
        out = self.upsample(out)
        img = self.final_layer(out)
        return img

# Instantiate the generator
latent_dim = 100  # Size of latent vector
img_channels = 3  # RGB images
img_size = 64  # Output image size

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator(latent_dim, img_channels, img_size).to(device)

# Test the generator
z = torch.randn(4, latent_dim).to(device)  # Random latent vector (batch size = 4)
generated_images = generator(z)

print(f"Generated image shape: {generated_images.shape}")  # Should be [4, 3, 64, 64]


The provided code snippet defines and tests a Generative Adversarial Network (GAN) generator model that includes self-attention and residual blocks to enhance image generation quality.

1. **Imports**: The code begins by importing the necessary PyTorch modules, including 

torch

 and 

torch.nn

.

2. **Self-Attention Class**: 
   - The 

SelfAttention

 class is defined to implement a self-attention mechanism. This mechanism allows the model to focus on different parts of the image, improving the generation of fine details.
   - The 

__init__

 method initializes convolutional layers for query, key, and value projections, and a learnable parameter 

gamma

.
   - The 

forward

 method computes the attention map and applies it to the input feature map, enhancing the representation.

3. **ResidualBlock Class**: 
   - The 

ResidualBlock

 class implements a residual block, which helps in training deep networks by allowing gradients to flow through skip connections.
   - The 

__init__

 method sets up two convolutional layers with batch normalization and ReLU activation.
   - The 

forward

 method adds the input to the output of the block, creating a residual connection.

4. **Generator Class**: 
   - The 

Generator

 class defines the architecture of the GAN generator.
   - The 

__init__

 method initializes the generator with a fully connected layer to project the latent vector, followed by a series of upsampling layers, residual blocks, and self-attention layers.
   - The 

forward

 method processes the latent vector through these layers to generate an image.
   - The 

upsample

 sequence includes batch normalization, transposed convolutions for upsampling, residual blocks, and self-attention layers to refine the generated images.
   - The 

final_layer

 normalizes the output to the range [-1, 1] using a Tanh activation function.

5. **Instantiating and Testing the Generator**: 
   - The generator is instantiated with a latent dimension of 100, 3 image channels (for RGB images), and an output image size of 64x64 pixels.
   - The generator is moved to the appropriate device (GPU if available, otherwise CPU).
   - A random latent vector 

z

 is generated, and the generator produces a batch of images.
   - The shape of the generated images is printed to verify that it matches the expected dimensions `[4, 3, 64, 64]`.

This code sets up a sophisticated GAN generator that leverages self-attention and residual connections to produce high-quality images, and it includes a test to ensure the generator works as expected.

In [None]:
import torch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, in_dim):
        super(SelfAttention, self).__init__()
        self.query_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.key_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.value_conv = nn.Conv2d(in_dim, in_dim, 1)
        self.gamma = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        batch, channels, height, width = x.size()
        proj_query = self.query_conv(x).view(batch, -1, width * height).permute(0, 2, 1)
        proj_key = self.key_conv(x).view(batch, -1, width * height)
        attention = torch.bmm(proj_query, proj_key)
        attention = torch.softmax(attention, dim=-1)

        proj_value = self.value_conv(x).view(batch, -1, width * height)
        out = torch.bmm(proj_value, attention.permute(0, 2, 1))
        out = out.view(batch, channels, height, width)
        out = self.gamma * out + x
        return out

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample=True):
        super(ResidualBlock, self).__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1) if downsample else nn.Identity()
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.pool = nn.AvgPool2d(2) if downsample else nn.Identity()

    def forward(self, x):
        shortcut = self.shortcut(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += shortcut
        out = self.relu(out)
        out = self.pool(out)
        return out

class Discriminator(nn.Module):
    def __init__(self, img_channels, img_size=64):
        super(Discriminator, self).__init__()

        self.model = nn.Sequential(
            ResidualBlock(img_channels, 64, downsample=True),            # 64x64 -> 32x32
            SelfAttention(64),
            ResidualBlock(64, 128, downsample=True),           # 32x32 -> 16x16
            SelfAttention(128),
            ResidualBlock(128, 256, downsample=True),          # 16x16 -> 8x8
            ResidualBlock(256, 512, downsample=True)           # 8x8 -> 4x4
        )

        self.final_layer = nn.Sequential(
            nn.Flatten(),
            nn.Linear(512 * 4 * 4, 1),  # Final score output
            nn.Sigmoid()  # Outputs probability of being real or fake
        )

    def forward(self, img):
        out = self.model(img)
        out = self.final_layer(out)
        return out

# Instantiate the discriminator
img_channels = 3  # RGB images
img_size = 64  # Input image size

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
discriminator = Discriminator(img_channels, img_size).to(device)

# Test the discriminator
batch_size = 4
test_images = torch.randn(batch_size, img_channels, img_size, img_size).to(device)  # Fake images batch
output = discriminator(test_images)

print(f"Discriminator output shape: {output.shape}")  # Should be [4, 1]


The provided code snippet defines and tests a Discriminator model for a Generative Adversarial Network (GAN) using PyTorch. The Discriminator is designed to distinguish between real and fake images, incorporating self-attention and residual blocks to enhance its performance.

1. **Imports**: The code begins by importing the necessary PyTorch modules, including 

torch

 and 

torch.nn

.

2. **SelfAttention Class**:
   - The 

SelfAttention

 class implements a self-attention mechanism, which allows the model to focus on different parts of the image, improving its ability to capture fine details.
   - The 

__init__

 method initializes convolutional layers for query, key, and value projections, and a learnable parameter 

gamma

.
   - The 

forward

 method computes the attention map and applies it to the input feature map, enhancing the representation.

3. **ResidualBlock Class**:
   - The 

ResidualBlock

 class implements a residual block, which helps in training deep networks by allowing gradients to flow through skip connections.
   - The 

__init__

 method sets up two convolutional layers with batch normalization and ReLU activation, along with a shortcut connection and optional downsampling.
   - The 

forward

 method adds the input to the output of the block, creating a residual connection and optionally downsampling the feature map.

4. **Discriminator Class**:
   - The 

Discriminator

 class defines the architecture of the GAN discriminator.
   - The 

__init__

 method initializes the discriminator with a series of residual blocks and self-attention layers, followed by a final layer that outputs a probability score.
   - The 

model

 sequence includes residual blocks with downsampling, self-attention layers, and batch normalization to refine the feature maps.
   - The 

final_layer

 flattens the feature maps and applies a linear layer followed by a sigmoid activation to output the probability of the image being real or fake.

5. **Instantiating and Testing the Discriminator**:
   - The discriminator is instantiated with 3 image channels (for RGB images) and an input image size of 64x64 pixels.
   - The discriminator is moved to the appropriate device (GPU if available, otherwise CPU).
   - A batch of random fake images is generated, and the discriminator produces an output for these images.
   - The shape of the discriminator's output is printed to verify that it matches the expected dimensions `[4, 1]`, indicating the batch size and the probability score for each image.

This code sets up a sophisticated GAN discriminator that leverages self-attention and residual connections to effectively distinguish between real and fake images, and it includes a test to ensure the discriminator works as expected.

In [None]:
import warnings
warnings.filterwarnings("ignore")

The provided code snippet is used to manage warning messages in Python.

1. **Importing the Warnings Module**: The 

warnings

 module is imported, which is a built-in Python module used to handle warning messages. Warnings are typically issued to alert the user about potential issues in the code that do not necessarily stop the execution but might lead to unexpected behavior.

2. **Filtering Warnings**: The 

warnings.filterwarnings("ignore")

 function call is used to suppress all warning messages. By setting the filter to "ignore", the code instructs Python to ignore any warnings that would normally be printed to the console. This can be useful in scenarios where the developer is aware of certain non-critical warnings and wants to prevent them from cluttering the output.

While suppressing warnings can make the output cleaner and easier to read, it is important to use this approach judiciously. Ignoring warnings without understanding their cause can sometimes hide underlying issues that might affect the program's correctness or performance.

In [None]:
from torch import optim
from tqdm import tqdm

def train_gan(generator, discriminator, train_loader, latent_dim, device, epochs=1000, lr=0.0002, beta1=0.5, beta2=0.999):
    generator.to(device)
    discriminator.to(device)

    criterion = nn.BCEWithLogitsLoss()
    optimizer_G = optim.Adam(generator.parameters(), lr=lr, betas=(beta1, beta2))
    optimizer_D = optim.Adam(discriminator.parameters(), lr=lr, betas=(beta1, beta2))
    
    scaler = torch.cuda.amp.GradScaler()  # For mixed precision training

    for epoch in range(epochs):
        generator.train()
        discriminator.train()
        epoch_loss_G = 0.0
        epoch_loss_D = 0.0

        for real_images, _ in tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}"):
            batch_size = real_images.size(0)
            real_images = real_images.to(device)

            valid = torch.ones((batch_size, 1), requires_grad=False).to(device)
            fake = torch.zeros((batch_size, 1), requires_grad=False).to(device)

            # Train Generator
            optimizer_G.zero_grad()
            z = torch.randn(batch_size, latent_dim).to(device)
            
            with torch.cuda.amp.autocast():  # Mixed precision training
                generated_images = generator(z)
                g_loss = criterion(discriminator(generated_images), valid)

            scaler.scale(g_loss).backward()
            scaler.step(optimizer_G)
            scaler.update()
            epoch_loss_G += g_loss.item()

            # Train Discriminator
            optimizer_D.zero_grad()
            with torch.cuda.amp.autocast():
                real_loss = criterion(discriminator(real_images), valid)
                fake_loss = criterion(discriminator(generated_images.detach()), fake)
                d_loss = (real_loss + fake_loss) / 2

            scaler.scale(d_loss).backward()
            scaler.step(optimizer_D)
            scaler.update()
            epoch_loss_D += d_loss.item()

            # Clear cache to reduce memory fragmentation
            torch.cuda.empty_cache()

        print(f"Epoch [{epoch+1}/{epochs}] | Generator Loss: {epoch_loss_G:.4f} | Discriminator Loss: {epoch_loss_D:.4f}")

    print("Training completed.")


# Call the train_gan function with the train_loader, generator, and discriminator
train_gan(generator, discriminator, train_loader, latent_dim, device)


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models
import warnings
warnings.filterwarnings("ignore")

# Define EfficientNetV2 model for HAM10000 with 7 classes
class EfficientNetV2Classifier(nn.Module):
    def __init__(self, num_classes=7):  # 7 classes for HAM10000
        super(EfficientNetV2Classifier, self).__init__()
        self.efficientnet_v2 = models.efficientnet_v2_s(pretrained=True)
        
        in_features = self.efficientnet_v2.classifier[1].in_features
        self.efficientnet_v2.classifier = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(in_features, num_classes)
        )

    def forward(self, x):
        return self.efficientnet_v2(x)

# Initialize the model
model_EfficientNetV2 = EfficientNetV2Classifier(num_classes=7)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_EfficientNetV2.parameters(), lr=0.001)
epochs = 20

# Training loop
def train_model(model, train_loader, test_loader, criterion, optimizer, epochs=20):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        total_correct = 0
        
        for data, labels in train_loader:
            data, labels = data.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(data)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
            print("done")
        
        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = total_correct / len(train_loader.dataset)
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}')
        
        # Validation after each epoch
        validate_model(model, test_loader)

# Validation loop
def validate_model(model, test_loader):
    model.eval()
    total_correct = 0
    total_loss = 0.0
    
    with torch.no_grad():
        for data, labels in test_loader:
            data, labels = data.to(device), labels.to(device)
            outputs = model(data)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
    
    avg_loss = total_loss / len(test_loader)
    accuracy = total_correct / len(test_loader.dataset)
    print(f'Validation Loss: {avg_loss:.4f}, Validation Accuracy: {accuracy:.4f}')

model_EfficientNetV2.to(device)

# Train the model
train_model(model_EfficientNetV2, train_loader, test_loader, criterion, optimizer, epochs=epochs)

In [None]:
# Define ShuffleNetV2 model for HAM10000 with 7 classes
class ShuffleNetV2Classifier(nn.Module):
    def __init__(self, num_classes=7):  # 7 classes for HAM10000
        super(ShuffleNetV2Classifier, self).__init__()
        self.shufflenet_v2 = models.shufflenet_v2_x1_0(pretrained=True)
        
        # Modify the last fully connected layer to match the number of classes
        in_features = self.shufflenet_v2.fc.in_features
        self.shufflenet_v2.fc = nn.Sequential(
            nn.Dropout(p=0.3),
            nn.Linear(in_features, num_classes)
        )

    def forward(self, x):
        return self.shufflenet_v2(x)

# Initialize the model
model_ShuffleNetV2 = ShuffleNetV2Classifier(num_classes=7)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_ShuffleNetV2.parameters(), lr=0.001)
epochs = 20

# Training loop
def train_model(model, train_loader, test_loader, criterion, optimizer, epochs=20):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        total_correct = 0
        
        for data, labels in train_loader:
            data, labels = data.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(data)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
        
        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = total_correct / len(train_loader.dataset)
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.4f}')
        
        # Validation after each epoch
        validate_model(model, test_loader)

# Validation loop
def validate_model(model, test_loader):
    model.eval()
    total_correct = 0
    total_loss = 0.0
    
    with torch.no_grad():
        for data, labels in test_loader:
            data, labels = data.to(device), labels.to(device)
            outputs = model(data)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            total_correct += (predicted == labels).sum().item()
    
    avg_loss = total_loss / len(test_loader)
    accuracy = total_correct / len(test_loader.dataset)
    print(f'Validation Loss: {avg_loss:.4f}, Validation Accuracy: {accuracy:.4f}')

model_ShuffleNetV2.to(device)

# Train the model
train_model(model_ShuffleNetV2, train_loader, test_loader, criterion, optimizer, epochs=epochs)

In [None]:
def create_support_set(generator, model_EfficientNetV2, model_ShuffleNetV2, labels, noise_dim=128):
    noise = torch.randn(batch_size, noise_dim)  # Random noise for generator
    created_imgs = generator(noise, labels) 
    EfficientNetV2Classifier_labels = model_EfficientNetV2(created_imgs)
    ShuffleNetV2Classifier_labels = model_ShuffleNetV2(created_imgs)
    if EfficientNetV2Classifier_labels == labels and ShuffleNetV2Classifier_labels == labels:
        return created_imgs
    else:
        return None

In [None]:
import torch
import torch.nn as nn

class CNNEncoder(nn.Module):
    def __init__(self, in_channels=3, base_features=64):
        super(CNNEncoder, self).__init__()
        
        # Encoder block 1
        self.block1 = nn.Sequential(
            nn.Conv2d(in_channels, base_features, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features, base_features, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 64x64 -> 32x32
        )
        
        # Encoder block 2
        self.block2 = nn.Sequential(
            nn.Conv2d(base_features, base_features * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 2),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 2, base_features * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 32x32 -> 16x16
        )
        
        # Encoder block 3
        self.block3 = nn.Sequential(
            nn.Conv2d(base_features * 2, base_features * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 4),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 4, base_features * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 16x16 -> 8x8
        )
        
        # Encoder block 4
        self.block4 = nn.Sequential(
            nn.Conv2d(base_features * 4, base_features * 8, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 8),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 8, base_features * 8, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 8),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # Reduces 8x8 -> 4x4
        )

    def forward(self, x):
        # Apply each encoder block to the input
        x = self.block1(x)  # 64x64 -> 32x32
        x = self.block2(x)  # 32x32 -> 16x16
        x = self.block3(x)  # 16x16 -> 8x8
        x = self.block4(x)  # 8x8 -> 4x4
        return x


In [None]:
import torch
import torch.nn as nn

class AttentionModule(nn.Module):
    def __init__(self, feature_dim, num_heads=4):
        super(AttentionModule, self).__init__()
        
        self.num_heads = num_heads
        self.head_dim = feature_dim // num_heads
        
        # Linear transformations for multi-head attention
        self.query_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        self.key_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        self.value_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        
        # Multi-head attention mechanism
        self.attn_heads = nn.ModuleList(
            [nn.Sequential(
                nn.Conv2d(self.head_dim, self.head_dim, kernel_size=1),
                nn.Softmax(dim=-1)  # Softmax across the spatial dimension
            ) for _ in range(num_heads)]
        )
        
        # Channel attention to recalibrate feature maps
        self.channel_attention = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(feature_dim, feature_dim // 16, kernel_size=1),
            nn.ReLU(),
            nn.Conv2d(feature_dim // 16, feature_dim, kernel_size=1),
            nn.Sigmoid()
        )
        
        # Spatial attention to emphasize important regions in the spatial dimension
        self.spatial_attention = nn.Sequential(
            nn.Conv2d(2, 1, kernel_size=7, padding=3),
            nn.Sigmoid()
        )
        
        # Final 1x1 conv to combine outputs
        self.output_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
    
    def forward(self, features):
        # Compute query, key, and value maps for multi-head attention
        queries = self.query_conv(features)  # [B, C, H, W]
        keys = self.key_conv(features)       # [B, C, H, W]
        values = self.value_conv(features)   # [B, C, H, W]
        
        B, C, H, W = queries.size()
        queries = queries.view(B, self.num_heads, self.head_dim, H * W)  # [B, heads, head_dim, H*W]
        keys = keys.view(B, self.num_heads, self.head_dim, H * W)        # [B, heads, head_dim, H*W]
        values = values.view(B, self.num_heads, self.head_dim, H * W)    # [B, heads, head_dim, H*W]
        
        # Multi-head attention
        attention_outputs = []
        for i in range(self.num_heads):
            attn_weights = torch.bmm(queries[:, i], keys[:, i].transpose(1, 2))  # [B, head_dim, head_dim]
            attn_weights = self.attn_heads[i](attn_weights.view(B, self.head_dim, H, W))  # Apply learned attention map
            attn_output = torch.bmm(attn_weights.view(B, self.head_dim, H * W), values[:, i])  # [B, head_dim, H*W]
            attention_outputs.append(attn_output.view(B, self.head_dim, H, W))
        
        # Concatenate all attention head outputs
        multi_head_output = torch.cat(attention_outputs, dim=1)  # [B, C, H, W]
        
        # Channel Attention
        channel_attn_weights = self.channel_attention(multi_head_output)
        channel_attn_output = multi_head_output * channel_attn_weights  # Element-wise multiplication (recalibration)
        
        # Spatial Attention
        avg_pool = torch.mean(channel_attn_output, dim=1, keepdim=True)  # Average pooling across channels
        max_pool = torch.max(channel_attn_output, dim=1, keepdim=True)[0]  # Max pooling across channels
        spatial_attn_weights = self.spatial_attention(torch.cat([avg_pool, max_pool], dim=1))
        spatial_attn_output = channel_attn_output * spatial_attn_weights  # Element-wise multiplication (spatial recalibration)
        
        # Final 1x1 conv to produce the final attention output
        output = self.output_conv(spatial_attn_output)
        return output


In [None]:
import torch
import torch.nn as nn

class MTUNet2(nn.Module):
    def __init__(self, in_channels=3, base_features=64, num_classes=5, feature_dim=512, num_heads=4):
        super(MTUNet2, self).__init__()
        
        # Complex CNN Encoder shared by both query and support
        self.encoder = CNNEncoder(in_channels, base_features)
        
        # Complex Attention mechanism
        self.attn_module = AttentionModule(feature_dim, num_heads=num_heads)
        
        # Classification Decoder
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(base_features*16*8*8, 1024),  # Updated linear layer input size for complex encoder
            nn.ReLU(),
            nn.Linear(1024, num_classes)
        )
    
    def forward(self, query, support):
        # Step 1: Extract features from the query image using the updated CNNEncoder
        query_features = self.encoder(query)  # Query features are [B, 1024, 8, 8] based on complex CNNEncoder
        
        # Step 2: Extract and aggregate features from the support set
        N = support.size(0)  # Number of support images
        support_features = []
        for i in range(N):
            support_feature = self.encoder(support[i].unsqueeze(0))  # Each support image's features
            support_features.append(support_feature)
        
        # Aggregate support features (using average pooling for simplicity)
        support_features = torch.mean(torch.stack(support_features), dim=0)  # [B, 1024, 8, 8]
        
        # Step 3: Apply complex attention to both query and support features
        query_attn = self.attn_module(query_features)  # Attention on query
        support_attn = self.attn_module(support_features)  # Attention on support
        
        # Step 4: Combine query and support features via one-to-one concatenation
        combined_features = torch.cat((query_attn, support_attn), dim=1)  # Concatenate along the channel dimension
        # Combined features will be [B, 1024 + 1024 = 2048, 8, 8]
        
        # Step 5: Classification Decoder (use the combined query-support features)
        classification_output = self.classifier(combined_features)
        
        return classification_output


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = MTUNet2(in_channels=3, base_features=64, num_classes=5)
criterion_cls = nn.CrossEntropyLoss()  # For classification output
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function
def train(model, train_loader, criterion_cls, optimizer, epoch):
    model.train()
    running_loss = 0.0
    
    for data, target in enumerate(train_loader):
        
        # Clear gradients
        optimizer.zero_grad()

        # Creating support set
        support = create_support_set(generator, model_EfficientNetV2, model_ShuffleNetV2, target, noise_dim=128)

        # Forward pass
        classification_output = model(data, support)  # Assuming same data for support set in FSL
        
        # Compute loss
        loss_cls = criterion_cls(classification_output, target)  # Assuming target is for classification
        
        # Backward pass
        loss_cls.backward()
        optimizer.step()

        # Accumulate the running loss
        running_loss += loss_cls.item()

        # Compute accuracy for classification output
        _, predicted = torch.max(classification_output.data, 1)
        total += target.size(0)
        correct_cls += (predicted == target).sum().item()

    accuracy = 100 * correct_cls / total
    
    return running_loss / len(train_loader), accuracy


# Evaluation function
def evaluate(model, test_loader, criterion_cls):
    model.eval()
    test_loss = 0.0
    correct_cls = 0
    total = 0

    with torch.no_grad():
        for data, target in test_loader:

            # Forward pass
            classification_output = model(data)
            
            # Compute loss
            loss_cls = criterion_cls(classification_output, target)
            
            test_loss += loss_cls.item()

            # Compute accuracy for classification output
            _, predicted = torch.max(classification_output.data, 1)
            total += target.size(0)
            correct_cls += (predicted == target).sum().item()

    accuracy = 100 * correct_cls / total
    avg_loss = test_loss / len(test_loader)
    
    return avg_loss, accuracy


# Main training loop
num_epochs = 500
for epoch in range(1, num_epochs + 1):
    train_loss, train_accuracy = train(model, train_loader, criterion_cls, optimizer, epoch)
    print(f'Epoch [{epoch}], Training Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%')

    test_loss, test_accuracy = evaluate(model, test_loader, criterion_cls)
    print(f'Epoch [{epoch}], Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%')
    print()