In [None]:
import os

import pandas as pd

from sklearn.model_selection import train_test_split

from torchvision import transforms, datasets

from torch.utils.data import Dataset, DataLoader

from PIL import Image

The provided code snippet imports several essential libraries and modules that are commonly used in data science, machine learning, and deep learning projects.



1. **os**: This module provides a way of using operating system-dependent functionality like reading or writing to the file system. It is useful for tasks such as navigating the file system, handling file paths, and manipulating directories.


2. **pandas as pd**: Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrames, which are particularly useful for handling and analyzing structured data. The alias pd  is a common convention to make the code more concise.
   
4. **train_test_split from sklearn.model_selection**: This function is part of the scikit-learn library, which is widely used for machine learning tasks. The train_test_split function is used to split a dataset into training and testing sets, which is a crucial step in building and evaluating machine learning models.

4. **transforms and datasets from torchvision**: Torchvision is a library that provides tools for computer vision tasks. The transforms module includes common image transformations that are often used in preprocessing steps, such as resizing, cropping, and normalizing images. The datasets module provides access to popular datasets and utilities to load them.

5. **Dataset and DataLoader from torch.utils.data**: These classes are part of PyTorch, a deep learning framework. The Dataset class is an abstract class representing a dataset, and the DataLoader class provides an iterable over a dataset, with support for batching, shuffling, and parallel data loading. These are essential for efficiently handling large datasets during training and evaluation of deep learning models.

6. **Image from PIL**: The Python Imaging Library (PIL) is a library that adds image processing capabilities to Python. The Image module is used for opening, manipulating, and saving many different image file formats. It is often used in conjunction with torchvision for image preprocessing tasks.

Together, these imports set up a robust environment for handling data, preprocessing images, and building machine learning and deep learning models.

In [71]:
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os
import pandas as pd
import random
class HAM10000Dataset(Dataset):

    def __init__(self, csv_file, img_dirs, transform=None, support_set_size=5):

        self.data = pd.read_csv(csv_file)
        self.img_dirs = img_dirs
        self.transform = transform
        self.support_set_size = support_set_size
        # Pre-compute label mapping

        self.label_map = {label: idx for idx, label in enumerate(self.data['dx'].unique())}

 

    def __len__(self):
    
        return len(self.data)
    
     
    
    def __getitem__(self, idx):
    
        # Obtain the query image
    
        question_name = self.data.iloc[idx]['image_id'] + '.jpg'
    
        label = self.data.iloc[idx]['dx']
    
     
    
        query_image = self._load_image(question_name)
    
     
    
        # Get the processed label
    
        label_id = self.label_map[label]
    
     
    
        # Construct support set
    
        support_images = []
    
        label_class_data = self.data[self.data['dx'] == label]
    
        support_indices = random.sample(list(label_class_data.index), self.support_set_size)
    
     
    
        for support_idx in support_indices:
    
            support_name = self.data.iloc[support_idx]['image_id'] + '.jpg'
    
            support_image = self._load_image(support_name)
    
            support_images.append(support_image)
    
     
    
        return query_image, torch.stack(support_images), label_id
    
     
    
    def _load_image(self, img_name):
    
        for img_dir in self.img_dirs:
    
            img_path = os.path.join(img_dir, img_name)
    
            if os.path.exists(img_path):
    
                image = Image.open(img_path).convert('RGB')
    
                if self.transform:
    
                    image = self.transform(image)
    
                return image

The provided code defines a custom dataset class named HAM10000Dataset that inherits from PyTorch's Dataset
**Initialization (init method)**: The constructor takes three parameters: csv_file, img_dirs, and an optional transform. The csv_file is expected to be a CSV file containing metadata about the images, such as their filenames and labels. The img_dirs is a list of directories where the images are stored. The transform parameter allows for optional image transformations (e.g., resizing, normalization) to be applied to the images. The constructor reads the CSV file into a pandas DataFrame and stores the image directories and transform.
**Length (len method)**: This method returns the number of samples in the dataset by returning the length of the DataFrame. This is a required method for PyTorch datasets, enabling functions like len(dataset) to work correctly.
**Get Item (get item method)**: This method retrieves a single sample from the dataset. It takes an index idx as input and performs the following steps: 
    Looks up the image name in the DataFrame using the provided index and appends the .jpg extension.
    Searches for the image file in the specified directories. If the image is found, it is opened and converted to RGB format. If the image is not found in any directory, a FileNotFoundError is raised.
    Retrieves the label for the image from the DataFrame. The label is mapped to an integer using a dictionary that maps unique labels to indices.
    If a transform is provided, it is applied to the image.
    Returns a tuple containing the image and its corresponding label.

This custom dataset class is essential for loading and preprocessing the HAM10000 dataset, making it ready for training and evaluating machine learning models. It handles the complexities of locating images across multiple directories, reading image files, and applying necessary transformations.


In [72]:
metadata_path = "../input/skin-cancer-mnist-ham10000/HAM10000_metadata.csv"

metadata = pd.read_csv(metadata_path)



# Check the number of unique images in metadata

print(f"Total images in metadata: {len(metadata)}")

Total images in metadata: 10015


The provided code snippet is responsible for loading and inspecting metadata related to the HAM10000 dataset, which is used for skin lesion analysis.


1. **Setting the Metadata Path**: The variable metadata_path is assigned the file path to the CSV file containing the metadata for the HAM10000 dataset. This path points to a file named `HAM10000_metadata.csv` located in the directory `../input/skin-cancer-mnist-ham10000/`.

2. **Loading the Metadata**: The pd.read_csv(metadata_path) function call reads the CSV file into a pandas DataFrame named metadata. This DataFrame will contain various details about the images, such as their filenames, labels, and possibly other relevant information.

3. **Checking the Number of Unique Images**: The print statement outputs the total number of images listed in the metadata. The len(metadata) function call returns the number of rows in the DataFrame, which corresponds to the number of unique images described in the metadata file.


This code is crucial for verifying that the metadata has been loaded correctly and for understanding the scope of the dataset by checking the total number of images available for analysis.

In [73]:
# Split metadata into train and test sets

train_metadata, test_metadata = train_test_split(metadata, test_size=0.99, random_state=42)



# Save split metadata for easier loading

train_metadata.to_csv("train_metadata.csv", index=False)

test_metadata.to_csv("test_metadata.csv", index=False)

The provided code snippet is responsible for splitting the metadata of the HAM10000 dataset into training and testing sets and then saving these splits to CSV files for easier future access.


1. **Splitting the Metadata**: The train_test_split function from scikit-learn is used to split the metadata DataFrame into two separate DataFrames: train_metadata and test_metadata. The test_size=0.99 parameter specifies that 99% of the data should be allocated to the test set, leaving only 1% for the training set. The random_state=42 parameter ensures that the split is reproducible by setting a seed for the random number generator.

2. **Saving the Split Metadata**: The to_csv method is called on both train_metadata and test_metadata DataFrames to save them as CSV files named train_metadata.csv and test_metadata.csv, respectively. The index=False parameter ensures that the row indices are not included in the saved CSV files.

This code is essential for preparing the dataset for machine learning tasks. By splitting the metadata into training and testing sets, it allows for proper evaluation of the model's performance. Saving these splits to CSV files makes it convenient to load the pre-split data in future sessions, avoiding the need to perform the split operation repeatedly.

In [74]:
# Directories containing images

image_dirs = [

    "../input/skin-cancer-mnist-ham10000/HAM10000_images_part_1",

    "../input/skin-cancer-mnist-ham10000/HAM10000_images_part_2"

]

The provided code snippet defines a list of directories that contain the images for the HAM10000 dataset, which is used for skin lesion analysis.

1. **Defining Image Directories**: The variable main.ipynb ) is assigned a list of two directory paths. These directories are specified as strings and point to the locations where the image files are stored. The paths are:

   - `../input/skin-cancer-mnist-ham10000/HAM10000_images_part_1`

   - `../input/skin-cancer-mnist-ham10000/HAM10000_images_part_2`


2. **Purpose of Image Directories**: These directories are likely part of the dataset's structure, where the images have been split into multiple parts for organizational purposes. By listing these directories, the code can later iterate through them to locate and load the images as needed.

This setup is crucial for managing and accessing the image files efficiently. By specifying the directories in a list, the code can easily handle the images regardless of their distribution across multiple folders. This approach simplifies the process of loading and preprocessing the images for further analysis or model training.

In [75]:
transform = transforms.Compose([

    transforms.Resize((64, 64)),  # Resize images to 256x256

    transforms.ToTensor(),         # Convert to PyTorch tensor

    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize

])

The provided code snippet defines a sequence of image transformations using the transforms.Compose function from the `torchvision` library. These transformations are applied to the images in the HAM10000 dataset to prepare them for input into a machine learning model.

1. **Resizing Images**: The transforms.Resize((64, 64)) transformation resizes the images to a fixed size of 64x64 pixels. This step ensures that all images have the same dimensions, which is necessary for batch processing in neural networks. The comment incorrectly mentions resizing to 256x256, but the actual code resizes to 64x64.

2. **Converting to Tensor**: The transforms.ToTensor() transformation converts the images from PIL format (or numpy arrays) to PyTorch tensors. This conversion is essential because PyTorch models require input data in tensor format. Additionally, this transformation scales the pixel values from the range [0, 255] to [0, 1].



3. **Normalizing**: The transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) transformation normalizes the pixel values of the images. Normalization adjusts the pixel values to have a mean of 0.5 and a standard deviation of 0.5 for each of the three color channels (red, green, and blue). This step helps in stabilizing and speeding up the training process by ensuring that the input data has a consistent distribution.

By composing these transformations, the code ensures that the images are uniformly resized, converted to a suitable format for PyTorch, and normalized. These preprocessing steps are crucial for preparing the dataset for training and evaluating machine learning models effectively.

In [76]:
# Datasets

train_dataset = HAM10000Dataset(csv_file="train_metadata.csv", img_dirs=image_dirs, transform=transform)

test_dataset = HAM10000Dataset(csv_file="test_metadata.csv", img_dirs=image_dirs, transform=transform)


The provided code snippet sets up the datasets and data loaders for training and testing a machine learning model using the HAM10000 dataset. It also prints the sizes of the training and testing datasets.

1. **Creating Datasets**: train_dataset and test_dataset are instances of the custom  HAM10000Dataset class. 
   - The train_dataset is initialized with the metadata file `train_metadata.csv`, the list of image directories image_dirs, and the transformation pipeline transform.
   - Similarly, the test_dataset is initialized with the metadata file `test_metadata.csv`, the same image directories, and the same transformation pipeline.
   - These datasets will handle loading and preprocessing the images and their corresponding labels.

2. **Creating DataLoaders**:
   - train_loader and test_loader are instances of PyTorch's DataLoader class.
   - train_loader is created with the train_dataset, a batch size of 4, shuffling enabled (shuffle=True), and 2 worker threads (num_workers=2) for parallel data loading. Shuffling ensures that the training data is presented in a different order each epoch, which helps in training the model more effectively.
   - test_loader is created with the test_dataset, the same batch size of 4, shuffling disabled (shuffle=False), and 2 worker threads. Shuffling is typically disabled for the test set to ensure consistent evaluation.

3. **Printing Dataset Sizes**:
   - The print statements output the number of samples in the training and testing datasets by calling len() on train_dataset and  test_dataset.
   - This provides a quick check to ensure that the datasets have been loaded correctly and to understand the amount of data available for training and testing.

Overall, this code is essential for preparing the data pipeline, ensuring that the images and labels are correctly loaded, preprocessed, and batched for training and evaluation of the machine learning model.

In [77]:
### Part 1: Model Components

import torch
import torch.nn as nn
import torch.nn.functional as F

# CNN Encoder class
class CNNEncoder(nn.Module):
    def __init__(self, in_channels=3, base_features=64):
        super(CNNEncoder, self).__init__()
        self.block1 = nn.Sequential(
            nn.Conv2d(in_channels, base_features, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features, base_features, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.block2 = nn.Sequential(
            nn.Conv2d(base_features, base_features * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 2),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 2, base_features * 2, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.block3 = nn.Sequential(
            nn.Conv2d(base_features * 2, base_features * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 4),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 4, base_features * 4, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.block4 = nn.Sequential(
            nn.Conv2d(base_features * 4, base_features * 8, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 8),
            nn.ReLU(inplace=True),
            nn.Conv2d(base_features * 8, base_features * 8, kernel_size=3, padding=1),
            nn.BatchNorm2d(base_features * 8),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        return x

In [None]:
import torch
import torch.nn as nn

class AttentionModule(nn.Module):
    def __init__(self, feature_dim, num_heads=4):
        super(AttentionModule, self).__init__()
        
        self.num_heads = num_heads
        self.head_dim = feature_dim // num_heads
        
        # Linear transformations for multi-head attention
        self.query_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        self.key_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        self.value_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
        
        # Multi-head attention mechanism
        self.attn_heads = nn.ModuleList(
            [nn.Sequential(
                nn.Conv2d(self.head_dim, self.head_dim, kernel_size=1),
                nn.Softmax(dim=-1)  # Softmax across the spatial dimension
            ) for _ in range(num_heads)]
        )
        
        # Channel attention to recalibrate feature maps
        self.channel_attention = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(feature_dim, feature_dim // 16, kernel_size=1),
            nn.ReLU(),
            nn.Conv2d(feature_dim // 16, feature_dim, kernel_size=1),
            nn.Sigmoid()
        )
        
        # Spatial attention to emphasize important regions in the spatial dimension
        self.spatial_attention = nn.Sequential(
            nn.Conv2d(2, 1, kernel_size=7, padding=3),
            nn.Sigmoid()
        )
        
        # Final 1x1 conv to combine outputs
        self.output_conv = nn.Conv2d(feature_dim, feature_dim, kernel_size=1)
    
    def forward(self, features):
        # Compute query, key, and value maps for multi-head attention
        queries = self.query_conv(features)  # [B, C, H, W]
        keys = self.key_conv(features)       # [B, C, H, W]
        values = self.value_conv(features)   # [B, C, H, W]
        
        B, C, H, W = queries.size()
        queries = queries.view(B, self.num_heads, self.head_dim, H * W).permute(0, 1, 3, 2)  # [B, heads, H*W, head_dim]
        keys = keys.view(B, self.num_heads, self.head_dim, H * W).permute(0, 1, 3, 2)        # [B, heads, H*W, head_dim]
        values = values.view(B, self.num_heads, self.head_dim, H * W).permute(0, 1, 3, 2)    # [B, heads, H*W, head_dim]
        
        # Multi-head attention
        attention_outputs = []
        for i in range(self.num_heads):
            attn_weights = torch.bmm(queries[:, i], keys[:, i].transpose(1, 2))  # [B, H*W, H*W]
            attn_weights = self.attn_heads[i](attn_weights.view(B, H, W, H * W)).view(B, H * W, H * W)  # Apply learned attention map
            attn_output = torch.bmm(attn_weights, values[:, i])  # [B, H*W, head_dim]
            attn_output = attn_output.view(B, H, W, self.head_dim).permute(0, 3, 1, 2)  # [B, head_dim, H, W]
            attention_outputs.append(attn_output)
        
        # Concatenate all attention head outputs
        multi_head_output = torch.cat(attention_outputs, dim=1)  # [B, C, H, W]
        
        # Channel Attention
        channel_attn_weights = self.channel_attention(multi_head_output)
        channel_attn_output = multi_head_output * channel_attn_weights  # Element-wise multiplication (recalibration)
        
        # Spatial Attention
        avg_pool = torch.mean(channel_attn_output, dim=1, keepdim=True)  # Average pooling across channels
        max_pool = torch.max(channel_attn_output, dim=1, keepdim=True)[0]  # Max pooling across channels
        spatial_attn_weights = self.spatial_attention(torch.cat([avg_pool, max_pool], dim=1))
        spatial_attn_output = channel_attn_output * spatial_attn_weights  # Element-wise multiplication (spatial recalibration)
        
        # Final 1x1 conv to produce the final attention output
        output = self.output_conv(spatial_attn_output)
        return output# Define the AttentionModule class

In [79]:
# Define the MTUNet2 class
class MTUNet2(nn.Module):
    def __init__(self, in_channels=3, base_features=64, num_classes=5, feature_dim=512, num_heads=4):
        super(MTUNet2, self).__init__()
        self.encoder = CNNEncoder(in_channels, base_features)
        self.attn_module = AttentionModule(feature_dim=feature_dim * 8, num_heads=num_heads)
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(feature_dim * 8 * 4 * 4, 1024),
            nn.ReLU(inplace=True),
            nn.Linear(1024, num_classes)
        )

    def forward(self, query, support):
        query_features = self.encoder(query)
        B, N, C, H, W = support.size()
        support = support.view(B * N, C, H, W)
        support_features = self.encoder(support)
        support_features = support_features.view(B, N, -1, H // 16, W // 16)
        support_features = support_features.mean(dim=1)

        query_attn = self.attn_module(query_features)
        support_attn = self.attn_module(support_features)

        combined_features = torch.cat([query_attn, support_attn], dim=1)
        outputs = self.classifier(combined_features)
        return outputs

In [80]:
# Define the train function
def train(model, train_loader, criterion, optimizer, epoch, num_epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0  # Initialize running loss
    total_correct = 0  # Initialize the count of correct predictions
    total_samples = 0  # Initialize the count of total samples

    for query, support, labels in train_loader:
        # Move data to the correct device
        query, support, labels = query.to(device), support.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(query, support)

        # Compute the loss
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Accumulate the loss
        running_loss += loss.item()

        # Compute the number of correct predictions
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
        total_samples += labels.size(0)

    # Calculate average loss and accuracy
    avg_loss = running_loss / len(train_loader)
    accuracy = 100 * total_correct / total_samples
    print(f'Epoch [{epoch}/{num_epochs}], Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%')

    return avg_loss, accuracy

# Define the evaluation function
def evaluate(model, test_loader, criterion):
    model.eval()  # Set the model to evaluation mode
    total_correct = 0  # Initialize the count of correct predictions
    total_samples = 0  # Initialize the count of total samples
    total_loss = 0.0  # Initialize the total loss

    with torch.no_grad():  # Disable gradient calculation for evaluation
        for query, support, labels in test_loader:
            # Move data to the correct device
            query, support, labels = query.to(device), support.to(device), labels.to(device)

            # Forward pass
            outputs = model(query, support)

            # Compute the loss
            loss = criterion(outputs, labels)

            # Accumulate the loss
            total_loss += loss.item()

            # Compute the number of correct predictions
            _, predicted = torch.max(outputs, 1)
            total_correct += (predicted == labels).sum().item()
            total_samples += labels.size(0)

    # Calculate average loss and accuracy
    avg_loss = total_loss / len(test_loader)
    accuracy = 100 * total_correct / total_samples
    print(f'Test Loss: {avg_loss:.4f}, Test Accuracy: {accuracy:.2f}%')

    return avg_loss, accuracy

In [81]:
import torch.optim as optim
from torch.utils.data import DataLoader

# Main training loop
def main_train_loop(train_loader, test_loader, model, criterion, optimizer, num_epochs):
    for epoch in range(1, num_epochs + 1):
        # Train the model for one epoch
        train_loss, train_acc = train(model, train_loader, criterion, optimizer, epoch, num_epochs)
        
        # Evaluate the model on the test set
        test_loss, test_acc = evaluate(model, test_loader, criterion)

# Configuration
num_epochs = 1
learning_rate = 0.001
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Initialize the model
model = MTUNet2(in_channels=3, base_features=64, num_classes=5, feature_dim=512, num_heads=4).to(device)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Assuming train_loader and test_loader are defined with compatible shapes and data

# Example data loaders (replace with actual data initialization)
# train_loader = DataLoader(...)
# test_loader = DataLoader(...)
# DataLoaders

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=2)

test_loader = DataLoader(test_dataset, batch_size=4, shuffle=False, num_workers=2)



# Print dataset sizes

print(f"Number of training samples: {len(train_dataset)}")

print(f"Number of testing samples: {len(test_dataset)}")
# Start the training process
# Make sure train_loader and test_loader are properly initialized before running
main_train_loop(train_loader, test_loader, model, criterion, optimizer, num_epochs)

Number of training samples: 100
Number of testing samples: 9915


RuntimeError: Given groups=1, weight of size [4096, 4096, 1, 1], expected input[4, 512, 4, 4] to have 4096 channels, but got 512 channels instead