<a href="https://www.kaggle.com/code/rishabhsingh18/models-genesis-experiment?scriptVersionId=188204650" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
import os
import nibabel as nib
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

Data collection and pre processing, we use the BraTS dataset which is about Brain tumor

In [None]:
# Helper Functions
def load_nifti_image(file_path):
    img = nib.load(file_path)
    img_data = img.get_fdata()
    return img_data


def preprocess_image(img):
    img = (img - np.min(img)) / (np.max(img) - np.min(img))
    return img


def create_inpainting_task(inputs):
    # Mask random parts of the input image
    mask = torch.rand(inputs.shape) > 0.85
    masked_inputs = inputs.clone()
    masked_inputs[mask] = 0
    return masked_inputs, inputs


def dice_coefficient(pred, target, smooth=1e-5):
    intersection = (pred * target).sum()
    return (2. * intersection + smooth) / (pred.sum() + target.sum() + smooth)


- Model Design and Training - here we define a 3D U-Net model and train it using self supervised tasks like image inpainting.

In [None]:
# Define the Dataset Classes
class MedicalDataset(Dataset):
    def __init__(self, images):
        self.images = images

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img = self.images[idx]
        return img


class SegmentationDataset(Dataset):
    def __init__(self, images, labels):
        self.images = images
        self.labels = labels

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img = self.images[idx]
        lbl = self.labels[idx]
        return img, lbl

In [None]:
# Define the 3D U-Net Model
class UNet3D(nn.Module):
    def __init__(self):
        super(UNet3D, self).__init__()
        # Encoder
        self.enc1 = nn.Sequential(nn.Conv3d(1, 32, 3, padding=1), nn.ReLU(), nn.Conv3d(32, 32, 3, padding=1), nn.ReLU())
        self.enc2 = nn.Sequential(nn.MaxPool3d(2), nn.Conv3d(32, 64, 3, padding=1), nn.ReLU(), nn.Conv3d(64, 64, 3, padding=1), nn.ReLU())
        self.enc3 = nn.Sequential(nn.MaxPool3d(2), nn.Conv3d(64, 128, 3, padding=1), nn.ReLU(), nn.Conv3d(128, 128, 3, padding=1), nn.ReLU())
        
        # Decoder
        self.dec3 = nn.Sequential(nn.ConvTranspose3d(128, 64, 2, stride=2), nn.Conv3d(64, 64, 3, padding=1), nn.ReLU(), nn.Conv3d(64, 64, 3, padding=1), nn.ReLU())
        self.dec2 = nn.Sequential(nn.ConvTranspose3d(64, 32, 2, stride=2), nn.Conv3d(32, 32, 3, padding=1), nn.ReLU(), nn.Conv3d(32, 32, 3, padding=1), nn.ReLU())
        self.dec1 = nn.Conv3d(32, 1, 1)

    def forward(self, x):
        # Encoder
        e1 = self.enc1(x)
        e2 = self.enc2(e1)
        e3 = self.enc3(e2)
        
        # Decoder
        d3 = self.dec3(e3)
        d2 = self.dec2(d3 + e2)
        d1 = self.dec1(d2 + e1)
        
        return d1

In [None]:
# Main Experiment
def main():
    # Load and preprocess the dataset
    data_dir = '/kaggle/input/brats2020-training-data/BraTS20 Training Metadata.csv'
    image_files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.nii')]
    images = [preprocess_image(load_nifti_image(f)) for f in image_files]
    
    # Split the dataset into training and validation sets
    train_images, val_images = train_test_split(images, test_size=0.2, random_state=42)
    
    # Define the model, loss function, and optimizer
    model = UNet3D().cuda()
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # Data loaders for self-supervised training
    train_dataset = MedicalDataset(train_images)
    val_dataset = MedicalDataset(val_images)
    train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=4)
    
    # Self-supervised training loop
    for epoch in range(100):
        model.train()
        running_loss = 0.0
        for data in train_loader:
            inputs = data.unsqueeze(1).float().cuda()
            masked_inputs, targets = create_inpainting_task(inputs)
            optimizer.zero_grad()
            outputs = model(masked_inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        
        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')
    
    # Load segmentation labels and preprocess
    segmentation_labels = [preprocess_image(load_nifti_image(f.replace('image', 'label'))) for f in image_files]
    labels = [preprocess_image(lbl) for lbl in segmentation_labels]
    train_labels, val_labels = train_test_split(labels, test_size=0.2, random_state=42)
    
    # Update the dataset class to include labels
    train_dataset = SegmentationDataset(train_images, train_labels)
    val_dataset = SegmentationDataset(val_images, val_labels)
    train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=4)
    
    # Redefine the loss function for segmentation
    criterion = nn.CrossEntropyLoss()
    
    # Fine-tuning loop
    for epoch in range(50):
        model.train()
        running_loss = 0.0
        for data in train_loader:
            inputs, labels = data
            inputs, labels = inputs.unsqueeze(1).float().cuda(), labels.unsqueeze(1).float().cuda()
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        
        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')
        
        # Validate the model
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for data in val_loader:
                inputs, labels = data
                inputs, labels = inputs.unsqueeze(1).float().cuda(), labels.unsqueeze(1).float().cuda()
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item()
        
        print(f'Validation Loss: {val_loss/len(val_loader)}')
    
    # Evaluation on the validation set
    model.eval()
    dice_scores = []
    with torch.no_grad():
        for data in val_loader:
            inputs, labels = data
            inputs, labels = inputs.unsqueeze(1).float().cuda(), labels.unsqueeze(1).float().cuda()
            outputs = model(inputs)
            preds = torch.argmax(outputs, dim=1)
            dice = dice_coefficient(preds.cpu().numpy(), labels.cpu().numpy())
    print(f'Mean Dice Coefficient: {np.mean(dice_scores)}')


if __name__ == '__main__':
    main()


# References :


Sure, here are the references for downloading the BraTS dataset:

### 1. Medical Segmentation Decathlon
- **Website**: [Medical Segmentation Decathlon](http://medicaldecathlon.com/)
- **Dataset**: The BraTS dataset is available as part of the Medical Segmentation Decathlon challenge, which provides multiple medical imaging datasets for various tasks.

### 2. BraTS Challenge
- **Website**: [BraTS Challenge](https://www.med.upenn.edu/cbica/brats2020/data.html)
- **Dataset**: The BraTS Challenge focuses on the segmentation of brain tumors and provides datasets for training and validation, along with detailed instructions on how to download and use the data.

### 3. Kaggle
- **Website**: [Kaggle BraTS Dataset](https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation)
- **Dataset**: Kaggle hosts the BraTS dataset, making it easy to download and integrate into your machine learning workflows. You can use the Kaggle API to download the dataset directly.

### Additional References
For more detailed instructions on setting up the Kaggle API and downloading datasets, refer to the official Kaggle documentation:
- **Kaggle API Documentation**: [Kaggle API](https://www.kaggle.com/docs/api)

### Steps for Downloading and Using the Dataset

#### 1. Medical Segmentation Decathlon
To download the dataset from the Medical Segmentation Decathlon:
1. Visit [Medical Segmentation Decathlon](http://medicaldecathlon.com/).
2. Find the BraTS dataset listed under the available datasets.
3. Follow the provided instructions to download the dataset.

#### 2. BraTS Challenge
To download the dataset from the BraTS Challenge:
1. Visit [BraTS Challenge](https://www.med.upenn.edu/cbica/brats2020/data.html).
2. Register for the challenge if required.
3. Download the training and validation datasets from the provided links.

#### 3. Kaggle
To download the dataset from Kaggle:
1. Ensure you have the Kaggle API installed (`pip install kaggle`).
2. Set up your Kaggle API key by following the [instructions](https://www.kaggle.com/docs/api).
3. Use the following commands to download and extract the dataset:

```bash
kaggle datasets download awsaf49/brats20-dataset-training-validation
unzip brats20-dataset-training-validation.zip -d path_to_extract
```

### Example Script for Downloading from Kaggle

Here’s a script to download the dataset using the Kaggle API:

```bash
# Install Kaggle API
pip install kaggle

# Download the dataset
kaggle datasets download awsaf49/brats20-dataset-training-validation

# Extract the dataset
unzip brats20-dataset-training-validation.zip -d path_to_extract
```

### Summary
These references and steps should help you download and set up the BraTS dataset for your experiment. If you need more information or assistance, feel free to ask!