# Project 02 - Image segmentation and object detection

__Handout date:__ 23.05.2024  
__Submission deadline:__ 19.06.2024 - 23:59  
__Topics:__ Segmentation and object detection.  
__Submission link:__ https://fz-juelich.sciebo.de/s/410NfCTI5rMLv1n

In this project, we would like you to investigate image segmentation and object detection.
For this, you will use the data from the [Broad Bioimage Benchmark Collection](https://bbbc.broadinstitute.org/BBBC039).
The dataset contains images of cells acquired using fluorescence microscopy, along with annotations of individual cells.
Your goal will be to apply the segmentation and detection methods described in the lecture to the dataset.

__Note:__ The main goal of this project to get you working on a real-world segmentation/detection task. Projects will not be graded based on the performance of the trained classifiers.

![A stack of images](./broad_dataset.png)

We suggest that you follow the following  steps:
1. __Know your data__ Have a look  at the resources provided on the website to understand the dataset.
1. Download and inspect the images, groundtruth annotations, and metadata. Plot a few example datapoints. __Tip:__ The data has fixed URLs, so you can download the data from within the notebook (e.g., using `!wget URL`).
1. Write a data loader that allows you to use the data in PyTorch. Split the data according to the training, test, and validation files provided in the metadata.
1. Train one or multiple of the segmentation models discussed in the lecture (or any other segmentation model you would like to try) to segment the cells. Report the loss curve, appropriate quality metrics, and some example results of the trained model(s).
1. Train one of multiple of the detection models discussed in the lecture (or any other detection method you would like to try) to detect the cells. Report the loss curve, appropriate quality metrics, and some example results of the trained model(s).

Tipps and hints:
1. Please do not add the dataset to your submission. Use `.gitignore` to ignore the directory containing the downloaded dataset.
1. Think about how you have to process the provided groundtruth data to make it usable for segmentation and detection.
1. `scikit-image` provides several helpful functions for extracting information from masks. Have a look at `skimage.measure.label` and `skimage.measure.regionprops`. These can help to convert data into a format that is appropriate for detection.

In [1]:
!wget -c https://data.broadinstitute.org/bbbc/BBBC039/images.zip
!wget -c https://data.broadinstitute.org/bbbc/BBBC039/masks.zip
!wget -c https://data.broadinstitute.org/bbbc/BBBC039/metadata.zip
!unzip images.zip
!unzip masks.zip
!unzip metadata.zip

--2024-06-19 14:37:23--  https://data.broadinstitute.org/bbbc/BBBC039/images.zip
Resolving data.broadinstitute.org (data.broadinstitute.org)... 69.173.68.137
Connecting to data.broadinstitute.org (data.broadinstitute.org)|69.173.68.137|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 77915748 (74M) [application/zip]
Saving to: ‘images.zip’


2024-06-19 14:37:52 (2.60 MB/s) - ‘images.zip’ saved [77915748/77915748]

--2024-06-19 14:37:52--  https://data.broadinstitute.org/bbbc/BBBC039/masks.zip
Resolving data.broadinstitute.org (data.broadinstitute.org)... 69.173.68.137
Connecting to data.broadinstitute.org (data.broadinstitute.org)|69.173.68.137|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2753811 (2.6M) [application/zip]
Saving to: ‘masks.zip’


2024-06-19 14:37:53 (2.98 MB/s) - ‘masks.zip’ saved [2753811/2753811]

--2024-06-19 14:37:53--  https://data.broadinstitute.org/bbbc/BBBC039/metadata.zip
Resolving data.broadinstitute.org (d

In [2]:
import glob
import matplotlib.image as mpimg

images = []
for img_path in glob.glob('./images/*.tif'):
  images.append(mpimg.imread(img_path))

masks = []
for mask_path in glob.glob('./masks/*.png'):
  masks.append(mpimg.imread(mask_path))

In [3]:
masks[0].shape, images[0].shape

((520, 696, 4), (520, 696))

In [4]:
#create subfolders of images and masks for training, testing and validating separately
import os
os.mkdir('./images/train')
os.mkdir('./images/test')
os.mkdir('./images/validation')
os.mkdir('./masks/train')
os.mkdir('./masks/test')
os.mkdir('./masks/validation')

In [5]:
#move the image files to the corresponding subfolders
import os, shutil
image_path = './images'
mask_path = './masks'
train_txt = r'./metadata/training.txt'
test_txt = r'./metadata/test.txt'
validation_txt = r'./metadata/validation.txt'

for i in open(train_txt, 'r'):
  shutil.move(image_path + '/' + i.replace('.png\n', '.tif'), image_path + '/train'+'/'+i.replace('.png\n', '.tif'))
  shutil.move(mask_path + '/' + i.replace('.png\n', '.png'), mask_path + '/train'+'/'+i.replace('.png\n', '.png'))

for i in open(test_txt, 'r'):
  shutil.move(image_path + '/' + i.replace('.png\n', '.tif'), image_path + '/test'+'/'+i.replace('.png\n', '.tif'))
  shutil.move(mask_path + '/' + i.replace('.png\n', '.png'), mask_path + '/test'+'/'+i.replace('.png\n', '.png'))

for i in open(validation_txt, 'r'):
  shutil.move(image_path + '/' + i.replace('.png\n', '.tif'), image_path + '/validation'+'/'+i.replace('.png\n', '.tif'))
  shutil.move(mask_path + '/' + i.replace('.png\n', '.png'), mask_path + '/validation'+'/'+i.replace('.png\n', '.png'))


In [6]:
#create dataset
from torch.utils.data import Dataset
import cv2

class TheDataset(Dataset):
	def __init__(self, imagePaths, maskPaths, transforms):
		self.imagePaths = imagePaths
		self.maskPaths = maskPaths
		self.transforms = transforms
	def __len__(self):
		# return the number of total samples contained in the dataset
		return len(self.imagePaths)
	def __getitem__(self, idx):
		# grab the image path from the current index
		imagePath = self.imagePaths[idx]
		image = cv2.imread(imagePath)
		image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
		mask = cv2.imread(self.maskPaths[idx], 0)
		# check to see if we are applying any transformations
		if self.transforms is not None:
			# apply the transformations to both image and its mask
			image = self.transforms(image)
			mask = self.transforms(mask)
		#process the information from masks, to be done

		return (image, mask)

In [25]:
#process the information from masks for model training
import skimage
import skimage.io

mask0 = mpimg.imread('./masks/train/IXMtest_A06_s6_w1B9577918-4973-4A87-BA73-A168AA755527.png')
mask0_1 = skimage.io.imread('./masks/train/IXMtest_A06_s6_w1B9577918-4973-4A87-BA73-A168AA755527.png')
tt0 = skimage.measure.label(mask0, return_num = True)
tt0_1 = skimage.measure.label(mask0_1, return_num = True)
tt1 = skimage.measure.regionprops(tt0[0])
tt1_1 = skimage.measure.regionprops(tt0_1[0])

In [31]:
mask0.shape, mask0_1.shape, tt0[0].shape, tt0_1[0].shape, tt0[1], tt0_1[1]
#tt0[0] is label of mask0, tt0_1[0] is label of mask0_1
#num_classes of mask0 is 1,
#num_classes of mask0_1 is 72...why?

((520, 696, 4), (520, 696, 4), (520, 696, 4), (520, 696, 4), 1, 72)

In [32]:
#min, max col/row of label 0 area of tt0 and tt0_1
tt1[0].bbox, tt1_1[0].bbox

((0, 0, 3, 520, 696, 4), (0, 0, 3, 520, 696, 4))

In [None]:
from torchvision import transforms
#import torchvision
import torchvision.transforms as T

def get_transform(train):
  transforms = []
  transforms.append(T.ToTensor())
  if train:
    transforms.append(T.RandomHorizontalFlip(0.5))
  return T.Compose(transforms)

trainImages = [i for i in glob.glob('./images/train/*.tif')]
testImages = [i for i in glob.glob('./images/test/*.tif')]
validationImages = [i for i in glob.glob('./images/validation/*.tif')]
trainMasks = [i for i in glob.glob('./masks/train/*.png')]
testMasks = [i for i in glob.glob('./masks/test/*.png')]
validationMasks = [i for i in glob.glob('./masks/validation/*.png')]

trainDS = TheDataset(imagePaths=trainImages, maskPaths=trainMasks,
	transforms=transforms)
testDS = TheDataset(imagePaths=testImages, maskPaths=testMasks,
    transforms=transforms)


In [None]:
print(f"[INFO] found {len(trainDS)} examples in the training set...")
print(f"[INFO] found {len(testDS)} examples in the test set...")

[INFO] found 100 examples in the training set...
[INFO] found 50 examples in the test set...


In [None]:
from torch.utils.data import DataLoader

trainLoader = DataLoader(trainDS, shuffle=True, batch_size=16)
testLoader = DataLoader(testDS, shuffle=False, batch_size=16)