# Plankton Identifier - Classification from plankton imager

This notebook provides code to run a CNN to classify plankton from images from the plankton imager. It treats this problem as a classification task. The labels are extracted from the folder names. It makes predictions (i.e., inference) on unseen images, and stores the predictions in a csv file for every subfolder.

## Import modules and set parameters

In [1]:
# Import necessary libraries
import fastai
from fastai.vision.all import *
print ('You are using FastAI version: ' + fastai.__version__)
# FastAI version in counting paper: 2.0.13

You are using FastAI version: 2.7.13


In [2]:
filename = 'Plankton_imager_02_v03c' # Insert your filename, for repeated experiments with different trainings hyperparameters (see below)
bs = 600 # Insert highest working batchsize here (limited by hardware)
np.random.seed(3)

NOTE TO SELF:

When a class n=1, it gives an error when it ends up in the validation set (and not in the training set). Easy work around is to reset random seed, and hope it ends up in the training set this time.

Nb: bs = 20 gave a usage of 2421MiB on the GPU (A100)

## Getting the data

In [3]:
# Get image paths
path = Path('data/train_WMR_PI10_learning_set_v4');
path.ls()

(#67) [Path('data/train_WMR_PI10_learning_set_v4/cnidaria_ephyrae'),Path('data/train_WMR_PI10_learning_set_v4/larvacea_tail_only'),Path('data/train_WMR_PI10_learning_set_v4/larvacea_Oikopleura_complete'),Path('data/train_WMR_PI10_learning_set_v4/crustacea_megalopa'),Path('data/train_WMR_PI10_learning_set_v4/larvacea_Friitilaria_complete'),Path('data/train_WMR_PI10_learning_set_v4/copepoda_nauplii'),Path('data/train_WMR_PI10_learning_set_v4/echinoderm_stars_echino'),Path('data/train_WMR_PI10_learning_set_v4/copepoda_monstrilloidae'),Path('data/train_WMR_PI10_learning_set_v4/balanoid_nauplius'),Path('data/train_WMR_PI10_learning_set_v4/crustacea_exuvium')...]

In [4]:
fnames = get_image_files(path)
fnames[:3]

(#3) [Path('data/train_WMR_PI10_learning_set_v4/cnidaria_ephyrae/pia7.2024-01-29.1350+N00029030.tif'),Path('data/train_WMR_PI10_learning_set_v4/cnidaria_ephyrae/pia7.2024-01-11.1820+N00018671.tif'),Path('data/train_WMR_PI10_learning_set_v4/cnidaria_ephyrae/pia7.2024-01-11.0300+N00035250.tif')]

## Creating a dataset

In [5]:
# Create dataset

block = DataBlock(blocks=(ImageBlock, CategoryBlock), # for regression, change this CategoryBlock
                  splitter=RandomSplitter(),
                  get_items = get_image_files, 
                  get_y = parent_label,
                  #item_tfms=Resize((770,1040), method='squish'), 
                  item_tfms=Resize(300, ResizeMethod.Pad, pad_mode='zeros'), # see page 73 book
                  batch_tfms=[*aug_transforms( #https://docs.fast.ai/vision.augment#aug_transforms
                      mult=1.0, 
                      do_flip=True,
                      flip_vert=True,
                      max_rotate=0.2,
                      min_zoom=1.0,
                      max_zoom=1.1,
                      max_lighting=0.3,
                      max_warp=0.1,
                      p_affine=0.5,
                      p_lighting=0.5,
                      pad_mode='zeros'), 
                              Normalize.from_stats(*imagenet_stats)] 
                 )

In [6]:
#dls = block.dataloaders(df,bs=bs)
dls = block.dataloaders(path, bs=bs)

## Define model

In [7]:
# met metrics


# Create Learner
learn = vision_learner(dls, resnet50, #loss_func=Huber(), # different ResNets and Loss functions can be specified here
                    metrics=error_rate
                   ); # creates pretrained model
learn.model = torch.nn.DataParallel(learn.model) # Parallels computations over multiplle GPUs

print('This is Plankton Identifier version: ' + filename) # See top
print('The batchsize is set at:', bs) # See top
print('The loss function is:', learn.loss_func) # Double check current loss func

This is Plankton Identifier version: Plankton_imager_02_v03c
The batchsize is set at: 600
The loss function is: FlattenedLoss of CrossEntropyLoss()


In [8]:
learn.load('Plankton_imager_02_v03_stage-2_Best')

<fastai.learner.Learner at 0x15220a0cb850>

# Inference

### Set up images for predictions

In [9]:
# Path towards images
PathImgs = Path('../Project003a_Plankton_imager/data_tar'); # 

In [10]:
%%time

# Loop over all subdirectories in the parent directory
for subdir in os.listdir(PathImgs):
    subdir_path = os.path.join(PathImgs, subdir)
    
    # Check if it's a directory
    if os.path.isdir(subdir_path):
        print(f"Started processing: {subdir_path}")

        # Get images to predict
        imgs = get_image_files(subdir_path)
        imgs.sort()
        
        # Check if the imgs list is empty
        if len(imgs) == 0:
            print(f"No images found in {subdir_path}")
            print("=================================================")
            continue  # Skip to the next directory
        
        print(f"{len(imgs)} images in {subdir}")
        print('Path first image: ' + str(imgs[0]))
        print('Path last image: ' + str(imgs[-1]))
        
        # Create image batch for predicting
        dl = learn.dls.test_dl(imgs)

        # Get predictions for image batch
        preds, ignored1, label_numeric = learn.get_preds(dl=dl, with_decoded=True)
        print("Made preds")
        
        # Create table (i.e. dataframe) with predictions
        testdf = pd.DataFrame()  # creates empty table
        testdf['id'] = imgs.items  # adds filenames & paths to table
        testdf['label'] = label_numeric  # adds predictions to table
        print("Created df")

        # Adds prediction (confidence) for every class
        for i in range(len(preds[0])):
            testdf[i] = preds.numpy()[:, i]
        print("Added preds to df")
            
        # Export table to .csv file for downstream applications
        testdf.to_csv(subdir+'_Preds_v01c.csv', index=False, float_format='%.9f')
        print(f"Finished processing: {subdir_path}")
        print("=================================================")


Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-06
8208497 images in 2023-06-06
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-06/Background.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-06/untarred_1720/RawImages/pia6.2023-06-06.1720+N00052811.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-06
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-05
4375702 images in 2023-06-05
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-05/untarred_1004/RawImages/pia6.2023-06-05.1004+N00021501.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-05/untarred_1720/RawImages/pia6.2023-06-05.1720+N00043499.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-05
Started processing: ../Project003a_Plankton_imager/data_tar/2023-05-31
7498508 images in 2023-05-31
Path first image: ../Project003a_Plankton_imager/data_tar/2023-05-31/untarred_0346/RawImages/pia6.2023-05-31.0346+N00006160.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-05-31/untarred_1820/Background.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-05-31
Started processing: ../Project003a_Plankton_imager/data_tar/.ipynb_checkpoints
No images found in ../Project003a_Plankton_imager/data_tar/.ipynb_checkpoints
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-12
222789 images in 2023-06-12
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-12/untarred_1130/RawImages/pia6.2023-06-12.1130+N00063004.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-12/untarred_1330/RawImages/pia6.2023-06-12.1330+N00011330.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-12
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-13
7654596 images in 2023-06-13
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-13/untarred_0350/RawImages/pia6.2023-06-13.0350+N00029000.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-13/untarred_1630/RawImages/pia6.2023-06-13.1630+N00092459.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-13
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-08
11684096 images in 2023-06-08
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-08/untarred_0352/RawImages/pia6.2023-06-08.0352+N00014168.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-08/untarred_2350/RawImages/pia6.2023-06-08.2350+N00099999.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-08
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-02
2772399 images in 2023-06-02
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-02/untarred_0350/RawImages/pia6.2023-06-02.0350+N00017833.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-02/untarred_0820/RawImages/pia6.2023-06-02.0820+N00097332.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-02
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-09
7670009 images in 2023-06-09
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-09/untarred_0000/Background.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-09/untarred_1240/RawImages/pia6.2023-06-09.1240+N00070044.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-09
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-07
8133503 images in 2023-06-07
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-07/untarred_0344/RawImages/pia6.2023-06-07.0344+N00005476.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-07/untarred_1950/RawImages/pia6.2023-06-07.1950+N00001047.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-07
Started processing: ../Project003a_Plankton_imager/data_tar/2023-05-30
3715874 images in 2023-05-30
Path first image: ../Project003a_Plankton_imager/data_tar/2023-05-30/untarred_1203/RawImages/pia6.2023-05-30.1203+N00024166.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-05-30/untarred_1830/RawImages/pia6.2023-05-30.1830+N00071999.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-05-30
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-15
7924400 images in 2023-06-15
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-15/untarred_0350/RawImages/pia6.2023-06-15.0350+N00009198.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-15/untarred_1700/RawImages/pia6.2023-06-15.1700+N00033832.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-15
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-01
8581369 images in 2023-06-01
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-01/untarred_0406/RawImages/pia6.2023-06-01.0406+N00019335.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-01/untarred_1820/RawImages/pia6.2023-06-01.1820+N00064332.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-01
Started processing: ../Project003a_Plankton_imager/data_tar/2023-06-14
7247883 images in 2023-06-14
Path first image: ../Project003a_Plankton_imager/data_tar/2023-06-14/untarred_0354/RawImages/pia6.2023-06-14.0354+N00012396.tif
Path last image: ../Project003a_Plankton_imager/data_tar/2023-06-14/untarred_1600/RawImages/pia6.2023-06-14.1600+N00001665.tif


Made preds
Created df
Added preds to df
Finished processing: ../Project003a_Plankton_imager/data_tar/2023-06-14
CPU times: user 13h 30min 7s, sys: 1h 27min 30s, total: 14h 57min 37s
Wall time: 14h 13s


In [11]:
learn.dls.vocab

['amphioxus_larva', 'artefact_cleaning_fibre', 'artefact_long_line', 'balanoid_cypris', 'balanoid_nauplius', 'bryozoa_cyphonautes', 'bubbles', 'chaetognath', 'cladocera_evadne', 'cladocera_podon_pleopis', 'cnidaria-tentacle', 'cnidaria_ephyrae', 'cnidaria_hydromedusae', 'cnidaria_hydropolyp', 'copepoda', 'copepoda_monstrilloidae', 'copepoda_nauplii', 'crustacea_amphipoda', 'crustacea_caprella', 'crustacea_exuvium', 'crustacea_megalopa', 'crustacea_mysida', 'crustacea_zoea_shrimp', 'crutacea_zoea_crab', 'ctenophora', 'cumacea', 'detritus', 'diatom_loop', 'diatom_odontella', 'diatom_other', 'diatom_setae', 'diatom_solitary_centric', 'diatom_straight', 'dinoflagellate_Tripos', 'dinoflagellate_noctiluca_fragment', 'dinoflagellate_noctiluca_intact', 'dinoflagellate_pyrocystis', 'echinoderm_bipinnaria', 'echinoderm_branchiolaria', 'echinoderm_echinopluteus_type_1', 'echinoderm_echinopluteus_type_2', 'echinoderm_ophiopluteus', 'echinoderm_stars_asteridae', 'echinoderm_stars_echino', 'echinode

In [None]:
# Some CUDA/GPU tests

def check_all_cuda_devices():
    device_count = torch.cuda.device_count()
    for i in range(device_count):
        print('>>>> torch.cuda.device({})'.format(i))
        result = torch.cuda.device(i)
        print(result, '\n')

        print('>>>> torch.cuda.get_device_name({})'.format(i))
        result = torch.cuda.get_device_name(i)
        print(result, '\n')


def check_cuda():
    print('>>>> torch.cuda.is_available()')
    result = torch.cuda.is_available()
    print(result, '\n')

    print('>>>> torch.cuda.device_count()')
    result = torch.cuda.device_count()
    print(result, '\n')

    print('>>>> torch.cuda.current_device()')
    result = torch.cuda.current_device()
    print(result, '\n')

    print('>>>> torch.cuda.device(0)')
    result = torch.cuda.device(0)
    print(result, '\n')

    print('>>>> torch.cuda.get_device_name(0)')
    result = torch.cuda.get_device_name(0)
    print(result, '\n')

    check_all_cuda_devices()


def check_cuda_ops():
    print('>>>> torch.zeros(2, 3)')
    zeros = torch.zeros(2, 3)
    print(zeros, '\n')

    print('>>>> torch.zeros(2, 3).cuda()')
    cuda_zero = torch.zeros(2, 3).cuda()
    print(cuda_zero, '\n')

    print('>>>> torch.tensor([[1, 2, 3], [4, 5, 6]])')
    tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]]).cuda()
    print(tensor_a, '\n')

    print('>>>> tensor_a + cuda_zero')
    sum = tensor_a + cuda_zero
    print(sum, '\n')

    print('>>>> tensor_a * cuda_twos')
    tensor_a = tensor_a.to(torch.float)
    cuda_zero = cuda_zero.to(torch.float)
    cuda_twos = (cuda_zero + 1.0) * 2.0
    product = tensor_a * cuda_twos
    print(product, '\n')

    print('>>>> torch.matmul(tensor_a, cuda_twos.T)')
    mat_mul = torch.matmul(tensor_a, cuda_twos.T)
    print(mat_mul, '\n')

try:
    check_all_cuda_devices()
except Exception as e:
    print('get_version() failed, exception message below:')
    print(e)

try:
    check_cuda()
except Exception as e:
    print('check_cuda() failed, exception message below:')
    print(e)

try:
    check_cuda_ops()
except Exception as e:
    print('check_cuda_ops() failed, exception message below:')
    print(e)