## Feature extraction Cpfl1 - napari_simpleitk_image_processing

The dataset contains Cpfl1 animals and corresponding WT animals of the postnatal ages: 8, 14, 20, 30, 70, and 245.

The workflow for  extracting features with `napari_simpleitk_image_processing` and its `label_statistics` function was already demonstrated [here]. 

This notebook aims to extract features from all images in the corresponding folder using a for loop.

In [1]:
import apoc
import numpy as np
import os
import pandas as pd
import pyclesperanto_prototype as cle

from napari_simpleitk_image_processing import label_statistics

import sys
sys.path.append("../..")
from quapos_lm import rescale_image, rescale_segmentation, predict_image

In [2]:
# Load the classifier
quapos_lm = apoc.ObjectSegmenter(opencl_filename = "../../01-training-and-validation/02-quapos-lm.cl")
quapos_lm.feature_importances()

{'gaussian_blur=1': 0.32557488170342097,
 'difference_of_gaussian=1': 0.4231073391932076,
 'laplace_box_of_gaussian_blur=1': 0.25131777910337144}

In [3]:
# Define file path for images
images = "../../data/02-data-for-pixel-classifier/cpfl-wt-comparison/b-s-ops-images-cropped/"

# Define a file list from the folder
file_list = os.listdir(images)
print(file_list)

['C2-cpfl-p08-1.3-20x-flo.tif', 'C2-cpfl-p08-2.3-20x-flo.tif', 'C2-cpfl-p08-3.3-20x-flo.tif', 'C2-cpfl-p14-1.3-20x-flo.tif', 'C2-cpfl-p14-2.3-20x-flo.tif', 'C2-cpfl-p14-3.3-20x-flo.tif', 'C2-cpfl-p20-1.3-20x-flo.tif', 'C2-cpfl-p20-2.3-20x-flo.tif', 'C2-cpfl-p20-3.3-20x-flo.tif', 'C2-cpfl-p245-1.3-20x-flo.tif', 'C2-cpfl-p245-2.3-20x-flo.tif', 'C2-cpfl-p245-3.3-20x-flo.tif', 'C2-cpfl-p245-4.3-20x-flo.tif', 'C2-cpfl-p30-1.4-20x-flo.tif', 'C2-cpfl-p30-2.4-20x-flo.tif', 'C2-cpfl-p30-3.4-20x-flo.tif', 'C2-cpfl-p70-1.4-20x-flo.tif', 'C2-cpfl-p70-2.4-20x-flo.tif', 'C2-cpfl-p70-3.4-20x-flo.tif', 'C2-wt-p08-1.4-20x-flo.tif', 'C2-wt-p08-2.4-20x-flo.tif', 'C2-wt-p08-3.4-20x-flo.tif', 'C2-wt-p08-4.4-20x-flo.tif', 'C2-wt-p14-1.4-20x-flo.tif', 'C2-wt-p14-2.4-20x-flo.tif', 'C2-wt-p14-3.4-20x-flo.tif', 'C2-wt-p14-4.4-20x-flo.tif', 'C2-wt-p20-1.4-20x-flo.tif', 'C2-wt-p20-1.4-20x-suse.tif', 'C2-wt-p20-2.4-20x-flo.tif', 'C2-wt-p20-2.4-20x-suse.tif', 'C2-wt-p245-1.1-20x-flo.tif', 'C2-wt-p245-2.1-20x-flo.ti

### Extract features

Now a for loop will be computed to extract all the features. Additionally, information from the filename will be used to add the columns `age`, `genotype`, and `image_id`.

In [4]:
# Define empty array to store data
features = []

# Loop over the image folder
for i, file_name in enumerate(file_list):
    
    # Load the image 
    image = cle.imread(images + file_name)
    
    # Predict the image
    prediction = predict_image(image=image, classifier=quapos_lm)
    
    # Rescale original image
    image_rescaled = rescale_image(image=image, voxel_x=0.323, voxel_y=0.323, voxel_z=0.490)
    
    # Rescale the prediction
    prediction_rescaled = rescale_segmentation(segmentation=prediction, voxel_x=0.323, voxel_y=0.323, voxel_z=0.490)
    
    # Extract features with napari_simpleitk_image_processing's label_statistics function
    # Here all possible features are extracted
    features_i = label_statistics(
        intensity_image = image_rescaled,
        label_image = prediction_rescaled,
        size = True,
        intensity = True,
        perimeter = True,
        shape = True,
        position = True,
        moments = True)
    
    # Add information from the filename into corresponding columns
    # Split the filename
    file_name_split = file_name.split("-")
    
    # Add age in a respective column
    age = int(file_name_split[2].replace("p", ""))
    features_i["age"] = pd.Series([age for x in range(len(features_i))])
    
    # Add the image_id in a respective column
    features_i["image_id"] = pd.Series([i for x in range(len(features_i))])
    
    # Add the genotype in a respective column
    features_i["genotype"] = pd.Series([file_name_split[1] for x in range(len(features_i))])
    
    # Store measurements of current image in pandas dataframe
    features.append(features_i)
    
# Concatenate dataframe with all measurements
features = pd.concat(features)

In [5]:
# Show the dataframe
features

Unnamed: 0,label,maximum,mean,median,minimum,sigma,sum,variance,bbox_0,bbox_1,...,principal_axes5,principal_axes6,principal_axes7,principal_axes8,principal_moments0,principal_moments1,principal_moments2,age,image_id,genotype
0,1,250.0,189.793103,184.144531,157.0,20.305487,5504.0,412.312808,3,372,...,-0.095507,6.478347e-02,0.070207,-0.995427,4.109826e-01,0.518157,1.898447,8,0,cpfl
1,2,334.0,207.273684,202.871094,133.0,42.706679,19691.0,1823.860470,18,373,...,-0.321833,-2.100729e-02,0.322695,-0.946270,5.700545e-01,2.258945,3.391720,8,0,cpfl
2,3,182.0,176.000000,177.902344,154.0,12.308534,880.0,151.500000,50,356,...,-0.316228,8.987734e-14,0.316228,-0.948683,-6.462349e-27,0.200000,0.600000,8,0,cpfl
3,4,262.0,214.904762,215.355469,175.0,27.285353,4513.0,744.490476,58,342,...,0.057032,7.949138e-02,-0.065368,0.994690,3.190225e-01,0.392635,1.660225,8,0,cpfl
4,5,311.0,195.171429,190.386719,118.0,36.997206,6831.0,1368.793277,58,349,...,-0.004001,-4.202444e-03,0.003170,-0.999986,4.103891e-01,0.608340,2.199230,8,0,cpfl
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
177,178,416.0,224.548673,213.832031,127.0,62.839505,25374.0,3948.803413,1358,121,...,0.262584,-2.516684e-01,-0.093339,-0.963302,1.293494e+00,1.692919,3.117589,70,39,wt
178,179,216.0,193.250000,189.738281,177.0,16.357975,773.0,267.583333,1376,133,...,0.316228,-0.000000e+00,-0.316228,-0.948683,0.000000e+00,0.125000,0.750000,70,39,wt
179,180,481.0,251.469466,237.925781,145.0,64.019970,65885.0,4098.556535,1381,162,...,-0.785786,-2.772161e-01,0.893035,-0.354458,2.274351e+00,4.376522,11.201424,70,39,wt
180,181,289.0,198.567568,195.761719,131.0,37.774051,14694.0,1426.878934,1385,182,...,-0.146190,3.010862e-02,0.144831,-0.988998,6.047656e-01,1.706057,2.747979,70,39,wt


In [6]:
features.to_csv("../../measurements/01-cpfl-simpleitk.csv", index=False)