# 3D feature extraction of Cell Painting performed on organoids 

This notebook aims to explain ans walkthrough what features are being extracted and how they are extracted.
These features are 3D features extracted from 3D image sets.
Where an image set is a collection of 3D images that are images of the same object but from using different light spectra.

An overview of how the features are extracted is shown in the following diagram:
![Feature Extraction Overview](../diagram/Featuization.png)

## The process
The process of featurization is adapted from a Cellprofiler pipeline approach and standard image-based profiling practices. 
The 3D image sets are segmented into objects:
* Organoid
* Nucleus
* Cell
* Cytoplasm

The objects are then used to extract features.
Where a feature is a measurement within the segmented object.
The feature extraction is performed using python libraries such as:
* [scikit-image](https://scikit-image.org/)
* [scipy](https://www.scipy.org/)
* [mahotas](https://mahotas.readthedocs.io/en/latest/)
* [numpy](https://numpy.org/)
* [cupy](https://docs.cupy.dev/en/stable/)
* [cucim](https://docs.rapids.ai/api/cucim/stable/)

The code is adapted and trimmed down to a more functional (as opposed to Object Oriented) approach from [Cellprofiler](https://cellprofiler.org/).
The code is adapted to reduce the number of list declarations and use generators instead.
The code is also written to be distributed across multiple CPU cores, and to be run on separate CPUs across image sets.
The idea is to run a single image set object - channel combination feature extraction method on a single CPU core.
In the case where a compute cluster is not available, the code has also been adapted to run on a single GPU for increased performance.

So what are the features we are extracting?

## Feature types:
While some feature types are quite intuitive, others are not.
### AreaSizeShape
### Colocalization
### Granularity
### Intensity
### Neighbors
### Texture

In [3]:
import pathlib

import duckdb

In [None]:
path_to_db = pathlib.Path(
    "../../4.processing_image_based_profiles/results/converted_profiles/C4-2/C4-2.sqlite"
).resolve(strict=True)
conn = duckdb.connect(path_to_db)

# read the schema
df = conn.execute("SELECT * FROM Nuclei").fetchdf()
columns = df.columns

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,object_id,image_set,Area.Size.Shape_Nuclei_AGP_VOLUME,Area.Size.Shape_Nuclei_AGP_CENTER.X,Area.Size.Shape_Nuclei_AGP_CENTER.Y,Area.Size.Shape_Nuclei_AGP_CENTER.Z,Area.Size.Shape_Nuclei_AGP_BBOX.VOLUME,Area.Size.Shape_Nuclei_AGP_MIN.X,Area.Size.Shape_Nuclei_AGP_MAX.X,Area.Size.Shape_Nuclei_AGP_MIN.Y,...,Texture_Difference.Entropy_256.1,Texture_Difference.Variance_256.1,Texture_Entropy_256.1,Texture_Information.Measure.of.Correlation.1_256.1,Texture_Information.Measure.of.Correlation.2_256.1,Texture_Inverse.Difference.Moment_256.1,Texture_Sum.Average_256.1,Texture_Sum.Entropy_256.1,Texture_Sum.Variance_256.1,Texture_Variance_256.1
0,1,C4-2,5180.0,512.400579,245.278958,0.0,7832.0,465.0,554.0,205.0,...,0.002525,0.00389,0.003550,-0.739922,0.064276,0.999876,0.032338,0.003386,6.364896,1.731823
1,1,C4-2,5180.0,512.400579,245.278958,0.0,7832.0,465.0,554.0,205.0,...,0.002507,0.00389,0.003570,-0.687041,0.060885,0.999863,0.046515,0.003376,12.633023,3.421752
2,1,C4-2,5180.0,512.400579,245.278958,0.0,7832.0,465.0,554.0,205.0,...,0.003545,0.00389,0.004544,-0.628943,0.064302,0.999826,0.027326,0.003985,5.047603,1.340597
3,1,C4-2,5180.0,512.400579,245.278958,0.0,7832.0,465.0,554.0,205.0,...,0.003326,0.00389,0.004003,-0.698913,0.065316,0.999835,0.044238,0.003733,11.403296,3.112801
4,1,C4-2,5180.0,512.400579,245.278958,0.0,7832.0,465.0,554.0,205.0,...,0.003332,0.00389,0.004292,-0.625617,0.062248,0.999843,0.012383,0.003758,1.376855,0.422694
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
101675,255,C4-2,,,,,,,,,...,,,,,,,,,,
101676,255,C4-2,,,,,,,,,...,,,,,,,,,,
101677,255,C4-2,,,,,,,,,...,,,,,,,,,,
101678,255,C4-2,,,,,,,,,...,,,,,,,,,,
