# Analyze coordinates

In [1]:
import os
import pandas as pd
import numpy as np
import bigfish
import bigfish.stack as stack
import bigfish.classification as classification
import bigfish.plot as plot
print("Big-FISH version: {0}".format(bigfish.__version__))

Big-FISH version: 0.6.2


In [2]:
# hard-code the paths of our input and output directories
path_output = "../data/output"

In this notebook, we show examples to **compute features** for each cell. We reuse extracted results from the previous notebook: *6 - Extract cell level results*. We can read these results with a dedicated function `bigfish.stack.read_cell_extracted`.

In [3]:
# load single cell data
path = os.path.join(path_output, "results_cell_0.npz")
data = stack.read_cell_extracted(path, verbose=True)

cell_mask = data["cell_mask"]
print("cell mask")
print("\r shape: {0}".format(cell_mask.shape))
print("\r dtype: {0}".format(cell_mask.dtype), "\n")

nuc_mask = data["nuc_mask"]
print("nucleus mask")
print("\r shape: {0}".format(nuc_mask.shape))
print("\r dtype: {0}".format(nuc_mask.dtype), "\n")

rna_coord = data["rna_coord"]
print("RNAs coordinates")
print("\r shape: {0}".format(rna_coord.shape))
print("\r dtype: {0}".format(rna_coord.dtype), "\n")

foci_coord = data["foci"]
print("foci coordinates")
print("\r shape: {0}".format(foci_coord.shape))
print("\r dtype: {0}".format(foci_coord.dtype), "\n")

smfish = data["smfish"]
print("smfish channel")
print("\r shape: {0}".format(smfish.shape))
print("\r dtype: {0}".format(smfish.dtype))

Available keys: cell_id, bbox, cell_coord, cell_mask, nuc_coord, nuc_mask, rna_coord, foci, transcription_site, image, dapi, smfish 

cell mask
 shape: (520, 269)
 dtype: bool 

nucleus mask
 shape: (520, 269)
 dtype: bool 

RNAs coordinates
 shape: (585, 4)
 dtype: int64 

foci coordinates
 shape: (7, 5)
 dtype: int64 

smfish channel
 shape: (520, 269)
 dtype: uint16


Different group of features are available. We can get the feature names with `bigfish.classification.get_features_name` and their value with `bigfish.classification.compute_features`.

## Hand-crafted features

**Distance related features** compute the mean or median distance of RNAs to cellular or nuclear membrane. Distances are normalized by the expected distance in case of uniform RNAs distribution (*index* features).

In [4]:
# get distance feature names
feature_names = classification.get_features_name(names_features_distance=True)
feature_names

['index_mean_distance_cell',
 'index_median_distance_cell',
 'index_mean_distance_nuc',
 'index_median_distance_nuc']

**Intranuclear features** compute RNAs proportion inside nucleus.

In [5]:
# get intranuclear feature names
feature_names = classification.get_features_name(names_features_intranuclear=True)
feature_names

['proportion_rna_in_nuc', 'nb_rna_out_nuc', 'nb_rna_in_nuc']

**Protrusion related features** compute the number of RNAs detected in a cell protrusion. The count of RNAs is normalized by the expected count in case of uniform RNAs distribution (*index_rna_protrusion*) or the total number of RNAs (*proportion_rna_protrusion*).

In [6]:
# get protrusion feature names
feature_names = classification.get_features_name(names_features_protrusion=True)
feature_names

['index_rna_protrusion', 'proportion_rna_protrusion', 'protrusion_area']

**Dispersion features** compute dispersion indices describe in [(Stueland et al. 2019)](https://www.nature.com/articles/s41598-019-44783-2). A high index means a high level of RNAs polarization or dispersion within the cell.

In [7]:
# get dispersion feature names
feature_names = classification.get_features_name(names_features_dispersion=True)
feature_names

['index_polarization', 'index_dispersion', 'index_peripheral_distribution']

**Topographic features** compute the number of RNAs detected in specific cellular subregions. These regions are defined by concentric circles around the nucleus (every 500 nanometers) or the cellular membrane. The count of RNAs is normalized by the expected count in case of uniform RNAs distribution (*index* features) or the total number of RNAs (*proportion* features). For example, *proportion_rna_cell_radius_500_1000* is the proportion of RNAs detected between 500nm and 1000nm from the cellular membrane.

In [8]:
# get topographic feature names
feature_names = classification.get_features_name(names_features_topography=True)
feature_names

['index_rna_nuc_edge',
 'proportion_rna_nuc_edge',
 'index_rna_nuc_radius_500_1000',
 'proportion_rna_nuc_radius_500_1000',
 'index_rna_nuc_radius_1000_1500',
 'proportion_rna_nuc_radius_1000_1500',
 'index_rna_nuc_radius_1500_2000',
 'proportion_rna_nuc_radius_1500_2000',
 'index_rna_nuc_radius_2000_2500',
 'proportion_rna_nuc_radius_2000_2500',
 'index_rna_nuc_radius_2500_3000',
 'proportion_rna_nuc_radius_2500_3000',
 'index_rna_cell_radius_0_500',
 'proportion_rna_cell_radius_0_500',
 'index_rna_cell_radius_500_1000',
 'proportion_rna_cell_radius_500_1000',
 'index_rna_cell_radius_1000_1500',
 'proportion_rna_cell_radius_1000_1500',
 'index_rna_cell_radius_1500_2000',
 'proportion_rna_cell_radius_1500_2000',
 'index_rna_cell_radius_2000_2500',
 'proportion_rna_cell_radius_2000_2500',
 'index_rna_cell_radius_2500_3000',
 'proportion_rna_cell_radius_2500_3000']

**Foci related features** computes the proportion of RNAs detected in a foci.

In [9]:
# get foci feature names
feature_names = classification.get_features_name(names_features_foci=True)
feature_names

['proportion_rna_in_foci']

**Area related features** compute the nucleus and cell area.

In [10]:
# get area feature names
feature_names = classification.get_features_name(names_features_area=True)
feature_names

['proportion_nuc_area', 'cell_area', 'nuc_area', 'cell_area_out_nuc']

**Centrosomal features** compute the mean or median distance of RNAs to the closest centrosome (*index_mean_distance_centrosome* and *index_median_distance_centrosome*) or the number of RNAs detected within 2000 nanometers of a centrosome. The count of RNAs is normalized by the expected count in case of uniform RNAs distribution (*index_rna_centrosome*) or the total number of RNAs (*proportion_rna_centrosome*). Feature *index_centrosome_dispersion* is adapted from [(Stueland et al. 2019)] to quantify the dispersion of RNAs around centrosomes. A low index means RNAs are located close to the centrosomes.

In [11]:
# get centrosomal feature names
feature_names = classification.get_features_name(names_features_centrosome=True)
feature_names

['index_mean_distance_centrosome',
 'index_median_distance_centrosome',
 'index_rna_centrosome',
 'proportion_rna_centrosome',
 'index_centrosome_dispersion']

## Compute features

To **compute features for one cell**, we directly feed `bigfish.classification.compute_features` with the results read with `bigfish.read_cell_extracted`. It is possible to choose specific groups of features and to return feature names in the same time.

In [12]:
# compute features
features, features_names = classification.compute_features(
    cell_mask, nuc_mask, ndim=3, rna_coord=rna_coord,
    smfish=smfish, voxel_size_yx=103,
    foci_coord=foci_coord,
    centrosome_coord=None,
    compute_distance=True,
    compute_intranuclear=True,
    compute_protrusion=True,
    compute_dispersion=True,
    compute_topography=True,
    compute_foci=True,
    compute_area=True,
    return_names=True)

for feature, feature_name in zip(features, features_names):
    print("{0:40} {1:0.2f}".format(feature_name + ":", feature))

index_mean_distance_cell:                1.24
index_median_distance_cell:              1.42
index_mean_distance_nuc:                 0.63
index_median_distance_nuc:               0.47
proportion_rna_in_nuc:                   0.30
nb_rna_out_nuc:                          410.00
nb_rna_in_nuc:                           175.00
index_rna_protrusion:                    0.44
proportion_rna_protrusion:               0.01
protrusion_area:                         1714.00
index_polarization:                      0.11
index_dispersion:                        0.84
index_peripheral_distribution:           0.81
index_rna_nuc_edge:                      0.83
proportion_rna_nuc_edge:                 0.03
index_rna_nuc_radius_500_1000:           1.83
proportion_rna_nuc_radius_500_1000:      0.04
index_rna_nuc_radius_1000_1500:          1.55
proportion_rna_nuc_radius_1000_1500:     0.03
index_rna_nuc_radius_1500_2000:          1.55
proportion_rna_nuc_radius_1500_2000:     0.03
index_rna_nuc_radius_2000_2

## Compute features for several cells

To **compute features for multiple cells**, a loop can be easily implemented on the different cell results. Finally, *pandas* DataFrames are an appropriate way to store computed features. 

In [13]:
# parse different results files
dataframes = []
for filename in ["results_cell_0.npz", "results_cell_1.npz"]:

    # load single cell data
    path = os.path.join(path_output, filename)
    data = stack.read_cell_extracted(path)
    cell_mask = data["cell_mask"]
    nuc_mask = data["nuc_mask"]
    rna_coord = data["rna_coord"]
    foci_coord = data["foci"]
    smfish = data["smfish"]
    
    # compute features
    features, features_names = classification.compute_features(
    cell_mask, nuc_mask, ndim=3, rna_coord=rna_coord,
    smfish=smfish, voxel_size_yx=103,
    foci_coord=foci_coord,
    centrosome_coord=None,
    compute_distance=True,
    compute_intranuclear=True,
    compute_protrusion=True,
    compute_dispersion=True,
    compute_topography=True,
    compute_foci=True,
    compute_area=True,
    return_names=True)

    # build dataframe
    features = features.reshape((1, -1))
    df_cell = pd.DataFrame(data=features, columns=features_names)
    dataframes.append(df_cell)
    
# concatenate dataframes
df = pd.concat(dataframes)

# reset index
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,index_mean_distance_cell,index_median_distance_cell,index_mean_distance_nuc,index_median_distance_nuc,proportion_rna_in_nuc,nb_rna_out_nuc,nb_rna_in_nuc,index_rna_protrusion,proportion_rna_protrusion,protrusion_area,...,proportion_rna_cell_radius_1500_2000,index_rna_cell_radius_2000_2500,proportion_rna_cell_radius_2000_2500,index_rna_cell_radius_2500_3000,proportion_rna_cell_radius_2500_3000,proportion_rna_in_foci,proportion_nuc_area,cell_area,nuc_area,cell_area_out_nuc
0,1.24,1.42,0.63,0.47,0.3,410.0,175.0,0.44,0.01,1714.0,...,0.02,0.53,0.03,0.5,0.02,0.13,0.22,88291.0,19781.0,68510.0
1,1.24,1.43,0.63,0.62,0.28,390.0,153.0,0.33,0.0,868.0,...,0.03,0.69,0.03,0.66,0.03,0.11,0.25,78577.0,19801.0,58776.0


## Save features

Saving the features DataFrame can be done with `bigfish.stack.save_data_to_csv`.

In [14]:
path = os.path.join(path_output, "df_features.csv")
stack.save_data_to_csv(df, path)