# Managing features

*Blabla what is covered in this notebook*

In [4]:
from pycellin.io.trackmate import load_TrackMate_XML
import pycellin.graph.features.utils as gfu

*describe example, display movie*

In [12]:
# Path to the TrackMate XML file.
xml_path = "../sample_data/Ecoli_growth_on_agar_pad.xml"

# Parse the XML file and create a Pycellin Model object
# that contains all the data from the XML file.
model = load_TrackMate_XML(xml_path)

# We can display basic information about this model.
print(model)
print(f"This model contains {model.data.number_of_lineages()} lineages:")
for lin_ID, lineage in model.data.cell_data.items():
    print(f"- ID {lin_ID}: {lineage}")

Model with 3 lineages.
This model contains 3 lineages:
- ID 0: CellLineage named 'Track_0' with 152 nodes and 151 edges
- ID 1: CellLineage named 'Track_1' with 189 nodes and 188 edges
- ID 2: CellLineage named 'Track_2' with 185 nodes and 184 edges


## What is a feature in Pycellin?

Information that can be stored into a lineage and that is related to part or all of the lineage (e.g. shape of a cell, velocity of a cell, number of divisions in a lineage...)  
*add schematic with the different types of features and a few examples depending on where the features are stored?*  
*link to the notebook about Pycellin data structure?*

Can be automatically computed from existing data or added manually.

Different type of features depending on where they come from.

### Imported features

When the data was created with an external tool, e.g. TrackMate.  
Cannot be recomputed, added automatically when importing data.

### Pycellin features

Predefined features that can be added to the model as needed:
- morphological features, e.g. bacterial length
- tracking related features, e.g. division time

### Custom features

User-defined features, usually less generic and more tailored to a specific experiment or data type.   
More advanced, requires more Python knowledge. See dedicated notebook.

## Adding features to a model

As said before, imported features are automatically added during importation, and custom features are more complex and will be covered in a separate notebook. So in this part we will only see how to add Pycellin features to a model.

### For cell lineages

List of available Pycellin features for cell lineages:

In [3]:
gfu.get_pycellin_cell_lineage_features()

{'absolute_age': 'Age of the cell since the beginning of the lineage',
 'relative_age': 'Age of the cell since the beginning of the current cell cycle'}

Possibility of adding features one by one, either by calling the dedicated method:

In [4]:
model.add_absolute_age()

Or by using the more generic `add_pycellin_feature()` method:

In [5]:
model.add_pycellin_feature("absolute_age")

ValueError: A Feature called absolute_age already exists in node features.

Attempting to add an already present feature into a model will raise a `ValuError`.

When you want to add a lot of features, it might be easier to add them all at once with the `add_pycellin_features()` shortcut method that accepts a list of Pycellin feature names:

In [6]:
model.add_pycellin_features(["relative_age", "cell_length"])

At any time, you can check which cell lineage features are currently present in the model, regardless of their provenance (imported, pre-defined by Pycellin or custom built):

In [7]:
model.get_cell_lineage_features()

['QUALITY',
 'POSITION_T',
 'RADIUS',
 'VISIBILITY',
 'MANUAL_SPOT_COLOR',
 'MEAN_INTENSITY_CH1',
 'MEDIAN_INTENSITY_CH1',
 'MIN_INTENSITY_CH1',
 'MAX_INTENSITY_CH1',
 'TOTAL_INTENSITY_CH1',
 'STD_INTENSITY_CH1',
 'MEAN_INTENSITY_CH2',
 'MEDIAN_INTENSITY_CH2',
 'MIN_INTENSITY_CH2',
 'MAX_INTENSITY_CH2',
 'TOTAL_INTENSITY_CH2',
 'STD_INTENSITY_CH2',
 'CONTRAST_CH1',
 'SNR_CH1',
 'CONTRAST_CH2',
 'SNR_CH2',
 'ELLIPSE_X0',
 'ELLIPSE_Y0',
 'ELLIPSE_MAJOR',
 'ELLIPSE_MINOR',
 'ELLIPSE_THETA',
 'ELLIPSE_ASPECTRATIO',
 'PERIMETER',
 'CIRCULARITY',
 'SOLIDITY',
 'SHAPE_INDEX',
 'name',
 'ROI_coords',
 'cell_ID',
 'location',
 'frame',
 'area',
 'absolute_age',
 'relative_age',
 'cell_length',
 'SPOT_SOURCE_ID',
 'SPOT_TARGET_ID',
 'LINK_COST',
 'DIRECTIONAL_CHANGE_RATE',
 'SPEED',
 'DISPLACEMENT',
 'EDGE_TIME',
 'MANUAL_EDGE_COLOR',
 'location',
 'TRACK_INDEX',
 'DIVISION_TIME_MEAN',
 'DIVISION_TIME_STD',
 'NUMBER_SPOTS',
 'NUMBER_GAPS',
 'NUMBER_SPLITS',
 'NUMBER_MERGES',
 'NUMBER_COMPLEX',
 

Most of Pycellin features, if not all, are expressed in time and/or space units. These units are defined at the model level, usually during the model creation. So the exact units of a specific Pycellin feature will only be known once the feature has been added to the model.  

Information on a feature, units included, is accessible through the FeatureDeclaration of the model, regardless of the feature provenance:

In [16]:
print(model.feat_declaration.node_feats["absolute_age"])  # Pycellin feature.
print(model.feat_declaration.edge_feats["SPEED"])  # Imported feature.

Feature(name='absolute_age', description='Age of the cell since the beginning of the lineage', lineage_type='CellLineage', provenance='Pycellin', data_type='int', unit='frame')
Feature(name='SPEED', description='Speed', lineage_type='CellLineage', provenance='TrackMate', data_type='float', unit='µm/min')


### For cell cycle lineages

Cell cycle features (or cycle features for short) behave like cell features. However, you first need to compute the cycle lineages, otherwise you will get the following error:

In [9]:
model.add_division_time()

ValueError: Cycle lineages have not been computed yet. Please compute the cycle lineages first with `model.add_cycle_data()`.

Please refer to Pycellin data structure notebook (*add ref*) if you are unsure about the difference between a cell lineage (`CellLineage`) and a cell cycle lineage (`CycleLineage`).

To compute and add the cycle lineages to the model, just call the `add_cycle_data()` method on the model:

In [13]:
model.add_cycle_data()

List of available Pycellin features for cell cycle lineages:

In [11]:
gfu.get_pycellin_cycle_lineage_features()

{'cell_cycle_completeness': 'Completeness of the cell cycle, i.e. does it start and end with a division',
 'division_time': 'Time elapsed between the birth of a cell and its division',
 'division_rate': 'Number of divisions per time unit'}

Like for cell lineages, you can add new features either one by one by calling the method specific to your feature:

In [12]:
model.add_division_rate()

Or the generic `add_pycellin_feature()`:

In [None]:
model.add_pycellin_feature("division_rate")

Or several features at once with `add_pycellin_features()`:

In [13]:
model.add_pycellin_features(["division_time", "cell_cycle_completeness"])

Cell lineage and cycle lineage features can be added at the same time without issue, given that cycle lineage data has already been computed:

In [None]:
model.add_pycellin_features(
    [
        "absolute_age",  # cell lineage feature
        "cell_cycle_completeness",  # cycle lineage feature
        "relative_age",  # cell lineage feature
    ]
)

To check which cell lineage features are currently present in the model:

In [14]:
model.get_cycle_lineage_features()

['cycle_ID',
 'cells',
 'cycle_length',
 'level',
 'division_rate',
 'division_time',
 'cell_cycle_completeness',
 'cycle_lineage_ID']

### For optional arguments

Some features accept optional arguments, often to tweak the way the feature is defined and computed: change of unit, change of algorithm...

When adding features one by one, additional arguments are given like classic optional argument.

For example, with feature specific methods:

In [3]:
model.add_relative_age(in_time_unit=True)
model.add_cell_length(skel_algo="lee", tolerance=0.6)

And with the generic `add_pycellin_feature()`:

In [6]:
model.add_pycellin_feature("relative_age", in_time_unit=True)
model.add_pycellin_feature("cell_length", skel_algo="lee", tolerance=0.6)

When adding several features at the same time, features are given as a dictionary with the name of the feature as the key and additional keyword arguments as values:

In [8]:
model.add_pycellin_features(
    {
        "relative_age": {"in_time_unit": True},
        "cell_length": {"skel_algo": "lee", "tolerance": 0.6},
    }
)

It is also possible to mix features with and without additional arguments:

In [14]:
model.add_pycellin_features(
    [{"relative_age": {"in_time_unit": True}}, "cell_cycle_completeness"]
)

## Computing or recomputing features

Only for Pycellin or custom features, imported features cannot be recomputed (no calculator associated to the feature).

In [None]:
model.update()
model.is_update_required()

## Removing features from a model

For any type of features.

In [None]:
model.remove_feature("absolute_age")
model.remove_features(["relative_age", "cell_cycle_completeness"])