# Managing features

*Blabla what is covered in this notebook*

In [1]:
import pycellin as pc

*describe example, display movie*

In [2]:
# Path to the TrackMate XML file.
xml_path = "../sample_data/Ecoli_growth_on_agar_pad.xml"

# Parse the XML file and create a Pycellin Model object
# that contains all the data from the XML file.
model = pc.load_TrackMate_XML(xml_path)

# We can display basic information about this model.
print(model)
print(f"This model contains {model.data.number_of_lineages()} lineages:")
for lin_ID, lineage in model.data.cell_data.items():
    print(f"- ID {lin_ID}: {lineage}")

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.
This model contains 3 lineages:
- ID 0: CellLineage of ID 0 named Track_0 with 152 cells and 151 links.
- ID 1: CellLineage of ID 1 named Track_1 with 189 cells and 188 links.
- ID 2: CellLineage of ID 2 named Track_2 with 185 cells and 184 links.


## What is a feature in Pycellin?

Information that can be stored into a lineage and that is related to part or all of the lineage (e.g. shape of a cell, velocity of a cell, number of divisions in a lineage...)  
*add schematic with the different types of features and a few examples depending on where the features are stored?*  
*link to the notebook about Pycellin data structure?*

Can be automatically computed from existing data or added manually.

Different type of features depending on where they come from.

### Imported features

When the data was created with an external tool, e.g. TrackMate.  
Cannot be recomputed, added automatically when importing data.

### Pycellin features

Predefined features that can be added to the model as needed:
- morphological features, e.g. bacterial length
- tracking related features, e.g. division time

### Custom features

User-defined features, usually less generic and more tailored to a specific experiment or data type.   
More advanced, requires more Python knowledge. See dedicated notebook.

## Adding features to a model

When you add a feature to a model, it lets the model know which features are expected on the lineages.
It is also a way for the user to have a clear description of the features (provenance, units, the type of data it applies to...). However, features do not have values at this stage: the features are just *declared*, not *computed*. In the model, the information about the added features are stored in the FeatureDeclaration of the model:


In [3]:
print("List of the name of all features already registered in the model:")
print(model.feat_declaration)
print()

print("Detailed information on each node feature:")
for feat_info in model.feat_declaration.feats_dict.values():
    print(feat_info)

List of the name of all features already registered in the model:
Node features: QUALITY, POSITION_T, RADIUS, VISIBILITY, MANUAL_SPOT_COLOR, MEAN_INTENSITY_CH1, MEDIAN_INTENSITY_CH1, MIN_INTENSITY_CH1, MAX_INTENSITY_CH1, TOTAL_INTENSITY_CH1, STD_INTENSITY_CH1, MEAN_INTENSITY_CH2, MEDIAN_INTENSITY_CH2, MIN_INTENSITY_CH2, MAX_INTENSITY_CH2, TOTAL_INTENSITY_CH2, STD_INTENSITY_CH2, CONTRAST_CH1, SNR_CH1, CONTRAST_CH2, SNR_CH2, ELLIPSE_X0, ELLIPSE_Y0, ELLIPSE_MAJOR, ELLIPSE_MINOR, ELLIPSE_THETA, ELLIPSE_ASPECTRATIO, AREA, PERIMETER, CIRCULARITY, SOLIDITY, SHAPE_INDEX, cell_name, cell_ID, cell_location, frame, ROI_coords
Edge features: SPOT_SOURCE_ID, SPOT_TARGET_ID, LINK_COST, DIRECTIONAL_CHANGE_RATE, SPEED, DISPLACEMENT, EDGE_TIME, MANUAL_EDGE_COLOR, link_location
Lineage features: TRACK_INDEX, DIVISION_TIME_MEAN, DIVISION_TIME_STD, NUMBER_SPOTS, NUMBER_GAPS, NUMBER_SPLITS, NUMBER_MERGES, NUMBER_COMPLEX, LONGEST_GAP, TRACK_DURATION, TRACK_START, TRACK_STOP, TRACK_DISPLACEMENT, TRACK_MEAN_S

Here you can see that there are already a lot of features declared in the model. These features were imported from TrackMate when the XML file was loaded in Pycellin (see code in the second cell, at the top the notebook). Depending on the goal of your analysis, you may want to add features more relevant to your study, like growth rate or division rate of your cells.
Of course, if you build a Pycellin model from scratch instead of importing a pre-existing tracking file (from TrackMate or another tracking tool), then your model will first be devoid of features.

In this part, we will only see how to add pre-defined Pycellin features to a model. Indeed, imported features are automatically added during importation, and custom features are more complex and will be covered in a separate notebook.

There are 2 types of lineages in Pycellin: cell lineages and cell cycle lineages (cycle lineages for short). Each lineage type has their own features. If you are unsure about the difference between a cell lineage (`CellLineage`) and a cell cycle lineage (`CycleLineage`), I invite you to refer to Pycellin data structure notebook (*add ref*) before going further.


### For cell lineages

In [4]:
lin0 = model.data.cell_data[0]
lin0.plot(title="Cell lineage of lineage ID 0")

List of available Pycellin features for cell lineages:

In [5]:
pc.get_pycellin_cell_lineage_features()

{'absolute_age': 'Age of the cell since the beginning of the lineage',
 'angle': 'Angle of the cell trajectory between two consecutive detections',
 'cell_displacement': 'Displacement of the cell between two consecutive detections',
 'cell_length': 'Length of the cell',
 'cell_speed': 'Speed of the cell between two consecutive detections',
 'cell_width': 'Width of the cell',
 'relative_age': 'Age of the cell since the beginning of the current cell cycle'}

Possibility of adding features one by one, either by calling the dedicated method:

In [6]:
model.add_absolute_age()

Or by using the more generic `add_pycellin_feature()` method:  
(but attempting to add an already present feature into a model will throw a warning)

In [7]:
model.add_pycellin_feature("absolute_age")


An identical Feature 'absolute_age' has already been declared.



When you want to add a lot of features, it might be easier to add them all at once with the `add_pycellin_features()` shortcut method that accepts a list of Pycellin feature names:

In [8]:
model.add_pycellin_features(["relative_age", "cell_length"])

At any time, you can check which cell lineage features are currently present in the model, regardless of their provenance (imported, pre-defined by Pycellin or custom built):

In [9]:
model.get_cell_lineage_features()

['QUALITY',
 'POSITION_T',
 'RADIUS',
 'VISIBILITY',
 'MANUAL_SPOT_COLOR',
 'MEAN_INTENSITY_CH1',
 'MEDIAN_INTENSITY_CH1',
 'MIN_INTENSITY_CH1',
 'MAX_INTENSITY_CH1',
 'TOTAL_INTENSITY_CH1',
 'STD_INTENSITY_CH1',
 'MEAN_INTENSITY_CH2',
 'MEDIAN_INTENSITY_CH2',
 'MIN_INTENSITY_CH2',
 'MAX_INTENSITY_CH2',
 'TOTAL_INTENSITY_CH2',
 'STD_INTENSITY_CH2',
 'CONTRAST_CH1',
 'SNR_CH1',
 'CONTRAST_CH2',
 'SNR_CH2',
 'ELLIPSE_X0',
 'ELLIPSE_Y0',
 'ELLIPSE_MAJOR',
 'ELLIPSE_MINOR',
 'ELLIPSE_THETA',
 'ELLIPSE_ASPECTRATIO',
 'AREA',
 'PERIMETER',
 'CIRCULARITY',
 'SOLIDITY',
 'SHAPE_INDEX',
 'cell_name',
 'SPOT_SOURCE_ID',
 'SPOT_TARGET_ID',
 'LINK_COST',
 'DIRECTIONAL_CHANGE_RATE',
 'SPEED',
 'DISPLACEMENT',
 'EDGE_TIME',
 'MANUAL_EDGE_COLOR',
 'TRACK_INDEX',
 'DIVISION_TIME_MEAN',
 'DIVISION_TIME_STD',
 'NUMBER_SPOTS',
 'NUMBER_GAPS',
 'NUMBER_SPLITS',
 'NUMBER_MERGES',
 'NUMBER_COMPLEX',
 'LONGEST_GAP',
 'TRACK_DURATION',
 'TRACK_START',
 'TRACK_STOP',
 'TRACK_DISPLACEMENT',
 'TRACK_MEAN_SPEED',

Most of Pycellin features, if not all, are expressed in time and/or space units. These units are defined at the model level, usually during the model creation. So the exact units of a specific Pycellin feature will only be known once the feature has been added to the model.  

Information on a feature, units included, is accessible through the FeatureDeclaration of the model, regardless of the feature provenance:

In [10]:
print(model.feat_declaration.feats_dict["absolute_age"])  # Pycellin feature.
print(model.feat_declaration.feats_dict["SPEED"])  # Imported feature.

Feature(name='absolute_age', description='Age of the cell since the beginning of the lineage', feat_type='node', lin_type='CellLineage', provenance='Pycellin', data_type='int', unit='frame')
Feature(name='SPEED', description='Speed', feat_type='edge', lin_type='CellLineage', provenance='TrackMate', data_type='float', unit='µm/min')


### For cell cycle lineages

Cell cycle features (or cycle features for short) behave like cell features. However, you first need to compute the cycle lineages, otherwise you will get the following error:

In [11]:
model.add_division_time()

ValueError: Cycle lineages have not been computed yet. Please compute the cycle lineages first with `model.add_cycle_data()`.

To compute and add the cycle lineages to the model, just call the `add_cycle_data()` method on the model:

In [12]:
model.add_cycle_data()

A few mandatory cycle lineage features are automatically computed:
- cycle_ID: node ID of the cell cycle, i.e. node ID of the last cell in the cell cycle;
- cells: node IDs of the cells in the cell cycle, in chronological order;
- cycle_length: number of cells in the cell cycle;
- level: level of the cell cycle in the lineage, i.e. number of cell cycles upstream of the current one.

In [13]:
cyclelin0 = model.data.cycle_data[0]
cyclelin0.plot(
    title="Cycle lineage of lineage ID 0",
    node_hover_features=["cycle_ID", "level", "cycle_length"],
)

List of available Pycellin features for cell cycle lineages:

In [14]:
pc.get_pycellin_cycle_lineage_features()

{'branch_total_displacement': 'Displacement of the cell during the cell cycle',
 'branch_mean_displacement': 'Mean displacement of the cell during the cell cycle',
 'branch_mean_speed': 'Mean speed of the cell during the cell cycle',
 'cell_cycle_completeness': 'Completeness of the cell cycle, i.e. does it start and end with a division',
 'division_time': 'Time elapsed between the birth of a cell and its division',
 'division_rate': 'Number of divisions per time unit',
 'straightness': 'Straightness of the cell trajectory'}

Like for cell lineages, you can add new features either one by one by calling the method specific to your feature:

In [15]:
model.add_division_rate()

Or the generic `add_pycellin_feature()`:

``` python
model.add_pycellin_feature("division_rate")
```

Or several features at once with `add_pycellin_features()`:

In [16]:
model.add_pycellin_features(["division_time", "cell_cycle_completeness"])

Cell lineage and cycle lineage features can be added at the same time without issue, given that cycle lineage data has already been computed:

In [17]:
model.add_pycellin_features(
    [
        "cell_displacement",  # cell lineage feature
        "branch_total_displacement",  # cycle lineage feature
        "cell_speed",  # cell lineage feature
    ]
)

To check which cell lineage features are currently present in the model:

In [18]:
model.get_cycle_lineage_features()

['cycle_ID',
 'cells',
 'cycle_length',
 'level',
 'division_rate',
 'division_time',
 'cell_cycle_completeness',
 'branch_total_displacement']

### For optional arguments

Some features accept optional arguments, often to tweak the way the feature is defined and computed: change of unit, change of algorithm...

When adding features one by one, additional arguments are given like classic optional argument.

For example, with feature specific methods:

``` python
model.add_relative_age(in_time_unit=True)
model.add_cell_length(skel_algo="lee", tolerance=0.6)
```

And with the generic `add_pycellin_feature()`:

``` python
model.add_pycellin_feature("relative_age", in_time_unit=True)
model.add_pycellin_feature("cell_length", skel_algo="lee", tolerance=0.6)
```

When adding several features at the same time, features are given as a dictionary with the name of the feature as the key and additional keyword arguments as values:

``` python
model.add_pycellin_features(
    {
        "relative_age": {"in_time_unit": True},
        "cell_length": {"skel_algo": "lee", "tolerance": 0.6},
    }
)
```

It is also possible to mix features with and without additional arguments:

``` python
model.add_pycellin_features(
    [{"relative_age": {"in_time_unit": True}}, "cell_cycle_completeness"]
)
```

## Computing or recomputing features

Now that the Pycellin features we are interested in have been declared, we want to actually compute their values to enrich the lineages. To do this, we need to ask the model to perform an update:

In [19]:
model.update()



More generally, an update computes or recomputes all the needed features on the impacted structural elements of the model (cells, cycles (if cycle data is present), links and lineages).

We can now check that the values have been computed and stored in the lineage by directly accessing the feature value of a specific cell or cell cycle:

In [20]:
# For the cell lineage
print(lin0.nodes[9013]["relative_age"])
print(lin0.nodes[9013]["cell_length"])

# For the cycle lineage
print(cyclelin0.nodes[9019]["division_rate"])

0
2.754430235429518
0.16666666666666666


Or by plotting a lineage of the model and asking for the feature of interest when hovering over the lineage elements:

In [21]:
lin0.plot(node_hover_features=["cell_ID", "relative_age", "cell_length"])

In [22]:
cyclelin0.plot(node_hover_features=["cycle_ID", "division_rate"])

### When is an update required?

An update is required in 2 different cases:
- When you have added one or more features to a model. In that case, the added feature(s) will have no associated values and the update is needed to compute them. All cells, cycles (if cycle data is present), links and lineages will be affected by the update.
- When the structure of a lineage has been modified, e.g. removal of a cell or correction of an inacurrate division. In that case, the features already have values but they may be incorrect. Depending on the features reach and the impacted elements of the model, only a subset of cells, cycles (if cycle data is present), links and lineages will be affected by the update.

You can check if an update is required with:

In [23]:
model.is_update_required()

False

However, there is one case not taken into account by `update()` and `is_update_required()`: **when the value of a feature is manually modified on one or more elements**, which in turns should impact the value of another feature.
A classic example is when the shape of a specific cell is modified by the user. Since cell length depends on the shape of the cell, length for this specific cell becomes incorrect. In that case, you can ask the model to recompute just the cell length:  
**NOT IMPLEMENTED YET**

In [24]:
# model.recompute_feature("cell_length")  # NOT IMPLEMENTED YET

## Removing features from a model

Removing a feature both remove all the feature values stored in the lineages and unregister the feature itself from the model. It is a non-reversible operation. If you want to get the feature back, the only way is to re-add it to the model, then update the model to recompute all the values.

All types of features can be removed, even if it breaks the model (like removing `cell_ID` or `frame`). Mandatory features will be protected in a later update to avoid breaking models.

For now, features must be removed one by one and by specifying which type of object they applies to:

In [26]:
model.remove_feature("absolute_age")
model.remove_feature("DISPLACEMENT")
model.remove_feature("LONGEST_GAP")

After removal, trying to access the feature values will throw an error:

In [27]:
lin0.nodes[9013]["absolute_age"]

KeyError: 'absolute_age'