# Managing features

*Blabla what is covered in this notebook*

In [1]:
from pycellin.io.trackmate import load_TrackMate_XML
import pycellin.graph.features.utils as gfu

*describe example, display movie*

In [2]:
# Path to the TrackMate XML file.
xml_path = "../sample_data/Ecoli_growth_on_agar_pad.xml"

# Parse the XML file and create a Pycellin Model object
# that contains all the data from the XML file.
model = load_TrackMate_XML(xml_path)

# We can display basic information about this model.
print(model)
print(f"This model contains {model.data.number_of_lineages()} lineages:")
for lin_ID, lineage in model.data.cell_data.items():
    print(f"- ID {lin_ID}: {lineage}")

Model with 3 lineages.
This model contains 3 lineages:
- ID 0: CellLineage named 'Track_0' with 152 nodes and 151 edges
- ID 1: CellLineage named 'Track_1' with 189 nodes and 188 edges
- ID 2: CellLineage named 'Track_2' with 185 nodes and 184 edges


## What is a feature in Pycellin?

Information that can be stored into a lineage and that is related to part or all of the lineage (e.g. shape of a cell, velocity of a cell, number of divisions in a lineage...)  
*add schematic with the different types of features and a few examples depending on where the features are stored?*  
*link to the notebook about Pycellin data structure?*

Can be automatically computed from existing data or added manually.

Different type of features depending on where they come from.

### Imported features

When the data was created with an external tool, e.g. TrackMate.  
Cannot be recomputed, added automatically when importing data.

### Pycellin features

Predefined features that can be added to the model as needed:
- morphological features, e.g. bacterial length
- tracking related features, e.g. division time

### Custom features

User-defined features, usually less generic and more tailored to a specific experiment or data type.   
More advanced, requires more Python knowledge. See dedicated notebook.

## Adding features to a model

As said before, imported features are automatically added durin importation, and custom features are more complex and will be covered into a separate notebook. So in this part we will only see how to add Pycellin features to a model.

### For cell lineages

List of available Pycellin features for cell lineages:

In [11]:
gfu.get_pycellin_cell_lineage_features()

{'absolute_age': 'Age of the cell since the beginning of the lineage',
 'relative_age': 'Age of the cell since the beginning of the current cell cycle'}

Possibility of adding features one by one, either by calling the dedicated method:

In [None]:
model.add_absolute_age()

Or by using the more generic `add_pycellin_feature()` method:

In [None]:
model.add_pycellin_feature("absolute_age")

However, when you want to add a lot features, it might be easier to add them all at once with the `add_pycellin_features()` shortcut method that accepts a list of Pycellin feature names:

In [None]:
model.add_pycellin_features(["relative_age", "absolute_age"])

### For cell cycle lineages

Cell cycle features (or cycle features for short) behave like cell features. However, you first need to compute the cycle lineages, otherwise you will get the following error:

In [4]:
model.add_division_time()

ValueError: Cycle lineages have not been computed yet. Please compute the cycle lineages first with `model.add_cycle_data()`.

Please refer to Pycellin data structure notebook (*add ref*) if you are unsure about the difference between a cell lineage (`CellLineage`) and a cell cycle lineage (`CycleLineage`).

To compute and add the cycle lineages to the model, just do:

In [5]:
model.add_cycle_data()

List of available Pycellin features for cell cycle lineages:

In [8]:
gfu.get_pycellin_cycle_lineage_features()

{'cell_cycle_completeness': 'Completeness of the cell cycle, i.e. does it start and end with a division',
 'division_time': 'Time elapsed between the birth of a cell and its division',
 'division_rate': 'Number of divisions per time unit'}

To get even more information on a specific feature, once it has been added to the model:

Most of these features, if not all, are expressed in time and/or space units. These units are defined at the model level, usually during the model creation. So the exact units of a specific Pycellin feature will only be known once the feature has been added to the model.  
For example Pycellin `division_rate` has the general following description: `Number of divisions per time unit`. Once this feature has been added to our example model, the description becomes:

In [13]:
# Features already present in the model:
model.get_cell_lineage_features()
model.get_cycle_lineage_features()

[]

### For optional arguments

Some features accept optional arguments, often to 

For some features, the way they are computed can be modified by 

In [None]:
model.add_pycellin_features(
    [{"relative_age": {"in_time_unit": True}}, "cell_cycle_completeness"]
)

## Computing or recomputing features

Only for Pycellin or custom features, imported features cannot be recomputed (no calculator associated to the feature).

## Removing features from a model

For any type of features.