# Advanced - Custom features

This notebook requires Python coding knowledge. It is also recommended to first read the notebooks [Pycellin data structure](./Pycellin%20data%20structure.ipynb) and [Managing features](./Managing%20features.ipynb).

In this notebook, we will cover how to augment a model with a user-defined feature.

- describe the new feature  
- implement the way to compute it (i.e. its calculator)
- add the new feature and its associated calculator to the model  
- update the model so the new feature values are computed and stored properly for all relevant elements of the lineages

In [1]:
import pycellin as pc

In [2]:
# Path to the TrackMate XML file.
xml_path = "../sample_data/Ecoli_growth_on_agar_pad.xml"

# Parse the XML file and create a Pycellin Model object
# that contains all the data from the XML file.
model = pc.load_TrackMate_XML(xml_path)

# We can display basic information about this model.
print(model)
print(f"This model contains {model.data.number_of_lineages()} lineages:")
for lin_ID, lineage in model.data.cell_data.items():
    print(f"- ID {lin_ID}: {lineage}")

Model with 3 lineages.
This model contains 3 lineages:
- ID 0: CellLineage named 'Track_0' with 152 nodes and 151 edges
- ID 1: CellLineage named 'Track_1' with 189 nodes and 188 edges
- ID 2: CellLineage named 'Track_2' with 185 nodes and 184 edges


## Definition of the new feature 

A new feature is defined by creating a new instance of the `Feature` class, with 6 fields that describe the feature:

**name**  
The name/identifier of the feature. This will be the name of the associated Python variable and must follow variable naming rules (no punctuations except underscore, no whitespaces, no numbers).

**description**  
A concise description of the feature.

**lineage_type**    
Either `CellLineage` or `CycleLineage` depending on which type of lineage your feature is related to. See [Pycellin data structure](./Pycellin%20data%20structure.ipynb) if in doubt.

**provenance**  
Where does the feature come from? For imported features, it is the name of the tool the data was imported from. For custom features, you can use `custom`, your initials or whatever works for you. This field is useful for traceability (e.g. reopening a model after a long time, sharing a model with other people...).

**data_type**   
Python type of the feature values, e.g. `int`, `bool`...

**unit**    
Unit of the feature values, e.g. `µm`, `min`, `cell`...


All these fields are strings. They are mandatory except `unit` which will be set to `None` if not provided.

Here is an example on a basic feature, the age of the cells:

In [3]:
age_feat = pc.Feature(
    "age",  # name
    "Age of the cell since the beginning of the lineage",  # description
    "CellLineage",  # lineage_type
    "Pycellin",  # provenance
    "int",  # data_type
    "frame",  # unit
)

You can then access the different fields of the feature:

In [4]:
print(f"{age_feat.name}: {age_feat.description} in {age_feat.unit}(s)")

age: Age of the cell since the beginning of the lineage in frame(s)


## Calculators

In Pycellin, a calculator is a class that defines and structures how to compute a specific feature. A feature cannot be associated to more than one calculator. As we will see, calculators can be very simple with just a few lines of code, or really complex. 

Currently, imported features do not have calculators. This means that imported features cannot be recomputed unless you define a calculator yourself and associate it with the imported feature.  
Pycellin features come with their own calculators and are stored within the `pycellin.graph.features` subpackage. This is a useful place to look for real examples of calculators.

When you want to compute something that is not provided by Pycellin nor imported from a tracking tool, you need to define your own calculator. Here is the stripped-down generic structure of a calculator:

```python
class MyCalculator(CalculatorToInheritFrom):

    def compute(self, *args):
        # Code to compute the value of the feature for a single element, 
        # either a node, an edge or a lineage graph.        
        return feature_value
```

The calculator class must ALWAYS define at least a `compute()` method. In more complex calculators, you can also have an `__init__()`method (see the [In case of additional arguments](#in-case-of-additional-arguments) section).

With our `age` feature previously defined, the calculator could be:

In [5]:
class AgeCalculator(pc.NodeGlobalFeatureCalculator):

    def compute(self, data: pc.Data, lineage: pc.CellLineage, noi: int) -> int:
        root = lineage.get_root()
        return lineage.nodes[noi]["frame"] - lineage.nodes[root]["frame"]

### How to choose the calculator to inherit from?

In the example above, we have built our `AgeCalculator` by inheriting from `NodeGlobalFeatureCalculator`. But there are several `FeatureCalculator` base classes you can inherit from, represented in orange in the following calculators tree: 

![Calculators tree scheme](./imgs/Pycellin_calculators_scheme.png)

To know which calculator to use, you need to answer 2 questions:
1. Where do you want to store the feature values? On nodes, edges, or lineage graphs?
2. What information do you need to compute your new feature?

Question 1 is already answered in the previous part (not currently, but will be in next update since the feature type (node, edge, lineage) is going to be one of the mandatory field of a Feature). 

Question 2 requires to think about **the reach of your feature**: local vs global.  
A local feature is a feature that only needs data from the element it will be stored in. For example, cell area would be a local feature since it only needs information about the shape of the cell itself to be computed. But some features require data from several elements or maybe from the whole lineage. In that case they are global. A simple example is our AgeCalculator. To compute the age of a cell you need to know when the first cell of the lineage appeared. So you need to access the data stored into 2 different cells: the cell of interest and the root of the lineage.  

These 2 examples, area and age, are examples of node features, so their calculators would have to inherit from `NodeLocalFeatureCalculator` and `NodeGlobalFeatureCalculator` respectively.  
Computing the speed of a cell between 2 frames would need an `EdgeLocalFeatureCalculator` since only the position of the 2 cells forming the edge is needed.  
For lineage, mean cell speed is an example of a feature based on a `LineageLocalFeatureCalculator`. It requires information from all the edges of the current lineage, but nothing from other lineages.

## Adding the new feature and its calculator to the model

Once the feature and its calculator are defined, they need to be linked together and added to the model. This is done with the `add_custom_feature()` method:

```python
model.add_custom_feature(
    my_feature,  # instance of Feature defined above
    MyCalculator,  # Calculator class defined above (not instanciated!)
)
```

For our `age` example:

In [6]:
model.add_custom_feature(age_feat, AgeCalculator)

## Updating the model

When a new feature is added with `add_custom_feature()`, a full update of the model is automatically planned.  

To actually run the update that will compute and add the feature values for the newly added custom feature, you need to call the `update()` method on the model:

In [7]:
model.update()

During the update, all the features associated with a calculator will be computed or recomputed. See [Managing features](./Managing%20features.ipynb) for more information.

We can check that our custom `age` feature has been added to all nodes by plotting a lineage and hovering over its cells:

In [8]:
lin0 = model.data.cell_data[0]
lin0.plot(node_hover_features=["cell_ID", "age"])

## Removing a custom feature

A custom feature is removed from a model like any other features:

In [9]:
model.remove_feature("age", "node")

See [Managing features](./Managing%20features.ipynb) for more information.

## Examples

### Importance of `compute()` signature

The signature of the `compute` method must be respected, even if one of the argument is not needed to compute the feature.

In [None]:
feat_incorrect = pc.Feature(
    name="node_ID_parity_incorrect",
    description="Parity of the node ID",
    lineage_type="CellLineage",
    provenance="Pycellin",
    data_type="int",
)


class ParityCalculator_incorrect(pc.NodeLocalFeatureCalculator):

    def compute(self, noi):
        if noi % 2 == 0:
            return True
        else:
            return False

In [None]:
model.add_custom_feature(feat_incorrect, ParityCalculator_incorrect)
model.update()

TypeError: ParityCalculator_incorrect.compute() takes 2 positional arguments but 3 were given

Error

In [None]:
feat_correct = pc.Feature(
    name="node_ID_parity_correct",
    description="Parity of the node ID",
    lineage_type="CellLineage",
    provenance="Pycellin",
    data_type="int",
)


class ParityCalculator_correct(pc.NodeLocalFeatureCalculator):

    def compute(self, lineage, noi):
        if noi % 2 == 0:
            return True
        else:
            return False

In [None]:
model.add_custom_feature(feat_correct, ParityCalculator_correct)
model.update()

TypeError: ParityCalculator_incorrect.compute() takes 2 positional arguments but 3 were given

### In case of additional arguments