# Advanced - Custom features

This notebook requires Python coding knowledge. It is also recommended to first read the notebooks [Pycellin data structure](./Pycellin%20data%20structure.ipynb) and [Managing features](./Managing%20features.ipynb).

In this notebook, we will cover how to augment a model with a user-defined feature.

- describe the new feature  
- implement the way to compute it (i.e. its calculator)
- add the new feature and its associated calculator to the model  
- update the model so the new feature values are computed and stored properly for all relevant elements of the lineages

In [1]:
import pycellin as pc

In [2]:
# Path to the TrackMate XML file.
xml_path = "../sample_data/Ecoli_growth_on_agar_pad.xml"

# Parse the XML file and create a Pycellin Model object
# that contains all the data from the XML file.
model = pc.load_TrackMate_XML(xml_path)

# We can display basic information about this model.
print(model)
print(f"This model contains {model.data.number_of_lineages()} lineages:")
for lin_ID, lineage in model.data.cell_data.items():
    print(f"- ID {lin_ID}: {lineage}")

Model with 3 lineages.
This model contains 3 lineages:
- ID 0: CellLineage named 'Track_0' with 152 nodes and 151 edges
- ID 1: CellLineage named 'Track_1' with 189 nodes and 188 edges
- ID 2: CellLineage named 'Track_2' with 185 nodes and 184 edges


## Definition of the new feature 

A new feature is defined by creating a new instance of the `Feature` class, with 6 fields that describe the feature:

**name**  
The name/identifier of the feature. This will be the name of the associated Python variable and must follow variable naming rules (no punctuations except underscore, no whitespaces, no numbers).

**description**  
A concise description of the feature.

**lineage_type**    
Either `CellLineage` or `CycleLineage` depending on which type of lineage your feature is related to. See [Pycellin data structure](./Pycellin%20data%20structure.ipynb) if in doubt.

**provenance**  
Where does the feature come from? For imported features, it is the name of the tool the data was imported from. For custom features, you can use `custom`, your initials or whatever works for you. This field is useful for traceability (e.g. reopening a model after a long time, sharing a model with other people...).

**data_type**   
Python type of the feature values, e.g. `int`, `bool`...

**unit**    
Unit of the feature values, e.g. `µm`, `min`, `cell`...


All these fields are strings. They are mandatory except `unit` which will be set to `None` if not provided.

Here is an example on a basic feature, the age of the cells:

In [3]:
my_feature = pc.Feature(
    "absolute_age",  # name
    "Age of the cell since the beginning of the lineage",  # description
    "CellLineage",  # lineage_type
    "Pycellin",  # provenance
    "int",  # data_type
    "frame",  # unit
)

You can then access the different fields of the feature:

In [5]:
print(f"{my_feature.name}: {my_feature.description} in {my_feature.unit}(s)")

absolute_age: Age of the cell since the beginning of the lineage in frame(s)


## Calculators

In Pycellin, a calculator is a class that defines and structures how to compute a specific feature. A feature cannot be associated to more than one calculator. As we will see, calculators can be very simple with just a few lines of code, or really complex. 

Currently, imported features do not have calculators unless you define a calculator yourself. This means that imported features cannot be recomputed.  
Pycellin features come with their own calculators and are stored within the `pycellin.graph.features` subpackage. This is a useful place to look for real examples of calculators.

When you want to compute something that is not provided by Pycellin nor imported from a tracking tool, you need to define your own feature and its associated calculator.

```python
class MyCalculator(CalculatorToInheritFrom):

    def compute(self, *args):
        # Code to compute the value of the feature for a single element, 
        # either a node, an edge or a lineage graph.        
        return feature_value
```


### How to choose the calculator to inherit from?

Tree diagram of FeatureCalculator classes

Need to answer 2 questions:
1. where are stored the feature values? On nodes, edges, or lineage graphs?
2. what information do you need to compute your new feature?

Point 1. is already answered in the previous part  
The **reach of a feature**: local vs global

### In case of additional arguments

## Adding the new feature and its calculator to the model

```python
model.add_custom_feature(
    my_feature,  # Feature defined above
    MyCalculator,  # Calculator class defined above (not instanciated!)
)
```

### In case of additional arguments

## Updating the model

When a new feature is added, a full update of the model is automatically planned.  
Same as imported features and Pycellin features, features values are added to the data by calling the `update()` method on the model:

In [None]:
model.update()

## Examples

The signature of the `compute` method must be respected, even if one of the argument is not needed to compute the feature.

In [None]:
feat_incorrect = pc.Feature(
    name="node_ID_parity_incorrect",
    description="Parity of the node ID",
    lineage_type="CellLineage",
    provenance="Pycellin",
    data_type="int",
)


class ParityCalculator_incorrect(pc.NodeLocalFeatureCalculator):

    def compute(self, noi):
        if noi % 2 == 0:
            return True
        else:
            return False

NameError: name 'Feature' is not defined

In [None]:
feat_correct = pc.Feature(
    name="node_ID_parity_correct",
    description="Parity of the node ID",
    lineage_type="CellLineage",
    provenance="Pycellin",
    data_type="int",
)


class ParityCalculator_correct(pc.NodeLocalFeatureCalculator):

    def compute(self, lineage, noi):
        if noi % 2 == 0:
            return True
        else:
            return False

In [None]:
model.add_custom_feature(feat_correct, ParityCalculator_incorrect)