# Getting started

In this notebook we cover the basics of Pycellin:
- how to obtain a Pycellin model,
- how Pycellin models cell tracking data,
- how to manipulate, modify and enrich the cell lineages,
- how to export the data to other formats.

In [1]:
import pycellin as pc

## How to get a Pycellin model

### Building from scratch

It is possible to build a Pycellin model manually, starting with an empty model.

In [2]:
my_model = pc.Model()
print(my_model)
# The model is completely empty: no metadata, no features declaration, no lineages.
print(my_model.__repr__())

Empty model.
Model(metadata=None, feat_declaration=None, data=None)


You can then fill up the metadata and build cell lineages by adding cells and links. This is covered in details in the notebook [Creating a model from scratch](./Creating%20a%20model%20from%20scratch.ipynb) (WIP).

### Loading from a Pycellin pickle file

You can load a model from a Pycellin pickle file saved on disk with the `load_from_pickle()` method:

In [3]:
pycellin_pickle = "../sample_data/FakeTracks.pickle"
my_model = pc.Model.load_from_pickle(pycellin_pickle)
print(my_model)

Model named 'FakeTracks' with 2 lineages, built from TrackMate.


However, please note that while `pickle` is a module of the Python Standard Library, is it not secure. **You should only load data from sources you trust.** Please refer to the [documentation of the `pickle` module](https://docs.python.org/3/library/pickle.html) for more information.

### Loading from external tools

Pycellin can load data from different tracking file formats. It currently supports:
- Cell Tracking Challenge text files
- TrackMate XML files

More tracking file formats will be supported in the future.

#### Cell Tracking Challenge

Tracking data formatted as per the [Cell Tracking Challenge](https://celltrackingchallenge.net/) file format [specifications](https://public.celltrackingchallenge.net/documents/Naming%20and%20file%20content%20conventions.pdf) can be loaded with
the `load_CTC_file()` function:

In [None]:
ctc_file = "../sample_data/FakeTracks_TMtoCTC_res.txt"
ctc_model = pc.load_CTC_file(ctc_file)
print(ctc_model)

Model named 'FakeTracks_TMtoCTC_res' with 2 lineages, built from CTC.


For this format, only track topology is read: no cell segmentation is extracted in the case of associated label images (this might get supported later if there is a need).

#### TrackMate



Data generated with [TrackMate](https://imagej.net/plugins/trackmate/) can be loaded into a Pycellin model thanks to the `load_TrackMate_XML()` function:

In [4]:
trackmate_xml = "../sample_data/Ecoli_growth_on_agar_pad.xml"
tm_model = pc.load_TrackMate_XML(trackmate_xml)
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


All data within the TrackMate XML file is loaded into the Pycellin model: there is no loss of information. Most notably, TrackMate features are accessible within Pycellin under the same name (e.g. AREA, MEAN_INTENSITY_CH1).

To know more about the specifics of using Pycellin with TrackMate data, please refer to the dedicated notebook: [Pycellin with TrackMate](./Pycellin%20with%20TrackMate.ipynb).

## Pycellin model structure

In this section, we are using the TrackMate model previously loaded as example.

In [6]:
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


A Pycellin model consists of 3 different elements that we describe in separate sections below:
- the metadata of the model
- the data, i.e. the lineages
- the declaration of the features present in the lineages

*simplified scheme of a model*

### Metadata

Metadata holds information about the model and the data of the model.

In Pycellin, the metadata is stored as a dictionnary. It is accessible by calling `metadata` on your model:

In [7]:
for metadata_field in tm_model.metadata:
    print(metadata_field)

name
file_location
provenance
date
space_unit
time_unit
pycellin_version
TrackMate_version
time_step
pixel_size
Log
Settings
GUIState
DisplaySettings


In [8]:
print(tm_model.metadata["provenance"], tm_model.metadata["TrackMate_version"])

TrackMate 7.10.2


Some metadata fields are common to most Pycellin models, like `provenance`, `date` or `space_unit`. Others are specific to the way the model was built. For example, models coming from TrackMate have a `TrackMate_version` field and more (`Log`, `GUIState`...).

Since metadata is stored as a dictionnary, it is versatile: you can store whatever information you find relevant. The more information you store, the better it is for traceability. For example, you could describe the different channels of your timelapse, or list some image acquisition parameters:

In [9]:
tm_model.metadata["channel1"] = "segmentation"
tm_model.metadata["channel2"] = "ZipA-mCherry"
tm_model.metadata["objective"] = "100x oil"

However, be careful not to delete the `space_unit`, `time_unit`, `time_step` and `pixel_size` fields when they exist since they can be needed when computing features like `division_time`, `cell_displacement`...

You can access these important fields with the following methods:

In [10]:
pix_size = tm_model.get_pixel_size()
s_unit = tm_model.get_space_unit()
print(f"pixel width = {pix_size['width']} {s_unit}")
print(f"pixel height = {pix_size['height']} {s_unit}")
print(f"pixel depth = {pix_size['depth']} {s_unit}")

timestep = tm_model.get_time_step()
t_unit = tm_model.get_time_unit()
print(f"frame = {timestep} {t_unit}")

pixel width = 0.06587000409777295 µm
pixel height = 0.06587000409777295 µm
pixel depth = 0.06587 µm
frame = 5.0 min


### Data

In Pycellin, lineages are modeled as directed acyclic graphs. It means that division events are allowed; they even are recommended if you want to take full advantage of Pycellin. **However, FUSION EVENTS ARE NOT SUPPORTED.** Fusion events happen when a cell has more than one parent. If you try to use Pycellin on a lineage with fusion events, it may crash or produce incorrect results, especially if you are computing features related to tracking.

Below is an example of a dataset with fusions. If Pycellin detects fusions when loading data from an external tool, a warning will be raised. It is then up to you to decide if you want to proceed or to correct the fusions. However it is highly recommended to correct them.

In [11]:
trackmate_xml_fusions = "../sample_data/Ecoli_growth_on_agar_pad_with_fusions.xml"
fusion_model = pc.load_TrackMate_XML(trackmate_xml_fusions)



You can manually check if a model contains fusions and what are the cells involved with the `get_fusions()` method:

In [12]:
fusion_model.get_fusions()

[Cell(cell_ID=9065, lineage_ID=0),
 Cell(cell_ID=9232, lineage_ID=1),
 Cell(cell_ID=9257, lineage_ID=1)]

To correct the fusions, you can either go back to clean your data with your software of choice or remove one incoming link per fusion cell directly in Pycellin.

Fusions or not, cell lineages data and cell cycle lineages data are stored in a `Data` object, accessible by calling `data` on the model:

In [13]:
print(tm_model.data)

Data object with 3 cell lineages.


Both types of lineages are implemented in Pycellin as [NetworkX](https://networkx.org/) DiGraphs. Each lineage is a connected graph built of nodes linked by edges. You can refer to [NetworkX documentation](https://networkx.org/documentation/stable/tutorial.html) for specifics regarding implementation.

#### Cell lineages

In cell lineages, each node of the lineage graph models a cell at a specific point in time and space. Each edge models the displacement of a cell in time and space.

All cell lineages are stored in `cell_data`, a dictionnary where keys are the IDs of the lineages, and values the lineages themselves (instances of the CellLineage class):

In [14]:
# The dict of all lineages.
tm_model.data.cell_data

{0: <pycellin.classes.lineage.CellLineage at 0x7f8e71e14750>,
 1: <pycellin.classes.lineage.CellLineage at 0x7f8e7975c950>,
 2: <pycellin.classes.lineage.CellLineage at 0x7f8e718f4490>}

In [15]:
# The lineage whose lineage_ID is 1.
lin1 = tm_model.data.cell_data[1]
print(lin1)

CellLineage of ID 1 named Track_1 with 189 cells and 188 links.


IDs of lineages (`lineage_ID`) are always integer. Lineage IDs are often positive, but not necessarily. Pycellin convention is to attribute a negative ID to one-cell lineage: minus the ID of the cell itself. That way it is easier to distinguish between one-cell lineage and "true" lineage.

In [16]:
# TODO: add a method to get the lineage by its lineage_ID

##### Plotting a cell lineage

Pycellin offers basic plotting of lineages thanks to the `plot()` method. Cell lineages are plotted like family trees, with time flowing from top to bottom.

Plotting relies on [Plotly](https://plotly.com/python/), an interactive graphing library. You can zoom in and out or pan on the lineage and adjust the axes. Some data is displayed when hovering on cells. Clicking on the legend elements on the right hide or display them.

In [17]:
lin1.plot()

It is possible to customize and enrich the plot with more information.  
Below is an example where cells are colored according to their area and where cell area is displayed when the cursor is hovering on cells. In this example, area data was extracted from TrackMate (area is called AREA in TrackMate).

In [18]:
lin_ID = lin1.graph["lineage_ID"]
lin1.plot(
    title=f"Cell lineage of ID = {lin_ID}",  # title of the plot
    node_colormap_feature="AREA",  # feature to use to color the nodes
    node_color_scale="plotly3",  # color scale for the node colors
    node_hover_features=["cell_ID", "AREA"],  # which features to display
)

See Pycellin `plot()` documentation for more information on the available customization parameters.

##### Accessing data in a cell lineage

Just as lineages are identified by their `lineage_ID`, cells are identified by their `cell_ID`, a positive integer. Links between cells (edges) are identified by a tuple of their incoming cell ID and their outgoing cell ID, in this order: `(incoming_cell_ID, outgoing_cell_ID)`.

Data can be stored at the cell level, at the link level or at the lineage level depending on the information you want to store. Since lineage are implemented as NetworkX directed acyclic graphs, you can access any features (called attributes in NetworkX) like in any other NetworkX graph (cf [NetworkX tutorial](https://networkx.org/documentation/stable/tutorial.html)).

For example, if we take the first cell of the lineage above, we can list all the features associated to the cell (node) as well as their values:

In [19]:
cell_ID = 8985
lin1.nodes[8985]

{'name': 'ID8985',
 'STD_INTENSITY_CH1': 0.7303613935746522,
 'SOLIDITY': 0.8801470588235286,
 'STD_INTENSITY_CH2': 5.07251276088164,
 'QUALITY': 605.0,
 'POSITION_T': 0.0,
 'TOTAL_INTENSITY_CH2': 67609.0,
 'TOTAL_INTENSITY_CH1': 2920.0,
 'CONTRAST_CH1': -0.9910059194034286,
 'ELLIPSE_MINOR': 0.33679824175103523,
 'ELLIPSE_THETA': -0.38208393640136984,
 'ELLIPSE_Y0': -0.016039040053655834,
 'CIRCULARITY': 0.3340965639336479,
 'AREA': 2.596806177744612,
 'ELLIPSE_MAJOR': 2.448401561458347,
 'CONTRAST_CH2': -0.9914983350220509,
 'MEAN_INTENSITY_CH1': 4.891122278056951,
 'MAX_INTENSITY_CH2': 130.0,
 'MEAN_INTENSITY_CH2': 113.24790619765494,
 'MAX_INTENSITY_CH1': 5.0,
 'MIN_INTENSITY_CH2': 100.0,
 'MIN_INTENSITY_CH1': 0.0,
 'SNR_CH1': -1475.7751085884477,
 'ELLIPSE_X0': 0.03782613542727112,
 'SHAPE_INDEX': 6.132942987585787,
 'SNR_CH2': -5207.449068381545,
 'MEDIAN_INTENSITY_CH1': 5.0,
 'VISIBILITY': 1,
 'RADIUS': 0.9091694445367442,
 'MEDIAN_INTENSITY_CH2': 113.0,
 'ELLIPSE_ASPECTRATIO': 

To get a specific feature value:

In [20]:
lin1.nodes[8985]["AREA"]

2.596806177744612

This is the same for links (edges):

In [21]:
lin1.edges[8985, 9015]  # link between cells 8985 and 9015

{'SPOT_SOURCE_ID': 8985,
 'SPOT_TARGET_ID': 9015,
 'LINK_COST': 0.5539550418894829,
 'DIRECTIONAL_CHANGE_RATE': nan,
 'SPEED': 0.05992454257732503,
 'DISPLACEMENT': 0.2996227128866252,
 'EDGE_TIME': 2.5,
 'location': (11.133820855399392, 21.374861395496293, 0.0)}

In [22]:
lin1.edges[8985, 9015]["DISPLACEMENT"]  # feature loaded from TrackMate

0.2996227128866252

And for lineages:

In [23]:
lin1.graph

{'name': 'Track_1',
 'TRACK_INDEX': 1,
 'DIVISION_TIME_MEAN': 15.37037037037037,
 'DIVISION_TIME_STD': 7.195875678513824,
 'NUMBER_SPOTS': 189,
 'NUMBER_GAPS': 0,
 'NUMBER_SPLITS': 28,
 'NUMBER_MERGES': 0,
 'NUMBER_COMPLEX': 0,
 'LONGEST_GAP': 0,
 'TRACK_DURATION': 110.0,
 'TRACK_START': 0.0,
 'TRACK_STOP': 110.0,
 'TRACK_DISPLACEMENT': 10.24707606539736,
 'TRACK_MEAN_SPEED': 0.24165350122434,
 'TRACK_MAX_SPEED': 1.5301701671007144,
 'TRACK_MIN_SPEED': 0.01049273749583099,
 'TRACK_MEDIAN_SPEED': 0.16348567401042208,
 'TRACK_STD_SPEED': 0.2204287856906754,
 'TRACK_MEAN_QUALITY': 817.8201058201058,
 'TOTAL_DISTANCE_TRAVELED': 227.15429115087966,
 'MAX_DISTANCE_TRAVELED': 24.429157080704854,
 'CONFINEMENT_RATIO': 0.04511064269787922,
 'MEAN_STRAIGHT_LINE_SPEED': 0.09315523695815782,
 'LINEARITY_OF_FORWARD_PROGRESSION': 0.38549094669096795,
 'MEAN_DIRECTIONAL_CHANGE_RATE': 0.23933007611391033,
 'lineage_ID': 1,
 'location': (11.299502880687372, 20.138498761300276, 0.0),
 'FilteredTrack': T

In [24]:
lin1.graph["NUMBER_SPOTS"]  # number of cells in the lineage, from TrackMate

189

Aside from accessing data stored in lineages, Pycellin allows you to retrieve noteworthy lineage elements.

In our lineage plotted above, we can easily identify the cells that are dividing with `get_divisions()`:

In [25]:
print(lin1.get_divisions())

[9249, 9267, 9269, 9276, 9291, 9304, 9314, 9316, 9331, 9348, 9359, 9367, 9383, 9433, 9447, 9501, 8992, 9022, 9045, 9057, 9098, 9105, 9109, 9157, 9158, 9165, 9183, 9202]


We can similarly identify the first cell of the lineage (root of the graph) and the last cells (leaves of the graph):

In [26]:
print(f"First cell: {lin1.get_root()}")
print(f"Last cells: {lin1.get_leaves()}")

First cell: 8985
Last cells: [9386, 9387, 9391, 9393, 9396, 9399, 9400, 9401, 9403, 9404, 9411, 9421, 9422, 9423, 9425, 9426, 9427, 9442, 9452, 9454, 9455, 9458, 9459, 9466, 9468, 9472, 9474, 9478, 9503]


We can list all the ancestor cells of a specific cell, in chronological order (from the root of the lineage to the cell of interest):

In [27]:
cell_ID = 9197
print(lin1.get_ancestors(cell_ID))

[8985, 9015, 8992, 8991, 9000, 9011, 9021, 9130, 9045, 9030, 9066, 9054, 9105, 9092, 9139, 9171, 9278]


Or the descendants, unordered:

In [28]:
print(lin1.get_descendants(cell_ID))

[9504, 9314, 9411, 9484, 9422, 9359, 9231, 9490, 9370, 9468]


We can also identify sister cells, i.e. cells that are on the same frame and share the same parent cell:

In [29]:
cell_ID = 8991
lin1.get_sister_cells(cell_ID)

[8997]

Or all the cells in a cell cycle, from the cell just after division to the next division, in chronological order:

In [30]:
lin1.get_cell_cycle(cell_ID)

[8991, 9000, 9011, 9021, 9130, 9045]

Similarly, we can list all cell cycles in a lineage with `get_cell_cycles()`:

In [31]:
lin1.get_cell_cycles()

[[9079, 9117, 9153, 9249],
 [9100, 9150, 9164, 9267],
 [9152, 9166, 9269],
 [9245, 9276],
 [9260, 9179, 9217, 9291],
 [9280, 9189, 9257, 9304],
 [9092, 9139, 9171, 9278, 9197, 9231, 9314],
 [9273, 9198, 9232, 9316],
 [9253, 9177, 9237, 9288, 9331],
 [9186, 9251, 9301, 9348],
 [9359],
 [9204, 9236, 9311, 9367],
 [9261, 9180, 9239, 9295, 9383],
 [9205, 9238, 9285, 9323, 9433],
 [9248, 9300, 9337, 9447],
 [9241, 9298, 9346, 9501],
 [8997, 9002, 9009, 9022],
 [8991, 9000, 9011, 9021, 9130, 9045],
 [9132, 9046, 9034, 9069, 9057],
 [9033, 9070, 9058, 9110, 9098],
 [9030, 9066, 9054, 9105],
 [9128, 9044, 9032, 9068, 9056, 9109],
 [9094, 9141, 9157],
 [9097, 9085, 9125, 9158],
 [9154, 9165],
 [9107, 9087, 9129, 9161, 9264, 9183],
 [9202]]

Finally, we can find if a cell is a root, a leaf or a division with `is_root()`, `is_leaf()` or `is_division()` respectively:

In [32]:
cell_ID = 9105
print(f"For cell {cell_ID}:")
print(f"  is root? {lin1.is_root(cell_ID)}")
print(f"  is leaf? {lin1.is_leaf(cell_ID)}")
print(f"  is division? {lin1.is_division(cell_ID)}")
print()

cell_ID = 9401
print(f"For cell {cell_ID}:")
print(f"  is root? {lin1.is_root(cell_ID)}")
print(f"  is leaf? {lin1.is_leaf(cell_ID)}")
print(f"  is division? {lin1.is_division(cell_ID)}")

For cell 9105:
  is root? False
  is leaf? False
  is division? True

For cell 9401:
  is root? False
  is leaf? True
  is division? False


#### Cell cycle lineages

Cell cycle lineages, or cycle lineages for short, are like cell lineages except that a node models a whole cell cycle instead of a cell at a specific time point.

When creating a model or loading data for the first time, there is no cycle lineage data. You don't always needs it: it depends on the goals of your exploration / analysis.  
If you need it, you can compute and add it with `add_cycle_data()`:


In [33]:
tm_model.add_cycle_data()

Similar to cell lineages, all cycle lineages are stored in `cycle_data`, a dictionnary where keys are the IDs of the lineages, and values the lineages themselves (instances of the CycleLineage class):

In [34]:
# The dict of all lineages.
tm_model.data.cycle_data

{0: <pycellin.classes.lineage.CycleLineage at 0x7f8ea856f710>,
 1: <pycellin.classes.lineage.CycleLineage at 0x7f8ea857cc90>,
 2: <pycellin.classes.lineage.CycleLineage at 0x7f8ea8583350>}

In [35]:
# The cycle lineage whose lineage_ID is 1.
clin1 = tm_model.data.cycle_data[1]
print(clin1)

CycleLineage of ID 1 with 57 cell cycles and 56 links.


Cell lineages and cycle lineages share their `lineage_ID`. Thanks to that, it is easier to work with both cell and cycle lineages in parallel. A cell lineage and a cycle lineage with the same ID are just 2 different views of the same underlying lineage.

In [36]:
lin_ID = 0
lin0 = tm_model.data.cell_data[lin_ID]
clin0 = tm_model.data.cycle_data[lin_ID]
print(
    f"Lineage of ID {lin_ID} is made of {len(lin0)} cells and has {len(clin0)} cell cycles."
)

Lineage of ID 0 is made of 152 cells and has 37 cell cycles.


In [37]:
# TODO: add a method to get the lineage by its lineage_ID

Cycle lineages are view only: they cannot be modified. Cycle data must stay mapped onto cell data to ensure data consistency.

##### Plotting a cycle lineage

Cycle lineages can be plotted like cell lineages. However, while time still flows from top to bottom in the plot, it doesn't flow homogeneously between the different branches since cell cycles do not have the same length. Consequently, the y-axis represents the level of the cell cycle (i.e. how many cell cycles before the current cycle) instead of time.

In [38]:
clin1.plot()

In [39]:
lin_ID = lin1.graph["lineage_ID"]
clin1.plot(
    title=f"Cycle lineage of ID = {lin_ID}",  # title of the plot
    width=1500,  # width of the plot
    height=600,  # height of the plot
    node_text="cycle_ID",  # feature to use as text for the nodes
    node_text_font={"color": "white"},  # style of the nodes text
    node_marker_style={"size": 35},  # style of the nodes markers
    node_colormap_feature="cycle_length",  # feature to use to color the nodes
    node_color_scale="plotly3",  # color scale for the node colors
    node_hover_features=["cycle_ID", "cycle_length"],  # features to display
)

##### Accessing data in a cycle lineage

In cycle lineages, cell cycles (nodes) are identified by their `cycle_ID`. The `cycle_ID` is the `cell_ID` of the last cell of the cell cycle, i.e. the division cell.

Otherwise, cycle lineages can be manipulated like cell lineages.

In [40]:
cycle_ID = 9022

# All features stored in the cell cycle (node).
print(clin1.nodes[cycle_ID])
# Accessing a specific feature value.
print(clin1.nodes[cycle_ID]["cells"])

{'lineage_ID': 1, 'cycle_ID': 9022, 'cells': [8997, 9002, 9009, 9022], 'cycle_length': 4, 'level': 1}
[8997, 9002, 9009, 9022]


For now, there is no features stored in the cycle lineages links and only one feature, the `lineage_ID`, in the lineage itself. But accessing the links and lineages features dictionnary is still possible:

In [41]:
# Features for links between cell cycles (edges).
print(clin1.edges[cycle_ID, 9057])

# Features for the cell cycle lineage (graph).
print(clin1.graph)
print(clin1.graph["lineage_ID"])

{}
{'lineage_ID': 1}
1


Like for cell lineages, noteworthy cycle lineage elements can be retrieved. Some methods, like `get_divisions()` or `get_sister_cells()` do not make sense on a cycle lineage and are not defined. Others are used in the same way on cell or cycle lineages.

In [42]:
cycle_ID = 9158

print(f"First cycle: {clin1.get_root()}")
print(f"Last cycles: {clin1.get_leaves()}")

print(f"Ancestor cycles of cycle {cycle_ID}: {clin1.get_ancestors(cycle_ID)}")
print(
    f"Descendant cycles of cycle {cycle_ID}: {lin1.get_descendants(cycle_ID)}")

print(f"Cycle is root? {clin1.is_root(cycle_ID)}")
print(f"Cycle is leaf? {clin1.is_leaf(cycle_ID)}")

First cycle: 8992
Last cycles: [9386, 9387, 9391, 9393, 9396, 9399, 9400, 9401, 9403, 9404, 9411, 9421, 9422, 9423, 9425, 9426, 9427, 9442, 9452, 9454, 9455, 9458, 9459, 9466, 9468, 9472, 9474, 9478, 9503]
Ancestor cycles of cycle 9158: [8992, 9022, 9057]
Descendant cycles of cycle 9158: [9443, 9253, 9383, 9288, 9386, 9452, 9261, 9454, 9295, 9391, 9331, 9237, 9430, 9239, 9496, 9177, 9499, 9180]
Cycle is root? False
Cycle is leaf? False


### Declaration of Features

The declaration of features is a Python object, an instance of the class `FeaturesDeclaration`. It holds all the information about the features that have been or will be computed on your data: name, description, units... 

In [43]:
# TODO: to finish when the features dict will be updated

Feature names must be unique: two features cannot share the same name. However, the same feature can apply to different lineage elements. For example, the `lineage_ID` feature is a lineage feature, but also a node feature to ease processing.

## Modification of the lineages


*add node, edge, lineage...*  
*remove...*

## Managing features

*Just the basics here, ref the features notebooks*  
[Managing features](./Managing%20features.ipynb)  
[Advanced - Custom features](Advanced%20-%20Custom%20features.ipynb)

Like previous section, we are using the TrackMate model as example.

In [44]:
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


## Export

### Pickled Pycellin model

You can save a model on disk anytime with the `save_to_pickle()` method:

In [45]:
my_model.save_to_pickle("../sample_data/FakeTracks_saved.pickle")

A Pycellin model is a complex Python object that can be serialized. Pickling a model is a lossless way to save the model on disk for later use.

However, as stated in [Loading from a Pycellin pickle file](#Loading-from-a-Pycellin-pickle-file), `pickle` module is not secure. Malicious code could be executed when unpickling a file from an unknown source. Because of this safety issue, pickle is not the preferred format for sharing a model with the community.

Indeed, the intended use of `save_to_pickle()` and `load_from_pickle()` is to allow you to save your model whenever you want and to be able to resume working on it at a later time or in another Python session. 

### Tables: DataFrames and CSVs

Lineage data from a Pycellin model can be exported into [Pandas](https://pandas.pydata.org/) [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) and saved as comma-separated values (CSV) files.  

Each type of lineage elements generates a different kind of table, as reviewed below.

#### Cells table

A cell table contains all the features stored in the **cells (nodes) of the cell lineages** of the model. There is one row per cell and one column per cell feature. The DataFrame is ordered by increasing `lineage_ID` first, then by `frame`, then by `cell_ID`.

In [46]:
cell_df = tm_model.to_cell_dataframe()
print(cell_df.shape)
print(cell_df.columns)

(526, 38)
Index(['lineage_ID', 'frame', 'cell_ID', 'name', 'STD_INTENSITY_CH1',
       'SOLIDITY', 'STD_INTENSITY_CH2', 'QUALITY', 'POSITION_T',
       'TOTAL_INTENSITY_CH2', 'TOTAL_INTENSITY_CH1', 'CONTRAST_CH1',
       'ELLIPSE_MINOR', 'ELLIPSE_THETA', 'ELLIPSE_Y0', 'CIRCULARITY', 'AREA',
       'ELLIPSE_MAJOR', 'CONTRAST_CH2', 'MEAN_INTENSITY_CH1',
       'MAX_INTENSITY_CH2', 'MEAN_INTENSITY_CH2', 'MAX_INTENSITY_CH1',
       'MIN_INTENSITY_CH2', 'MIN_INTENSITY_CH1', 'SNR_CH1', 'ELLIPSE_X0',
       'SHAPE_INDEX', 'SNR_CH2', 'MEDIAN_INTENSITY_CH1', 'VISIBILITY',
       'RADIUS', 'MEDIAN_INTENSITY_CH2', 'ELLIPSE_ASPECTRATIO', 'PERIMETER',
       'ROI_N_POINTS', 'ROI_coords', 'location'],
      dtype='object')


In [47]:
cell_df.head()

Unnamed: 0,lineage_ID,frame,cell_ID,name,STD_INTENSITY_CH1,SOLIDITY,STD_INTENSITY_CH2,QUALITY,POSITION_T,TOTAL_INTENSITY_CH2,...,SNR_CH2,MEDIAN_INTENSITY_CH1,VISIBILITY,RADIUS,MEDIAN_INTENSITY_CH2,ELLIPSE_ASPECTRATIO,PERIMETER,ROI_N_POINTS,ROI_coords,location
0,0,0,8993,ID8993,6.578459,0.854065,5.656838,601.0,0.0,70098.0,...,-4728.824091,44.0,1,0.921242,114.0,6.70309,10.285699,52,"[(-1.5894197951076556, 1.1551815744407925), (-...","(15.389185653591088, 19.363324702015483, 0.0)"
1,0,1,9013,ID9013,2.968181,0.916279,5.726043,293.0,5.0,34098.0,...,-4761.822554,15.0,1,0.638839,115.0,4.279183,5.944787,30,"[(-0.727319272545234, 0.8544524671426856), (-0...","(16.83253527445072, 18.214913719162585, 0.0)"
2,0,1,9014,ID9014,0.0,0.881818,5.309714,291.0,5.0,33523.0,...,-5171.331354,15.0,1,0.633956,115.0,3.903724,5.74794,34,"[(-0.6141453589387584, 0.5208483829178867), (-...","(14.282171209226647, 19.470697860756204, 0.0)"
3,0,2,8986,ID8986,0.0,0.915761,5.841083,338.0,10.0,39655.0,...,-4695.084469,3.0,1,0.682225,117.0,4.359176,6.355476,31,"[(-0.6357009198638028, 0.43737422107603763), (...","(13.908506745565052, 19.29069200620696, 0.0)"
4,0,2,8989,ID8989,0.238197,0.842262,5.303366,288.0,10.0,32526.0,...,-5232.460649,4.0,1,0.625181,115.0,5.28734,6.341672,35,"[(-0.801263212391051, 0.8421507178785816), (-0...","(16.774739206100993, 18.029605456133368, 0.0)"


You can then process the DataFrame however you want and save it as a CSV file.

In [48]:
filename = tm_model.metadata["name"] + "_cell_df.csv"
cell_df.to_csv("../sample_data/" + filename, index=False)

#### Links table

A link table contains all the features stored in the **links (edges) of the cell lineages** of the model. There is one row per link and one column per link feature. The DataFrame is ordered by increasing `lineage_ID`.

In [49]:
link_df = tm_model.to_link_dataframe()
print(link_df.shape)
print(link_df.columns)

(523, 11)
Index(['lineage_ID', 'source_cell_ID', 'target_cell_ID', 'SPEED', 'EDGE_TIME',
       'LINK_COST', 'DISPLACEMENT', 'SPOT_SOURCE_ID', 'location',
       'DIRECTIONAL_CHANGE_RATE', 'SPOT_TARGET_ID'],
      dtype='object')


In [50]:
link_df.head()

Unnamed: 0,lineage_ID,source_cell_ID,target_cell_ID,SPEED,EDGE_TIME,LINK_COST,DISPLACEMENT,SPOT_SOURCE_ID,location,DIRECTIONAL_CHANGE_RATE,SPOT_TARGET_ID
0,0,9216,9290,0.087533,92.5,0.3979,0.437667,9216,"(15.77647350663074, 18.700692441696226, 0.0)",0.5568,9290
1,0,9062,9051,0.065217,52.5,0.556544,0.326083,9062,"(16.867171098415724, 17.065684749491684, 0.0)",0.302041,9051
2,0,9064,9052,0.062491,52.5,0.526377,0.312457,9064,"(16.8868824795124, 18.09478336899743, 0.0)",0.387028,9052
3,0,9065,9053,0.059114,52.5,0.574426,0.295568,9065,"(13.29424444460342, 17.840100823207138, 0.0)",0.102747,9053
4,0,9067,9055,0.119222,52.5,0.460385,0.596109,9067,"(12.35043289411444, 19.726640016926822, 0.0)",0.049335,9055


In [51]:
filename = tm_model.metadata["name"] + "_link_df.csv"
link_df.to_csv("../sample_data/" + filename, index=False)

#### Cell cycles table

A cell cycle table contains all the features stored in the **cell cycles (nodes) of the cycle lineages** of the model. There is one row per cell cycle and one column per cell cycle feature. The DataFrame is ordered by increasing `lineage_ID` first, then by `level`, then by `cycle_ID`.

In [52]:
cycle_df = tm_model.to_cycle_dataframe()
print(cycle_df.shape)
print(cycle_df.columns)

(145, 5)
Index(['lineage_ID', 'level', 'cycle_ID', 'cells', 'cycle_length'], dtype='object')


In [53]:
cycle_df.head()

Unnamed: 0,lineage_ID,level,cycle_ID,cells,cycle_length
0,0,0,8993,[8993],1
1,0,1,9019,"[9013, 8989, 8996, 9001, 9010, 9019]",6
2,0,1,9020,"[9014, 8986, 8988, 8999, 9008, 9020]",6
3,0,2,9051,"[9121, 9040, 9027, 9062, 9051]",5
4,0,2,9052,"[9124, 9042, 9029, 9064, 9052]",5


In [54]:
filename = tm_model.metadata["name"] + "_cycle_df.csv"
cycle_df.to_csv("../sample_data/" + filename, index=False)

#### Lineages table

A lineage table contains all the features stored in the **cell lineages (graphs)** of the model. There is one row per lineage and one column per lineage feature. The DataFrame is ordered by increasing `lineage_ID`.

In [55]:
lin_df = tm_model.to_lineage_dataframe()
print(lin_df.shape)
print(lin_df.columns)

(3, 29)
Index(['lineage_ID', 'name', 'TRACK_INDEX', 'DIVISION_TIME_MEAN',
       'DIVISION_TIME_STD', 'NUMBER_SPOTS', 'NUMBER_GAPS', 'NUMBER_SPLITS',
       'NUMBER_MERGES', 'NUMBER_COMPLEX', 'LONGEST_GAP', 'TRACK_DURATION',
       'TRACK_START', 'TRACK_STOP', 'TRACK_DISPLACEMENT', 'TRACK_MEAN_SPEED',
       'TRACK_MAX_SPEED', 'TRACK_MIN_SPEED', 'TRACK_MEDIAN_SPEED',
       'TRACK_STD_SPEED', 'TRACK_MEAN_QUALITY', 'TOTAL_DISTANCE_TRAVELED',
       'MAX_DISTANCE_TRAVELED', 'CONFINEMENT_RATIO',
       'MEAN_STRAIGHT_LINE_SPEED', 'LINEARITY_OF_FORWARD_PROGRESSION',
       'MEAN_DIRECTIONAL_CHANGE_RATE', 'location', 'FilteredTrack'],
      dtype='object')


In [56]:
lin_df.head()

Unnamed: 0,lineage_ID,name,TRACK_INDEX,DIVISION_TIME_MEAN,DIVISION_TIME_STD,NUMBER_SPOTS,NUMBER_GAPS,NUMBER_SPLITS,NUMBER_MERGES,NUMBER_COMPLEX,...,TRACK_STD_SPEED,TRACK_MEAN_QUALITY,TOTAL_DISTANCE_TRAVELED,MAX_DISTANCE_TRAVELED,CONFINEMENT_RATIO,MEAN_STRAIGHT_LINE_SPEED,LINEARITY_OF_FORWARD_PROGRESSION,MEAN_DIRECTIONAL_CHANGE_RATE,location,FilteredTrack
0,0,Track_0,0,20.9375,5.543389,152,0,18,0,0,...,0.200327,767.657895,176.90183,21.091779,0.02115,0.034013,0.145166,0.22293,"(16.866163267759074, 17.21004009862445, 0.0)",True
1,1,Track_1,1,15.37037,7.195876,189,0,28,0,0,...,0.220429,817.820106,227.154291,24.429157,0.045111,0.093155,0.385491,0.23933,"(11.299502880687372, 20.138498761300276, 0.0)",True
2,2,Track_2,2,20.652174,9.083495,185,0,25,0,0,...,0.240662,760.610811,190.932312,16.092759,0.048998,0.085048,0.409802,0.242049,"(32.12555169642885, 36.83932678084876, 0.0)",True


In [57]:
filename = tm_model.metadata["name"] + "_lin_df.csv"
lin_df.to_csv("../sample_data/" + filename, index=False)

#### Exporting a subset of lineages

For each of the 4 methods above, `to_xxx_dataframe()`, all lineages in the model are processed. If you are interested in just a subset of lineages, you can choose the lineages by passing a list of the `lineage_ID`:

In [58]:
# Cell dataframe of lineage of ID = 0.
lin0_cell_df = tm_model.to_cell_dataframe([0])
print(lin0_cell_df.shape)
lin0_cell_df.head()

(152, 38)


Unnamed: 0,lineage_ID,frame,cell_ID,name,STD_INTENSITY_CH1,SOLIDITY,STD_INTENSITY_CH2,QUALITY,POSITION_T,TOTAL_INTENSITY_CH2,...,SNR_CH2,MEDIAN_INTENSITY_CH1,VISIBILITY,RADIUS,MEDIAN_INTENSITY_CH2,ELLIPSE_ASPECTRATIO,PERIMETER,ROI_N_POINTS,ROI_coords,location
0,0,0,8993,ID8993,6.578459,0.854065,5.656838,601.0,0.0,70098.0,...,-4728.824091,44.0,1,0.921242,114.0,6.70309,10.285699,52,"[(-1.5894197951076556, 1.1551815744407925), (-...","(15.389185653591088, 19.363324702015483, 0.0)"
1,0,1,9013,ID9013,2.968181,0.916279,5.726043,293.0,5.0,34098.0,...,-4761.822554,15.0,1,0.638839,115.0,4.279183,5.944787,30,"[(-0.727319272545234, 0.8544524671426856), (-0...","(16.83253527445072, 18.214913719162585, 0.0)"
2,0,1,9014,ID9014,0.0,0.881818,5.309714,291.0,5.0,33523.0,...,-5171.331354,15.0,1,0.633956,115.0,3.903724,5.74794,34,"[(-0.6141453589387584, 0.5208483829178867), (-...","(14.282171209226647, 19.470697860756204, 0.0)"
3,0,2,8986,ID8986,0.0,0.915761,5.841083,338.0,10.0,39655.0,...,-4695.084469,3.0,1,0.682225,117.0,4.359176,6.355476,31,"[(-0.6357009198638028, 0.43737422107603763), (...","(13.908506745565052, 19.29069200620696, 0.0)"
4,0,2,8989,ID8989,0.238197,0.842262,5.303366,288.0,10.0,32526.0,...,-5232.460649,4.0,1,0.625181,115.0,5.28734,6.341672,35,"[(-0.801263212391051, 0.8421507178785816), (-0...","(16.774739206100993, 18.029605456133368, 0.0)"


In [59]:
# Lineage dataframe of lineages of ID 0 and 2.
subset_lin_df = tm_model.to_lineage_dataframe([0, 2])
print(subset_lin_df.shape)
subset_lin_df.head()

(2, 29)


Unnamed: 0,lineage_ID,name,TRACK_INDEX,DIVISION_TIME_MEAN,DIVISION_TIME_STD,NUMBER_SPOTS,NUMBER_GAPS,NUMBER_SPLITS,NUMBER_MERGES,NUMBER_COMPLEX,...,TRACK_STD_SPEED,TRACK_MEAN_QUALITY,TOTAL_DISTANCE_TRAVELED,MAX_DISTANCE_TRAVELED,CONFINEMENT_RATIO,MEAN_STRAIGHT_LINE_SPEED,LINEARITY_OF_FORWARD_PROGRESSION,MEAN_DIRECTIONAL_CHANGE_RATE,location,FilteredTrack
0,0,Track_0,0,20.9375,5.543389,152,0,18,0,0,...,0.200327,767.657895,176.90183,21.091779,0.02115,0.034013,0.145166,0.22293,"(16.866163267759074, 17.21004009862445, 0.0)",True
1,2,Track_2,2,20.652174,9.083495,185,0,25,0,0,...,0.240662,760.610811,190.932312,16.092759,0.048998,0.085048,0.409802,0.242049,"(32.12555169642885, 36.83932678084876, 0.0)",True


### Tracking formats

Pycellin can export data to different external tracking file formats. It currently supports:
- Cell Tracking Challenge text files
- TrackMate XML files

More tracking file formats will be supported in the future.

#### Cell Tracking Challenge

Simply use the `export_CTC_file()` function to export a model to a CTC compatible text file:

In [None]:
filename = tm_model.metadata["name"] + "_CTC_export.txt"
ctc_out = "../sample_data/" + filename
pc.export_CTC_file(tm_model, ctc_out)

The CTC file format does not support features: only the topology of the lineages is exported. Moreover, `lineage_ID` and `cell_ID` are not carried over.

See the [CTC file format specifications](https://public.celltrackingchallenge.net/documents/Naming%20and%20file%20content%20conventions.pdf) if you want more information on the format.

#### TrackMate

A Pycellin model is exported as a TrackMate XML file with the function `export_TrackMate_XML()`. However, TrackMate requires features to be given in a unique temporal and a unique spatial unit while Pycellin features can each have different units. So you need to specify the space and time units for TrackMate and to ensure that your units are consistent across the Pycellin features of the model.

In [61]:
filename = tm_model.metadata["name"] + "_TrackMate_export.xml"
xml_out = "../sample_data/" + filename
pc.export_TrackMate_XML(
    tm_model, xml_out, {"spatialunits": "pixel", "temporalunits": "sec"}
)



Features added to the model with Pycellin will be carried over to TrackMate if the data type of the feature is integer, float or boolean. `lineage_ID` and `cell_ID` are converted into their TrackMate equivalent: `TRACK_ID` and `SPOT_ID` respectively.

For a more detailed explanation, please refer to the dedicated notebook: [Pycellin with TrackMate](./Pycellin%20with%20TrackMate.ipynb).