# Getting started

In this notebook we cover the basics of Pycellin:
- how to build or load a Pycellin model,
- how Pycellin models cell tracking data,
- how to manipulate, modify and enrich the cell lineages,
- how to export the data to other formats.

In [1]:
import pycellin

## How to get a Pycellin model

### Building from scratch

It is possible to build a Pycellin model manually, starting with an empty model.

In [2]:
my_model = pycellin.Model()
print(my_model)

Model with 0 lineage.


You can then fill up the metadata and build cell lineages by adding cells and links. This is covered in details in the notebook [Creating a model from scratch](./Creating%20a%20model%20from%20scratch.ipynb) (WIP).

### Loading from a Pycellin pickle file

You can load a model from a Pycellin pickle file saved on disk with the `load_from_pickle()` method:

In [3]:
pycellin_pickle = "../sample_data/FakeTracks.pickle"
my_model = pycellin.Model.load_from_pickle(pycellin_pickle)
print(my_model)

Model named 'FakeTracks' with 2 lineages, built from TrackMate.


However, please note that while `pickle` is a module of the Python Standard Library, is it not secure. **You should only load data from sources you trust.** Please refer to the [documentation of the `pickle` module](https://docs.python.org/3/library/pickle.html) for more information.

### Loading from external tools

Pycellin can load data from different tracking file formats. It currently supports:
- Cell Tracking Challenge text files
- TrackMate XML files

More tracking file formats will be supported in the future.

#### Cell Tracking Challenge

Tracking data formatted as per the [Cell Tracking Challenge](https://celltrackingchallenge.net/) file format [specifications](https://public.celltrackingchallenge.net/documents/Naming%20and%20file%20content%20conventions.pdf) can be loaded with
the `load_CTC_file()` function:

In [4]:
ctc_file = "../sample_data/FakeTracks_TMtoCTC.txt"
ctc_model = pycellin.load_CTC_file(ctc_file)
print(ctc_model)

Model named 'FakeTracks_TMtoCTC' with 2 lineages, built from CTC.


For this format, only track topology is read: no cell segmentation is extracted in the case of associated label images (this might get supported later if several people request it).

#### TrackMate



Data generated with [TrackMate](https://imagej.net/plugins/trackmate/) can be loaded into a Pycellin model thanks to the `load_TrackMate_XML()` function:

In [5]:
trackmate_xml = "../sample_data/Ecoli_growth_on_agar_pad.xml"
tm_model = pycellin.load_TrackMate_XML(trackmate_xml)
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


All data within the TrackMate XML file is loaded into the Pycellin model: there is no loss of information. Most notably, TrackMate properties are accessible within Pycellin under the same name (e.g. AREA, MEAN_INTENSITY_CH1).

To know more about the specifics of using Pycellin with TrackMate data, please refer to the dedicated notebook: [Working with TrackMate data](./Working%20with%20TrackMate%20data.ipynb).

## Pycellin model structure

In [6]:
# TODO: this section is really really long and I don't think a beginner
# need all of this information. I should synthetize it and move the long
# version into a dedicated notebook.

In this section, we are using the TrackMate model previously loaded as example.

In [7]:
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


A Pycellin model consists of 3 different elements that we describe in separate sections below:
- the metadata of the model
- the data, i.e. the lineages
- the declaration of the properties present in the lineages

*simplified scheme of a model*

### Model metadata

A pycellin model can stored model metadata, i.e. information about the model and the data of the model.

In Pycellin, the metadata is stored as a dictionary. It is accessible by calling `model_metadata` on your model:

In [8]:
for metadata_field in tm_model.model_metadata:
    print(metadata_field)

name
file_location
provenance
date
space_unit
time_unit
pycellin_version
TrackMate_version
time_step
pixel_size
Log
Settings
GUIState
DisplaySettings


In [9]:
print(tm_model.model_metadata["provenance"], tm_model.model_metadata["TrackMate_version"])

TrackMate 7.10.2


Some metadata fields are common to most Pycellin models, like `provenance`, `date` or `space_unit`. Others are specific to the way the model was built. For example, models coming from TrackMate have a `TrackMate_version` field and more (`Log`, `GUIState`...).

Since metadata is stored as a dictionary, it is versatile: you can store whatever information you find relevant. The more information you store, the better it is for traceability. For example, you could describe the different channels of your timelapse, or list some image acquisition parameters:

In [10]:
tm_model.model_metadata["channel1"] = "segmentation"
tm_model.model_metadata["channel2"] = "ZipA-mCherry"
tm_model.model_metadata["objective"] = "100x oil"

However, be careful not to delete the `space_unit`, `time_unit`, `time_step` and `pixel_size` fields when they exist since they can be needed when computing properties like `division_time`, `cell_displacement`...

You can access these important fields with the following methods:

In [11]:
pix_size = tm_model.get_pixel_size()
s_unit = tm_model.get_space_unit()
print(f"pixel width = {pix_size['width']} {s_unit}")
print(f"pixel height = {pix_size['height']} {s_unit}")
print(f"pixel depth = {pix_size['depth']} {s_unit}")

timestep = tm_model.get_time_step()
t_unit = tm_model.get_time_unit()
print(f"frame = {timestep} {t_unit}")

pixel width = 0.06587 micrometer
pixel height = 0.06587 micrometer
pixel depth = 0.06587 micrometer
frame = 5.0 minute


### Data

In Pycellin, lineages are modeled as directed acyclic graphs or DAGs for short (same as your family tree if we ignore spouses). It means that division events are allowed; they even are recommended if you want to take full advantage of Pycellin. **However, FUSION EVENTS ARE NOT SUPPORTED.** Fusion events happen when a cell has more than one parent. If you try to use Pycellin on a lineage with fusion events, it may crash or produce incorrect results, especially if you are computing properties related to tracking.

Below is an example of a dataset with fusions. If Pycellin detects fusions when loading data from an external tool, a warning will be raised. It is then up to you to decide if you want to proceed or to correct the fusions. However it is highly recommended to correct them.

In [12]:
trackmate_xml_fusions = "../sample_data/Ecoli_growth_on_agar_pad_with_fusions.xml"
fusion_model = pycellin.load_TrackMate_XML(trackmate_xml_fusions)



You can manually check if a model contains fusions and what are the cells involved with the `get_fusions()` method:

In [13]:
fusion_model.get_fusions()

[Cell(cell_ID=9065, lineage_ID=0),
 Cell(cell_ID=9232, lineage_ID=1),
 Cell(cell_ID=9257, lineage_ID=1)]

To correct the fusions, you can either go back to clean your data with your software of choice or remove one incoming link per fusion cell directly in Pycellin.

In [14]:
# TODO: a short explanation on cycle lineages is needed here!! And a nice transition.

Fusions or not, lineages data is stored in a `Data` object, accessible by calling `data` on the model:

In [15]:
print(tm_model.data)

Data object with 3 cell lineages.


Two types of lineages exist in Pycellin: cell lineages and cell cycle lineages. Both types of lineages are implemented in Pycellin as [NetworkX](https://networkx.org/) DiGraphs. Each lineage is a connected graph built of nodes linked by edges. You can refer to [NetworkX documentation](https://networkx.org/documentation/stable/tutorial.html) for specifics regarding implementation.

#### Cell lineages

In cell lineages, each node of the lineage graph models a cell at a specific point in time and space. Each edge models the displacement of a cell in time and space.

All cell lineages are stored in `cell_data`, a dictionary where keys are the IDs of the lineages, and values the lineages themselves (instances of the CellLineage class):

In [16]:
# The dict of all lineages.
tm_model.data.cell_data

{0: <pycellin.classes.lineage.CellLineage at 0x76a6f0803fd0>,
 1: <pycellin.classes.lineage.CellLineage at 0x76a6f0834950>,
 2: <pycellin.classes.lineage.CellLineage at 0x76a6f0850990>}

You can also get a list of the cell lineages with `get_cell_lineages()`:

In [17]:
cell_lins = tm_model.get_cell_lineages()
for lin in cell_lins:
    print(lin)

CellLineage of ID 0 with 152 cells and 151 links.
CellLineage of ID 1 with 189 cells and 188 links.
CellLineage of ID 2 with 185 cells and 184 links.


To access a specific cell lineage from its `lineage_ID`, you can either query the cell_data dictionary:

In [18]:
# The lineage whose lineage_ID is 1.
lin1 = tm_model.data.cell_data[1]
print(lin1)

CellLineage of ID 1 with 189 cells and 188 links.


or use the dedicated method:

In [19]:
# The lineage whose lineage_ID is 2.
lin2 = tm_model.get_cell_lineage_from_ID(2)
print(lin2)

CellLineage of ID 2 with 185 cells and 184 links.


IDs of lineages (`lineage_ID`) are always integer. Lineage IDs are often positive, but not necessarily. Pycellin convention is to attribute a negative ID to one-cell lineage: minus the ID of the cell itself. That way it is easier to distinguish between one-cell lineage and "true" lineage.

##### Plotting a cell lineage

Pycellin offers basic plotting of lineages thanks to the `plot()` method. Cell lineages are plotted like family trees, with time flowing from top to bottom.

Plotting relies on [Plotly](https://plotly.com/python/), an interactive graphing library. You can zoom in and out or pan on the lineage and adjust the axes. Some data is displayed when hovering on cells. Clicking on the legend elements on the right hide or display them.

In [20]:
lin1.plot()

It is possible to customize and enrich the plot with more information.  
Below is an example where cells are colored according to their area and where cell area is displayed when the cursor is hovering on cells. In this example, area data was extracted from TrackMate (area is called AREA in TrackMate).

In [21]:
lin_ID = lin1.graph["lineage_ID"]
lin1.plot(
    title=f"Cell lineage of ID = {lin_ID}",  # title of the plot
    node_colormap_prop="AREA",  # property to use to color the nodes
    node_color_scale="plotly3",  # color scale for the node colors
    node_hover_props=["cell_ID", "AREA"],  # which properties to display
)

See Pycellin `plot()` documentation for more information on the available customization parameters.

##### Accessing data in a cell lineage

Just as lineages are identified by their `lineage_ID`, cells are identified by their `cell_ID`, a positive integer. Links between cells (edges) are identified by a tuple of their incoming cell ID and their outgoing cell ID, in this order: `(source_cell_ID, target_cell_ID)`.

Data can be stored at the cell level, at the link level or at the lineage level depending on the information you want to store. Since lineages are implemented as NetworkX DAGs, you can access any properties (called attributes in NetworkX) like in any other NetworkX graph (cf [NetworkX tutorial](https://networkx.org/documentation/stable/tutorial.html)).

For example, if we take the first cell of the lineage above, we can list all the properties associated to the cell (node) as well as their values:

In [22]:
cell_ID = 8985
lin1.nodes[8985]

{'STD_INTENSITY_CH1': 0.7303613935746522,
 'SOLIDITY': 0.8801470588235286,
 'STD_INTENSITY_CH2': 5.07251276088164,
 'QUALITY': 605.0,
 'POSITION_T': 0.0,
 'TOTAL_INTENSITY_CH2': 67609.0,
 'TOTAL_INTENSITY_CH1': 2920.0,
 'CONTRAST_CH1': -0.9910059194034286,
 'ELLIPSE_MINOR': 0.33679824175103523,
 'ELLIPSE_THETA': -0.38208393640136984,
 'ELLIPSE_Y0': -0.016039040053655834,
 'CIRCULARITY': 0.3340965639336479,
 'AREA': 2.596806177744612,
 'ELLIPSE_MAJOR': 2.448401561458347,
 'CONTRAST_CH2': -0.9914983350220509,
 'MEAN_INTENSITY_CH1': 4.891122278056951,
 'MAX_INTENSITY_CH2': 130.0,
 'MEAN_INTENSITY_CH2': 113.24790619765494,
 'MAX_INTENSITY_CH1': 5.0,
 'MIN_INTENSITY_CH2': 100.0,
 'MIN_INTENSITY_CH1': 0.0,
 'SNR_CH1': -1475.7751085884477,
 'ELLIPSE_X0': 0.03782613542727112,
 'SHAPE_INDEX': 6.132942987585787,
 'SNR_CH2': -5207.449068381545,
 'MEDIAN_INTENSITY_CH1': 5.0,
 'VISIBILITY': 1,
 'RADIUS': 0.9091694445367442,
 'MEDIAN_INTENSITY_CH2': 113.0,
 'ELLIPSE_ASPECTRATIO': 7.269638786500052,


To get a specific property value:

In [23]:
lin1.nodes[8985]["AREA"]

2.596806177744612

This is the same for links (edges):

In [24]:
lin1.edges[8985, 9015]  # link between cells 8985 and 9015

{'SPOT_SOURCE_ID': 8985,
 'SPOT_TARGET_ID': 9015,
 'LINK_COST': 0.5539550418894829,
 'DIRECTIONAL_CHANGE_RATE': nan,
 'SPEED': 0.05992454257732503,
 'DISPLACEMENT': 0.2996227128866252,
 'EDGE_TIME': 2.5,
 'link_x': 11.133820855399392,
 'link_y': 21.374861395496293,
 'link_z': 0.0}

In [25]:
lin1.edges[8985, 9015]["DISPLACEMENT"]  # property loaded from TrackMate

0.2996227128866252

And for lineages:

In [26]:
lin1.graph

{'TRACK_INDEX': 1,
 'DIVISION_TIME_MEAN': 15.37037037037037,
 'DIVISION_TIME_STD': 7.195875678513824,
 'NUMBER_SPOTS': 189,
 'NUMBER_GAPS': 0,
 'NUMBER_SPLITS': 28,
 'NUMBER_MERGES': 0,
 'NUMBER_COMPLEX': 0,
 'LONGEST_GAP': 0,
 'TRACK_DURATION': 110.0,
 'TRACK_START': 0.0,
 'TRACK_STOP': 110.0,
 'TRACK_DISPLACEMENT': 10.24707606539736,
 'TRACK_MEAN_SPEED': 0.24165350122434,
 'TRACK_MAX_SPEED': 1.5301701671007144,
 'TRACK_MIN_SPEED': 0.01049273749583099,
 'TRACK_MEDIAN_SPEED': 0.16348567401042208,
 'TRACK_STD_SPEED': 0.2204287856906754,
 'TRACK_MEAN_QUALITY': 817.8201058201058,
 'TOTAL_DISTANCE_TRAVELED': 227.15429115087966,
 'MAX_DISTANCE_TRAVELED': 24.429157080704854,
 'CONFINEMENT_RATIO': 0.04511064269787922,
 'MEAN_STRAIGHT_LINE_SPEED': 0.09315523695815782,
 'LINEARITY_OF_FORWARD_PROGRESSION': 0.38549094669096795,
 'MEAN_DIRECTIONAL_CHANGE_RATE': 0.23933007611391033,
 'lineage_ID': 1,
 'lineage_name': 'Track_1',
 'lineage_x': 11.299502880687372,
 'lineage_y': 20.138498761300276,
 'l

In [27]:
lin1.graph["NUMBER_SPOTS"]  # number of cells in the lineage, from TrackMate

189

Aside from accessing data stored in lineages, Pycellin allows you to retrieve noteworthy lineage elements.

In our lineage plotted above, we can easily identify the cells that are dividing with `get_divisions()`:

In [28]:
print(lin1.get_divisions())

[9249, 9267, 9269, 9276, 9291, 9304, 9314, 9316, 9331, 9348, 9359, 9367, 9383, 9433, 9447, 9501, 8992, 9022, 9045, 9057, 9098, 9105, 9109, 9157, 9158, 9165, 9183, 9202]


We can similarly identify the first cell of the lineage (root of the graph) and the last cells (leaves of the graph):

In [29]:
print(f"First cell: {lin1.get_root()}")
print(f"Last cells: {lin1.get_leaves()}")

First cell: 8985
Last cells: [9386, 9387, 9391, 9393, 9396, 9399, 9400, 9401, 9403, 9404, 9411, 9421, 9422, 9423, 9425, 9426, 9427, 9442, 9452, 9454, 9455, 9458, 9459, 9466, 9468, 9472, 9474, 9478, 9503]


We can list all the ancestor cells of a specific cell, in chronological order (from the root of the lineage to the cell of interest):

In [30]:
cell_ID = 9197
print(lin1.get_ancestors(cell_ID))

[8985, 9015, 8992, 8991, 9000, 9011, 9021, 9130, 9045, 9030, 9066, 9054, 9105, 9092, 9139, 9171, 9278]


Or the descendants, unordered:

In [31]:
print(lin1.get_descendants(cell_ID))

[9504, 9314, 9411, 9484, 9422, 9359, 9231, 9490, 9370, 9468]


We can also identify sister cells, i.e. cells that are on the same frame and share the same parent cell:

In [32]:
cell_ID = 8991
lin1.get_sister_cells(cell_ID)

[8997]

Or all the cells in a cell cycle, from the cell just after division to the next division, in chronological order:

In [33]:
lin1.get_cell_cycle(cell_ID)

[8991, 9000, 9011, 9021, 9130, 9045]

Similarly, we can list all cell cycles in a lineage with `get_cell_cycles()`:

In [34]:
lin1.get_cell_cycles()

[[9079, 9117, 9153, 9249],
 [9100, 9150, 9164, 9267],
 [9152, 9166, 9269],
 [9245, 9276],
 [9260, 9179, 9217, 9291],
 [9280, 9189, 9257, 9304],
 [9092, 9139, 9171, 9278, 9197, 9231, 9314],
 [9273, 9198, 9232, 9316],
 [9253, 9177, 9237, 9288, 9331],
 [9186, 9251, 9301, 9348],
 [9359],
 [9204, 9236, 9311, 9367],
 [9261, 9180, 9239, 9295, 9383],
 [9205, 9238, 9285, 9323, 9433],
 [9248, 9300, 9337, 9447],
 [9241, 9298, 9346, 9501],
 [8985, 9015, 8992],
 [8997, 9002, 9009, 9022],
 [8991, 9000, 9011, 9021, 9130, 9045],
 [9132, 9046, 9034, 9069, 9057],
 [9033, 9070, 9058, 9110, 9098],
 [9030, 9066, 9054, 9105],
 [9128, 9044, 9032, 9068, 9056, 9109],
 [9094, 9141, 9157],
 [9097, 9085, 9125, 9158],
 [9154, 9165],
 [9107, 9087, 9129, 9161, 9264, 9183],
 [9202],
 [9430, 9386],
 [9333, 9435, 9387],
 [9443, 9391],
 [9393],
 [9453, 9396],
 [9352, 9473, 9399],
 [9353, 9477, 9400],
 [9349, 9479, 9401],
 [9403],
 [9361, 9486, 9404],
 [9490, 9411],
 [9369, 9502, 9421],
 [9370, 9504, 9422],
 [9242, 9287,

Finally, we can find if a cell is a root, a leaf or a division with `is_root()`, `is_leaf()` or `is_division()` respectively:

In [35]:
cell_ID = 9105
print(f"For cell {cell_ID}:")
print(f"  is root? {lin1.is_root(cell_ID)}")
print(f"  is leaf? {lin1.is_leaf(cell_ID)}")
print(f"  is division? {lin1.is_division(cell_ID)}")
print()

cell_ID = 9401
print(f"For cell {cell_ID}:")
print(f"  is root? {lin1.is_root(cell_ID)}")
print(f"  is leaf? {lin1.is_leaf(cell_ID)}")
print(f"  is division? {lin1.is_division(cell_ID)}")

For cell 9105:
  is root? False
  is leaf? False
  is division? True

For cell 9401:
  is root? False
  is leaf? True
  is division? False


#### Cell cycle lineages

Cell cycle lineages, or cycle lineages for short, are like cell lineages except that a node models a whole cell cycle instead of a cell at a specific time point.

When creating a model or loading data for the first time, there is no cycle lineage data. You don't always need it: it depends on the goals of your exploration / analysis.  
If you need it, you can compute and add it with `add_cycle_data()`:


In [36]:
tm_model.add_cycle_data()

Similar to cell lineages, all cycle lineages are stored in `cycle_data`, a dictionary where keys are the IDs of the lineages, and values the lineages themselves (instances of the CycleLineage class):

In [37]:
# The dict of all cycle lineages.
tm_model.data.cycle_data

{0: <pycellin.classes.lineage.CycleLineage at 0x76a6eeea1ad0>,
 1: <pycellin.classes.lineage.CycleLineage at 0x76a6eee8bc10>,
 2: <pycellin.classes.lineage.CycleLineage at 0x76a6eef3b490>}

It is also possible to directly get a list of the cycle lineages:

In [38]:
cycle_lins = tm_model.get_cycle_lineages()
for lin in cycle_lins:
    print(lin)

CycleLineage of ID 0 with 37 cell cycles and 36 links.
CycleLineage of ID 1 with 57 cell cycles and 56 links.
CycleLineage of ID 2 with 51 cell cycles and 50 links.


To get a specific cycle lineage, identified by its `lineage_ID`:

In [39]:
# Via the dictionary of cycle lineages.
clin1 = tm_model.data.cycle_data[1]  # lineage_ID is 1
print(clin1)

# Via the method get_cycle_lineage_from_ID.
clin2 = tm_model.get_cycle_lineage_from_ID(2)  # lineage_ID is 2
print(clin2)

CycleLineage of ID 1 with 57 cell cycles and 56 links.
CycleLineage of ID 2 with 51 cell cycles and 50 links.


Cell lineages and cycle lineages share their `lineage_ID`. Thanks to that, it is easier to work with both cell and cycle lineages in parallel. A cell lineage and a cycle lineage with the same ID are just 2 different views of the same underlying lineage.

In [40]:
lin_ID = 0
lin0 = tm_model.data.cell_data[lin_ID]
clin0 = tm_model.data.cycle_data[lin_ID]
print(f"Lineage of ID {lin_ID} is made of {len(lin0)} cells and has {len(clin0)} cell cycles.")

Lineage of ID 0 is made of 152 cells and has 37 cell cycles.


Cycle lineages are view only: they cannot be modified. This is because cycle data must stay mapped onto cell data to ensure data consistency.

##### Plotting a cycle lineage

Cycle lineages can be plotted like cell lineages. However, while time still flows from top to bottom in the plot, it doesn't flow homogeneously between the different branches of the graph since cell cycles do not have the same length. Consequently, the y-axis represents the level of the cell cycle (i.e. how many cell cycles before the current cycle) instead of time.

In [41]:
clin1.plot()

In [42]:
lin_ID = lin1.graph["lineage_ID"]
clin1.plot(
    title=f"Cycle lineage of ID = {lin_ID}",  # title of the plot
    width=1500,  # width of the plot
    height=600,  # height of the plot
    node_text="cycle_ID",  # property to use as text for the nodes
    node_text_font={"color": "white"},  # style of the nodes text
    node_marker_style={"size": 35},  # style of the nodes markers
    node_colormap_prop="cycle_length",  # property to use to color the nodes
    node_color_scale="plotly3",  # color scale for the node colors
    node_hover_props=["cycle_ID", "cycle_length"],  # properties to display
)

##### Accessing data in a cycle lineage

In cycle lineages, cell cycles (nodes) are identified by their `cycle_ID`. The `cycle_ID` is the `cell_ID` of the last cell of the cell cycle, i.e. the division cell.

Otherwise, cycle lineages can be manipulated like cell lineages.

In [43]:
cycle_ID = 9022

# All properties stored in the cell cycle (node).
print(clin1.nodes[cycle_ID])
# Accessing a specific property value.
print(clin1.nodes[cycle_ID]["cells"])

{'cycle_ID': 9022, 'cells': [8997, 9002, 9009, 9022], 'cycle_length': 4, 'cycle_duration': 4, 'level': 1}
[8997, 9002, 9009, 9022]


For now, there is no properties stored in the cycle lineages links and only one property, the `lineage_ID`, in the lineage itself. But accessing the links and lineages properties dictionary is still possible:

In [44]:
# Properties for links between cell cycles (edges).
print(clin1.edges[cycle_ID, 9057])

# Properties for the cell cycle lineage (graph).
print(clin1.graph)
print(clin1.graph["lineage_ID"])

{}
{'lineage_ID': 1}
1


Like cell lineages, noteworthy cycle lineage elements can be retrieved. Some methods, like `get_divisions()` or `get_sister_cells()` do not make sense on a cycle lineage and are not defined. Others are used in the same way on cell or cycle lineages.

In [45]:
cycle_ID = 9158

print(f"First cycle: {clin1.get_root()}")
print(f"Last cycles: {clin1.get_leaves()}")

print(f"Ancestor cycles of cycle {cycle_ID}: {clin1.get_ancestors(cycle_ID)}")
print(f"Descendant cycles of cycle {cycle_ID}: {lin1.get_descendants(cycle_ID)}")

print(f"Cycle is root? {clin1.is_root(cycle_ID)}")
print(f"Cycle is leaf? {clin1.is_leaf(cycle_ID)}")

First cycle: 8992
Last cycles: [9386, 9387, 9391, 9393, 9396, 9399, 9400, 9401, 9403, 9404, 9411, 9421, 9422, 9423, 9425, 9426, 9427, 9442, 9452, 9454, 9455, 9458, 9459, 9466, 9468, 9472, 9474, 9478, 9503]
Ancestor cycles of cycle 9158: [8992, 9022, 9057]
Descendant cycles of cycle 9158: [9443, 9253, 9383, 9288, 9386, 9452, 9261, 9454, 9295, 9391, 9331, 9237, 9430, 9239, 9496, 9177, 9499, 9180]
Cycle is root? False
Cycle is leaf? False


### Declaration of properties

In Pycellin, a property is any data that can be stored into a lineage and that is related to part or all of the lineage (e.g. shape of a cell, velocity of a cell, number of divisions in a lineage...). The definition of a property (i.e. metadata of the property, like its name or units) is stored into a property object.

Here is an example of a mandatory Pycellin property, the `cell_ID`:

In [46]:
print(pycellin.graph.properties.core.create_cell_id_property())

Property 'cell_ID'
  Name: cell ID
  Description: Unique identifier of the cell
  Provenance: pycellin
  Type: node
  Lineage type: CellLineage
  Data type: int
  Unit: None


You can see that a property is defined by different fields:

**name**  
The name and identifier of the property. It must be **UNIQUE**: two properties cannot share the same name.

**description**  
A concise description of the property.

**provenance**  
Where does the property come from? For imported properties, Pycellin uses the name of the tool the data was imported from, like `TrackMate` or `CTC`. For Pycellin computed properties, Pycellin uses `Pycellin`.

**prop_type**  
The type of graph element the property applies to, either `node`, `edge`, `lineage`.

**lin_type**    
Either `CellLineage`, `CycleLineage` or just `Lineage` depending on which type of lineage your property is related to.

**dtype**   
The Python type of the property values, e.g. `int`, `bool`...

**unit**    
The unit of the property values, e.g. `µm`, `min`, `cell`...

The definition of the all the properties present in a given model is stored into a `PropsMetadata` object as a dictionary (`props`) whose keys are the name of the properties and the values the properties themselves:

In [47]:
print(tm_model.props_metadata)
print()
for prop in tm_model.props_metadata.props.values():
    print(f"{prop!r}")

Node properties: QUALITY, POSITION_T, RADIUS, VISIBILITY, MANUAL_SPOT_COLOR, MEAN_INTENSITY_CH1, MEDIAN_INTENSITY_CH1, MIN_INTENSITY_CH1, MAX_INTENSITY_CH1, TOTAL_INTENSITY_CH1, STD_INTENSITY_CH1, MEAN_INTENSITY_CH2, MEDIAN_INTENSITY_CH2, MIN_INTENSITY_CH2, MAX_INTENSITY_CH2, TOTAL_INTENSITY_CH2, STD_INTENSITY_CH2, CONTRAST_CH1, SNR_CH1, CONTRAST_CH2, SNR_CH2, ELLIPSE_X0, ELLIPSE_Y0, ELLIPSE_MAJOR, ELLIPSE_MINOR, ELLIPSE_THETA, ELLIPSE_ASPECTRATIO, AREA, PERIMETER, CIRCULARITY, SOLIDITY, SHAPE_INDEX, cell_name, cell_ID, cell_x, cell_y, cell_z, frame, ROI_coords, cycle_ID, cells, cycle_length, cycle_duration, level
Edge properties: SPOT_SOURCE_ID, SPOT_TARGET_ID, LINK_COST, DIRECTIONAL_CHANGE_RATE, SPEED, DISPLACEMENT, EDGE_TIME, MANUAL_EDGE_COLOR, link_x, link_y, link_z
Lineage properties: TRACK_INDEX, DIVISION_TIME_MEAN, DIVISION_TIME_STD, NUMBER_SPOTS, NUMBER_GAPS, NUMBER_SPLITS, NUMBER_MERGES, NUMBER_COMPLEX, LONGEST_GAP, TRACK_DURATION, TRACK_START, TRACK_STOP, TRACK_DISPLACEMENT, 

This might seem complicated but usually you don't need to manipulate the property declaration nor its properties dictionary. The properties present (i.e. declared) in a model are accessible via the model itself:

In [48]:
print(tm_model.get_properties().keys())

dict_keys(['QUALITY', 'POSITION_T', 'RADIUS', 'VISIBILITY', 'MANUAL_SPOT_COLOR', 'MEAN_INTENSITY_CH1', 'MEDIAN_INTENSITY_CH1', 'MIN_INTENSITY_CH1', 'MAX_INTENSITY_CH1', 'TOTAL_INTENSITY_CH1', 'STD_INTENSITY_CH1', 'MEAN_INTENSITY_CH2', 'MEDIAN_INTENSITY_CH2', 'MIN_INTENSITY_CH2', 'MAX_INTENSITY_CH2', 'TOTAL_INTENSITY_CH2', 'STD_INTENSITY_CH2', 'CONTRAST_CH1', 'SNR_CH1', 'CONTRAST_CH2', 'SNR_CH2', 'ELLIPSE_X0', 'ELLIPSE_Y0', 'ELLIPSE_MAJOR', 'ELLIPSE_MINOR', 'ELLIPSE_THETA', 'ELLIPSE_ASPECTRATIO', 'AREA', 'PERIMETER', 'CIRCULARITY', 'SOLIDITY', 'SHAPE_INDEX', 'cell_name', 'SPOT_SOURCE_ID', 'SPOT_TARGET_ID', 'LINK_COST', 'DIRECTIONAL_CHANGE_RATE', 'SPEED', 'DISPLACEMENT', 'EDGE_TIME', 'MANUAL_EDGE_COLOR', 'TRACK_INDEX', 'DIVISION_TIME_MEAN', 'DIVISION_TIME_STD', 'NUMBER_SPOTS', 'NUMBER_GAPS', 'NUMBER_SPLITS', 'NUMBER_MERGES', 'NUMBER_COMPLEX', 'LONGEST_GAP', 'TRACK_DURATION', 'TRACK_START', 'TRACK_STOP', 'TRACK_DISPLACEMENT', 'TRACK_MEAN_SPEED', 'TRACK_MAX_SPEED', 'TRACK_MIN_SPEED', 'TRAC

A few handy methods allow to get a specific subset of declared properties:

In [49]:
print(f"Node properties: {tm_model.get_node_properties().keys()}")
print(f"Edge properties: {tm_model.get_edge_properties().keys()}")
print(f"Lineage properties: {tm_model.get_lineage_properties().keys()}\n")

print(f"Cell lineage properties: {tm_model.get_cell_lineage_properties().keys()}")
print(f"Cycle lineage properties: {tm_model.get_cycle_lineage_properties().keys()}")

Node properties: dict_keys(['QUALITY', 'POSITION_T', 'RADIUS', 'VISIBILITY', 'MANUAL_SPOT_COLOR', 'MEAN_INTENSITY_CH1', 'MEDIAN_INTENSITY_CH1', 'MIN_INTENSITY_CH1', 'MAX_INTENSITY_CH1', 'TOTAL_INTENSITY_CH1', 'STD_INTENSITY_CH1', 'MEAN_INTENSITY_CH2', 'MEDIAN_INTENSITY_CH2', 'MIN_INTENSITY_CH2', 'MAX_INTENSITY_CH2', 'TOTAL_INTENSITY_CH2', 'STD_INTENSITY_CH2', 'CONTRAST_CH1', 'SNR_CH1', 'CONTRAST_CH2', 'SNR_CH2', 'ELLIPSE_X0', 'ELLIPSE_Y0', 'ELLIPSE_MAJOR', 'ELLIPSE_MINOR', 'ELLIPSE_THETA', 'ELLIPSE_ASPECTRATIO', 'AREA', 'PERIMETER', 'CIRCULARITY', 'SOLIDITY', 'SHAPE_INDEX', 'cell_name', 'cell_ID', 'cell_x', 'cell_y', 'cell_z', 'frame', 'ROI_coords', 'cycle_ID', 'cells', 'cycle_length', 'cycle_duration', 'level'])
Edge properties: dict_keys(['SPOT_SOURCE_ID', 'SPOT_TARGET_ID', 'LINK_COST', 'DIRECTIONAL_CHANGE_RATE', 'SPEED', 'DISPLACEMENT', 'EDGE_TIME', 'MANUAL_EDGE_COLOR', 'link_x', 'link_y', 'link_z'])
Lineage properties: dict_keys(['TRACK_INDEX', 'DIVISION_TIME_MEAN', 'DIVISION_TIME_

## Managing properties

*Just the basics here, ref the properties notebooks*  
[Managing properties](./Managing%20properties.ipynb)  
[Custom properties](Custom%20properties.ipynb)

Make a table of properties, mandatory, prop type, lin type
- cell_ID
- frame
- lineage_ID

Like previous section, we are using the TrackMate model as example.

In [50]:
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


## Modification of the lineages

Depending on what you want to do, you may need to manually modify a cell lineage: to remove fusion events, to correct tracking mistakes... or even to totally delete the lineage. Here we cover all the methods that allows you to modify the lineages topology.

### Adding and removing a lineage

To add a lineage to a model, we can first create a new lineage.

In [51]:
# Creation of the new and empty lineage.
new_lin = pycellin.CellLineage(lid=4)  # a lineage_ID is mandatory here
# Adding the lineage to the model.
new_lin_ID = tm_model.add_lineage(new_lin)
print(tm_model.get_cell_lineage_from_ID(new_lin_ID))

CellLineage of ID 4 with 0 cells and 0 links.


Or let Pycellin create the empty lineage by itself:

In [52]:
new_lin_ID = tm_model.add_lineage()
print(tm_model.get_cell_lineage_from_ID(new_lin_ID))

CellLineage of ID 5 with 0 cells and 0 links.


Or just specify a lineage ID:

In [53]:
new_lin_ID = tm_model.add_lineage(lid=10)
print(tm_model.get_cell_lineage_from_ID(new_lin_ID))

CellLineage of ID 10 with 0 cells and 0 links.


To remove a lineage from a model, just specify the lineage_ID to `remove_lineage()`. The removed lineage is returned in case it is needed later.

In [54]:
removed_lin = tm_model.remove_lineage(10)
print(removed_lin)

CellLineage of ID 10 with 0 cells and 0 links.


### Adding and removing cells

To add a cell, simply call the `add_cell()` method:

In [55]:
# Adding a cell with ID 1000 to lineage 1 at frame 22.
tm_model.add_cell(
    lid=1,
    cid=1000,
    frame=22,
)

1000

If no `cell_ID` is given, Pycellin will use the next available `cell_ID` and return the value. If no `frame` is given, Pycellin will automatically put the cell at frame 0.

In [56]:
# Adding a new cell.
new_cell_ID = tm_model.add_cell(
    lid=1,
)
print(f"New cell ID: {new_cell_ID}")

# Then adding some properties that can't be computed to it.
lin1.nodes[new_cell_ID]["cell_x"] = 1.5
lin1.nodes[new_cell_ID]["cell_y"] = 2.5
lin1.nodes[new_cell_ID]["cell_z"] = 0

New cell ID: 9509


Properties values can also be assigned at cell creation with the `prop_values` argument, a dictionary with property names as key and property values as values:

In [57]:
# Adding a new cell with some property values.
new_cell_ID = tm_model.add_cell(
    lid=1,
    frame=1,
    prop_values={"cell_x": 1, "cell_y": 2, "cell_z": 0},
)
print(lin1.nodes[new_cell_ID])

{'cell_x': 1, 'cell_y': 2, 'cell_z': 0, 'cell_ID': 9510, 'frame': 1}


To remove a cell, the `remove_cell()` method needs the ID of the cell to remove as well as the lineage ID. The properties dictionary of the removed node is returned.

In [58]:
tm_model.remove_cell(cid=9509, lid=1)

{'cell_ID': 9509, 'frame': 0, 'cell_x': 1.5, 'cell_y': 2.5, 'cell_z': 0}

### Adding and removing links

Two cells can be linked with the `add_link()` method. It requires the ID of the 2 cells as well as the ID(s) of the lineage(s) the cells belong to. Like cells, properties values can be passed as a dictionary.

In [59]:
# Linking two cells of the same lineage.
tm_model.add_link(
    source_cid=9503,
    source_lid=1,
    target_cid=1000,
    target_lid=1,  # can be omitted if the same as source
    prop_values={"link_x": 0.5, "link_y": 0.5, "link_z": 0},  # optional
)

In [60]:
# Linking two cells from different lineages.
# First we need to add a cell to one of our previously created lineage.
new_cell_ID = tm_model.add_cell(lid=4, frame=1)
print(new_cell_ID)

tm_model.add_link(
    source_cid=8985,
    source_lid=1,
    target_cid=new_cell_ID,
    target_lid=4,
)

0


In the example above, lineage 4 only had one cell, of ID 0. When linking the cell 0, it was removed for lineage 4 and transferred to the source lineage, lineage 1. Lineage 4 is now empty and could be deleted.

We can visualize the modifications done on lineage 1 by plotting it:

In [61]:
lin1.plot()

Note that you cannot link 2 cells as you want. First, the link is directed to respect the flow of time. The source cell cannot have a time value (`frame` property) greater than the target cell or a `TimeFlowError` will be raised.

Second, creating a link cannot lead to a fusion event since a cell cannot have more than one parent. If it happens, a `FusionError` will be raised.

To remove a cell, specify the IDs of the cells involved in the link and the lineage ID of the cells to `remove_link()`. Like `remove_cell()`, the method returns the properties dictionary of the removed link.

In [62]:
# We remove the link we created above.
tm_model.remove_link(source_cid=8985, target_cid=0, lid=1)

{}

### Updating the model

Once you have finished modifying the lineages, you need to update it as seen in the [Managing properties](#managing-properties) section. The update will recompute the properties as needed, as well as the cycle lineages if they were added to the model.

In [63]:
tm_model.update()

As stated before, cycle lineages are read only: you cannot directly add nor remove cell cycles and links. The modifications must be done on the cell lineages. The update of the model will propagate these changes to the cycle lineages. 

## Export

### Pickled Pycellin model

You can save a model on disk anytime with the `save_to_pickle()` method:

In [64]:
my_model.save_to_pickle("../sample_data/results/FakeTracks_saved.pickle")

A Pycellin model is a complex Python object that can be serialized. Pickling a model is a lossless way to save the model on disk for later use.

However, as stated in [Loading from a Pycellin pickle file](#Loading-from-a-Pycellin-pickle-file), `pickle` module is not secure. Malicious code could be executed when unpickling a file from an unknown source. Because of this safety issue, pickle is not the preferred format for sharing a model with the community.

Indeed, the intended use of `save_to_pickle()` and `load_from_pickle()` is to allow you to save your model whenever you want and to be able to resume working on it at a later time or in another Python session. 

### Tables: DataFrames and CSVs

Lineage data from a Pycellin model can be exported into [Pandas](https://pandas.pydata.org/) [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) and saved as comma-separated values (CSV) files.  

Each type of lineage elements generates a different kind of table, as reviewed below.

#### Cells table

A cell table contains all the properties stored in the **cells (nodes) of the cell lineages** of the model. There is one row per cell and one column per cell property. The DataFrame is ordered by increasing `lineage_ID` first, then by `frame`, then by `cell_ID`.

In [65]:
cell_df = tm_model.to_cell_dataframe()
print(cell_df.shape)
print(cell_df.columns)

(529, 40)
Index(['lineage_ID', 'frame', 'cell_ID', 'STD_INTENSITY_CH1', 'SOLIDITY',
       'STD_INTENSITY_CH2', 'QUALITY', 'POSITION_T', 'TOTAL_INTENSITY_CH2',
       'TOTAL_INTENSITY_CH1', 'CONTRAST_CH1', 'ELLIPSE_MINOR', 'ELLIPSE_THETA',
       'ELLIPSE_Y0', 'CIRCULARITY', 'AREA', 'ELLIPSE_MAJOR', 'CONTRAST_CH2',
       'MEAN_INTENSITY_CH1', 'MAX_INTENSITY_CH2', 'MEAN_INTENSITY_CH2',
       'MAX_INTENSITY_CH1', 'MIN_INTENSITY_CH2', 'MIN_INTENSITY_CH1',
       'SNR_CH1', 'ELLIPSE_X0', 'SHAPE_INDEX', 'SNR_CH2',
       'MEDIAN_INTENSITY_CH1', 'VISIBILITY', 'RADIUS', 'MEDIAN_INTENSITY_CH2',
       'ELLIPSE_ASPECTRATIO', 'PERIMETER', 'ROI_N_POINTS', 'ROI_coords',
       'cell_name', 'cell_x', 'cell_y', 'cell_z'],
      dtype='object')


In [66]:
cell_df.head()

Unnamed: 0,lineage_ID,frame,cell_ID,STD_INTENSITY_CH1,SOLIDITY,STD_INTENSITY_CH2,QUALITY,POSITION_T,TOTAL_INTENSITY_CH2,TOTAL_INTENSITY_CH1,...,RADIUS,MEDIAN_INTENSITY_CH2,ELLIPSE_ASPECTRATIO,PERIMETER,ROI_N_POINTS,ROI_coords,cell_name,cell_x,cell_y,cell_z
0,-9510,1,9510,,,,,,,,...,,,,,,,,1.0,2.0,0.0
1,-3,1,3,,,,,,,,...,,,,,,,,,,
2,0,0,8993,6.578459,0.854065,5.656838,601.0,0.0,70098.0,26356.0,...,0.921242,114.0,6.70309,10.285699,52.0,"[(-1.5894197951076556, 1.1551815744407925), (-...",ID8993,15.389186,19.363325,0.0
3,0,1,9013,2.968181,0.916279,5.726043,293.0,5.0,34098.0,4245.0,...,0.638839,115.0,4.279183,5.944787,30.0,"[(-0.727319272545234, 0.8544524671426856), (-0...",ID9013,16.832535,18.214914,0.0
4,0,1,9014,0.0,0.881818,5.309714,291.0,5.0,33523.0,4350.0,...,0.633956,115.0,3.903724,5.74794,34.0,"[(-0.6141453589387584, 0.5208483829178867), (-...",ID9014,14.282171,19.470698,0.0


You can then process the DataFrame however you want and save it as a CSV file.

In [67]:
filename = tm_model.model_metadata["name"] + "_cell_df.csv"
cell_df.to_csv("../sample_data/results/" + filename, index=False)

#### Links table

A link table contains all the properties stored in the **links (edges) of the cell lineages** of the model. There is one row per link and one column per link property. The DataFrame is ordered by increasing `lineage_ID`.

In [68]:
link_df = tm_model.to_link_dataframe()
print(link_df.shape)
print(link_df.columns)

(524, 13)
Index(['lineage_ID', 'source_cell_ID', 'target_cell_ID', 'DISPLACEMENT',
       'link_z', 'DIRECTIONAL_CHANGE_RATE', 'SPOT_SOURCE_ID', 'SPEED',
       'EDGE_TIME', 'SPOT_TARGET_ID', 'LINK_COST', 'link_y', 'link_x'],
      dtype='object')


In [69]:
link_df.head()

Unnamed: 0,lineage_ID,source_cell_ID,target_cell_ID,DISPLACEMENT,link_z,DIRECTIONAL_CHANGE_RATE,SPOT_SOURCE_ID,SPEED,EDGE_TIME,SPOT_TARGET_ID,LINK_COST,link_y,link_x
0,0,9216.0,9290.0,0.437667,0.0,0.5568,9216.0,0.087533,92.5,9290.0,0.3979,18.700692,15.776474
1,0,9218.0,9294.0,1.422631,0.0,0.608195,9218.0,0.284526,92.5,9294.0,0.467842,15.808225,20.106089
2,0,9222.0,9306.0,0.631321,0.0,0.055708,9222.0,0.126264,92.5,9306.0,0.403972,22.138865,11.621851
3,0,9223.0,9293.0,1.365206,0.0,0.606456,9223.0,0.273041,92.5,9293.0,0.491012,15.591736,18.851734
4,0,9227.0,9308.0,2.009642,0.0,0.011716,9227.0,0.401928,92.5,9308.0,0.616573,12.818797,22.279749


In [70]:
filename = tm_model.model_metadata["name"] + "_link_df.csv"
link_df.to_csv("../sample_data/results/" + filename, index=False)

#### Cell cycles table

A cell cycle table contains all the properties stored in the **cell cycles (nodes) of the cycle lineages** of the model. There is one row per cell cycle and one column per cell cycle property. The DataFrame is ordered by increasing `lineage_ID` first, then by `level`, then by `cycle_ID`.

In [71]:
cycle_df = tm_model.to_cycle_dataframe()
print(cycle_df.shape)
print(cycle_df.columns)

(147, 6)
Index(['lineage_ID', 'level', 'cycle_ID', 'cells', 'cycle_length',
       'cycle_duration'],
      dtype='object')


In [72]:
cycle_df.head()

Unnamed: 0,lineage_ID,level,cycle_ID,cells,cycle_length,cycle_duration
0,-9510,0,9510,[9510],1,1
1,-3,0,3,[3],1,1
2,0,0,8993,[8993],1,1
3,0,1,9019,"[9013, 8989, 8996, 9001, 9010, 9019]",6,6
4,0,1,9020,"[9014, 8986, 8988, 8999, 9008, 9020]",6,6


In [73]:
filename = tm_model.model_metadata["name"] + "_cycle_df.csv"
cycle_df.to_csv("../sample_data/results/" + filename, index=False)

#### Lineages table

A lineage table contains all the properties stored in the **cell lineages (graphs)** of the model. There is one row per lineage and one column per lineage property. The DataFrame is ordered by increasing `lineage_ID`.

In [74]:
lin_df = tm_model.to_lineage_dataframe()
print(lin_df.shape)
print(lin_df.columns)

(5, 31)
Index(['lineage_ID', 'TRACK_INDEX', 'DIVISION_TIME_MEAN', 'DIVISION_TIME_STD',
       'NUMBER_SPOTS', 'NUMBER_GAPS', 'NUMBER_SPLITS', 'NUMBER_MERGES',
       'NUMBER_COMPLEX', 'LONGEST_GAP', 'TRACK_DURATION', 'TRACK_START',
       'TRACK_STOP', 'TRACK_DISPLACEMENT', 'TRACK_MEAN_SPEED',
       'TRACK_MAX_SPEED', 'TRACK_MIN_SPEED', 'TRACK_MEDIAN_SPEED',
       'TRACK_STD_SPEED', 'TRACK_MEAN_QUALITY', 'TOTAL_DISTANCE_TRAVELED',
       'MAX_DISTANCE_TRAVELED', 'CONFINEMENT_RATIO',
       'MEAN_STRAIGHT_LINE_SPEED', 'LINEARITY_OF_FORWARD_PROGRESSION',
       'MEAN_DIRECTIONAL_CHANGE_RATE', 'lineage_name', 'lineage_x',
       'lineage_y', 'lineage_z', 'FilteredTrack'],
      dtype='object')


In [75]:
lin_df.head()

Unnamed: 0,lineage_ID,TRACK_INDEX,DIVISION_TIME_MEAN,DIVISION_TIME_STD,NUMBER_SPOTS,NUMBER_GAPS,NUMBER_SPLITS,NUMBER_MERGES,NUMBER_COMPLEX,LONGEST_GAP,...,MAX_DISTANCE_TRAVELED,CONFINEMENT_RATIO,MEAN_STRAIGHT_LINE_SPEED,LINEARITY_OF_FORWARD_PROGRESSION,MEAN_DIRECTIONAL_CHANGE_RATE,lineage_name,lineage_x,lineage_y,lineage_z,FilteredTrack
0,-9510,1,15.37037,7.195876,189,0,28,0,0,0,...,24.429157,0.045111,0.093155,0.385491,0.23933,Track_1,11.299503,20.138499,0.0,True
1,-3,1,15.37037,7.195876,189,0,28,0,0,0,...,24.429157,0.045111,0.093155,0.385491,0.23933,Track_1,11.299503,20.138499,0.0,True
2,0,0,20.9375,5.543389,152,0,18,0,0,0,...,21.091779,0.02115,0.034013,0.145166,0.22293,Track_0,16.866163,17.21004,0.0,True
3,1,1,15.37037,7.195876,189,0,28,0,0,0,...,24.429157,0.045111,0.093155,0.385491,0.23933,Track_1,11.299503,20.138499,0.0,True
4,2,2,20.652174,9.083495,185,0,25,0,0,0,...,16.092759,0.048998,0.085048,0.409802,0.242049,Track_2,32.125552,36.839327,0.0,True


In [76]:
filename = tm_model.model_metadata["name"] + "_lin_df.csv"
lin_df.to_csv("../sample_data/results/" + filename, index=False)

#### Exporting a subset of lineages

For each of the 4 methods above, `to_xxx_dataframe()`, all lineages in the model are processed. If you are interested in just a subset of lineages, you can choose the lineages by passing a list of the `lineage_ID`:

In [77]:
# Cell dataframe of lineage of ID = 0.
lin0_cell_df = tm_model.to_cell_dataframe([0])
print(lin0_cell_df.shape)
lin0_cell_df.head()

(152, 40)


Unnamed: 0,lineage_ID,frame,cell_ID,STD_INTENSITY_CH1,SOLIDITY,STD_INTENSITY_CH2,QUALITY,POSITION_T,TOTAL_INTENSITY_CH2,TOTAL_INTENSITY_CH1,...,RADIUS,MEDIAN_INTENSITY_CH2,ELLIPSE_ASPECTRATIO,PERIMETER,ROI_N_POINTS,ROI_coords,cell_name,cell_x,cell_y,cell_z
0,0,0,8993,6.578459,0.854065,5.656838,601.0,0.0,70098.0,26356.0,...,0.921242,114.0,6.70309,10.285699,52,"[(-1.5894197951076556, 1.1551815744407925), (-...",ID8993,15.389186,19.363325,0.0
1,0,1,9013,2.968181,0.916279,5.726043,293.0,5.0,34098.0,4245.0,...,0.638839,115.0,4.279183,5.944787,30,"[(-0.727319272545234, 0.8544524671426856), (-0...",ID9013,16.832535,18.214914,0.0
2,0,1,9014,0.0,0.881818,5.309714,291.0,5.0,33523.0,4350.0,...,0.633956,115.0,3.903724,5.74794,34,"[(-0.6141453589387584, 0.5208483829178867), (-...",ID9014,14.282171,19.470698,0.0
3,0,2,8986,0.0,0.915761,5.841083,338.0,10.0,39655.0,1014.0,...,0.682225,117.0,4.359176,6.355476,31,"[(-0.6357009198638028, 0.43737422107603763), (...",ID8986,13.908507,19.290692,0.0
4,0,2,8989,0.238197,0.842262,5.303366,288.0,10.0,32526.0,1124.0,...,0.625181,115.0,5.28734,6.341672,35,"[(-0.801263212391051, 0.8421507178785816), (-0...",ID8989,16.774739,18.029605,0.0


In [78]:
# Lineage dataframe of lineages of ID 0 and 2.
subset_lin_df = tm_model.to_lineage_dataframe([0, 2])
print(subset_lin_df.shape)
subset_lin_df.head()

(2, 31)


Unnamed: 0,lineage_ID,TRACK_INDEX,DIVISION_TIME_MEAN,DIVISION_TIME_STD,NUMBER_SPOTS,NUMBER_GAPS,NUMBER_SPLITS,NUMBER_MERGES,NUMBER_COMPLEX,LONGEST_GAP,...,MAX_DISTANCE_TRAVELED,CONFINEMENT_RATIO,MEAN_STRAIGHT_LINE_SPEED,LINEARITY_OF_FORWARD_PROGRESSION,MEAN_DIRECTIONAL_CHANGE_RATE,lineage_name,lineage_x,lineage_y,lineage_z,FilteredTrack
0,0,0,20.9375,5.543389,152,0,18,0,0,0,...,21.091779,0.02115,0.034013,0.145166,0.22293,Track_0,16.866163,17.21004,0.0,True
1,2,2,20.652174,9.083495,185,0,25,0,0,0,...,16.092759,0.048998,0.085048,0.409802,0.242049,Track_2,32.125552,36.839327,0.0,True


### Tracking formats

Pycellin can export data to different external tracking file formats. It currently supports:
- Cell Tracking Challenge text files
- TrackMate XML files

More tracking file formats will be supported in the future.

#### Cell Tracking Challenge

Simply use the `export_CTC_file()` function to export a model to a CTC compatible text file:

In [79]:
filename = my_model.model_metadata["name"] + "_CTC_export.txt"
ctc_out = "../sample_data/results/" + filename
pycellin.export_CTC_file(my_model, ctc_out)

The CTC file format does not support properties: only the topology of the lineages is exported. Moreover, `lineage_ID` and `cell_ID` are not carried over.

See the [CTC file format specifications](https://public.celltrackingchallenge.net/documents/Naming%20and%20file%20content%20conventions.pdf) if you want more information on the format.

#### TrackMate

A Pycellin model is exported as a TrackMate XML file with the function `export_TrackMate_XML()`. However, TrackMate requires properties to be given in a unique temporal and a unique spatial unit while Pycellin properties can each have different units. So you need to specify the space and time units for TrackMate and to ensure that your units are consistent across the Pycellin properties of the model.

In [80]:
filename = my_model.model_metadata["name"] + "_TrackMate_export.xml"
xml_out = "../sample_data/results/" + filename
pycellin.export_TrackMate_XML(my_model, xml_out, {"spatialunits": "pixel", "temporalunits": "sec"})



Properties added to the model with Pycellin will be carried over to TrackMate if the data type of the property is integer, float or boolean. `lineage_ID` and `cell_ID` are converted into their TrackMate equivalent: `TRACK_ID` and `SPOT_ID` respectively.

For a more detailed explanation, please refer to the dedicated notebook: [Working with TrackMate data](./Working%20with%20TrackMate%20data.ipynb).