# Getting started

In this notebook we cover the basics of Pycellin:
- how to obtain a Pycellin model,
- how Pycellin models cell tracking data,
- how to manipulate, modify and enrich the cell lineages,
- how to export the data to other formats.

In [1]:
import pycellin as pc

## How to get a Pycellin model

### Building from scratch

It is possible to build a Pycellin model manually, starting with an empty model.

In [2]:
my_model = pc.Model()
print(my_model)
# The model is completely empty: no metadata, no features declaration, no lineages.
print(my_model.__repr__())

Empty model.
Model(metadata=None, feat_declaration=None, data=None)


You can then fill up the metadata and build cell lineages by adding cells and links. This is covered in details in the notebook [Creating a model from scratch](./Creating%20a%20model%20from%20scratch.ipynb) (WIP).

### Loading from a Pycellin pickle file

You can load a model from a Pycellin pickle file saved on disk with the `load_from_pickle()` method:

In [3]:
pycellin_pickle = "../sample_data/FakeTracks.pickle"
my_model = pc.Model.load_from_pickle(pycellin_pickle)
print(my_model)

Model named 'FakeTracks' with 2 lineages, built from TrackMate.


However, please note that while `pickle` is a module of the Python Standard Library, is it not secure. **You should only load data from sources you trust.** Please refer to the [documentation of the `pickle` module](https://docs.python.org/3/library/pickle.html) for more information.

### Loading from external tools

Pycellin can load data from different tracking file formats. It currently supports:
- TrackMate XML files
- Cell Tracking Challenge (CTC) text files

More tracking file formats will be supported in the future.

#### TrackMate



Data generated with [TrackMate](https://imagej.net/plugins/trackmate/) can be loaded into a Pycellin model thanks to the `load_TrackMate_XML()` function:

In [4]:
trackmate_xml = "../sample_data/Ecoli_growth_on_agar_pad.xml"
tm_model = pc.load_TrackMate_XML(trackmate_xml)
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


All data within the TrackMate XML file is loaded into the Pycellin model: there is no loss of information. Most notably, TrackMate features are accessible within Pycellin under the same name (e.g. AREA, MEAN_INTENSITY_CH1).

To know more about the specifics of using Pycellin with TrackMate data, please refer to the dedicated notebook: [Pycellin with TrackMate](./Pycellin%20with%20TrackMate.ipynb).

#### Cell Tracking Challenge

Tracking data formatted as per the [Cell Tracking Challenge](https://celltrackingchallenge.net/) file format [specifications](https://public.celltrackingchallenge.net/documents/Naming%20and%20file%20content%20conventions.pdf) can be loaded with
the `load_CTC_file()` function:

In [5]:
# FIXME: need to update the loader
# ctc_file = ""
# ctc_model = pc.load_CTC_file(ctc_file)
# print(ctc_model)

For this format, only track topology is read: no cell segmentation is extracted in the case of associated label images (this might get supported later if there is a need).

## Pycellin model structure

In this section, we are using the TrackMate model previously loaded as example.

In [6]:
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


A Pycellin model consists of 3 different elements that we describe in separate sections below:
- the metadata of the model
- the data, i.e. the lineages
- the declaration of the features present in the lineages

*simplified scheme of a model*

### Metadata

Metadata holds information about the model and the data of the model.

In Pycellin, the metadata is versatile and stored as a dictionnary. It is accessible by calling `metadata` on your model:

In [7]:
for metadata_field in tm_model.metadata:
    print(metadata_field)

name
file_location
provenance
date
space_unit
time_unit
pycellin_version
TrackMate_version
time_step
pixel_size
Log
Settings
GUIState
DisplaySettings


In [8]:
print(tm_model.metadata["provenance"], tm_model.metadata["TrackMate_version"])

TrackMate 7.10.2


Some metadata fields are common to most Pycellin models, like `provenance`, `date` or `space_unit`. Others are specific to TrackMate models like `TrackMate_version` or `Log`.

The more information you store in metadata, the better it is for traceability. For example, you could describe the different channels of your timelapse, or list some image acquisition parameters:

In [9]:
tm_model.metadata["channel1"] = "segmentation"
tm_model.metadata["channel2"] = "ZipA-mCherry"
tm_model.metadata["objective"] = "100x oil"

### Data

In Pycellin, lineages are modeled as directed acyclic graphs. It means that splitting events (i.e. divisions) are allowed, they even are recommended if you want to take full advantage of Pycellin. **However, MERGING EVENTS (i.e. fusions) ARE NOT SUPPORTED.** If you try to use Pycellin on a lineage with merging events, it may crash or produce incorrect results, especially if you are computing features related to tracking.

Below is an example of a dataset with fusions. If Pycellin detects fusions, a warning will be raised and the cells involved in the fusions will be displayed. It is then up to you to decide if you want to proceed or to correct the fusions.

In [10]:
trackmate_xml_fusions = "../sample_data/Ecoli_growth_on_agar_pad_with_fusions.xml"
fusion_model = pc.load_TrackMate_XML(trackmate_xml_fusions)

Fusions location:
  Lineage 0 => cell IDs: [9065]
  Lineage 1 => cell IDs: [9232, 9257]


You can manually check for fusions whenever you want with the `check_for_fusions()` `Model` method:

In [11]:
fusion_model.check_for_fusions()

[Cell(cell_ID=9065, lineage_ID=0),
 Cell(cell_ID=9232, lineage_ID=1),
 Cell(cell_ID=9257, lineage_ID=1)]

`Data` can contain cell lineages data and cycle lineages data.

In [23]:
print(tm_model.data)

Data object with 3 cell lineages and 3 cycle lineages.


#### Cell lineages

In [19]:
tm_model.data.cell_data

{0: <pycellin.classes.lineage.CellLineage at 0x7f8eac831210>,
 1: <pycellin.classes.lineage.CellLineage at 0x7f8eabdb4950>,
 2: <pycellin.classes.lineage.CellLineage at 0x7f8eab1a8250>}

In [20]:
print(tm_model.data.cell_data[0])

CellLineage of ID 0 named Track_0 with 152 cells and 151 links.


#### Cell cycle lineages

In [24]:
tm_model.add_cycle_data()

ValueError: A Feature called cycle_ID already exists in node features.

In [21]:
tm_model.data.cycle_data

{0: <pycellin.classes.lineage.CycleLineage at 0x7f8eaa9863d0>,
 1: <pycellin.classes.lineage.CycleLineage at 0x7f8eab5dc550>,
 2: <pycellin.classes.lineage.CycleLineage at 0x7f8eaa9900d0>}

In [22]:
print(tm_model.data.cycle_data[0])

CycleLineage of ID 0 with 37 cycles and 36 links.


### Declaration of Features

The declaration of features is a Python object, an instance of the class `FeaturesDeclaration`. It holds all the information about the features that have been or will be computed on your data: name, description, units... 

In [None]:
# TODO: to finish when the features dict will be updated

Feature names must be unique: two features cannot share the same name. However, the same feature can apply to different lineage elements. For example, the `lineage_ID` feature is a lineage feature, but also a node feature to ease processing.

## Modification of the lineages


*add node, edge, lineage...*  
*remove...*

## Managing features

*Just the basics here, ref the features notebooks*  
[Managing features](./Managing%20features.ipynb)  
[Advanced - Custom features](Advanced%20-%20Custom%20features.ipynb)

Like previous section, we are using the TrackMate model as example.

In [8]:
print(tm_model)

Model named 'Ecoli_growth_on_agar_pad' with 3 lineages, built from TrackMate.


## Export

### Pickled Pycellin model

You can save a model on disk anytime with the `save_to_pickle()` method:

In [9]:
my_model.save_to_pickle("../sample_data/FakeTracks_saved.pickle")

A Pycellin model is a complex Python object that can be serialized. Pickling a model is a lossless way to save the model on disk for later use.

However, as stated in [Loading from a Pycellin pickle file](#Loading-from-a-Pycellin-pickle-file), `pickle` module is not secure. Malicious code could be executed when unpickling a file from an unknown source. Because of this safety issue, pickle is not the preferred format for sharing a model with the community.

Indeed, the intended use of `save_to_pickle()` and `load_from_pickle()` is to allow you to save your model whenever you want and to be able to resume working on it at a later time or in another Python session. 

### Tables: dataframes and CSVs

#### Cells table

#### Links table

#### Cell cycles table

#### Lineages table

### Tracking formats

#### TrackMate

#### Cell Tracking Challenge