# Loading and plotting data in lineages made with TrackMate

In this example, we will see how to extract numerical data from a TrackMate file and plot it interactively. 
This is a simple notebook, that will introduce the use of plotly to explore that data stored in a pycellin lineage.

To run this notebook, create or use a Python environment with pycellin and dependencies for the jupyter.
For instance:
``` bash
> conda create -n pycellin python="3.10" 
> conda activate pycellin
> pip install pycellin
> pip install ipykernel
> pip install --upgrade nbformat
```
and restart the kernel.

The data used in this demo contain the tracking of an early _C.elegans_ embryo, imaged in 3D for 37 minutes just after the first cell division (zygote -> AB & P1).

<video controls>
  <source src="imgs/movie.mp4" type="video/mp4" style="width: 100%; max-width: 300px;">
  Your browser does not support the video tag.
</video>

We want to plot various cell numerical features over time. 
We will see first a rather verbose way of plotting feature values, along with the hierarchy of the lineage. The data points will be connected by a line if the cells are linked in the lineage graph. 
In a second step will rely on pycellin to export all feature values to a Pandas dataframe. This can then be used to very quickly plot feature using Plotly express.


Cell tracking was done with TrackMate, on a 3D+T movie. 
Let's load and display the lineage with pycellin.

In [1]:
import pycellin

xml_path = "imgs/Celegans-5pc-17timepoints.xml"
model = pycellin.load_TrackMate_XML(xml_path)

# Get the lineages.
cell_lins = model.get_cell_lineages()
for lin in cell_lins:
    print(lin)

CellLineage of ID 0 named AB with 25 cells and 24 links.
CellLineage of ID 1 named P1 with 28 cells and 27 links.
CellLineage of ID 2 named PB1 with 17 cells and 16 links.
CellLineage of ID 3 named PB2 with 17 cells and 16 links.


We find 4 lineages. In the _C.elegans_ embryo:
- PB1 and PB2 are the polar bodies. They don't divide. By convention, PB1 is the one that does not move and is used to identify the anterior side of the embryo.
- The AB cell primarily gives rise to ectodermal tissues, including the nervous system, hypodermis (skin), and parts of the pharynx. When P0 divides in AB and P1, AB goes on the anterior side.
- The P1 lineage primarily contributes to the mesodermal and endodermal tissues, as well as the germline. This cell goes on the posterior side.

Let's plot the lineages that were tracked.

pycellin includes a `plot()` function that can directly diplay the lineages hierarchy.
These plots reproduce what is shown in TrackScheme: The position of cells is discarded, and they are simply shown with their relationship over time. The time is in the Y axis, from top to bottom. The X axis is simply used to stack sibblings.

pycellin generates an interactive plot thanks to plotly. You can hover the mouse over a node to show information. Additionnally, you can specify a feature to map the node color on a color map.
In the example below we show the cell name and radius in the hover, and use the radius to generate node colors.


In [2]:
for lin in cell_lins:
    first_cell = lin.get_root()
    lineage_name = lin.nodes[first_cell]["name"]
    lin.plot(
        title=f"{lineage_name} lineage",
        node_colormap_feature="RADIUS", 
        node_hover_features=["name", "RADIUS"], 
        node_color_scale="jet",
        node_marker_style=dict(
            size=10,
            symbol="circle",
        ),  # style of the nodes
        edge_line_style=dict(
            color="black",
            width=1,
            dash="solid",
        ), 
        plot_bgcolor="white",
    )

Let's focus on the AB lineage and plot the radius of cells as they move and divide.

Because we have 3 cell divisions, we cannot simply collect the time and radius of all cells in the lineage and plot them as a single line. The solution is to loop over edges and plot them one by one.

In [3]:
# Search for the AB lineage
for lin in cell_lins:
    first_cell = lin.get_root()
    lineage_name = lin.nodes[first_cell]["name"]

    if lineage_name == 'AB':
        ab_lineage = lin
        ab_lineage_ID = lin.graph["lineage_ID"]
        break


# Let's create a figure with plotly, that is already shipped with pycellin.
import plotly.graph_objects as go

fig = go.Figure()

# Loop over the edges of the lineage and add a line for each edge.
for source, target in ab_lineage.edges():
    # The time is stored in the node attributes. In TrackMate, this is the attribute "POSITION_T".
    source_time = ab_lineage.nodes[source]["POSITION_T"]
    target_time = ab_lineage.nodes[target]["POSITION_T"]
    # And for the radius.
    source_radius = ab_lineage.nodes[source]["RADIUS"]
    target_radius = ab_lineage.nodes[target]["RADIUS"]
    # Cell names
    source_name = ab_lineage.nodes[source]["name"]
    target_name = ab_lineage.nodes[target]["name"]

    fig.add_trace(
        go.Scatter(
            x=[source_time, target_time],
            y=[source_radius, target_radius],
            mode="lines+markers",
            line=dict(color="black", width=1),
            marker=dict(size=10, symbol="circle"),
            text=[source_name, target_name]
        )
    )
# Add axis labels
fig.update_layout(
    title="Radius of the AB lineage cells over time",
    xaxis_title="Time (min)",
    yaxis_title="Radius (µm)",
    showlegend=False,
    # Transparent background:
    plot_bgcolor='rgba(0, 0, 0, 0)',
    paper_bgcolor='rgba(0, 0, 0, 0)',
)

# Axes colors
fig.update_xaxes(linecolor='black')
fig.update_yaxes(linecolor='black')

# Show the figure
fig.show()


We can generalize this approach and package this in a function.
We can even add a third feature that will be used to color the nodes with a colormap.

In [4]:
def plot_lineage_features(lineage, x_feature, y_feature, c_feature):
    fig = go.Figure()

    # Prepare the color scale. Collect all the values for the feature c_feature
    # and create a color scale.
    c_values = [lineage.nodes[n][c_feature] for n in lineage.nodes()]
    c_min = min(c_values)
    c_max = max(c_values)
    

    # Loop over the edges of the lineage and add a line for each edge.
    for source, target in lineage.edges():
        source_x = lineage.nodes[source][x_feature]
        target_x = lineage.nodes[target][x_feature]
        source_y = lineage.nodes[source][y_feature]
        target_y = lineage.nodes[target][y_feature]
        # Cell names
        source_name = ab_lineage.nodes[source]["name"]
        target_name = ab_lineage.nodes[target]["name"]
        # For the colormap
        data = [ lineage.nodes[source][c_feature] , lineage.nodes[target][c_feature] ]

        fig.add_trace(
            go.Scatter(
                x=[source_x, target_x],
                y=[source_y, target_y],
                mode="lines+markers",
                line=dict(color="black", width=1),
                marker=dict(
                    color=data,
                    colorscale='jet',  # Apply the custom colormap
                    cmin=c_min,       # Minimum value of the scale
                    cmax=c_max,
                    size=15,
                    symbol="circle",
                    line=dict(color='black', width=1)
                ),
                text=[source_name, target_name]
            )
        )
    # Add axis labels
    fig.update_layout(
        xaxis_title=x_feature,
        yaxis_title=y_feature,
        showlegend=False,
        # Transparent background:
        plot_bgcolor='rgba(0, 0, 0, 0)',
        paper_bgcolor='rgba(0, 0, 0, 0)',
    )

    return fig



We can use it to plot the cell position over time.
There is a gotcha, however.
In TrackMate, the X and Y position of a cell is stored in the feature "POSITION_X" and "POSITION_Y". But when pycellin loads a TrackMate file, these feature values are reinterpreted and stored in the node attributes "cell_x" and "cell_y".
So we have:

In [5]:
fig2 = plot_lineage_features(ab_lineage, "cell_x", "cell_y", "POSITION_T")
# Square the figure
fig2.update_xaxes(
    scaleanchor="y",
    scaleratio=1,
)

fig2.show()


This plot can be configured a lot, but is very manual.

However, an important capability of pycellin is the ability to reshape and export the data to structure that can be used easily.
We can export the nodes of the AB lineages as a Pandas dataframe, that can be used very conveniently in ploty-express.
For instance, the plot above (minus the lines) can be obtained more simply with the following:

In [6]:
# Export all the lineages to a pandas dataframe
df = model.to_cell_dataframe()

# Filter for the AB lineage
df = df[df['lineage_ID']==ab_lineage_ID]

# Use plotly-express to create a scatter plot.
# It ships a very convenient syntax to set the color, size, position of a plot based on 
# the values of a dataframe. For instance we can plot the X,Y position of the cells, and color them
# based on the time of the cell division (POSITION_T). We can also set the size of the points
# based on the radius of the cells (RADIUS).
# Note that the dataframe is already filtered for the AB lineage.
# We can also use the name of the cells as hover name.
import plotly.express as px
fig3 = px.scatter(
    df,
    x="cell_x",
    y="cell_y",
    color="POSITION_T",
    size="RADIUS",
    hover_name="name",
    title="AB lineage cells",
    color_continuous_scale="jet",
)

# Square the figure
fig3.update_xaxes(
    scaleanchor="y",
    scaleratio=1,
)
fig3.update_layout(
        showlegend=False,
        # Transparent background:
        plot_bgcolor='rgba(0, 0, 0, 0)',
        paper_bgcolor='rgba(0, 0, 0, 0)',
    )



fig3.show()


This gives us a good impression on where do the cell move as the development progress. 
But this is just a XY view. 
We can generate an interactive 3D view by using another plotly express function:

In [7]:
fig4 = px.scatter_3d(
    df,
    x="cell_x",
    y="cell_y",
    z="cell_z",
    color="POSITION_T",
    size="RADIUS",
    hover_name="name",
    title="AB lineage cells",
    color_continuous_scale='jet',
)

# Square the figure
fig4.update_xaxes(
    scaleanchor="y",
    scaleratio=1,
)
fig4.update_layout(
        showlegend=False,
        # Transparent background:
        plot_bgcolor='rgba(0, 0, 0, 0)',
        paper_bgcolor='rgba(0, 0, 0, 0)',
    )

fig4.show()

This should be adapted to your needs, and will facilitate exploring lineage data.
In particular, this can be combined with the ability of pycellin to quickly augment a model with new feature definitions.
