# Introduction to `pylifemap`

[pylifemap](https://github.com/juba/pylifemap) is a Python package providing a Jupyter widget to visualize data using the [lifemap](https://lifemap.univ-lyon1.fr) interactive tree of life.

This is a sample notebook to show some of `pylifemap` features and how to use it.

## Installation

For the moment the package is only available on Github. You can install it by running the following code:

In [None]:
%pip install git+https://github.com/juba/pylifemap.git

## Sample data

In this notebook we will use a sample data file generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species.

We can load the file with the pandas or polars data frame libraries:

In [None]:
import polars as pl

iucn = pl.read_csv(
    "https://raw.githubusercontent.com/juba/pylifemap/main/data/iucn.csv"
)

The data set only contains two variables: the `taxid` of the species, and its `status`:

In [None]:
iucn

Besides the full `iucn` dataset, we will create another `iucn_extinct` object with only the species with the "Extinct" status.

In [None]:
iucn_extinct = iucn.filter(pl.col("status") == "Extinct")
iucn_extinct

## Visualizing species distribution

We will first try to visualize the distribution of the species in `iucn_extinct`.

The first thing we have to do is to create a new `Lifemap` instance by passing it our data:

In [None]:
from pylifemap import Lifemap

Lifemap(iucn_extinct)

To visualize our data we have to add a *layer* to our `Lifemap` object. Here we can use `layer_points`, which displays each species with a colored point. We also call the `show()` method to display the result.

In [None]:
Lifemap(iucn_extinct).layer_points().show()

Another interesting layer for species distribution is `layer_screengrid`:

In [None]:
Lifemap(iucn_extinct).layer_screengrid().show()

In our dataset we have a list of extinct species which are "leaves" of the tree. One thing we can do is compute the frequency of extinct species for each tree node, by aggregating the count along the branches.

We can compute this by using the `aggregate_count` function on our data:

In [None]:
from pylifemap import aggregate_count

iucn_extinct_agg = aggregate_count(iucn_extinct)
iucn_extinct_agg

We can visualize this new dataset with a `layer_points`. But it is more interesting visually if we make the size and the color of the points depend on their associated count. This is possible by using the `radius_col` and `fill_col` arguments:

In [None]:
Lifemap(iucn_extinct_agg).layer_points(radius_col="n", fill_col="n").show()

And we can add a second layer to also color the branches depending on the species frequencies:

In [None]:
(
    Lifemap(iucn_extinct_agg)
    .layer_lines(width_col="n", color_col="n")
    .layer_points(radius_col="n", fill_col="n")
    .show()
)

## Visualizing a categorical variable

Instead of displaying the count of a specific status, we may want to try to visualize the repartition of the different status values in the full `iucn` dataset.

To do this we could color the points according to the status value. This can be done with the `fill_col` argument:


In [None]:
Lifemap(iucn).layer_points(fill_col="status", opacity=.2).show()

Another possibility is to make an another aggregation along the tree branches, this time to compute the frequencies of the different statuses at each node. This can ben done with the `aggregate_cat` function.

In [None]:
from pylifemap import aggregate_freq

iucn_agg = aggregate_freq(iucn, "status")

We can then visualize this data as a series of donut charts. You can click on a chart to display a popup with more informations:

In [None]:
Lifemap(iucn_agg).layer_donuts("status").show()