# 2. *Salpingoeca rosetta* gene expression

This notebook contains the code for visualizing the expression of the selected genes in the Salpingoeca rosetta dataset.

## 2.1 Setup
First, load the necessary libraries.
Prior to running this notebook, the `zoogletools` package was installed using the following command from the top level of this GitHub repository.

```bash
pip install -e .
```

In [1]:
import zoogletools as zt

## 2.2 Load data
Load the data from the *Salpingoeca rosetta* [dataset published to Figshare](https://figshare.com/articles/dataset/Table_of_differential_gene_expression_between_cell_types_of_i_Salpingoeca_rosetta_i_/28225106?file=51726065) by Leon et al (2025), as well as the mapping of human gene symbols to *Salpingoeca rosetta* Uniprot IDs provided by the updated Zoogle analysis.

In [2]:
salpingoeca_results_filepath = (
    "../../data/2025-04-21-os-portal-reprocessed/per-nonref-species/Salpingoeca-rosetta.tsv"
)
salpingoeca_expression_filepath = "../../data/Srosetta_DifferentialGeneExpression_TPM.txt"

id_mapping = zt.salpingoeca.plotting.create_salpingoeca_id_mapping(
    salpingoeca_results_filepath,
    salpingoeca_expression_filepath,
)

## 2.3 Plot expression
Plot the expression of a selected gene based on the human gene symbol.  
Here, we plot the expression of the human gene `CYB561`. The choanoflagellate gene `cytb561a` was shown by the authors of the [Leon et al. 2024 preprint](https://www.biorxiv.org/content/10.1101/2024.05.25.595918v2.full) to be specifically expressed in thecate cells.

In [3]:
human_gene_symbol = "CYB561"

zt.salpingoeca.plotting.plot_expression_boxplot_and_heatmap(
    human_gene_symbol,
    salpingoeca_expression_filepath,
    id_mapping,
)

## 2.4 Plot multiple genes
Here, we plot the expression of genes identified through our analysis of the *Salpingoeca rosetta* dataset as potentially useful for modeling human diseases.

In [4]:
selected_genes = ["PLS1", "CORO1A", "CLTC", "UNC13D", "GALC", "B4GALT7", "B3GALNT2"]

for human_gene_symbol in selected_genes:
    fig = zt.salpingoeca.plotting.plot_expression_boxplot_and_heatmap(
        human_gene_symbol,
        salpingoeca_expression_filepath,
        id_mapping,
        output_image_filepath=f"figures/Salpingoeca_{human_gene_symbol}_{id_mapping[human_gene_symbol]}_expression.svg",
        output_html_filepath=f"figures/Salpingoeca_{human_gene_symbol}_{id_mapping[human_gene_symbol]}_expression.html",
    )
    fig.show()