# 6. Visualizing Ciona scRNA-Seq expression

This notebook illustrates a variety of functions for visualizing Ciona scRNA-Seq expression data across developmental stages, including:
- Violin plot distributions of gene expression across cell clusters identified in Seurat
- `[TBI]` UMAP scatter plots of gene expression, tissue type, and cluster identity
- `[TBI]` Tissue-specific bubble plots of gene expression across developmental stages

*`[TBI]` indicates that the functions will be implemented in a future PR.

To run this notebook, you should have run the downloading Snakemake workflow described in the README at the top level of this repo, followed by the Jupyter notebooks numbered 0, 1, 3, 4, and 5.

## 6.1 Load necessary modules

Be sure to install `zoogletools` from the top level of this directory.

In [1]:
import zoogletools as zt
from zoogletools.ciona.constants import (
    CIONA_GENE_MODELS_DIRPATH,
    PIEKARZ_DATA_DIRPATH,
    ZOOGLE_RESULTS_DIRPATH,
    CionaStage,
)
from zoogletools.ciona.identifier_mapping import CionaIDTypes



## 6.2 Create identifier mapper object

Because there are multiple different identifiers for the same gene, we need to create an identifier mapper object that can map between different identifier types.

We create this once and reuse it for all plots in this notebook.

### Inputting identifiers
Plots in this module can work with the following identifiers, specified via the `CionaIDTypes` `StrEnum` class in the `input_id_type` parameter (see downstream functions for more details). The supported IDs include:
- **HGNC Gene Symbol** (`CionaIDTypes.HGNC_GENE_SYMBOL`). This is the primary gene symbol for the human gene.
- **Ciona UniProt ID** (`CionaIDTypes.NONREF_PROTEIN`). This is the UniProt ID of the Ciona protein.
- **Ciona KY2021 ID** (`CionaIDTypes.KY_ID`). This is the gene identifier from the KY2021 genome.
- **Ciona KH2012 ID** (`CionaIDTypes.KH_ID`). This is the gene identifier from the KH2012 genome.

For the scRNA-Seq data, the input identifier will be mapped to its corresponding `KY_ID`, which is the identifier used for gene expression within the Piekarz dataset.

In [2]:
id_mapper = zt.ciona.identifier_mapping.IdentifierMapper(
    zoogle_results_dirpath=ZOOGLE_RESULTS_DIRPATH,
    ciona_gene_models_dirpath=CIONA_GENE_MODELS_DIRPATH,
)

## 6.3 Violin plots of gene expression

`zoogletools` includes functions to visualize expression of user-selected genes across cell clusters. The clusters can be colorized either by the Seurat cluster number, or by the top most-represented tissue type found in the cluster.

These plots contain three subplots:
- 1. **Cell count bar chart.** The top row shows a bar chart of the number of cells in each cluster.
- 2. **Percent expression bar chart.** The middle row shows a bar chart where the proportion of cells with nonzero expression of the gene of interest is colored.
- 3. **Expression violin chart.** The bottom row shows a violin chart of the distribution of gene expression for cells with nonzero expression.

In [3]:
input_id = "NAXE"
stage = CionaStage.INIG

cluster_plot = zt.ciona.plotting.plot_expression_violin(
    stage=stage,
    input_id=input_id,
    input_id_type=CionaIDTypes.HGNC_GENE_SYMBOL,
    data_dirpath=PIEKARZ_DATA_DIRPATH,
    mapper=id_mapper,
    color_mode="cluster",
)

cluster_plot.show()

tissue_plot = zt.ciona.plotting.plot_expression_violin(
    stage=stage,
    input_id=input_id,
    input_id_type=CionaIDTypes.HGNC_GENE_SYMBOL,
    data_dirpath=PIEKARZ_DATA_DIRPATH,
    mapper=id_mapper,
    color_mode="tissue",
)

tissue_plot.show()

## 6.3 Violin plots of multiple genes across all stages
We can generate violin plots for multiple developmental stages at once using the `plot_expression_violin_for_all_stages` function. The code below generates violin plots for genes identified as tractable for pilot experiments in *Ciona*, colored by cluster and by tissue type, and saves the plots to the `figures` directory.

In [4]:
selected_genes = ["NAXE", "FCHO1", "PGM3", "RBBP7", "NCKAP1L"]

output_dirpath = "figures"

for gene in selected_genes:
    zt.ciona.plotting.plot_expression_violin_for_all_stages(
        input_id=gene,
        input_id_type=CionaIDTypes.HGNC_GENE_SYMBOL,
        data_dirpath=PIEKARZ_DATA_DIRPATH,
        mapper=id_mapper,
        color_mode="cluster",
        output_dirpath=output_dirpath,
    )

    zt.ciona.plotting.plot_expression_violin_for_all_stages(
        input_id=gene,
        input_id_type=CionaIDTypes.HGNC_GENE_SYMBOL,
        data_dirpath=PIEKARZ_DATA_DIRPATH,
        mapper=id_mapper,
        color_mode="tissue",
        output_dirpath=output_dirpath,
    )

100%|██████████| 10/10 [00:10<00:00,  1.08s/it]
100%|██████████| 10/10 [00:09<00:00,  1.05it/s]
100%|██████████| 10/10 [00:08<00:00,  1.14it/s]
100%|██████████| 10/10 [00:09<00:00,  1.10it/s]
100%|██████████| 10/10 [00:10<00:00,  1.05s/it]
100%|██████████| 10/10 [00:10<00:00,  1.06s/it]
100%|██████████| 10/10 [00:10<00:00,  1.01s/it]
100%|██████████| 10/10 [00:09<00:00,  1.02it/s]
100%|██████████| 10/10 [00:08<00:00,  1.12it/s]
100%|██████████| 10/10 [00:09<00:00,  1.05it/s]
