## SpaRED Library Plotting DEMO

This demonstration illustrates how to use the plotting functions available in our SpaRED PyPI library. These functions provide powerful and flexible tools for visualizing various aspects of spatial transcriptomics data. With SpaRED's plotting capabilities, users can easily generate insightful visual representations of their data, aiding in both analysis and presentation. In this demo, we will cover the following topics:

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as im
import os
import sys
from pathlib import Path

currentdir = os.getcwd()
parentdir = str(Path(currentdir).parents[2])
sys.path.insert(0, parentdir)

import spared

### Load Datasets
The `datasets` file has a function to get any desired dataset and return the adata as well as the parameter dictionary. This function returns a filtered and processed adata. This function has a parameter called *visualize* that allows for all visualizations if set to True. The fuction also saves the raw_adata (not processed) in case it is required. 

We will begin by loading a dataset and setting the *visualize* parameter as False. This way we can look at each plotting function separetetly as evaluate the generated images. 

In [None]:
from spared.datasets import get_dataset
import anndata as ad

#get_dataset(dataset, visualize)
data = get_dataset("vicari_mouse_brain", visualize=False)

#adata
adata = data.adata

#parameters dictionary
param_dict = data.param_dict

#loading raw adata 
dataset_path = os.getcwd()
files_path = os.path.join(dataset_path, "processed_data/vicari_data/vicari_mouse_brain/")
files = os.listdir(files_path)
adata_path = os.path.join(files_path, files[0], "adata_raw.h5ad")
raw_adata = ad.read_h5ad(adata_path)

### Plotting Functions

We are ready to explore the plotting function one by one. This tutorial will demostrate how to use each function, what to introduce as input and the expected output. First we must define a path to where the plots will be saved.


In [None]:
inv_folder_path="/home/dvegaa/spared/docs/inv_plots/vicari_mouse_brain"
os.makedirs(inv_folder_path, exist_ok=True)

`plot_data_distribution_stats` receives as input:

* **dataset (str):** name of the dataset
* **processed_adata (ad.AnnData):** processed adata 
* **path (str):** path to where image will be saved

And plots a pie chart and bar plots of the distribution of spots and slides in the dataset split.

In [None]:
from spared.plotting import plot_data_distribution_stats

plot_data_distribution_stats(dataset=data.dataset, processed_adata=adata, path=os.path.join(inv_folder_path, 'splits_stats.png'))

# Load the saved image
image_path = os.path.join(inv_folder_path, 'splits_stats.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

Now lets begin with `plot_all_slides`. This function receives as input:
* **dataset (str):** name of the dataset
* **processed_adata (ad.AnnData):** processed adata
* **path (str):** path to where image will be saved

And plots all the whole slide images present in the dataset.

In [None]:
from spared.plotting import plot_all_slides

plot_all_slides(dataset=data.dataset, processed_adata=adata, path=os.path.join(inv_folder_path, 'all_slides.png'))

# Load the saved image
image_path = os.path.join(inv_folder_path, 'all_slides.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()


`plot_exp_frac` receives as input:

* **param:dict (dict):** dictionary of dataset parameters
* **dataset (str):** name of the dataset
* **raw_adata (ad.AnnData):** raw adata 
* **path (str):** path to where image will be saved

And plots a heatmap of the expression fraction and global expression fraction for the complete collection of slides. 

In [None]:
from spared.plotting import plot_exp_frac

plot_exp_frac(param_dict=param_dict, dataset=data.dataset, raw_adata=raw_adata, path=os.path.join(inv_folder_path, 'exp_frac.png'))

# Load the saved image
image_path = os.path.join(inv_folder_path, 'exp_frac.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

`plot_histograms` receives as input:

* **processed_adata (ad.AnnData):** raw adata 
* **raw_adata (ad.AnnData):** processed adata 
* **path (str):** path to where image will be saved

And plots a figure that analyses the effect of the filtering over the data.The first row corresponds to the raw data and the second row plots the filtered and processed data. Histograms of total:   
1. Counts per cell
2. Cells with expression
3. Total counts per gene
4. Moran I statistics (only in processed data)

are generated.

In [None]:
from spared.plotting import plot_histograms

plot_histograms(processed_adata=data.adata, raw_adata=raw_adata, path=os.path.join(inv_folder_path, 'filtering_histograms.png'))

# Load the saved image
image_path = os.path.join(inv_folder_path, 'filtering_histograms.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

`plot_random_patches` receives as input:

* **dataset(str):** name of the dataset
* **processed_adata (ad.AnnData):** processed adata 
* **path (str):** path to where image will be saved
* **patch_size:** the size of the patches

And plots 16 random patches.

In [None]:
from spared.plotting import plot_random_patches

plot_random_patches(dataset=data.dataset, processed_adata=adata, path=os.path.join(inv_folder_path, 'random_patches.png'), patch_size=data.patch_size)

# Load the saved image
image_path = os.path.join(inv_folder_path, 'random_patches.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

`visualize_moran_filtering` receives as input:

* **param dict (dict):** dictionary of dataset parameters
* **processed_adata (ad.AnnData):** processed adata 
* **from_layer (str):** The key in adata.layers used for plotting
* **path (str):** path to where image will be saved
* **split_names (dict):** dictionary containing split names
* **top (bool):** if True the top most auto-correlated genes are visualized. If False the top least auto-correlated genes are visualized.

And plots the most or least auto-correlated genes.

In [None]:
# Creat folder for top and bottom moran genes
os.makedirs(os.path.join(inv_folder_path, 'top_moran_genes'), exist_ok=True)
os.makedirs(os.path.join(inv_folder_path, 'bottom_moran_genes'), exist_ok=True)
# Define the layer
layer = 'c_d_log1p'

from spared.plotting import visualize_moran_filtering

visualize_moran_filtering(param_dict=param_dict, processed_adata=adata, from_layer=layer, path=os.path.join(inv_folder_path, 'top_moran_genes', f'{layer}.png'), split_names=data.split_names, top = True)
visualize_moran_filtering(param_dict=param_dict, processed_adata=adata, from_layer=layer, path = os.path.join(inv_folder_path, 'bottom_moran_genes', f'{layer}.png'), split_names=data.split_names, top = False)

# Load the saved image
image_path_top = os.path.join(inv_folder_path, 'top_moran_genes', f'{layer}.png')
img_top = im.imread(image_path_top)

image_path_bot = os.path.join(inv_folder_path, 'bottom_moran_genes', f'{layer}.png')
img_bot = im.imread(image_path_bot)

# Display the image
fig, ax = plt.subplots(1, 2, figsize=(12, 6))

ax[0].imshow(img_top)
ax[0].axis('off')
ax[0].set_title('Top Moran Genes')

ax[1].imshow(img_bot)
ax[1].axis('off')
ax[1].set_title('Bottom Moran Genes')

plt.tight_layout()
plt.show()


`visualize_gene_expression` receives as input:

* **param dict (dict):** dictionary of dataset parameters
* **processed_adata (ad.AnnData):** processed adata 
* **from_layer (str):** the key in adata.layers used for plotting (must be *raw* values)
* **path (str):** path to where image will be saved
* **split_names (dict):** dictionary containing split names

And plots the gene expression of 4 specified genes in `param_dict['plotting_genes']`. If in `param_dict['plotting_genes'] = None`, 4 genes are randomly selected. 

In [None]:
# creat folder for expression plots
os.makedirs(os.path.join(inv_folder_path, 'expression_plots'), exist_ok=True)
# Define the layer
layer = 'counts'

from spared.plotting import visualize_gene_expression

visualize_gene_expression(param_dict=param_dict, processed_adata=adata, from_layer=layer, path=os.path.join(inv_folder_path,'expression_plots', f'{layer}.png'), split_names=data.split_names)

# Load the saved image
image_path = os.path.join(inv_folder_path,'expression_plots', f'{layer}.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

`plot_clusters` receives as input:

* **dataset (str):** name of the dataset
* **param dict (dict):** dictionary of dataset parameters
* **processed_adata (ad.AnnData):** processed adata 
* **from_layer (str):** the key in adata.layers used for plotting 
* **path (str):** path to where image will be saved
* **split_names (dict):** dictionary containing split names

And generates a plot that visualizes the Leiden clusters spatially in the slides. More specifically, it plots:
1. The spatial distribution of the Leiden clusters in the slides.
2. UMAP embeddings of each slide colored by Leiden clusters.
3. General UMAP embedding of the complete dataset colored by Leiden clusters and the batch correction key.
4. PCA embeddings of the complete dataset colored by the batch correction key.

In [None]:
# Create folder to save cluster plots
os.makedirs(os.path.join(inv_folder_path, 'cluster_plots'), exist_ok=True)
# Define layer
layer = 'c_d_log1p'

from spared.plotting import plot_clusters

plot_clusters(dataset=data.dataset, param_dict=param_dict, processed_adata=adata, from_layer=layer, path=os.path.join(inv_folder_path, 'cluster_plots', f'{layer}.png'), split_names=data.split_names)

# Load the saved image
image_path = os.path.join(inv_folder_path, 'cluster_plots', f'{layer}.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()


`plot_mean_std` receives as input:

* **dataset (str):** name of the dataset
* **processed_adata (ad.AnnData):** processed adata 
* **raw_adata (ad.AnnData):** raw adata
* **path (str):** path to where image will be saved

And plots a scatter of mean and standard deviation of genes present in raw_adata (black) and all the layers with non-zero mean in processed_adata. This function can be used to see the effect of filtering and processing in the genes.

In [None]:
from spared.plotting import plot_mean_std

plot_mean_std(dataset=data.dataset, processed_adata=adata, raw_adata=raw_adata, path=os.path.join(inv_folder_path, 'mean_std_scatter.png'))

# Load the saved image
image_path = os.path.join(inv_folder_path, 'mean_std_scatter.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

`plot_mean_std_partitions` receives as input:

* **dataset (str):** name of the dataset
* **processed_adata (ad.AnnData):** processed adata 
* **from_layer (str):** the key in adata.layers used for plotting
* **path (str):** path to where image will be saved

And plots a scatter of mean and standard deviation of genes present in processed_adata drawing with a different color different data splits. This function is used to see how tractable is the task.

In [None]:
# Create folder to save mean and std partition plots
os.makedirs(os.path.join(inv_folder_path, 'mean_vs_std_partitions'), exist_ok=True)
# Define layer
layer = 'c_d_log1p'

from spared.plotting import plot_mean_std_partitions

plot_mean_std_partitions(dataset=data.dataset, processed_adata=adata, from_layer=layer, path=os.path.join(inv_folder_path, 'mean_vs_std_partitions', f'{layer}.png'))

# Load the saved image
image_path = os.path.join(inv_folder_path, 'mean_vs_std_partitions', f'{layer}.png')
img = im.imread(image_path)

# Display the image
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.axis('off')
plt.show()

`plot_tests` receives as input:

* **patch_size (int):** size of the patches
* **dataset (str):** name of the dataset
* **split_names (dict):** dictionary containing split names
* **param_dict (dict):** dictionary of dataset parameters
* **folder_path (str):** path to the folder where all the images will be saved
* **processed_adata (ad.AnnData):** processed adata 
* **raw_adata (ad.AnnData):** raw adata 

And calls all the plotting functions in the plotting library to create quality control plots.

In [None]:
# Create folder to save all plots
folder_path="/home/dvegaa/spared/docs/all_plots/vicari_mouse_brain"
os.makedirs(folder_path, exist_ok=True)

from spared.plotting import plot_tests

plot_tests(patch_size=data.patch_size, dataset=data.dataset, split_names=data.split_names, param_dict=param_dict, folder_path=folder_path, processed_adata=adata, raw_adata=raw_adata)
