# SpaRED Library Processing DEMO

In this tutorial, we will explore the data processing functions available in the SpaRED library, focusing on four key areas:

* Gene Features
* Filtering
* Layer Operations
* Denoising

These processing functions are essential for preparing and refining spatial transcriptomics data, ensuring that it is ready for accurate and efficient analysis. This demonstration will showcase the preprocessing steps used in our paper, providing a detailed look at how to clean your data, extract meaningful features, and perform various operations on data layers.


In [2]:
import matplotlib.pyplot as plt
import matplotlib.image as im
import os
import sys
from pathlib import Path

currentdir = os.getcwd()
parentdir = str(Path(currentdir).parent)
sys.path.insert(0, parentdir)
print(parentdir)

import spared

/media/SSD4/dvegaa/SpaRED


## Load Datasets

The `datasets` file has a function to get any desired dataset and return the adata as well as the parameter dictionary. This function returns a filtered and processed adata. This function has a parameter called *visualize* that allows for all visualizations if set to True. The fuction also saves the raw_adata (not processed) in case it is required. 

We will begin by loading a dataset and setting the *visualize* parameter as False since no images are required for the functions analized in this DEMO.

In [None]:
from spared.datasets import get_dataset
import anndata as ad

#get dataset
data = get_dataset("vicari_mouse_brain", visualize=False)

#adata
adata = data.adata

#parameters dictionary
param_dict = data.param_dict

#loading raw adata 
dataset_path = os.getcwd()
files_path = os.path.join(dataset_path, "processed_data/vicari_data/vicari_mouse_brain/")
files = os.listdir(files_path)
adata_path = os.path.join(files_path, files[0], "adata_raw.h5ad")
raw_adata = ad.read_h5ad(adata_path)

## Gene features functions

In this section, we will explore the gene features functions available in the SpaRED library. These functions provide tools to compute the relative and global expression fractions for all genes, as well as the Moran's I for genes in an AnnData object. These calculations provide insights into gene expression patterns and their spatial distribution, and are also used in the preprocessing steps.

### Function: `get_exp_frac`

The `get_exp_frac` function calculates the expression fraction of each gene within individual slides. This function is essential for understanding the local expression patterns of genes across different spatial regions.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** adata collection where non-expressed genes have a value of `0` in the `adata.X` matrix

##### <u>Returns:</u>

An updated AnnData object with the expression fraction information added into the `adata.var['exp_frac']` column. The expression fraction of a gene in a slide is defined as the proportion of spots where that gene is expressed.



In [None]:
from spared.gene_features import get_exp_frac

adata_exp = get_exp_frac(raw_adata)

### Function: `get_glob_exp_frac`

The `get_glob_exp_frac` function calculates the global expression fraction of each gene across the entire dataset. This measure provides a broader view of gene expression patterns, allowing for comparisons across different slides.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** adata collection where non-expressed genes have a value of `0` in the `adata.X` matrix

##### <u>Returns:</u>

An updated AnnData object with the global expression fraction information added into the `adata.var['glob_exp_frac']` column. The global expression fraction of a gene in a dataset is defined as the proportion of spots where that gene is expressed across the entire dataset, as opposed to individual slides.

In [None]:
from spared.gene_features import get_glob_exp_frac

adata_exp = get_glob_exp_frac(raw_adata)

### Function: `compute_moran`

The `compute_moran` function calculates Moran's I for each gene, providing a measure of spatial autocorrelation. Moran's I indicates whether gene expression levels are more similar (positive autocorrelation) or more dissimilar (negative autocorrelation) across spatial locations than would be expected by chance. Genes with high Moran's I values exhibit strong spatial patterns in their expression levels.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** An AnnData object to update. Must have expression values in `adata.layers[from_layer]`.
* **from_layer (str):** Key in `adata.layers` with the values used to compute Moran's I.
* **hex_geometry (bool):** Whether the geometry is hexagonal or not.

##### <u>Returns:</u>

An updated AnnData object with the average Moran's I for each gene in the `adata.var[f'{from_layer}_moran']` column.

In [None]:
from spared.gene_features import compute_moran

adata_moran = compute_moran(adata=adata, from_layer="c_d_log1p", hex_geometry=param_dict["hex_geometry"])

## Filtering functions

In this section, we will explore the filtering functions available in the SpaRED library. These functions are designed to refine your spatial transcriptomics data by filtering it based on specific criteria. Specifically, we will demonstrate how to filter AnnData objects (adata) by Moran's I genes, using the parameters defined in `param_dict` and based on specific slides.

### Function: `filter_by_moran`

The `filter_by_moran` function refines the dataset by selecting genes with the highest Moran's I values, which indicates strong spatial autocorrelation. This ensures that the analysis focuses on genes with meaningful spatial patterns, which is crucial for spatial transcriptomics studies.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** An AnnData object to update. The AnnData must contain an `adata.var[f'{from_layer}_moran']` column.
* **n_keep (int):** The number of genes to keep in the filtering process.
* **from_layer (str):** The layer for which the Moran's I was previously computed.

##### <u>Returns:</u>

An updated AnnData object with the filtered genes.

### Filtering by Moran's I values
**Moran's I** is a measure of spatial autocorrelation, indicating whether gene expression levels are more similar (positive autocorrelation) or more dissimilar (negative autocorrelation) across spatial locations than would be expected by chance. This indicates which genes present spatial patterns with biological meaning instead of random patterns.

The `filter_by_moran` function ranks all the genes present in the data by their Moran I values, obtained from `compute_moran`. Then, it selects the top genes with the highest values (e.g., the top 256 genes), ensuring that the analysis focuses only on those with meaningful spatial variation and not random spatial patterns. This is crucial for spatial transcriptomics studies to identify and analyze biologically significant spatial patterns.

In [None]:
from spared.filtering import filter_by_moran

adata_moran = filter_by_moran(adata, n_keep=param_dict['top_moran_genes'], from_layer='d_log1p')

### Function: `filter_dataset` 

The `filter_dataset` function refines the dataset by applying a series of filters to ensure the data is meaningful and robust for analysis. This function filters both spots and genes based on user-defined criteria provided in the param_dict.

##### <u>Parameters:</u>

* **adata(ad.AnnData):** An unfiltered AnnData collection.
* **param_dict (dict):** A dictionary containing filtering and processing parameters. In the `param_dict`, the following keys must be present:

    * `cell_min_counts` (*int*):      Minimum total counts for a spot to be valid.
    * `cell_max_counts` (*int*):      Maximum total counts for a spot to be valid.
    * `gene_min_counts` (*int*):      Minimum total counts for a gene to be valid.
    * `gene_max_counts` (*int*):      Maximum total counts for a gene to be valid.
    * `min_exp_frac` (*float*):       Minimum fraction of spots in any slide that must express a gene for it to be valid.
    * `min_glob_exp_frac` (*float*):  Minimum fraction of spots in the whole collection that must express a gene for it to be valid.
    * `wildcard_genes` (*str*):       Path to a `.txt` file with the genes to keep or `None` to filter genes based on the other keys.

##### <u>Returns:</u>

A filtered AnnData collection.

### Explanation

The `filter_dataset` function applies several filters to ensure that the data is meaningful and robust:

1. **Filter Spots by Total Counts:**

    Spots with total counts outside the range [`param_dict['cell_min_counts']`, `param_dict['cell_max_counts']`] are removed. This ensures that all spots have a meaningful number of expressed genes, providing sufficient information for accurate predictions.

2. **Filter Genes by Total Counts:**

    Genes with total counts outside the range [`param_dict['gene_min_counts']`, `param_dict['gene_max_counts']`] are removed. This ensures that all genes have meaningful expression values, providing sufficient information for accurate predictions.

3. **Filter Genes by Expression Fraction:**

    If `param_dict['wildcard_genes']` is None, genes are filtered based on their expression fraction. Genes that are not expressed in at least `param_dict['min_exp_frac']` of spots in each slide and `param_dict['min_glob_exp_frac']` of spots in the whole collection are removed. This discards genes with low sparsity, leaving only those with significant expression occurrence.

4. **Filter Genes by Wildcard Genes:**

    If `param_dict['wildcard_genes']` is specified, only the genes listed in the file are kept.

5. **Remove Genes with Zero Counts:**

    Finally, genes with zero counts are removed to ensure the dataset is free from non-expressed genes.

These filtering steps ensure that the model is provided with sufficient and meaningful data to learn expression patterns for each predicted or imputed gene.

In [None]:
from spared.filtering import filter_dataset

adata_filter = filter_dataset(adata, param_dict)

### Function: `get_slide_from_collection`

The `get_slide_from_collection` function extracts a specific slide from a collection of concatenated slides in an AnnData object.

##### <u>Parameters:</u>

* **collection (ad.Anndata):** An AnnData object with all the slides concatenated.
* **slide (str):** The name of the slide to extract from the collection.

##### <u>Returns:</u>

A filtered AnnData object containing only the specified slide.

In [None]:
from spared.filtering import get_slide_from_collection

slide_id = adata.obs.slide_id.unique()[0]
slide_adata = get_slide_from_collection(collection = adata,  slide=slide_id)

### Function: `get_slides_adata` 

The `get_slides_adata` function extracts multiple slides from a collection of concatenated slides in an AnnData object, based on a list of slide names.

##### <u>Parameters:</u>

* **collection (ad.Anndata):** An AnnData object with several slides concatenated.
* **slide_list (str):** A string containing a list of slide names separated by commas.

##### <u>Returns:</u>

A list of AnnData objects, one for each slide included in the `slide_list`.

In [None]:
from spared.filtering import get_slides_adata

all_slides = ",".join(adata.obs.slide_id.unique().to_list())
slides_list = get_slides_adata(collection=adata, slide_list=all_slides)

## Layer Operation functions

In this section, we will explore the layer operation functions available in the SpaRED library. These functions provide essential tools for processing and normalizing transcriptomic data, ensuring accurate and meaningful comparisons across different samples or regions.

### Function: `tpm_normalization` 

The `tpm_normalization` function applies TPM (Transcripts per Million) normalization to an AnnData object. This process adjusts the raw counts by gene length and library size, making the data comparable across different samples or regions.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object to normalize. The counts are taken from `adata.layers[from_layer]`.
* **organism (str):** Organism of the dataset. Must be 'mouse' or 'human'.
* **from_layer (str):** The layer to take the counts from. The data in this layer should be in raw counts.
* **to_layer (str):** The layer to store the results of the normalization.

##### <u>Returns:</u>

An updated AnnData object with TPM values in `adata.layers[to_layer]`.

### TPM Normalization

The purpose behind TPM normalization is to make gene expression levels comparable between different samples or regions. The general framework for TPM normalization involves the following steps:

1. **Count Reads per Gene:** For each gene in a sample, count the number of reads mapped to it. This gives the raw read counts.

2. **Normalize for Gene Length:** Divide the raw read count for each gene by the length of the gene (in kilobases). This step adjusts for the fact that longer genes are more likely to have more reads simply because they are longer. The result is the RPK (Reads Per Kilobase).

3. **Calculate the Scaling Factor:** Sum the RPK values for all genes in a sample to get a scaling factor. This represents the total number of reads per kilobase in the sample.

4. **Normalize for Library Size:** Divide each gene's RPK by the scaling factor and then multiply by $10^6$ to get TPM (Transcripts Per Million). This step adjusts for the total sequencing depth (library size), making expression levels comparable across samples.

TPM normalization accounts for both gene length and library size, making it possible to compare gene expression levels across different samples or regions accurately. It standardizes the data, ensuring that observed differences reflect biological variation rather than technical biases.

In [None]:
from spared.layer_operations import tpm_normalization

adata.layers['counts'] = adata.X.toarray()
adata = tpm_normalization(adata=adata, organism=param_dict["organism"], from_layer="counts", to_layer="tpm")

### Function: `log1p_transformation`

The `log1p_transformation` function applies a log base 2 transformation to the data, stabilizing variance and improving the normality of the data. This transformation is particularly useful for gene expression data, which often follows a skewed distribution.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object to transform.
* **from_layer (str):** The layer to take the data from.
* **to_layer (str):** The layer to store the results of the transformation.

##### <u>Returns:</u>

An updated AnnData object with transformed data in `adata.layers[to_layer]`.

### Log1p Transformation

The purpose behind the log1p transformation is to stabilize the variance and improve the normality of the data, making the gene expression data more suitable for downstream analyses. The `log1p_transformation` function applies a log base 2 transformation to the TPM values and adds 1 to avoid taking the logarithm of zero.

Gene expression data often follows a skewed distribution with a few genes having very high expression levels and many genes having low expression levels. By applying the log transformation, the data distribution becomes more symmetrical and closer to a normal distribution.

The log transformation achieves this by compressing the range of expression values. High expression values are compressed more than low expression values, reducing the effect of outliers, while low expression values are expanded slightly, helping to distinguish between low but non-zero expression levels. This transformation also stabilizes the variance across the data. Variance stabilization means that the variability of the data becomes more consistent across different expression levels. 

In [None]:
from spared.layer_operations import log1p_transformation

adata = log1p_transformation(adata, from_layer='tpm', to_layer='log1p')

### Function: `combat_transformation`

The `combat_transformation` function applies batch correction to the data using the ComBat algorithm, addressing technical variations and batch effects.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object to transform. Must have logarithmically transformed data in `adata.layers[from_layer]`.
* **batch_key (str):** The column in `adata.obs` that defines the batches.
* **from_layer (str):** The layer to take the data from.
* **to_layer (str):** The layer to store the results of the transformation.

##### <u>Returns:</u>

An updated AnnData object with batch-corrected data in `adata.layers[to_layer]`.

### ComBat Transformation

The purpose behind the ComBat transformation is to correct for batch effects and other technical variations between samples. Batch effects are unwanted variations that arise from differences in sample processing, such as differences in sequencing runs, sample preparation, or other technical factors. The ComBat algorithm adjusts for these variations by modeling the expression data as a combination of biological signal and batch effect. It estimates the batch effect parameters and removes them from the data, producing a corrected dataset.

Batch effects can introduce systematic biases that obscure true biological differences between samples. ComBat correction ensures that the observed variations in gene expression reflect genuine biological differences rather than technical artifacts, improving the reliability and accuracy of downstream analyses.

In [None]:
from spared.layer_operations import combat_transformation

adata = combat_transformation(adata, batch_key=param_dict['combat_key'], from_layer='log1p', to_layer='c_log1p')

### Function: `get_deltas`

The `get_deltas` function calculates the deviations (deltas) from the mean expression of each gene and stores these values in a specified layer of the AnnData object.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object to update. Must have expression values in `adata.layers[from_layer]`. Must also have the `adata.obs['split']` column with 'train' values.
* **from_layer (str):** The layer to take the data from.
* **to_layer (str):** The layer to store the results of the transformation.

##### <u>Returns:</u>

An updated AnnData object with the deltas in `adata.layers[to_layer]` and mean expression information in `adata.var[f'{from_layer}_avg_exp']`.

### Deltas

The delta value represents the difference between the actual gene expression value and the mean expression value in the training dataset. Research, including studies by Mejia, G et al., has shown that predicting expression variations (deltas) rather than absolute expression values leads to better performance in gene expression prediction tasks, as evidenced by lower Mean Squared Error (MSE). This approach reduces prediction error and enhances the accuracy and reliability of downstream analyses, thereby improving the overall performance in understanding and interpreting gene expression patterns.

In [None]:
from spared.layer_operations import get_deltas

adata = get_deltas(adata, from_layer='log1p', to_layer='deltas')

### Function: `add_noisy_layer` 

The `add_noisy_layer` function adds an artificial noisy layer to the AnnData object for experimentation or ablation purposes. This function corrupts the specified prediction layer by introducing noise, either by setting missing values to zero (for log-transformed data) or to the negative mean expression (for delta data).

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object to update. Must have the prediction layer, the gene means if it's a delta layer, and the mask layer.
* **prediction_layer (str):** The layer that will be corrupted to create the noisy layer.

##### <u>Returns:</u>

An updated AnnData object with the noisy layer added.

In [None]:
from spared.layer_operations import add_noisy_layer

adata.layers['mask'] = adata.layers['tpm'] != 0
adata = add_noisy_layer(adata=adata, prediction_layer="c_log1p")

### Function: `process_dataset` 

The `process_dataset` function performs a complete processing pipeline on a filtered AnnData object, applying various transformations and normalizations. This function integrates several preprocessing steps, ensuring the data is ready for accurate and robust analysis.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** AnnData object to process. The AnnData should be already filtered
* **param_dict (dict):** Dictionary that contains filtering and processing parameters. Keys that must be present are:
    * `top_moran_genes (int):` The number of genes to keep after filtering by Moran's I. If set to 0, then the number of genes is internally computed
    * `combat_key (str):` The column in adata.obs that defines the batches for ComBat batch correction. If set to 'None', then no batch correction is performed.
    * `hex_geometry (bool)` Whether the graph is hexagonal or not. If True, then the graph is hexagonal. If False, then the graph is a grid. Only true for visium datasets.

##### <u>Returns:</u>

A processed AnnData object with all the layers and results added. A list of included layers in adata.layers is:

* `counts`: Raw counts of the dataset.
* `tpm`: TPM normalized data.
* `log1p`: Log1p transformed data (base 2.0).
* `d_log1p`: Denoised data with adaptive median filter.
* `c_log1p`: Batch corrected data with ComBat (only if combat_key is not 'None').
* `c_d_log1p`: Batch corrected and denoised data with adaptive median filter (only if combat_key is not 'None').
* `deltas`: Deltas from the mean expression for log1p.
* `d_deltas`: Deltas from the mean expression for d_log1p.
* `c_deltas`: Deltas from the mean expression for c_log1p (only if combat_key is not 'None').
* `c_d_deltas`: Deltas from the mean expression for c_d_log1p (only if combat_key is not 'None').
* `mask`: Binary mask layer. True for valid observations, False for imputed missing values.

In [None]:
from spared.layer_operations import process_dataset

raw_adata = ad.read_h5ad(os.path.join(dataset_path, f'adata_raw.h5ad'))
processed_adata = process_dataset(adata=raw_adata, param_dict=param_dict)

### Denoising functions

In this section, we will explore the denoising functions available in the SpaRED library. These functions are designed to address and fill missing data, often referred to as dropout values, using methods such as the median imputation strategy, highlighted in (Mejia et al., 2023), and the innovative SpaCKLE method, as detailed in our publication (Mejia et al., 2023).

### Median imputation

Median imputation strategies replaces zero values in the gene map  with the median of a growing circular region around the interest patch up to the 7th unique radial distance. If no value is obtained at the end of this process, the median of nonzero entries of the WSI is applied.

### Function: `median_cleaner`

The `median_cleaner` function processes the AnnData object using an adaptive median filter method for denoising and filling in missing 

##### <u>Parameters:</u>

* **collection (ad.AnnData):** The AnnData collection to process.
* **from_layer (str):** The layer to compute the adaptive median filter from. Where to clean the noise from.
* **to_layer (str):** The layer to store the results of the adaptive median filter. Where to store the cleaned data.
* **n_hops (int):** The maximum number of concentric rings in the neighbors graph to take into account to compute the median. Analogous to the maximum window size.
* **hex_geometry (bool):** True if the graph has hexagonal spatial geometry (Visium technology). If False, then the graph is a grid.

##### <u>Returns:</u>

An updated AnnData collection with the results of the adaptive median filter stored in the layer  `adata.layers[to_layer]`.

To properly use `median_cleaner`, it is essential that the global expression fraction has been previously calculated and saved in the `AnnData` collection. Furthermore, the dataset should have undergone TPM normalization and log1p transformation to ensure accurate and effective noise removal.

In [None]:
from spared.denoising import median_cleaner

# Get global exp fraction
adata = get_glob_exp_frac(raw_adata)
# X to array
adata.layers['counts'] = adata.X.toarray()
# TPM normalization
adata = tpm_normalization(param_dict["organism"], adata, from_layer='counts', to_layer='tpm')
# Transform the data with log1p (base 2.0)
adata = log1p_transformation(adata, from_layer='tpm', to_layer='log1p')

adata = median_cleaner(adata, from_layer='log1p', to_layer='d_log1p', n_hops=4, hex_geometry=param_dict["hex_geometry"])

### SpaCKLE imputation 

SpaCKLE imputations strategie leverages the power of Transformers to complete corrupted gene expression vectors. This method outperforms previous gene completion strategies and is able to succesfully complete dropout values even when the missing data fraction is up to 70%. 

### Function: `spackle_cleaner`

The `spackle_cleaner` function processes the AnnData object using the SpaCKLE method for denoising and filling in missing data. 

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object to process. Must have data splits in `adata.obs['split']` with values 'train', 'val', and (optional) 'test'.
* **dataset (str):** The name of the dataset being processed.
* **from_layer (str):** The layer to take the data from for processing.
* **to_layer (str):** The layer to store the results of the SpaCKLE denoising process.
* **device (str):** The device to run the model on (e.g., 'cpu' or 'cuda').
* **lr (float):** The learning rate for training the model. Default is 1e-3.
* **train (bool):** Indicates whether to train a new model or use an existing one. Default is True.
* **get_performance_metrics (bool):** Indicates whether to compute performance metrics. Default is True.
* **load_ckpt_path (str):** Path to the checkpoint file of a pre-trained model. If provided, training is skipped. Default is an empty string.
* **optimizer (str):** The optimizer to use for training. Default is 'Adam'.
* **max_steps (int):** The maximum number of training steps. Default is 1000.

##### <u>Returns:</u>

An updated AnnData object with the denoised layer added and the path to the model's checkpoints used for completing the missing values.

To properly use the `spackle_cleaner` function, several preprocessing steps must be completed to ensure the data is adequately prepared for the denoising process. These steps include computing the average Moran's I for each gene, filtering genes based on their Moran's I values, applying ComBat batch correction, and adding a binary mask layer. Additionally, for optimal performance, the function requires a GPU device.

In [None]:
from spared.denoising import spackle_cleaner
import torch 

# Compute average moran for each gene in the layer d_log1p 
adata = compute_moran(adata, hex_geometry=param_dict["hex_geometry"], from_layer='d_log1p')
# Filter genes by Moran's I
adata = filter_by_moran(adata, n_keep=param_dict['top_moran_genes'], from_layer='d_log1p')
# Apply combat
adata = combat_transformation(adata, batch_key=param_dict['combat_key'], from_layer='d_log1p', to_layer='c_d_log1p')
# Add a binary mask layer 
adata.layers['mask'] = adata.layers['tpm'] != 0
# Define a device
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

adata_1, _  = spackle_cleaner(adata=adata, dataset=data.dataset, from_layer="c_d_log1p", to_layer="c_t_log1p", device=device)


### Function: `spackle_cleaner_experiment`

The `spackle_cleaner_experiment` function is designed to replicate the results presented in the SpaCKLE paper by training a SpaCKLE model or loading an existing model from a checkpoint to process an AnnData object. This function is essential for reproducing the results of the SpaCKLE method on a given dataset.

##### <u>Parameters:</u>

* **adata (ad.AnnData):** The AnnData object containing the dataset to be processed. The object must include data splits in adata.obs['split'], with values 'train', 'val', and optionally 'test'.
* **dataset (str):** The name of the dataset being used. This name is utilized to organize the results and save paths.
* **from_layer (str):** The specific layer in the AnnData object from which the data will be extracted for processing.
* **device (str):** The device to run the model on (e.g., 'cpu' or 'cuda').
* **lr (float):** The learning rate for training the SpaCKLE model. The default value is 1e-3.
* **train (bool):** Indicates whether to train a new model or use an existing one. Default is True.
* **load_ckpt_path (str):** The file path to a checkpoint of a previously trained model. If provided, the model will be loaded from this checkpoint, bypassing the training phase. This path should end with the .ckpt file and be located in a directory containing the corresponding script_params.json file. The default is an empty string.
* **optimizer (str):** The optimizer to use for training. Default is 'Adam'.
* **max_steps (int):** The maximum number of training steps. Default is 1000.

##### <u>Returns:</u>

The function returns an updated AnnData object that has been processed using the SpaCKLE model. Additionally, during the execution, the function prints performance metrics to the console, either during training or when loading and testing a pre-trained model. The function also saves the model's checkpoints and related parameters in a directory specified by the dataset name and date-time

This function is particularly useful for researchers looking to replicate the results of the SpaCKLE model on their own datasets, allowing for both model training and testing with existing models.

In [None]:
from spared.denoising import spackle_cleaner_experiment

repro = spackle_cleaner_experiment(adata=adata, dataset=data.dataset, from_layer="c_d_log1p", device=device, lr = 1e-3, train = True, load_ckpt_path = "")


## References

1. Mejia, G., Cárdenas, P., Ruiz, D., Castillo, A., Arbeláez, P.: Sepal: Spatial gene expression prediction from local graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. pp. 2294–2303 (October 2023)