Here, we show how to apply FlowSig to a spatial Stereo-seq dataset of an E9.5 mouse embryo, as originally studied in [Chen et al. (2022)](https://doi.org/10.1016/j.cell.2022.04.003).
The processed data and cell-cell communication inference, which was obtained using [COMMOT](https://commot.readthedocs.io/en/latest/tutorials.html),
can be downloaded from the following Zenodo  [repository](https://zenodo.org/doi/10.5281/zenodo.10850397).

In [None]:
import scanpy as sc
import pandas as pd
import flowsig as fs

Load in the data, which has been subsetted for spatially variable genes, as determined by the global Moran's I. In this case, we only retain genes where $I > 0.1$.

In [None]:
adata = sc.read('data/chen22_svg_E9.5.h5ad')

We construct 20 gene expression modules from the unnormalized spot counts using [NSF](https://www.nature.com/articles/s41592-022-01687-w).

In [None]:
fs.construct_gems_using_nsf(adata,
                            n_gems = 20,
                            layer_key = 'count',
                            length_scale = 5.0)

commot_output_key = 'commot-cellchat'

We first construct the potential cellular flows from the COMMOT output, which has been run previously.

In [None]:
fs.construct_flows_from_commot(adata,
                                commot_output_key,
                                gem_expr_key = 'X_gem',
                                scale_gem_expr = True,
                                flowsig_network_key = 'flowsig_network',
                                flowsig_expr_key = 'X_flow')

Then we subset for "spatially flowing" variables

In [None]:
fs.determine_informative_variables(adata,  
                                    flowsig_expr_key = 'X_flow',
                                    flowsig_network_key = 'flowsig_network',
                                    spatial = True,
                                    moran_threshold = 0.15,
                                    coord_type = 'grid',
                                    n_neighbours = 8)

For spatial data, we need to construct spatial blocks that are used for block bootstrapping, to preserve the spatial correlation of the gene expression data. The idea is that by sampling within these spatial blocks, we will better preserve these spatial correlation structures during bootstrapping. We construct the blocks using simple K-Means clustering over the spatial locations.

In [None]:
fs.construct_spatial_blocks(adata,
                             n_blocks=20,
                             use_graph=False,
                             spatial_block_key = "spatial_block",
                             spatial_key = "spatial")

Now we are ready to learn the network


In [None]:
fs.learn_intercellular_flows(adata,
                        flowsig_key = 'flowsig_network',
                        flow_expr_key = 'X_flow',
                        use_spatial = True,
                        block_key = 'spatial_block',
                        n_jobs = 1,
                        n_bootstraps = 10)

Now we do post-learning validation to reorient undirected edges from the learnt CPDAG so that they flow from inflow to GEM to outflow. After that, we remove low-confidence edges.

In [None]:
# This part is key for reducing false positives
fs.apply_biological_flow(adata,
                        flowsig_network_key = 'flowsig_network',
                        adjacency_key = 'adjacency',
                        validated_adjacency_key = 'adjacency_validated')

edge_threshold = 0.7

fs.filter_low_confidence_edges(adata,
                                edge_threshold = edge_threshold,
                                flowsig_network_key = 'flowsig_network',
                                adjacency_key = 'adjacency',
                                filtered_adjacency_key = 'adjacency_filtered')

adata.write('data/chen22_svg_E9.5.h5ad', compression='gzip')