# FlashDeconv: Fast Spatial Deconvolution via Structure-Preserving Sketching

This notebook demonstrates how to use FlashDeconv for spatial transcriptomics cell type deconvolution through the unified `omicverse.space.Deconvolution` API.

## Why FlashDeconv?

- **Scalability**: Handles millions of spots (Visium HD, Slide-seq) without GPU requirement
- **Speed**: Uses randomized sketching for O(n) complexity instead of O(n²)
- **Spatial awareness**: Incorporates graph Laplacian regularization for spatially smooth results
- **Integration**: scanpy-style API, seamlessly works with AnnData objects

## Inputs and Outputs

- **Inputs**:
  - Spatial transcriptomics data (10x Visium, Visium HD, Slide-seq, etc.)
  - Single-cell reference with cell type annotations
- **Outputs**:
  - Cell type proportions per spot (stored in `adata.obsm['flashdeconv']`)
  - Dominant cell type per spot
  - Compatible `adata_cell2location` object for downstream analysis

## Workflow Overview

1. Load scRNA-seq reference and spatial data (~1 min)
2. Run FlashDeconv deconvolution (~2-5 min for standard Visium)
3. Visualize results (~5 min)

In [None]:
import omicverse as ov
import scanpy as sc
import matplotlib.pyplot as plt

ov.plot_set()

## Step 1: Load Data

### 1.1 Load scRNA-seq reference

The reference should contain cell type annotations in `.obs`.

In [None]:
# Load your scRNA-seq reference
# Example: Human lymph node reference
adata_sc = ov.read('data/sc.h5ad')

# Check cell type annotations
print(adata_sc.obs['Subset'].value_counts())

### 1.2 Load spatial transcriptomics data

In [None]:
# Load spatial data (example: Visium human lymph node)
adata_sp = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
adata_sp.obs['sample'] = list(adata_sp.uns['spatial'].keys())[0]
adata_sp.var_names_make_unique()

print(f"Spatial data: {adata_sp.n_obs} spots, {adata_sp.n_vars} genes")

## Step 2: Run FlashDeconv Deconvolution

FlashDeconv is integrated into the `omicverse.space.Deconvolution` class. Simply set `method='FlashDeconv'`.

### Key Parameters

- `sketch_dim`: Dimension of sketched space (default: 512). Higher values preserve more information.
- `lambda_spatial`: Spatial regularization strength (default: 5000). Higher values encourage smoother spatial patterns.
- `n_hvg`: Number of highly variable genes to use (default: 2000).
- `n_markers_per_type`: Number of marker genes per cell type (default: 50).

In [None]:
# Initialize the Deconvolution object
decov_obj = ov.space.Deconvolution(
    adata_sc=adata_sc,
    adata_sp=adata_sp
)

In [None]:
# Run FlashDeconv deconvolution
decov_obj.deconvolution(
    method='FlashDeconv',
    celltype_key_sc='Subset',  # Column containing cell type annotations
    flashdeconv_kwargs={
        'sketch_dim': 512,          # Sketch dimension
        'lambda_spatial': 5000.0,   # Spatial regularization
        'n_hvg': 2000,              # Number of HVGs
        'n_markers_per_type': 50,   # Markers per cell type
    }
)

### Access Results

Results are stored in multiple locations for compatibility:
- `decov_obj.adata_cell2location`: AnnData with cell type proportions as X matrix
- `decov_obj.adata_sp.obsm['flashdeconv']`: DataFrame of proportions
- `decov_obj.adata_sp.obs['flashdeconv_dominant']`: Dominant cell type per spot

In [None]:
# View the result object
decov_obj.adata_cell2location

In [None]:
# View cell type proportions
decov_obj.adata_sp.obsm['flashdeconv'].head()

## Step 3: Visualization

### 3.1 Spatial heatmap of cell type proportions

In [None]:
# Select cell types to visualize
annotation_list = ['B_naive', 'B_GC_LZ', 'T_CD4+_TfH_GC', 'FDC',
                   'B_plasma', 'T_CD4+_naive', 'Endo', 'DC_cDC1']

# Plot spatial distribution
sc.pl.spatial(
    decov_obj.adata_cell2location, 
    cmap='magma',
    color=annotation_list,
    ncols=4, 
    size=1.3,
    img_key='hires',
)

### 3.2 Dominant cell type visualization

In [None]:
# Plot dominant cell type per spot
sc.pl.spatial(
    decov_obj.adata_sp,
    color='flashdeconv_dominant',
    size=1.3,
    img_key='hires',
)

### 3.3 Multi-target overlay

In [None]:
import matplotlib as mpl

# Create color dictionary from reference
if 'Subset_colors' in adata_sc.uns:
    color_dict = dict(zip(
        adata_sc.obs['Subset'].cat.categories,
        adata_sc.uns['Subset_colors']
    ))
else:
    color_dict = None

clust_labels = annotation_list[:5]

with mpl.rc_context({'figure.figsize': (6, 6), 'axes.grid': False}):
    fig = ov.pl.plot_spatial(
        adata=decov_obj.adata_cell2location,
        color=clust_labels, 
        labels=clust_labels,
        show_img=True,
        style='fast',
        max_color_quantile=0.992,
        circle_diameter=4,
        colorbar_position='right',
        palette=color_dict
    )

### 3.4 Pie chart visualization (cropped region)

In [None]:
# Crop a region of interest
adata_cropped = ov.space.crop_space_visium(
    decov_obj.adata_cell2location, 
    crop_loc=(0, 0),      
    crop_area=(500, 1000), 
    library_id=list(decov_obj.adata_cell2location.uns['spatial'].keys())[0], 
    scale=1
)

# Plot with pie charts
fig, ax = plt.subplots(figsize=(8, 4))
sc.pl.spatial(
    adata_cropped, 
    basis='spatial',
    color=None,  
    size=1.3,
    img_key='hires',
    ax=ax,      
    show=False
)

ov.pl.add_pie2spatial(
    adata_cropped,
    img_key='hires',
    cell_type_columns=annotation_list,
    ax=ax,
    colors=color_dict,
    pie_radius=10,
    remainder='gap',
    legend_loc=(0.5, -0.25),
    ncols=4,
    alpha=0.8
)
plt.show()

## Comparison: FlashDeconv vs Other Methods

| Feature | FlashDeconv | Tangram | cell2location |
|---------|-------------|---------|---------------|
| GPU Required | No | Optional | Recommended |
| Speed (10k spots) | ~2 min | ~15 min | ~60 min |
| Visium HD Support | Yes (native) | Limited | Limited |
| Spatial Regularization | Built-in | No | No |
| API Style | scanpy-like | Custom | Custom |

## Tips and Troubleshooting

### Parameter Tuning

- **For noisy/sparse data**: Increase `lambda_spatial` (e.g., 10000)
- **For dense data (Visium HD 2μm)**: Increase `sketch_dim` (e.g., 1024)
- **For better accuracy**: Increase `n_hvg` (e.g., 3000)

### Common Issues

1. **Few overlapping genes**: Ensure gene names match between spatial and reference data
2. **Missing spatial coordinates**: Check `adata.obsm['spatial']` exists
3. **Memory issues with large data**: FlashDeconv is memory-efficient, but for very large datasets, consider subsetting

## Citation

If you use FlashDeconv in your research, please cite:

```
Yang, C., Chen, J. & Zhang, X. FlashDeconv enables atlas-scale,
multi-resolution spatial deconvolution via structure-preserving sketching.
bioRxiv (2025). https://doi.org/10.64898/2025.12.22.696108
```

Also cite OmicVerse for the unified API:

```
Zeng, Z., et al. OmicVerse: a framework for bridging and accelerating 
single-cell multiomics analysis with deep learning.
```