Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis
We recommend to create a new environment with Python 3.10:
conda create -n py310 python=3.10
conda activate py310
The scarp
package can be installed via pip:
pip install scarp
scanpy (>=1.9.5)
numpy (>=1.25.2)
scipy (>=1.11.3)
pandas (>=2.1.1)
Please checkout the tutorials at here.
You can downloaded a example data from here.
import scanpy as sc
data_name = 'Leukemia'
data = sc.read_h5ad('./Example_data/Leukemia.h5ad')
from scarp import model
Cells_df = model.SCARP(adata=data,
data_name=data_name,
plot_SD=True,
verbose=True
)
parameter name | description | type | default |
---|---|---|---|
adata | input scATAC-seq data | h5ad | None |
data_name | name of this dataset | str | None |
m | parameter to control NR diffusion intensity | float | 1.5 |
gamma | parameter to control the threshold for merging adjacent chromosomes | int | 3000 |
beta | parameter to control the extent to which prior edge weight decays | int | 5000 |
return_shape | shape of the returned matrix | str | 'CN' |
peak_loc | use peak location prior information or not | bool | True |
parallel | parallel computing or not. 0 means automatically determined | int | 0 |
plot_SD | plot the SDs of PCs or not | bool | True |
fig_size | figure size of the SD plot | tuple | (4,3) |
save_file | if plot_std is True, the file path you want to save | str | None |
verbose | print the process or not | bool | True |
t, diffusion_mat = model.SCARP_diffusion_mat(adata=data)
k = model.SCARP_SD_plot(data=diffusion_mat,
peaks_num=Peaks_num,
title=data_name,
plot_SD=True)
Cell_embedding = model.SCARP_cell_embedding(diffusion_mat=diffusion_mat,
kept_comp=k)
For reproducibility, we provide all the necessary scripts and data here.