Skip to content

Xijiang1997/BOOST-Ising

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BOOST-Ising

BOOST-Ising is a method to detect spatially variable (SV) genes in spatial transcriptomics (ST) datasets. It is a Bayesian modeling framework for the analysis of single-gene count data on a large-scale lattice defined by a number of array spots. For each gene, expression counts are clustered into two groups - high-expression level and low-expression level, and spatial pattern is defined by the interaction between these two groups in a modified Ising model.

How to use BOOST-Ising functions

We use MouseOB dataset (Spatial Transcriptomics assay of a slice of Mouse Olfactory Bulb) as an example. This dataset can be found in data file.

The following R packages are required to run the model:

  • Rcpp
  • RcppArmadillo
  • RcppDist
  • mclust
  • edgeR
  • lattice

Firstly, we need to load data and functions. For demonstration, we load the example data with only 10 genes from the mouse olfactory bulb ST data.

load("data/toy_example.Rdata")
source("functions/Boost_Ising_function.R")

The dataset includes two parts: count data and location data. In count data, each column is the expression counts for a gene. Location data is the coordinates to indecate which locations of the tissue slice has been sampled.

Before detecting SV genes, we need to filter the dataset, which can remove sample locations and genes with few expression points.

filter_result <- filter_count(count, loc, min_total = 10, min_percentage = 0)
loc_f <- filter_result[[1]]
count_f <- filter_result[[2]]

In the above function, min_total is the minimum total counts, and locations are selected if the total counts for all genes in this location is not less than it. min_percentage is the minimum percentage of non-zero counts for genes. If a gene has so many zero counts that the percentage of non-zero count is less than this threshold, this gene will be removed.

After filteration, we can run the main detection function Boost_Ising.

Notes: Matrix is the only format acceptable for the 'count' input in the BOOST-Ising function. Each column is the expression counts for a gene. Column names are gene names.

detect_result <- Boost_Ising (count_f,loc_f, norm_method = 'tss', clustermethod = 'MGC')

In this function, we need to determine which normalization method is used. If norm_method = 1, counts data are devided by the summation of total counts for each location, which is at default. There are also other six options for normalization methods: 'q75', 'rle', 'tmm', 'n-vst', 'a-vst' and 'log'. For details of normalization methods, see Table 1 in the supplementary notes for the paper. For clustering method, model-based clustering method is applied. We can choose K-means by setting clustermethod = 'Kmeans'.

The output of this function is a dataframe and each row is the result for one gene.

detect_result

For each gene, 'theta_mean', 'theta_CI_low' and 'theta_CI_high' is the estimated posterior mean and lower and upper bounds of 95% confidence interval for interaction parameter in the modified Ising model. 'omega_mean', 'omega_CI_low' and 'omega_CI_high' is the estimated posterior mean and lower and upper bounds of 95% confidence interval for first-order intensity parameter in the modified Ising model. 'BF_neg' is the Bayes factor favoring against , while 'BF_pos' is the Bayes factor favoring against .

To obtain detected SV genes, we can check the Bayes factor favoring against .

SV_gene <- rownames(detect_result)[which(detect_result$BF_neg > 150)]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published