Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotateRegions with own dataset #66

Closed
sinkahak opened this issue Apr 14, 2021 · 1 comment
Closed

annotateRegions with own dataset #66

sinkahak opened this issue Apr 14, 2021 · 1 comment

Comments

@sinkahak
Copy link

Hi! I was wondering, whether it would be possible to use annotateRegions() with my own gene set of e.g. cell cycle phase genes? I tried visualizing the cell cycle phase with PlotFeatures(), but was only able to visualize an individual umap of each gene and that's not what I'm wishing. Do you have any suggestions, what would be the best way of visualizing e.g. cell cycle phase or any other gene sets containing numerous genes?

Thanks in advance!

Br. Sini

@cbravo93
Copy link
Member

Hi!

Assuming you dont have multiome available (only scATAC), you will need to either convert your gene set to regions (as this is the features you have) or the region matrix to a gene accessibility matrix and then score your gene set in that matrix with AUCell. The latter may be the easiest, you can just aggregate the probability scores of the regions surrounding a space around the TSS of the gene. For example:

# Helper function
# Retrieves the cisTopic object regions that intersect with the target regions (e.g. genes, etc...)
# Return is a list of region names (character) split by the @
# queryRegions:  GRanges
intersectToRegionSet <- function(cisTopicObject, targetRegions, splitBy=colnames(targetRegions@elementMetadata)[1], minOverlap=0.4, as.data.frame=FALSE)
{
  # To do: Check types
  if(!is.numeric(minOverlap) || (minOverlap<0 && minOverlap>=1)) stop("minOverlap should be a number between 0 and 1 (percentage of overlap between the regions).")
  if(is.null(targetRegions@elementMetadata) | ncol(targetRegions@elementMetadata)==0) targetRegions@elementMetadata <- DataFrame(name=as.character(targetRegions))
  if(!splitBy %in% colnames(targetRegions@elementMetadata)) stop("missing gene id annotation")
  if(length(table(lengths(targetRegions@elementMetadata[,splitBy]))) >1 ) stop("")
  
  # cisTopicObject
  cisTopicRegions <- cisTopicObject@region.ranges
  #elementMetadata(cisTopicRegions)[["regionNames"]] <- names(cisTopicRegions)
  
  cisTopicRegionsOverlap <- findOverlaps(cisTopicRegions, targetRegions,
                                         minoverlap=1,  #maxgap=0L, select="all",
                                         type="any", ignore.strand=TRUE)#, ...)
  
  if(minOverlap>0)
  {
    # In i-cisTarget, the default is 40% minimum overlap. Both ways: It takes the maximum percentage (of the peak or the ict region)
    # To reproduce those results:
    overlaps <- pintersect(cisTopicRegions[queryHits(cisTopicRegionsOverlap)], targetRegions[subjectHits(cisTopicRegionsOverlap)])
    percentOverlapHuman <- width(overlaps) / width(cisTopicRegions[queryHits(cisTopicRegionsOverlap)])
    percentOverlapPeaks <- width(overlaps) / width(targetRegions[subjectHits(cisTopicRegionsOverlap)])
    maxOverlap <- apply(cbind(percentOverlapHuman, percentOverlapPeaks), 1, max)
    cisTopicRegionsOverlap <- cisTopicRegionsOverlap[maxOverlap > minOverlap]
  }
  
  if (as.data.frame == FALSE){
    hitsPerGene <- split(unname(as.character(cisTopicRegions[queryHits(cisTopicRegionsOverlap)])), unlist(targetRegions[subjectHits(cisTopicRegionsOverlap)]@elementMetadata[,splitBy]))
    regionsPerGene <- lapply(hitsPerGene, unique)
    missingGenes <- targetRegions@elementMetadata[,splitBy][which(!targetRegions@elementMetadata[,splitBy] %in% names(regionsPerGene))]
    regionsPerGene <- c(missing=list(missingGenes), regionsPerGene)
  }
  
  if (as.data.frame == TRUE){
    hitsPerGene <- cbind(unname(as.character(cisTopicRegions[queryHits(cisTopicRegionsOverlap)])), unlist(targetRegions[subjectHits(cisTopicRegionsOverlap)]@elementMetadata[,splitBy]))
    regionsPerGene <- as.data.frame(hitsPerGene[!duplicated(hitsPerGene),])
  }
  return(regionsPerGene)
}
# Take regions +-10kbp around TSS and in gene's introns
gene2regionFile <- 'Common_data/mm10-limited-upstream10000-tss-downstream10000-full-transcript.bed'
# This is a dataframe with gene and chr \t upstream_boundary \t downstream_boundary
gene2region <- import.bed(con=gene2regionFile)
geneNameSplit <- strsplit(gene2region@elementMetadata$name, split = "#", fixed = TRUE)
geneNameClean <- sapply(geneNameSplit, function(x) x[[1]])
gene2region@elementMetadata$name <- geneNameClean
# Get regions in cisTopicobject per gene
geneRegions <- list()
chunk <- function(x,n) split(x, factor(sort(rank(x)%%n)))
chunks <- chunk(1:length(gene2region),10)
for (i in 1:length(chunks)){
  geneRegions <- c(geneRegions, intersectToRegionSet(cisTopicObject, gene2region[chunks[[i]]], splitBy="name", minOverlap=0.4))
}
# Fix duplicated
duplicated_list <- geneRegions[names(geneRegions)[duplicated(names(geneRegions))]]
geneRegions <- geneRegions[-which(names(geneRegions) %in% names(geneRegions)[duplicated(names(geneRegions))])]
for (gene in unique(names(duplicated_list))){
  geneRegions[[gene]] <- duplicated_list[[gene]]
}
# Formatting
missingGenes <- geneRegions[["missing"]]
geneRegionSets <- geneRegions[which(!names(geneRegions) %in% "missing")]
pred.matrix <- predictiveDistribution(cisTopicObject)
gene_act <- t(sapply(geneRegionSets, function(x) apply(pred.matrix[x,, drop=F], 2, sum)))
colnames(gene_act) <- cisTopicObject@cell.names
gene_act <- round(gene_act * 1000000)
# AUCell
cells_rankings <- AUCell_buildRankings(gene_act)
genes <- c("gene1", "gene2", "gene3") # Your genes
geneSets <- list(geneSet1=genes)
cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=nrow(cells_rankings)*0.05)
# Your signature enrichment score, you can add it as metadata to the cisTopicObject to plot

@cbravo93 cbravo93 closed this as completed May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants