Accessing topics fits #2

timydaley · 2018-12-10T02:37:44Z

Hi, I want to access the estimated topic distributions and probabilities. How do I do this from a cisTopic fit? The documentation is not clear on this.

cbravo93 · 2018-12-10T15:33:53Z

Hi Tim,

The unnormalized cell assignments throughtout the sampling iterations are stored in cisTopicObject@selected.model$document_expects; while the corresponding unnormalized region assignments are stored in cisTopicObject@selected.model$topics. I have added this to the selectModel() and the cisTopic class documentation.

For normalising cell assignments you can do:

#Z-score
norm_assig <- scale(object@selected.model$document_expects, center=TRUE, scale=TRUE)

#Probability
alpha <- object@calc.params[['runModels']]$alpha/length(object@selected.model$topic_sums)
norm_assig <- apply(object@selected.model$document_sums, 2, function(x) {(x + alpha)/sum(x + alpha)})

For normalising the regions, you can take a look to getRegionsScores(). The normalised topic scores for each region will be stored in object@region.data then.

Hope this is helpful!

timydaley · 2018-12-10T20:16:20Z

The document expects and topics are null. Code and results are shown below. Data is from Buenrostro 2018 and downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96769

library(cisTopic)
countsMM = Matrix::readMM("scATACseq_counts2.txt")
cellNames = scan("cellNames2.txt", what = character())
head(cellNames)
categories = c("HSC", "MPP", "LMPP", "CMP", "CLP", "MEP", "GMP", "MCP", "mono", "UNK", "pDC")
matchCategory <- function(s, categories){
for(k in categories){
if(grepl(toString(k), s)){
return(k)
}
}
return("NA")
}
cellTypes = sapply(cellNames, function(s) matchCategory(toString(s), categories))
counts = as.matrix(countsMM)
dim(counts)
rows2keep = which(rowSums(counts) > 10)
counts = counts[rows2keep, ]
cells2keep = which(colSums(counts) > 50)
counts = counts[ ,cells2keep]
cellTypes = cellTypes[cells2keep]
dim(counts)
colnames(counts) = 1:dim(counts)[2]
peaks = read.table("GSE96769_PeakFile_20160207.bed")
peaks = peaks[rows2keep, ]
rownames(counts) = paste0(peaks[,1], ":", peaks[,2], "-", peaks[,3])
cisTopicObject = createcisTopicObject(count.matrix = counts, project.name='test')
counts.cisTopics = runModels(cisTopicObject, topic=17, nCores=1, burnin = 250, iterations = 500, addModels=FALSE)

summary(counts.cisTopics@selected.model$topics)
Length Class Mode
0 NULL NULL
summary(counts.cisTopics@selected.model)
Length Class Mode
0 NULL NULL
counts.cisTopics@selected.model
NULL
counts.cisTopics@selected.model$document_expects
NULL
counts.cisTopics@selected.model$topics
NULL

cbravo93 · 2018-12-10T20:21:09Z

Hi Tim,

You need to run selectModel first to fill the selected.model slot.

counts.cisTopics <- selectModel(counts.cisTopics)

Otherwise, since you only have a model, you can find it in counts.cisTopics@models.

cbravo93 closed this as completed Dec 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing topics fits #2

Accessing topics fits #2

timydaley commented Dec 10, 2018

cbravo93 commented Dec 10, 2018 •

edited

timydaley commented Dec 10, 2018

cbravo93 commented Dec 10, 2018

Accessing topics fits #2

Accessing topics fits #2

Comments

timydaley commented Dec 10, 2018

cbravo93 commented Dec 10, 2018 • edited

timydaley commented Dec 10, 2018

cbravo93 commented Dec 10, 2018

cbravo93 commented Dec 10, 2018 •

edited