Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing topics fits #2

Closed
timydaley opened this issue Dec 10, 2018 · 3 comments
Closed

Accessing topics fits #2

timydaley opened this issue Dec 10, 2018 · 3 comments

Comments

@timydaley
Copy link

Hi, I want to access the estimated topic distributions and probabilities. How do I do this from a cisTopic fit? The documentation is not clear on this.

@cbravo93
Copy link
Member

cbravo93 commented Dec 10, 2018

Hi Tim,

The unnormalized cell assignments throughtout the sampling iterations are stored in cisTopicObject@selected.model$document_expects; while the corresponding unnormalized region assignments are stored in cisTopicObject@selected.model$topics. I have added this to the selectModel() and the cisTopic class documentation.

For normalising cell assignments you can do:

#Z-score
norm_assig <- scale(object@selected.model$document_expects, center=TRUE, scale=TRUE)

#Probability
alpha <- object@calc.params[['runModels']]$alpha/length(object@selected.model$topic_sums)
norm_assig <- apply(object@selected.model$document_sums, 2, function(x) {(x + alpha)/sum(x + alpha)})

For normalising the regions, you can take a look to getRegionsScores(). The normalised topic scores for each region will be stored in object@region.data then.

Hope this is helpful!

@timydaley
Copy link
Author

The document expects and topics are null. Code and results are shown below. Data is from Buenrostro 2018 and downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96769

library(cisTopic)
countsMM = Matrix::readMM("scATACseq_counts2.txt")
cellNames = scan("cellNames2.txt", what = character())
head(cellNames)
categories = c("HSC", "MPP", "LMPP", "CMP", "CLP", "MEP", "GMP", "MCP", "mono", "UNK", "pDC")
matchCategory <- function(s, categories){
for(k in categories){
if(grepl(toString(k), s)){
return(k)
}
}
return("NA")
}
cellTypes = sapply(cellNames, function(s) matchCategory(toString(s), categories))
counts = as.matrix(countsMM)
dim(counts)
rows2keep = which(rowSums(counts) > 10)
counts = counts[rows2keep, ]
cells2keep = which(colSums(counts) > 50)
counts = counts[ ,cells2keep]
cellTypes = cellTypes[cells2keep]
dim(counts)
colnames(counts) = 1:dim(counts)[2]
peaks = read.table("GSE96769_PeakFile_20160207.bed")
peaks = peaks[rows2keep, ]
rownames(counts) = paste0(peaks[,1], ":", peaks[,2], "-", peaks[,3])
cisTopicObject = createcisTopicObject(count.matrix = counts, project.name='test')
counts.cisTopics = runModels(cisTopicObject, topic=17, nCores=1, burnin = 250, iterations = 500, addModels=FALSE)

summary(counts.cisTopics@selected.model$topics)
Length Class Mode
0 NULL NULL
summary(counts.cisTopics@selected.model)
Length Class Mode
0 NULL NULL
counts.cisTopics@selected.model
NULL
counts.cisTopics@selected.model$document_expects
NULL
counts.cisTopics@selected.model$topics
NULL

@cbravo93
Copy link
Member

Hi Tim,

You need to run selectModel first to fill the selected.model slot.

counts.cisTopics <- selectModel(counts.cisTopics)

Otherwise, since you only have a model, you can find it in counts.cisTopics@models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants