Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decontaminating pancreatic dataset #23

Closed
jmzvillarreal opened this issue Jan 8, 2020 · 1 comment
Closed

decontaminating pancreatic dataset #23

jmzvillarreal opened this issue Jan 8, 2020 · 1 comment

Comments

@jmzvillarreal
Copy link

Hi,
I am using SoupX to decontaminate a dataset of pancreatic cell in which acinar enzymes are contaminating non acinar cells. I have used soup specific genes to determine the fratuion of contamination and correcting the expression profile as follows:

WT_36Dir<- c("/local/ljmartinezv/sc_pancreas_M_Serrano/Final_analysis/WT/AL4936/")
WT_36_CellID <- read.table('WT_36_CELLS', header = FALSE, sep= '\t')
WT_36 <- load10X(dataDir = WT_36Dir, cellIDs = WT_36_CellID$V1, keepDroplets = TRUE)
WT_36 <- estimateSoup(WT_36)

Soup specific genes

Soup_genes_36 <- head(WT_36$soupProfile[order(WT_36$soupProfile$est, decreasing = TRUE), ], n = 50)
Soup_genes_36 <- rownames(Soup_genes_36)

Estimating non-expressing cells

useToEst_36 = estimateNonExpressingCells(WT_36, nonExpressedGeneList = list(Soup_genes_36))

Calculating the contamination fraction

WT_36 <- calculateContaminationFraction(WT_36, list(Soup_genes_36), useToEst = useToEst_36)

estimated global contamination fraction of 37.60%

Correcting expression profile

WT_36_decont <- adjustCounts(WT_36)

DropletUtils:::write10xCounts("./WT_36Counts", WT_36_decont)

Does that looks fine to you ?
Thanks in advance,
Jaime.

@constantAmateur
Copy link
Owner

It looks like you're just using the top 50 genes as expressed in the soup to determine the contamination fraction. You should not be doing this, doing so will over-estimate the contamination fraction (potentially by a lot, I doubt your contamination is really as high as 37%). The correct thing to do is to pick genes that you know should not be expressed in a set of cells. I'm not an expect in your context, but something like Insulin in Acinar cells, that you know shouldn't be there. See the vignette for an example.

Failing that, you're better off setting the contamination fraction to something reasonable (like 10%) and proceeding with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants