Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in ImputeDNAmRef #3

Open
Duuudude opened this issue Dec 20, 2020 · 12 comments
Open

Error in ImputeDNAmRef #3

Duuudude opened this issue Dec 20, 2020 · 12 comments

Comments

@Duuudude
Copy link

Duuudude commented Dec 20, 2020

Dear developers,
I encountered an error while running:
refMscm2.m<- ImputeDNAmRef(expref.o$ref$med,db="SCM2",geneID="SYMBOL");
Error in eidNA.v[n] <- xx[[map.idx[n]]][1] : replacement has length zero
I have a different tissue type, could this error mean that there is no hit for my marker genes in the database?

I constructed expref.o using:
expref.o <- ConstExpRef( data, celltype.idx, celltype.v );
and succesfully obtained the reference matrix:
[1] "Now construct reference"
[1] 1727 5
Type1 Typel2 Type3 Type4 Type5
CYP19A1 0 0 0 0 3.169925
SERPINE1 0 0 0 0 2.807355
EPS8L1 0 0 0 0 2.321928

@aet21
Copy link
Owner

aet21 commented Dec 20, 2020

Hi,
First of all, tissue-type is really important, as EpiSCORE is only designed for solid tissue-types. For blood, PBMC, cord-blood, EpiSCORE is not appropriate, as for these tissues we have ample FACS-sorted data to build DNAm reference matrices.
However, the reason why EpiSCORE fails in your case must be related to something else, because a number of marker genes in the database must have been found. One reason could be if your data is in a different species to the one where you built the expression reference, as we did not include support for homology mapping. Or maybe you are missing the org.Hs.eg.db library (unlikely though).

@Duuudude
Copy link
Author

Duuudude commented Dec 20, 2020

Thank you for the reply!
I am trying to build the DNAm reference, so I don't think this error is related with different tissue types yet (actually I have single cell and DNAm data from the same tissue same species).

I checked into the ImputeDNAmRef method, the error raised from for following for loop:

for (n in 1:length(na.idx)) {
  eidNA.v[n] <-  xx[[map.idx[n]]][1]
}

where some xx[[map.idx[n]]][1] return NULL thus causing the trouble.
Can I directly remove the na after matching symbol with EntrezID?

@Duuudude
Copy link
Author

Dear developer,
Sorry to bother you again.
You mentioned the Roadmap epigenomic database contains 111 tissue and cell types. I am wondering if you include the whole database or subset of tissue/cell types in the EpiSCORE.

@aet21
Copy link
Owner

aet21 commented Jan 10, 2021

We only used samples for which there was both DNA methylation and RNA-Seq expression. Any sample related to cancer was dropped. At the time we processed the data, there were 45 samples with both types of data in RMAP and 34 samples in SCM2. You can check what the samples are since the database matrix columns are annotated to their tissue/cell-type.

@Duuudude
Copy link
Author

Thank you for the reply. I checked that my tissue of interest was not included in neither RMAP nor SCM2. Can I still use EpiSCORE to impute the DNAm reference for our samples? Do we assume that model trained from those samples in RMAP and SCM2 can be applied to other tissues (or cell types) universally?

@aet21
Copy link
Owner

aet21 commented Jan 10, 2021

What is your tissue?

@Duuudude
Copy link
Author

Placenta

@aet21
Copy link
Owner

aet21 commented Jan 10, 2021

EpiSCORE was designed for adult tissues (e.g. adult lung, breast, skin, brain,...). I doubt that it can be successfully applied to placenta, as many of the cell-types present in placenta were not represented in the RMAP and SCM2 databases. As always,
you can try to build the DNAm reference, but if I were reviewer of your paper i would demand validation of the DNAm reference.

@bindej99
Copy link

Thank you for the reply!
I am trying to build the DNAm reference, so I don't think this error is related with different tissue types yet (actually I have single cell and DNAm data from the same tissue same species).

I checked into the ImputeDNAmRef method, the error raised from for following for loop:

for (n in 1:length(na.idx)) {
  eidNA.v[n] <-  xx[[map.idx[n]]][1]
}

where some xx[[map.idx[n]]][1] return NULL thus causing the trouble.
Can I directly remove the na after matching symbol with EntrezID?

Hey,
i have the same struggles! Instead of Duuudude im using the mammary gland as described in the EpiSCORE publication as tissue. In my case, the map.idx object contains NAs.

Thank you for your help!

@aeteschendorff
Copy link

There is a bug in that part of the code where it converts the gene annotation, which will be corrected in due course.
A quick-fix is for you to simply reannotate your expression reference matrix to NCBI/Entrez gene IDs,
then the function will not need to do the conversion, and should run smoothly.

@hnlmarcus
Copy link

hnlmarcus commented Jul 29, 2021

I was having the same issue and solved it as you describe. But now I am getting another error:

refMrmap.m <- ImputeDNAmRef(expref.o$ref$med,db="RMAP",geneID="ENTREZID");
Error in p.m[g, ] : subscript out of bound

p.m was my normalized count matrix. This is a third type of error I am getting with the ImputeDNAmRef function, and I have tried various different inputs, checked for ranges and negative values, etc. From everything that I can see from object description of expref.o$ref$med, I tried to make it as similar to yours use in the vignette, but cannot get the function to work. I am also using 4 categories only.

traceback()
2: which(p.m[g, ] < 0.2)
1: ImputeDNAmRef(expref.o$ref$med, db = "SCM2", geneID = "ENTREZID")

Do you know what 0.2 is referring to?

@aet21
Copy link
Owner

aet21 commented Jul 30, 2021

Well, the "p.m" within the ImputeDNAmRef function refers to the expression matrix from the database (RMAP/SCM2), so nothing to do with your count matrix. I suspect the problem may be related to your expref.o$ref$med....... it could be that for one or more cell-types, there is no marker gene that are not expressed in that cell-type, which would then make "notE.idx" NULL and "p.m" undefined, and you'd get the error you are reporting. Alternatively, you still have a problem with your gene annotation. For instance, it could be that your EntrezID are e.g. " 2406" which would perhaps not match to the numeric 2406 identifier in the database reference matrices we provide.
In case the problem is the former one, then it seems you have a potential error in the construction of the expression reference matrix.
Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants