Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATS analysis fails with Ensembl annotations #40

Closed
Klim314 opened this issue Nov 3, 2017 · 4 comments
Closed

RATS analysis fails with Ensembl annotations #40

Klim314 opened this issue Nov 3, 2017 · 4 comments
Labels
question User query about usage.

Comments

@Klim314
Copy link

Klim314 commented Nov 3, 2017

With Ensembl annotations for kallisto quantifications, RATS will produce a solely of NA results due to the Ensembl ".N" version numbers

[1] Summary of DTU results:
                       category  tally
1         DTU genes (gene test)      0
2     non-DTU genes (gene test)      0
3          NA genes (gene test)  52636
4      DTU genes (transc. test)      0
5  non-DTU genes (transc. test)      0
6       NA genes (transc. test)  52636
7        DTU genes (both tests)      0
8    non-DTU genes (both tests)      0
9         NA genes (both tests)  52636
10              DTU transcripts      0
11          non-DTU transcripts      0
12               NA transcripts 131195

Looking at the Genes, all genes/transcripts fail to be detected by RATS as follows

dtus_subset$Genes %>% head()
            parent_id  elig sig elig_fx quant_reprod rep_reprod DTU transc_DTU known_transc detect_transc elig_transc maxDprop
1: ENSMUSG00000000001 FALSE  NA      NA           NA         NA  NA         NA            1             0           0       NA
2: ENSMUSG00000000003 FALSE  NA      NA           NA         NA  NA         NA            2             0           0       NA
3: ENSMUSG00000000028 FALSE  NA      NA           NA         NA  NA         NA            3             0           0       NA

Examining the raw data reveals this to be due to the Ensembl gene/transcript version numbers. Stripping the .N suffix resolves this issue.

$boot_data_A
$boot_data_A[[1]]
                    target_id      bs0      bs1    bs10     bs11     bs12     bs13    bs14     bs15     bs16     bs17     bs18
     1:    ENSMUST00000000001       NA       NA      NA       NA       NA       NA      NA       NA       NA       NA       NA
     2:  ENSMUST00000000001.4 23.28946 22.50972 24.6344 24.70217 23.98399 25.11433 24.7658 23.61903 24.20159 23.83014 26.02804
     3:    ENSMUST00000000003       NA       NA      NA       NA       NA       NA      NA       NA       NA       NA       NA
     4: ENSMUST00000000003.13  0.00000  0.00000  0.0000  0.00000  0.00000  0.00000  0.0000  0.00000  0.00000  0.00000  0.00000
     5:    ENSMUST00000000010       NA       NA      NA       NA       NA       NA      NA       NA       NA       NA       NA

The issue seems similar to that faced by Patcher's Sleuth here:
pachterlab/sleuth#58

@fruce-ki
Copy link
Collaborator

fruce-ki commented Nov 3, 2017

RATs does not process or interpret the IDs in any way. Any string is used 'as is'. As such, the IDs in your annotation must match exactly those in the quantification files. It is your responsibility to ensure that the same IDs are used across all the analysis steps.
Notice the section about Annotation Discrepancies in the input vignette. RATs will use the provided annotation as its guide. Any IDs in the annotation, not matched exactly in the quantifications will be assumed to have 0 expression. Any IDs in your quantifications that do not match the annotation will be ignored completely.

@fruce-ki
Copy link
Collaborator

fruce-ki commented Nov 3, 2017

Yes it is the same "problem" as the one reported for sleuth.
From my perspective this is a user error, not a program error. I consider as a liability any code "magic" that assumes a certain ID format and changes the provided IDs to conform to that presumed format. I don't think a program should be taking such initiative, because if the presumption is wrong, then the result will be worthless and the error may go unnoticed. I want RATs to work with any format of ID, including non-official formats, so automatically messing with the provided IDs is not a good idea.

@fruce-ki fruce-ki added the question User query about usage. label Nov 3, 2017
@fruce-ki
Copy link
Collaborator

fruce-ki commented Nov 3, 2017

If however, you did use the same annotation, but Kallisto chopped off the version numbers, thus creating the mismatch in the IDs, then I may need to consider adding some optional ID "magic", as it is not really a user error if a third party program edits the IDs.

It wasn't clear from your question, what form of IDs are in your annotation and what form are in your quantifications and whether the same annotation file was used for quantification and DTU.

@fruce-ki
Copy link
Collaborator

fruce-ki commented Nov 8, 2017

Hi!
Do you have anything to add to this issue? Did you resolve the problem?

Thanks!
Kimon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question User query about usage.
Projects
None yet
Development

No branches or pull requests

2 participants