Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tximport fails for experiments where samples were processed with different versions of Salmon #1496

Closed
arielsvn opened this issue Aug 15, 2019 · 1 comment · Fixed by #1513
Assignees

Comments

@arielsvn
Copy link
Contributor

Context

We got some Tximport failing with the following error message:

data_refinery=> select id, failure_reason from processor_jobs where pipeline_applied='TXIMPORT' and created_at > '2019-08-12 00:00:00'::timestamp and success='f' and failure_reason is not null order by end_time desc limit 1 OFFSET 4;
   id    |                                                         failure_reason
---------+---------------------------------------------------------------------------------------------------------------------------------
 3374939 | Found non-zero exit code from R code while running tximport.R: Read 1658 items                                                 +
         | Parsed with column specification:                                                                                              +
         | cols(                                                                                                                          +
         |   gene_id = col_character(),                                                                                                   +
         |   tx_name = col_character()                                                                                                    +
         | )                                                                                                                              +
         | reading in files with read_tsv                                                                                                 +
         | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Error: all(txId == raw[[txIdCol]]) is not TRUE+
         | In addition: Warning message:                                                                                                  +
         | In txId == raw[[txIdCol]] :                                                                                                    +
         |   longer object length is not a multiple of shorter object length                                                              +
         | Execution halted
(1 row)

Problem or idea

This was caused because we used different versions of salmon to process the samples in some experiments, which generates different quant files.

Solution or next step

Talked with @kurtwheeler, we want to re-run the latest version of Salmon on the unprocessed experiments and then re-run tximport.

@arielsvn
Copy link
Contributor Author

arielsvn commented Aug 15, 2019

Seems like this is affecting 659 experiments, with 58263 samples in total.

Out of those, only 2373 samples have already been processed with some version of salmon.
These seem to be experiments where tximport worked, and later we ran salmon on other samples from there. 🌮 @kurtwheeler

queries for reference
data_refinery=> select count(samples.id)
from experiments as E 
inner join experiment_sample_associations on E.id=experiment_sample_associations.experiment_id
inner join samples on samples.id=experiment_sample_associations.sample_id
where (
    select count(distinct(organism_index.salmon_version))
    from samples
      inner join sample_result_associations on samples.id=sample_result_associations.sample_id
      inner join computational_results on sample_result_associations.result_id=computational_results.id
      inner join organism_index on computational_results.organism_index_id=organism_index.id
    where
      samples.id in (select sample_id from experiment_sample_associations where experiment_id=E.id)
  ) > 1;
 count 
-------
 58263
(1 row)

data_refinery=> select count(samples.id)
from experiments as E 
inner join experiment_sample_associations on E.id=experiment_sample_associations.experiment_id
inner join samples on samples.id=experiment_sample_associations.sample_id
where (
    select count(distinct(organism_index.salmon_version))
    from samples                                                                                       
      inner join sample_result_associations on samples.id=sample_result_associations.sample_id
      inner join computational_results on sample_result_associations.result_id=computational_results.id
      inner join organism_index on computational_results.organism_index_id=organism_index.id
    where
      samples.id in (select sample_id from experiment_sample_associations where experiment_id=E.id)
  ) > 1 
  and samples.is_processed='t';
 count 
-------
  2373
(1 row)

data_refinery=> select count(*)         
from experiments as E 
where (                                                                                       
    select count(distinct(organism_index.salmon_version))
    from samples
      inner join sample_result_associations on samples.id=sample_result_associations.sample_id
      inner join computational_results on sample_result_associations.result_id=computational_results.id
      inner join organism_index on computational_results.organism_index_id=organism_index.id
    where                                                                                              
      samples.id in (select sample_id from experiment_sample_associations where experiment_id=E.id)
  ) > 1;
 count 
-------
   659
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant