automating dammit over many assemblies, problem with dependencies #34

johnsolk · 2015-12-06T20:31:36Z

I wrote a script to run dammit separately for many assemblies. The script writes and runs a dammitfile for each command. Example contents of dammitfile:

dammit annotate /mnt/mmetsp/Micromonas_pusilla/SRR1300457/trinity/trinity_out/Trinity.fasta \
--busco-group eukaryota --database-dir /mnt/dammit_databases --n_threads 8

But when the automated script runs with subprocess.Popen("sudo bash"+dammitfile), there is an error that some but not all of the dependencies (TransDecoder, LAST, BUSCO) are not installed (below). I can manually run the same command above and it works fine with no problems. Is there something I can do to fix why the subprocess is not finding the dependencies?

File written: /mnt/mmetsp/Erythrolobus_madagascarensis/SRR1300444/dammit_dir/SRR1300444.dammit.sh

========================================

dammit! a tool for easy de novo transcriptome annotation

Camille Scott 2015

========================================

submodule: annotate




--- Checking PATH for dependencies



          [ ] TransDecoder



          [ ] LAST



          [x] HMMER



          [x] Infernal



          [x] crb-blast



          [x] BLAST+



          [ ] BUSCO





--- Dependency results



          TransDecoder, LAST, BUSCO missing



Install dependencies to continue; exiting[DependencyHandler:ERROR]

The text was updated successfully, but these errors were encountered:

camillescott · 2015-12-06T22:18:30Z

Two possibilities. First, you should probably omit the sudo -- dammit doesn't need administrator privileges to run, and it changes the $PATH variable. Second, try calling popen with shell=True, which should export your environment variables. BUSCO, TransDecoder, and LAST were installed manually and the exports are in your .bashrc, so without that being sources (ie, shell=True), they aren't being found.

The relevant docs for popen are here: https://docs.python.org/2/library/subprocess.html#popen-constructor

Lemme know if that helps!

johnsolk · 2015-12-06T23:41:06Z

Removing sudo worked, running now. Thank you! I was originally using shell=True, just forgot to put that in my question.

If you stop then restart again, will the pipeline pick up form where it left off? It's running, but at first there were a few misc errors about not finding busco and tblastn results (below). Should I just wait for it to finish to see how it worked?

New DB title:  Trinity.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 16048 sequences in 0.548864 seconds.
BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
    "/build/buildd/ncbi-blast+-2.2.28/c++/src/corelib/ncbifile.cpp", line 4703: Error: ncbi::CMemoryFileMap::CMemoryFileMap() - To be memory mapped the file must exist: /mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin

eukaryota
*** Running tBlastN ***
*** Getting coordinates for candidate transcripts! ***
Traceback (most recent call last):
  File "/home/ubuntu/BUSCO_v1.1b1/BUSCO_v1.1b1.py", line 347, in <module>
    f=open('%s_tblastn' % args['abrev'])        #open input file
FileNotFoundError: [Errno 2] No such file or directory: 'Trinity.fasta.busco.results_tblastn'
          [ ] TransDecoder.LongOrfs:Trinity.fasta



CMD: /home/ubuntu/TransDecoder-2.0.1/util/compute_base_probs.pl Trinity.fasta 0 > Trinity.fasta.transdecoder_dir/base_freqs.dat
-first extracting base frequencies, we'll need them later.
CMD: touch Trinity.fasta.transdecoder_dir/base_freqs.dat.ok


- extracting ORFs from transcripts.
-total transcripts to examine: 16048
[16000/16048] = 99.70% done

#################################
### Done preparing long ORFs.  ###
##################################

        Use file: Trinity.fasta.transdecoder_dir/longest_orfs.pep  for Pfam and/or BlastP searches to enable homology-based coding region identification.

        Then, run TransDecoder.Predict for your final coding region predictions.


          [ ] hmmscan:longest_orfs.pep.x.Pfam-A.hmm

camillescott · 2015-12-07T17:17:45Z

It should resume without issues -- if it doesn't, please let me know :)

On Sun, Dec 6, 2015 at 3:41 PM, ljcohen notifications@github.com wrote:

Removing sudo worked, running now. Thank you! I was originally using
shell=True, just forgot to put that in my question.

If you stop then restart again, will the pipeline pick up form where it
left off? It's running, but at first there were a few misc errors about not
finding busco and tblastn results (below). Should I just wait for it to
finish to see how it worked?

New DB title: Trinity.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 16048 sequences in 0.548864 seconds.
BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
"/build/buildd/ncbi-blast+-2.2.28/c++/src/corelib/ncbifile.cpp", line 4703: Error: ncbi::CMemoryFileMap::CMemoryFileMap() - To be memory mapped the file must exist: /mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin

eukaryota
*** Running tBlastN ***
*** Getting coordinates for candidate transcripts! ***
Traceback (most recent call last):
File "/home/ubuntu/BUSCO_v1.1b1/BUSCO_v1.1b1.py", line 347, in
f=open('%s_tblastn' % args['abrev']) #open input file
FileNotFoundError: [Errno 2] No such file or directory: 'Trinity.fasta.busco.results_tblastn'
[ ] TransDecoder.LongOrfs:Trinity.fasta

CMD: /home/ubuntu/TransDecoder-2.0.1/util/compute_base_probs.pl Trinity.fasta 0 > Trinity.fasta.transdecoder_dir/base_freqs.dat
-first extracting base frequencies, we'll need them later.
CMD: touch Trinity.fasta.transdecoder_dir/base_freqs.dat.ok

extracting ORFs from transcripts.
-total transcripts to examine: 16048
[16000/16048] = 99.70% done

#################################

Done preparing long ORFs.

##################################
    Use file: Trinity.fasta.transdecoder_dir/longest_orfs.pep  for Pfam and/or BlastP searches to enable homology-based coding region identification.

    Then, run TransDecoder.Predict for your final coding region predictions.


      [ ] hmmscan:longest_orfs.pep.x.Pfam-A.hmm
—
Reply to this email directly or view it on GitHub
#34 (comment).

Camille Scott

Department of Computer Science
Lab for Data Intensive Biology
University of California, Davis

camille.scott.w@gmail.com

camillescott added the enhancement label Dec 10, 2015

camillescott closed this as completed Oct 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automating dammit over many assemblies, problem with dependencies #34

automating dammit over many assemblies, problem with dependencies #34

johnsolk commented Dec 6, 2015

camillescott commented Dec 6, 2015

johnsolk commented Dec 6, 2015

camillescott commented Dec 7, 2015

Done preparing long ORFs.

automating dammit over many assemblies, problem with dependencies #34

automating dammit over many assemblies, problem with dependencies #34

Comments

johnsolk commented Dec 6, 2015

camillescott commented Dec 6, 2015

johnsolk commented Dec 6, 2015

camillescott commented Dec 7, 2015

Done preparing long ORFs.