Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automating dammit over many assemblies, problem with dependencies #34

Closed
johnsolk opened this issue Dec 6, 2015 · 3 comments
Closed

Comments

@johnsolk
Copy link
Member

johnsolk commented Dec 6, 2015

I wrote a script to run dammit separately for many assemblies. The script writes and runs a dammitfile for each command. Example contents of dammitfile:

dammit annotate /mnt/mmetsp/Micromonas_pusilla/SRR1300457/trinity/trinity_out/Trinity.fasta \
--busco-group eukaryota --database-dir /mnt/dammit_databases --n_threads 8

But when the automated script runs with subprocess.Popen("sudo bash"+dammitfile), there is an error that some but not all of the dependencies (TransDecoder, LAST, BUSCO) are not installed (below). I can manually run the same command above and it works fine with no problems. Is there something I can do to fix why the subprocess is not finding the dependencies?

File written: /mnt/mmetsp/Erythrolobus_madagascarensis/SRR1300444/dammit_dir/SRR1300444.dammit.sh

========================================

dammit! a tool for easy de novo transcriptome annotation

Camille Scott 2015

========================================



submodule: annotate




--- Checking PATH for dependencies



          [ ] TransDecoder



          [ ] LAST



          [x] HMMER



          [x] Infernal



          [x] crb-blast



          [x] BLAST+



          [ ] BUSCO





--- Dependency results



          TransDecoder, LAST, BUSCO missing



Install dependencies to continue; exiting[DependencyHandler:ERROR]
@camillescott
Copy link
Member

Two possibilities. First, you should probably omit the sudo -- dammit doesn't need administrator privileges to run, and it changes the $PATH variable. Second, try calling popen with shell=True, which should export your environment variables. BUSCO, TransDecoder, and LAST were installed manually and the exports are in your .bashrc, so without that being sources (ie, shell=True), they aren't being found.

The relevant docs for popen are here: https://docs.python.org/2/library/subprocess.html#popen-constructor

Lemme know if that helps!

@johnsolk
Copy link
Member Author

johnsolk commented Dec 6, 2015

Removing sudo worked, running now. Thank you! I was originally using shell=True, just forgot to put that in my question.

If you stop then restart again, will the pipeline pick up form where it left off? It's running, but at first there were a few misc errors about not finding busco and tblastn results (below). Should I just wait for it to finish to see how it worked?

New DB title:  Trinity.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 16048 sequences in 0.548864 seconds.
BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
    "/build/buildd/ncbi-blast+-2.2.28/c++/src/corelib/ncbifile.cpp", line 4703: Error: ncbi::CMemoryFileMap::CMemoryFileMap() - To be memory mapped the file must exist: /mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin

eukaryota
*** Running tBlastN ***
*** Getting coordinates for candidate transcripts! ***
Traceback (most recent call last):
  File "/home/ubuntu/BUSCO_v1.1b1/BUSCO_v1.1b1.py", line 347, in <module>
    f=open('%s_tblastn' % args['abrev'])        #open input file
FileNotFoundError: [Errno 2] No such file or directory: 'Trinity.fasta.busco.results_tblastn'
          [ ] TransDecoder.LongOrfs:Trinity.fasta



CMD: /home/ubuntu/TransDecoder-2.0.1/util/compute_base_probs.pl Trinity.fasta 0 > Trinity.fasta.transdecoder_dir/base_freqs.dat
-first extracting base frequencies, we'll need them later.
CMD: touch Trinity.fasta.transdecoder_dir/base_freqs.dat.ok


- extracting ORFs from transcripts.
-total transcripts to examine: 16048
[16000/16048] = 99.70% done

#################################
### Done preparing long ORFs.  ###
##################################

        Use file: Trinity.fasta.transdecoder_dir/longest_orfs.pep  for Pfam and/or BlastP searches to enable homology-based coding region identification.

        Then, run TransDecoder.Predict for your final coding region predictions.


          [ ] hmmscan:longest_orfs.pep.x.Pfam-A.hmm

@camillescott
Copy link
Member

It should resume without issues -- if it doesn't, please let me know :)

On Sun, Dec 6, 2015 at 3:41 PM, ljcohen notifications@github.com wrote:

Removing sudo worked, running now. Thank you! I was originally using
shell=True, just forgot to put that in my question.

If you stop then restart again, will the pipeline pick up form where it
left off? It's running, but at first there were a few misc errors about not
finding busco and tblastn results (below). Should I just wait for it to
finish to see how it worked?

New DB title: Trinity.fasta
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 16048 sequences in 0.548864 seconds.
BLAST Database error: CSeqDBAtlas::MapMmap: While mapping file [/mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin] with 0 bytes allocated, caught exception:
NCBI C++ Exception:
"/build/buildd/ncbi-blast+-2.2.28/c++/src/corelib/ncbifile.cpp", line 4703: Error: ncbi::CMemoryFileMap::CMemoryFileMap() - To be memory mapped the file must exist: /mnt/mmetsp/Micromonas_pusilla/SRR1300455/dammit_dir/Trinity.fasta.dammit/Trinity.fasta.busco.results.nin

eukaryota
*** Running tBlastN ***
*** Getting coordinates for candidate transcripts! ***
Traceback (most recent call last):
File "/home/ubuntu/BUSCO_v1.1b1/BUSCO_v1.1b1.py", line 347, in
f=open('%s_tblastn' % args['abrev']) #open input file
FileNotFoundError: [Errno 2] No such file or directory: 'Trinity.fasta.busco.results_tblastn'
[ ] TransDecoder.LongOrfs:Trinity.fasta

CMD: /home/ubuntu/TransDecoder-2.0.1/util/compute_base_probs.pl Trinity.fasta 0 > Trinity.fasta.transdecoder_dir/base_freqs.dat
-first extracting base frequencies, we'll need them later.
CMD: touch Trinity.fasta.transdecoder_dir/base_freqs.dat.ok

  • extracting ORFs from transcripts.
    -total transcripts to examine: 16048
    [16000/16048] = 99.70% done

#################################

Done preparing long ORFs.

##################################

    Use file: Trinity.fasta.transdecoder_dir/longest_orfs.pep  for Pfam and/or BlastP searches to enable homology-based coding region identification.

    Then, run TransDecoder.Predict for your final coding region predictions.


      [ ] hmmscan:longest_orfs.pep.x.Pfam-A.hmm


Reply to this email directly or view it on GitHub
#34 (comment).

Camille Scott

Department of Computer Science
Lab for Data Intensive Biology
University of California, Davis

camille.scott.w@gmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants