Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

removes dependence on cogent's DNA, LoadSeqs, Alignment, DenseAlignment #1497

Merged
merged 67 commits into from
Apr 11, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
cdde0ec
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
92da297
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
d11c1b6
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
f0281bb
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
5a6575d
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
0cdbc7d
Merge branch 'master' of github.com:qiime/qiime into remove-cogent-DNA
gregcaporaso Mar 12, 2014
0ebee85
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
fbec872
modified to use bipy.core.sequence.DNA
gregcaporaso Mar 12, 2014
918d769
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
1f8c280
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
a1414dc
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
869ae26
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
7e2881f
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
0d76a72
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
13667fb
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
e4464f7
removed dep on and refs to cogent.DNA
gregcaporaso Mar 13, 2014
bc395e2
removed pycogent deps from pynast aligner class
gregcaporaso Mar 14, 2014
6abd439
remove cogent deps
gregcaporaso Mar 14, 2014
cbbd6ef
more clean-ups
gregcaporaso Mar 14, 2014
0e005cd
removed unused import
gregcaporaso Mar 14, 2014
09ebf5d
bug fix
gregcaporaso Mar 14, 2014
38cf234
Merge branch 'master' into remove-cogent-DNA
gregcaporaso Apr 9, 2014
e0b0ef2
fixed bipy references
gregcaporaso Apr 9, 2014
2a74967
removed unresolved conflict
gregcaporaso Apr 9, 2014
0cffae5
Merge branch 'master' of github.com:qiime/qiime into remove-cogent-DNA
gregcaporaso Apr 9, 2014
453ac49
removed cogent LoadSeqs dependency
gregcaporaso Apr 9, 2014
a8d03ba
removed LoadSeqs dependency
gregcaporaso Apr 9, 2014
503e56f
removed LoadSeqs dependency
gregcaporaso Apr 9, 2014
7be747d
moved cogent Alignment dependency
gregcaporaso Apr 9, 2014
c822a7d
removed DenseAlignment dependencies - this code is pretty ugly
gregcaporaso Apr 9, 2014
1e6fd16
removed outdated import
gregcaporaso Apr 9, 2014
c8c3ca6
fixed test failures
gregcaporaso Apr 9, 2014
ab5c144
removed LoadSeqs dependency
gregcaporaso Apr 9, 2014
df5801a
removed unused imports
gregcaporaso Apr 9, 2014
e86f5ab
removed LoadSeqs dependency
gregcaporaso Apr 9, 2014
bc2f968
removed unused import
gregcaporaso Apr 9, 2014
f0b1161
removed some cogent deps
gregcaporaso Apr 9, 2014
5101383
removed cogent dependency
gregcaporaso Apr 9, 2014
fbbbe3e
removed cogent dep
gregcaporaso Apr 9, 2014
1d4ca74
removed cogent deps
gregcaporaso Apr 9, 2014
456aecd
removed cogent deps
gregcaporaso Apr 9, 2014
370f893
removed cogent deps
gregcaporaso Apr 9, 2014
c61a4a2
removed cogent deps
gregcaporaso Apr 9, 2014
8b2df4e
removed cogent deps
gregcaporaso Apr 9, 2014
c5ca2bc
trying to remove cogent dep
gregcaporaso Apr 9, 2014
9428451
removed cogent dep
gregcaporaso Apr 9, 2014
31dd4ce
removed cogent dep
gregcaporaso Apr 9, 2014
3b4cbe9
removed LoadSeqs dep
gregcaporaso Apr 9, 2014
00d1093
removed cogent dep
gregcaporaso Apr 9, 2014
8ed1e81
attempting to fix test failures
gregcaporaso Apr 9, 2014
123d41e
cleaned up failing tests
gregcaporaso Apr 10, 2014
3d4bd51
removed files that were accidentally added back
gregcaporaso Apr 10, 2014
5860eda
removed line that was accidentally dropped
gregcaporaso Apr 10, 2014
3561d9b
items -> iteritems
gregcaporaso Apr 10, 2014
75837fb
addressed comment from @wasade
gregcaporaso Apr 10, 2014
b2049ff
addressed comment from @wasade
gregcaporaso Apr 10, 2014
941ad00
addressed spacing issue
gregcaporaso Apr 10, 2014
7d18735
removed unused import
gregcaporaso Apr 10, 2014
6805809
removed unused import
gregcaporaso Apr 10, 2014
47af8e3
removed insert_seqs_into_tree.py and references to external apps that…
gregcaporaso Apr 10, 2014
2643343
addressed @ElDeveloper's unrelated comment
gregcaporaso Apr 10, 2014
b76fd0c
updated to ensure that files are closed
gregcaporaso Apr 10, 2014
b4c1015
updated to ensure that files are closed
gregcaporaso Apr 10, 2014
303d70f
updated to ensure that files are closed
gregcaporaso Apr 10, 2014
5044514
updated to ensure that files are closed
gregcaporaso Apr 10, 2014
f60452e
updated to ensure that files are closed
gregcaporaso Apr 10, 2014
4c24f84
updated to ensure that files are closed
gregcaporaso Apr 10, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 26 additions & 25 deletions ChangeLog.md

Large diffs are not rendered by default.

12 changes: 5 additions & 7 deletions doc/install/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ As a consequence of this 'pipeline' architecture, **QIIME has a lot of dependenc
How to not install QIIME
========================

Because QIIME is hard to install, we have attempted to shift this burden to the QIIME development group rather than our users by providing virtual machines with QIIME and all of its dependencies pre-installed. We, and third-party developers, have also created several automated installation procedures. These alternatives (`summarized here <../index.html#downloading-and-installing-qiime>`_) allow you to bypass the complex installation procedure and have access to a full, working QIIME installation.
Because QIIME is hard to install, we have attempted to shift this burden to the QIIME development group rather than our users by providing virtual machines with QIIME and all of its dependencies pre-installed. We, and third-party developers, have also created several automated installation procedures. These alternatives (`summarized here <../index.html#downloading-and-installing-qiime>`_) allow you to bypass the complex installation procedure and have access to a full, working QIIME installation.

**We highly recommend going with one of these solutions if you're new to QIIME, or just want to test it out to see if it will do what you want.**

Expand Down Expand Up @@ -91,7 +91,7 @@ The next are python packages not included in Canopy Express. Each of these can b
* pyqi 0.3.1 (`src_pyqi <https://pypi.python.org/packages/source/p/pyqi/pyqi-0.3.1.tar.gz>`_) (license: BSD)
* scikit-bio (latest development version) (`src_skbio <https://github.com/biocore/scikit-bio>`_) (license: BSD)

Next, there are two non-python dependencies required for the QIIME base package. These should be installed by following their respective install instructions.
Next, there are two non-python dependencies required for the QIIME base package. These should be installed by following their respective install instructions.

* uclust 1.2.22q (`src_uclust <http://www.drive5.com/uclust/downloads1_2_22q.html>`_) See :ref:`uclust install notes <uclust-install>`. (licensed specially for Qiime and PyNAST users)
* fasttree 2.1.3 (`src_fasttree <http://www.microbesonline.org/fasttree/FastTree-2.1.3.c>`_) See `FastTree install instructions <http://www.microbesonline.org/fasttree/#Install>`_ (license: GPL)
Expand Down Expand Up @@ -154,17 +154,17 @@ You should see output that looks like the following::
................
----------------------------------------------------------------------
Ran 16 tests in 0.440s

OK

This indicates that you have a complete QIIME base install.
This indicates that you have a complete QIIME base install.

You should next :ref:`run QIIME's unit tests <run-test-suite>`. You will experience some test failures as a result of not having a full QIIME install. If you have questions about these failures, you should post to the `QIIME Forum <http://forum.qiime.org>`_.

QIIME full install (for access to advanced features in QIIME, and non-default processing pipelines)
---------------------------------------------------------------------------------------------------

The dependencies described below will support a full QIIME install. These are grouped by the features that each dependency will provide access to. Installation instructions should be followed for each individual package (e.g., from the project's website or README/INSTALL file).
The dependencies described below will support a full QIIME install. These are grouped by the features that each dependency will provide access to. Installation instructions should be followed for each individual package (e.g., from the project's website or README/INSTALL file).

Alignment, tree-building, taxonomy assignment, OTU picking, and other data generation steps (required for non-default processing pipelines):

Expand All @@ -181,8 +181,6 @@ Alignment, tree-building, taxonomy assignment, OTU picking, and other data gener
* cdbtools (`src_cdbtools <ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/cdbfasta/cdbfasta.tar.gz>`_)
* muscle 3.8.31 (`src_muscle <http://www.drive5.com/muscle/downloads.htm>`_) (Public domain)
* rtax 0.984 (`src_rtax <http://static.davidsoergel.com/rtax-0.984.tgz>`_) (license: BSD)
* pplacer 1.1 (`src_pplacer <http://matsen.fhcrc.org/pplacer/builds/pplacer-v1.1-Linux.tar.gz>`_) (license: GPL)
* ParsInsert 1.04 (`src_parsinsert <http://downloads.sourceforge.net/project/parsinsert/ParsInsert.1.04.tgz>`_) (license: GPL)
* usearch v5.2.236 and/or usearch v6.1 (`src_usearch <http://www.drive5.com/usearch/>`_) (license: see http://www.drive5.com/usearch/nonprofit_form.html) **At this stage two different versions of usearch are supported.** usearch v5.2.236 is referred to as ``usearch`` in QIIME, and usearch v6.1 is referred to as ``usearch61``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these dependencies also be removed from qiime-deploy? If so, can you please create issues on the qiime-deploy-conf tracker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: #109


Processing sff files:
Expand Down
78 changes: 0 additions & 78 deletions doc/scripts/insert_seqs_into_tree.rst

This file was deleted.

4 changes: 2 additions & 2 deletions qiime/adjust_seq_orientation.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

from os.path import split, splitext
from skbio.parse.sequences import parse_fasta
from cogent import DNA
from skbio.core.sequence import DNA

usage_str = """usage: %prog [options] {-i INPUT_FASTA_FP}

Expand Down Expand Up @@ -42,7 +42,7 @@ def rc_fasta_lines(fasta_lines, seq_desc_mapper=append_rc):
"""
for seq_id, seq in parse_fasta(fasta_lines):
seq_id = seq_desc_mapper(seq_id)
seq = DNA.rc(seq.upper())
seq = str(DNA(seq.upper()).rc())
yield seq_id, seq
return

Expand Down
68 changes: 35 additions & 33 deletions qiime/align_seqs.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,24 @@
from os import remove
from numpy import median

from cogent import LoadSeqs, DNA
from cogent.core.alignment import DenseAlignment, SequenceCollection, Alignment
from cogent.core.sequence import DnaSequence as Dna
from cogent.parse.rfam import MinimalRfamParser, ChangedSequence

import brokit
from brokit.infernal import cmalign_from_alignment
import brokit.clustalw
import brokit.muscle_v38
import brokit.mafft

from cogent import DNA as DNA_cogent
from cogent.parse.rfam import MinimalRfamParser, ChangedSequence
from skbio.app.util import ApplicationNotFoundError
from skbio.core.exception import RecordError
from skbio.parse.sequences import parse_fasta

from qiime.util import (FunctionWithParams,
get_qiime_temp_dir)

from skbio.core.alignment import SequenceCollection, Alignment
from skbio.core.sequence import DNASequence
from skbio.parse.sequences import parse_fasta

# Load PyNAST if it's available. If it's not, skip it if not but set up
# to raise errors if the user tries to use it.
Expand Down Expand Up @@ -115,7 +115,7 @@ def getResult(self, seq_path):
seqs = self.getData(seq_path)
params = dict(
[(k, v) for (k, v) in self.Params.items() if k.startswith('-')])
result = module.align_unaligned_seqs(seqs, moltype=DNA, params=params)
result = module.align_unaligned_seqs(seqs, moltype=DNA_cogent, params=params)
return result

def __call__(self, result_path=None, log_path=None, *args, **kwargs):
Expand All @@ -131,7 +131,7 @@ def __init__(self, params):
"""Return new InfernalAligner object with specified params.
"""
_params = {
'moltype': DNA,
'moltype': DNA_cogent,
'Application': 'Infernal',
}
_params.update(params)
Expand All @@ -156,9 +156,10 @@ def __call__(self, seq_path, result_path=None, log_path=None,
moltype = self.Params['moltype']

# Need to make separate mapping for unaligned sequences
unaligned = SequenceCollection(candidate_sequences, MolType=moltype)
int_map, int_keys = unaligned.getIntMap(prefix='unaligned_')
int_map = SequenceCollection(int_map, MolType=moltype)
unaligned = SequenceCollection.from_fasta_records(
candidate_sequences.iteritems(), DNASequence)
mapped_seqs, new_to_old_ids = unaligned.int_map(prefix='unaligned_')
mapped_seq_tuples = [(k, str(v)) for k,v in mapped_seqs.iteritems()]

# Turn on --gapthresh option in cmbuild to force alignment to full
# model
Expand All @@ -174,7 +175,6 @@ def __call__(self, seq_path, result_path=None, log_path=None,
# are fragments.
# Also turn on --gapthresh to use same gapthresh as was used to build
# model

if cmalign_params is None:
cmalign_params = {}
cmalign_params.update({'--sub': True, '--gapthresh': 1.0})
Expand All @@ -186,20 +186,23 @@ def __call__(self, seq_path, result_path=None, log_path=None,
# Align sequences to alignment including alignment gaps.
aligned, struct_string = cmalign_from_alignment(aln=template_alignment,
structure_string=struct,
seqs=int_map,
seqs=mapped_seq_tuples,
moltype=moltype,
include_aln=True,
params=cmalign_params,
cmbuild_params=cmbuild_params)

# Pull out original sequences from full alignment.
infernal_aligned = {}
infernal_aligned = []
# Get a dict of the identifiers to sequences (note that this is a
# cogent alignment object, hence the call to NamedSeqs)
aligned_dict = aligned.NamedSeqs
for key in int_map.Names:
infernal_aligned[int_keys.get(key, key)] = aligned_dict[key]
for n, o in new_to_old_ids.iteritems():
aligned_seq = aligned_dict[n]
infernal_aligned.append((o, aligned_seq))

# Create an Alignment object from alignment dict
infernal_aligned = Alignment(infernal_aligned, MolType=moltype)
infernal_aligned = Alignment.from_fasta_records(infernal_aligned, DNASequence)

if log_path is not None:
log_file = open(log_path, 'w')
Expand All @@ -208,7 +211,7 @@ def __call__(self, seq_path, result_path=None, log_path=None,

if result_path is not None:
result_file = open(result_path, 'w')
result_file.write(infernal_aligned.toFasta())
result_file.write(infernal_aligned.to_fasta())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be outside the scope of this pull request:

Could to_fasta be implemented as a generator that yields fasta records (strings)? This call is effectively reproducing the full alignment in memory (in fasta format) and then writing to file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good suggestion, but outside the scope of this PR. Could you add to skbio's #194?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result_file.close()
return None
else:
Expand Down Expand Up @@ -248,12 +251,8 @@ def __call__(self, seq_path, result_path=None, log_path=None,
for seq_id, seq in parse_fasta(open(template_alignment_fp)):
# replace '.' characters with '-' characters
template_alignment.append((seq_id, seq.replace('.', '-').upper()))
try:
template_alignment = LoadSeqs(data=template_alignment, moltype=DNA,
aligned=DenseAlignment)
except KeyError as e:
raise KeyError('Only ACGT-. characters can be contained in template alignments.' +
' The offending character was: %s' % e)
template_alignment = Alignment.from_fasta_records(
template_alignment, DNASequence, validate=True)

# initialize_logger
logger = NastLogger(log_path)
Expand All @@ -273,25 +272,28 @@ def __call__(self, seq_path, result_path=None, log_path=None,

logger.record(str(self))

for i, seq in enumerate(pynast_failed):
skb_seq = DNASequence(str(seq), identifier=seq.Name)
pynast_failed[i] = skb_seq
pynast_failed = SequenceCollection(pynast_failed)

for i, seq in enumerate(pynast_aligned):
skb_seq = DNASequence(str(seq), identifier=seq.Name)
pynast_aligned[i] = skb_seq
pynast_aligned = Alignment(pynast_aligned)

if failure_path is not None:
fail_file = open(failure_path, 'w')
for seq in pynast_failed:
fail_file.write(seq.toFasta())
fail_file.write('\n')
fail_file.write(pynast_failed.to_fasta())
fail_file.close()

if result_path is not None:
result_file = open(result_path, 'w')
for seq in pynast_aligned:
result_file.write(seq.toFasta())
result_file.write('\n')
result_file.write(pynast_aligned.to_fasta())
result_file.close()
return None
else:
try:
return LoadSeqs(data=pynast_aligned, aligned=DenseAlignment)
except ValueError:
return {}
return pynast_aligned


def compute_min_alignment_length(seqs_f, fraction=0.75):
Expand Down
2 changes: 0 additions & 2 deletions qiime/assign_taxonomy.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@
from cStringIO import StringIO
from collections import Counter, defaultdict

from cogent import LoadSeqs, DNA

from skbio.app.util import ApplicationNotFoundError
from skbio.parse.sequences import parse_fasta

Expand Down
Loading