Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

Commit

Permalink
Merge pull request #1497 from gregcaporaso/remove-cogent-DNA
Browse files Browse the repository at this point in the history
removes dependence on cogent's DNA, LoadSeqs, Alignment, DenseAlignment
  • Loading branch information
jairideout committed Apr 11, 2014
2 parents 416bfbd + 4c24f84 commit 99ff358
Show file tree
Hide file tree
Showing 40 changed files with 445 additions and 909 deletions.
51 changes: 26 additions & 25 deletions ChangeLog.md

Large diffs are not rendered by default.

12 changes: 5 additions & 7 deletions doc/install/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ As a consequence of this 'pipeline' architecture, **QIIME has a lot of dependenc
How to not install QIIME
========================

Because QIIME is hard to install, we have attempted to shift this burden to the QIIME development group rather than our users by providing virtual machines with QIIME and all of its dependencies pre-installed. We, and third-party developers, have also created several automated installation procedures. These alternatives (`summarized here <../index.html#downloading-and-installing-qiime>`_) allow you to bypass the complex installation procedure and have access to a full, working QIIME installation.
Because QIIME is hard to install, we have attempted to shift this burden to the QIIME development group rather than our users by providing virtual machines with QIIME and all of its dependencies pre-installed. We, and third-party developers, have also created several automated installation procedures. These alternatives (`summarized here <../index.html#downloading-and-installing-qiime>`_) allow you to bypass the complex installation procedure and have access to a full, working QIIME installation.

**We highly recommend going with one of these solutions if you're new to QIIME, or just want to test it out to see if it will do what you want.**

Expand Down Expand Up @@ -91,7 +91,7 @@ The next are python packages not included in Canopy Express. Each of these can b
* pyqi 0.3.1 (`src_pyqi <https://pypi.python.org/packages/source/p/pyqi/pyqi-0.3.1.tar.gz>`_) (license: BSD)
* scikit-bio (latest development version) (`src_skbio <https://github.com/biocore/scikit-bio>`_) (license: BSD)

Next, there are two non-python dependencies required for the QIIME base package. These should be installed by following their respective install instructions.
Next, there are two non-python dependencies required for the QIIME base package. These should be installed by following their respective install instructions.

* uclust 1.2.22q (`src_uclust <http://www.drive5.com/uclust/downloads1_2_22q.html>`_) See :ref:`uclust install notes <uclust-install>`. (licensed specially for Qiime and PyNAST users)
* fasttree 2.1.3 (`src_fasttree <http://www.microbesonline.org/fasttree/FastTree-2.1.3.c>`_) See `FastTree install instructions <http://www.microbesonline.org/fasttree/#Install>`_ (license: GPL)
Expand Down Expand Up @@ -154,17 +154,17 @@ You should see output that looks like the following::
................
----------------------------------------------------------------------
Ran 16 tests in 0.440s

OK

This indicates that you have a complete QIIME base install.
This indicates that you have a complete QIIME base install.

You should next :ref:`run QIIME's unit tests <run-test-suite>`. You will experience some test failures as a result of not having a full QIIME install. If you have questions about these failures, you should post to the `QIIME Forum <http://forum.qiime.org>`_.

QIIME full install (for access to advanced features in QIIME, and non-default processing pipelines)
---------------------------------------------------------------------------------------------------

The dependencies described below will support a full QIIME install. These are grouped by the features that each dependency will provide access to. Installation instructions should be followed for each individual package (e.g., from the project's website or README/INSTALL file).
The dependencies described below will support a full QIIME install. These are grouped by the features that each dependency will provide access to. Installation instructions should be followed for each individual package (e.g., from the project's website or README/INSTALL file).

Alignment, tree-building, taxonomy assignment, OTU picking, and other data generation steps (required for non-default processing pipelines):

Expand All @@ -181,8 +181,6 @@ Alignment, tree-building, taxonomy assignment, OTU picking, and other data gener
* cdbtools (`src_cdbtools <ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/cdbfasta/cdbfasta.tar.gz>`_)
* muscle 3.8.31 (`src_muscle <http://www.drive5.com/muscle/downloads.htm>`_) (Public domain)
* rtax 0.984 (`src_rtax <http://static.davidsoergel.com/rtax-0.984.tgz>`_) (license: BSD)
* pplacer 1.1 (`src_pplacer <http://matsen.fhcrc.org/pplacer/builds/pplacer-v1.1-Linux.tar.gz>`_) (license: GPL)
* ParsInsert 1.04 (`src_parsinsert <http://downloads.sourceforge.net/project/parsinsert/ParsInsert.1.04.tgz>`_) (license: GPL)
* usearch v5.2.236 and/or usearch v6.1 (`src_usearch <http://www.drive5.com/usearch/>`_) (license: see http://www.drive5.com/usearch/nonprofit_form.html) **At this stage two different versions of usearch are supported.** usearch v5.2.236 is referred to as ``usearch`` in QIIME, and usearch v6.1 is referred to as ``usearch61``.

Processing sff files:
Expand Down
78 changes: 0 additions & 78 deletions doc/scripts/insert_seqs_into_tree.rst

This file was deleted.

4 changes: 2 additions & 2 deletions qiime/adjust_seq_orientation.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

from os.path import split, splitext
from skbio.parse.sequences import parse_fasta
from cogent import DNA
from skbio.core.sequence import DNA

usage_str = """usage: %prog [options] {-i INPUT_FASTA_FP}
Expand Down Expand Up @@ -42,7 +42,7 @@ def rc_fasta_lines(fasta_lines, seq_desc_mapper=append_rc):
"""
for seq_id, seq in parse_fasta(fasta_lines):
seq_id = seq_desc_mapper(seq_id)
seq = DNA.rc(seq.upper())
seq = str(DNA(seq.upper()).rc())
yield seq_id, seq
return

Expand Down
68 changes: 35 additions & 33 deletions qiime/align_seqs.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,24 @@
from os import remove
from numpy import median

from cogent import LoadSeqs, DNA
from cogent.core.alignment import DenseAlignment, SequenceCollection, Alignment
from cogent.core.sequence import DnaSequence as Dna
from cogent.parse.rfam import MinimalRfamParser, ChangedSequence

import brokit
from brokit.infernal import cmalign_from_alignment
import brokit.clustalw
import brokit.muscle_v38
import brokit.mafft

from cogent import DNA as DNA_cogent
from cogent.parse.rfam import MinimalRfamParser, ChangedSequence
from skbio.app.util import ApplicationNotFoundError
from skbio.core.exception import RecordError
from skbio.parse.sequences import parse_fasta

from qiime.util import (FunctionWithParams,
get_qiime_temp_dir)

from skbio.core.alignment import SequenceCollection, Alignment
from skbio.core.sequence import DNASequence
from skbio.parse.sequences import parse_fasta

# Load PyNAST if it's available. If it's not, skip it if not but set up
# to raise errors if the user tries to use it.
Expand Down Expand Up @@ -115,7 +115,7 @@ def getResult(self, seq_path):
seqs = self.getData(seq_path)
params = dict(
[(k, v) for (k, v) in self.Params.items() if k.startswith('-')])
result = module.align_unaligned_seqs(seqs, moltype=DNA, params=params)
result = module.align_unaligned_seqs(seqs, moltype=DNA_cogent, params=params)
return result

def __call__(self, result_path=None, log_path=None, *args, **kwargs):
Expand All @@ -131,7 +131,7 @@ def __init__(self, params):
"""Return new InfernalAligner object with specified params.
"""
_params = {
'moltype': DNA,
'moltype': DNA_cogent,
'Application': 'Infernal',
}
_params.update(params)
Expand All @@ -156,9 +156,10 @@ def __call__(self, seq_path, result_path=None, log_path=None,
moltype = self.Params['moltype']

# Need to make separate mapping for unaligned sequences
unaligned = SequenceCollection(candidate_sequences, MolType=moltype)
int_map, int_keys = unaligned.getIntMap(prefix='unaligned_')
int_map = SequenceCollection(int_map, MolType=moltype)
unaligned = SequenceCollection.from_fasta_records(
candidate_sequences.iteritems(), DNASequence)
mapped_seqs, new_to_old_ids = unaligned.int_map(prefix='unaligned_')
mapped_seq_tuples = [(k, str(v)) for k,v in mapped_seqs.iteritems()]

# Turn on --gapthresh option in cmbuild to force alignment to full
# model
Expand All @@ -174,7 +175,6 @@ def __call__(self, seq_path, result_path=None, log_path=None,
# are fragments.
# Also turn on --gapthresh to use same gapthresh as was used to build
# model

if cmalign_params is None:
cmalign_params = {}
cmalign_params.update({'--sub': True, '--gapthresh': 1.0})
Expand All @@ -186,20 +186,23 @@ def __call__(self, seq_path, result_path=None, log_path=None,
# Align sequences to alignment including alignment gaps.
aligned, struct_string = cmalign_from_alignment(aln=template_alignment,
structure_string=struct,
seqs=int_map,
seqs=mapped_seq_tuples,
moltype=moltype,
include_aln=True,
params=cmalign_params,
cmbuild_params=cmbuild_params)

# Pull out original sequences from full alignment.
infernal_aligned = {}
infernal_aligned = []
# Get a dict of the identifiers to sequences (note that this is a
# cogent alignment object, hence the call to NamedSeqs)
aligned_dict = aligned.NamedSeqs
for key in int_map.Names:
infernal_aligned[int_keys.get(key, key)] = aligned_dict[key]
for n, o in new_to_old_ids.iteritems():
aligned_seq = aligned_dict[n]
infernal_aligned.append((o, aligned_seq))

# Create an Alignment object from alignment dict
infernal_aligned = Alignment(infernal_aligned, MolType=moltype)
infernal_aligned = Alignment.from_fasta_records(infernal_aligned, DNASequence)

if log_path is not None:
log_file = open(log_path, 'w')
Expand All @@ -208,7 +211,7 @@ def __call__(self, seq_path, result_path=None, log_path=None,

if result_path is not None:
result_file = open(result_path, 'w')
result_file.write(infernal_aligned.toFasta())
result_file.write(infernal_aligned.to_fasta())
result_file.close()
return None
else:
Expand Down Expand Up @@ -248,12 +251,8 @@ def __call__(self, seq_path, result_path=None, log_path=None,
for seq_id, seq in parse_fasta(open(template_alignment_fp)):
# replace '.' characters with '-' characters
template_alignment.append((seq_id, seq.replace('.', '-').upper()))
try:
template_alignment = LoadSeqs(data=template_alignment, moltype=DNA,
aligned=DenseAlignment)
except KeyError as e:
raise KeyError('Only ACGT-. characters can be contained in template alignments.' +
' The offending character was: %s' % e)
template_alignment = Alignment.from_fasta_records(
template_alignment, DNASequence, validate=True)

# initialize_logger
logger = NastLogger(log_path)
Expand All @@ -273,25 +272,28 @@ def __call__(self, seq_path, result_path=None, log_path=None,

logger.record(str(self))

for i, seq in enumerate(pynast_failed):
skb_seq = DNASequence(str(seq), identifier=seq.Name)
pynast_failed[i] = skb_seq
pynast_failed = SequenceCollection(pynast_failed)

for i, seq in enumerate(pynast_aligned):
skb_seq = DNASequence(str(seq), identifier=seq.Name)
pynast_aligned[i] = skb_seq
pynast_aligned = Alignment(pynast_aligned)

if failure_path is not None:
fail_file = open(failure_path, 'w')
for seq in pynast_failed:
fail_file.write(seq.toFasta())
fail_file.write('\n')
fail_file.write(pynast_failed.to_fasta())
fail_file.close()

if result_path is not None:
result_file = open(result_path, 'w')
for seq in pynast_aligned:
result_file.write(seq.toFasta())
result_file.write('\n')
result_file.write(pynast_aligned.to_fasta())
result_file.close()
return None
else:
try:
return LoadSeqs(data=pynast_aligned, aligned=DenseAlignment)
except ValueError:
return {}
return pynast_aligned


def compute_min_alignment_length(seqs_f, fraction=0.75):
Expand Down
2 changes: 0 additions & 2 deletions qiime/assign_taxonomy.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@
from cStringIO import StringIO
from collections import Counter, defaultdict

from cogent import LoadSeqs, DNA

from skbio.app.util import ApplicationNotFoundError
from skbio.parse.sequences import parse_fasta

Expand Down
Loading

0 comments on commit 99ff358

Please sign in to comment.