diff --git a/ChangeLog.md b/ChangeLog.md index 97eacd1dbe..eb8047e815 100644 --- a/ChangeLog.md +++ b/ChangeLog.md @@ -4,8 +4,9 @@ QIIME 1.8.0-dev (changes since 1.8.0 go here) * Removed ``-Y``/``--python_exe_fp`` and ``-N`` options from ``parallel_merge_otu_tables.py`` script as these are not available in any of the other parallel QIIME scripts and we do not have good reason to support them (see QIIME 1.6.0 release notes below for more details). * SciPy >= 0.13.0, pyqi 0.3.1, and the latest development version of scikit-bio are now required dependencies for a QIIME base install. * Added new options to make_otu_heatmap.py: --color_scheme, which allows users to choose from different color schemes [here](http://wiki.scipy.org/Cookbook/Matplotlib/Show_colormaps); --observation_metadata_category, which allows users to select a column other than taxonomy to use when labeling the rows; and --observation_metadata_level, which allows the user to specify which level in the hierarchical metadata category to use in creating the row labels. -* -m/--mapping_fps is no longer required for split_libraries_fastq.py. The mapping file is not required when running with --barcode_type 'not-barcoded',but the mapping file would fail to validate when passing multiple sequence files and sample ids but a mapping file without barcodes (see #1400). +* -m/--mapping_fps is no longer required for split_libraries_fastq.py. The mapping file is not required when running with --barcode_type 'not-barcoded',but the mapping file would fail to validate when passing multiple sequence files and sample ids but a mapping file without barcodes (see #1400). * Added alphabetical sorting option (based on boxplot labels) to make_distance_boxplots.py. Sorting by boxplot median can now be performed by passing ``--sort median`` (this was previously invoked by passing ``--sort``). Sorting alphabetically can be performed by passing ``--sort alphabetical``. +* Removed insert_seqs_into_tree.py. This code needs additional testing and documentation, and was not widely used. We plan to add this support back in the future, and progress on that can be followed on [#1499](https://github.com/biocore/qiime/issues/1499). QIIME 1.8.0 (11 Dec 2013) ========================= @@ -26,7 +27,7 @@ QIIME 1.8.0 (11 Dec 2013) * Refactored beta_diversity_though_plots.py, jackknifed_beta_diversity.py, and core_diversity_analyses.py workflows to generate emperor PCoA plots instead of KiNG PCoA plots. QIIME now depends on Emperor 0.9.3. One interface change that will be noticeable to users is that the output PCoA plots from these workflows are no longer separated into "continuous" and "discrete" directories. Users can make these color choices from within emperor, so only one PCoA plot is necessary. This refactoring also involved script interface changes to beta_diversity_through_plots.py, which no longer generates 2d plots (interested users can call make_2d_plots.py directly - these won't be needed as often, since we no longer have a Java dependency) or distance histograms (these data are better accessed through make_distance_boxplots.py, which is better written and tested, though users can still call make_distance_histograms.py directly). As a result, beta_diversity_through_plots.py no longer takes the --suppress_2d_plots, --suppress_3d_plots, or --histogram_categories parameters, and now takes a new --suppress_emperor_plots parameter which can be used to disable PCoA plotting. * Modified compare_alpha_diversity.py to generate box plots in addition to statistics, and added the ability to pass multiple categories (instead of just a single category) on the command line. Also fixed issue where options contain ``dest`` parameter, and therefore could have a different name then their longform parameter name. This involves several script interface changes: the --category option is now called --categories; script now takes --output_dir instead of --output_fp (because multiple files can be created, instead of just a single file); --alpha_diversity_filepath is now --alpha_diversity_fp; and --mapping_filepath is now --mapping_fp. * Refactored make_rarefaction_plots.py to add options --generate_per_sample_plots and --generate_average_tables. These are now suppressed by default to reduce run time and size of output. -* Refactored alpha_rarefaction.py to add option --retain_intermediate_files. Rarefied BIOM tables and alpha diversity results for each rarefied BIOM table are now removed by default to reduce size of output. +* Refactored alpha_rarefaction.py to add option --retain_intermediate_files. Rarefied BIOM tables and alpha diversity results for each rarefied BIOM table are now removed by default to reduce size of output. * Update to rtax 0.984. * Required PyNAST version is now 1.2.2. * Updated default taxonomy assigner to be the new uclust-based consensus taxonomy assigner. This was shown to be more accurate and faster than the existing methods in Bokulich, Rideout et al. (submitted). @@ -43,9 +44,9 @@ QIIME 1.7.0 (14 May 2013) ========================= * Required biom-format version is now 1.1.2. * core_qiime_analyses.py has been replaced with core_diversity_analyses.py. This follows a re-factoring to support only "downstream" analyses (i.e., starting with a BIOM table). This makes the script more widely applicable as it's now general to any BIOM data and/or different OTU picking strategies. -* Added support for usearch v6.1 OTU picking and chimera checking. This is in addition to existing support for usearch v5.2.236. +* Added support for usearch v6.1 OTU picking and chimera checking. This is in addition to existing support for usearch v5.2.236. * Added section on using usearch 6.1 chimera checking with ``identify_chimeric_seqs.py`` to "Chimera checking sequences with QIIME" tutorial. -* ``compare_alpha_diversity.py`` output now includes average alpha diversity values as well as the comparison p and t vals. +* ``compare_alpha_diversity.py`` output now includes average alpha diversity values as well as the comparison p and t vals. * ``compare_distance_matrices.py`` has a new option ``--variable_size_distance_classes`` for running Mantel correlogram over distance classes that vary in size (i.e. width) but contain the same number of pairwise distances in each class. * ``qiime.filter.sample_ids_from_category_state_coverage`` now supports splitting on a category. * Modified add_qiime_labels.py script to use standard metadata mapping file with a column specified for fasta file names to make more consistent with other scripts. @@ -80,7 +81,7 @@ QIIME 1.6.0 (18 Dec 2012) * Modified the parameters (de novo chimera detection, reference chimera detection, and size filtering) for USEARCH options with ``pick_otus.py`` to ``suppress_X`` and ``False`` by default, rather than ``True`` and turned off by calling, to make them more intuitive to use and work better with the workflow scripts. * Added a ``simpson_reciprocal`` measure of alpha diversity, which is ``1/D``, following the [definition here](http://www.countrysideinfo.co.uk/simpsons.htm) among other places. Note the measure ``reciprocal_simpson`` is ``1/simpson``, not ``1/D``. It was removed for clarity. * Added new script, ``compute_core_microbiome.py``, which identifies the core OTUs (i.e., those defined in some user-defined percentage of the samples). -* Major refactoring of parallel QIIME. Repetitive code was consolidated into the ParallelWrapper class, which may ultimately move to PyCogent. The only script interface changes are that the ``-Y/--python_exe_fp``, ``-N (serial script filepath)``, and ``-P/--poller_fp`` parameters are no longer available to the user. These were very infrequently (if ever) modified from defaults, so it doesn't make sense to continue to support these. These changes will allow for easier development of new parallel wrappers and facilitate changes to the underlying parallel functionality. +* Major refactoring of parallel QIIME. Repetitive code was consolidated into the ParallelWrapper class, which may ultimately move to PyCogent. The only script interface changes are that the ``-Y/--python_exe_fp``, ``-N (serial script filepath)``, and ``-P/--poller_fp`` parameters are no longer available to the user. These were very infrequently (if ever) modified from defaults, so it doesn't make sense to continue to support these. These changes will allow for easier development of new parallel wrappers and facilitate changes to the underlying parallel functionality. * Added new script, ``compare_taxa_summaries.py``, and supporting library and test code (``qiime/compare_taxa_summaries.py`` and ``tests/test_compare_taxa_summaries.py``) to allow for the comparison of taxa summary files, including sorting and filling, expected, and paired comparisons using pearson or spearman correlation. Added accompanying tutorial (``doc/tutorials/taxa_summary_comparison.rst``). * New script for parallel trie otu picker. * Made ``loaddata.r`` more robust when making mapping files, distance matrices, etc. compatible with each other. There were rare cases that caused some R functions (e.g. ``betadisper``) to fail if empty levels were left in the parsed mapping file. @@ -118,7 +119,7 @@ QIIME 1.5.0 (8 May 2012) ================================== * OTU tables are now stored on disk in the BIOM file format (see http://biom-format.org). The BIOM format webpage describes the motivation for the switch, but briefly it will support interoperability of related tools (e.g., QIIME/MG-RAST/mothur/VAMPS), and is a more efficient representation of data/metadata. The biom-format projects DenseTable and SparseTable objects are now used to represent OTU tables in memory. See the convert_biom.py script in the biom-format project for converting between 'classic' and BIOM formatted OTU tables. * Added a script, add_qiime_labels, that allows users to specify a directory of fasta files, along with a mapping file of SampleIDfasta file name, and combines the fasta files into a single combined fasta file with QIIME compatible labels. This is to handle situations where sequencing centers perform their own proprietary demultiplexing into separate fasta files per sample, instead of supplying raw data, but users would like to use QIIME to analyze their data. -* Added new compare_categories.py script to perform significance testing of categories/sample grouping. Added accompanying tutorial and new RExecutor class to util.py. Methods supported by compare_categories.py are Adonis, Anosim, BEST, Moran's I, MRPP, PERMANOVA, PERMDISP, and RDA. See doc/tutorials/category_comparison.rst for details. +* Added new compare_categories.py script to perform significance testing of categories/sample grouping. Added accompanying tutorial and new RExecutor class to util.py. Methods supported by compare_categories.py are Adonis, Anosim, BEST, Moran's I, MRPP, PERMANOVA, PERMDISP, and RDA. See doc/tutorials/category_comparison.rst for details. * compare_distance_matrices.py can now perform partial Mantel and Mantel correlogram tests in addition to the traditional Mantel test. Additionally, the script has several new options. Added new supporting tutorial and generic statistical method library code (doc/tutorials/distance_matrix_comparison.rst, qiime/stats.py, qiime/compare_distance_matrices.py), and two new classes (DistanceMatrix and MetadataMap) to util.py. * make_3d_plots.py added a new option "-s" which by default only outputs the unscaled points, whereas user can choose to show scaled, unscaled or both. * split_libraries_fastq.py default parameters updated based on evaluation of parameter settings on real and mock community data sets. A manuscript describing these results is currently in preparation. Briefly, the -p/--min_per_read_length parameter was modified to take a fraction of the full read length that is acceptable as the minimum, rather than an absolute (integer) length. Additionally the --max_bad_run_length default was changed from 1 to 3. @@ -128,19 +129,19 @@ QIIME 1.5.0 (8 May 2012) * Increased allowed ambiguous bases in split_libraries.py default values from 0 to 6. This is to accommodate the FLX+ long read technology which will often make ambiguous base calls but still have quality sequences following the ambiguous bases. Also added an option to truncate at the first "N" character option (-x) to allow users to retain these sequences but remove ambiguous bases if desired. * Updated merge_mapping_files.py to support merging of mapping files with overlapping sample ids. * Added support for CASAVA 1.8.0 quality scores in split_libraries_fastq.py. This involved deprecating the --last_bad_quality_char parameter in favor of --phred_quality_threshold. The latter is now computed from the former on the basis of detecting which version of CASAVA is being used from the fastq headers (unfortunately they don't include this information in the file, but it is possible to detect). -* Added the possibility of printing the function of the curve that was fit to the points in plot_semivariogram.py +* Added the possibility of printing the function of the curve that was fit to the points in plot_semivariogram.py * Replaced filter_otu_table.py with filter_otus_from_otu_table.py. The interface was redesigned, and the script was renamed for clarity. * Replaced filter_by_metadata.py with filter_samples_from_otu_table.py. The interface was redesigned, and the script was renamed for clarity. * Add new script to compute the coverage of a sample (or its inverse - the conditional uncovered probability) in the script conditional_uncovered_probability.py. Current estimators include lladser_pe, lladser_ci, esty_ci and robbins. * Updated usearch application wrapper, unit test, and documentation to handle usearch v5.2.32 as earlier version supported has bugs regarding consensus sequence generation (--consout parameter). * Added support for the RTAX taxonomy assignment. RTAX is designed for assigning taxonomy to paired-end reads, but additionally works for single end reads. QIIME currently supports RTAX 0.981. * Added the pick_subsampled_reference_otus_through_otu_tables.py, a more efficient open reference OTU picking workflow script for processing very large Illumina (or other) data sets. This is being used to process the Earth Microbiome Project data, so is designed to scale to tens of HiSeq runs. A new tutorial has been added that describes this process (doc/tutorials/open_reference_illumina_processing.rst). -* Added new script convert_fastqual_to_fastq.py to convert fasta/qual files to fastq. -* Added ability to output demultiplexed fastq from split_libraries_fastq.py. +* Added new script convert_fastqual_to_fastq.py to convert fasta/qual files to fastq. +* Added ability to output demultiplexed fastq from split_libraries_fastq.py. * Added a new sort option to summarize_taxa_through_plots.py which is very useful for web-interface. By default, sorting is turned off. * Added ability to output OTUs per sample instead of sequences per sample to per_library_stats.py. -* Updates and expansions to existing tutorials, including the using AWS and procrustes analysis tutorials. -* Added insert_seqs_into_tree.py to insert reads into an existing tree. This script wraps RAxML, ParsInsert, and PPlacer. +* Updates and expansions to existing tutorials, including the using AWS and procrustes analysis tutorials. +* Added insert_seqs_into_tree.py to insert reads into an existing tree. This script wraps RAxML, ParsInsert, and PPlacer. * Updated split_libraries_fastq.py to handle look only at the first n bases of the barcode reads, where n is automatically determined as the length of the barcodes in the mapping file. This feature is only use if all of the barcodes are the same length. It allows qiime to easily handle ignoring of a 13th base call in the barcode files - this is a technical artifact that sometimes arises. * Added new stats.py module that provides an API for running biogeographical statistical methods, as well as a framework for creating new method implementations in the future (this code was moved over from qiimeutils/microbiogeo). Also added two new classes to the util module (DistanceMatrix and MetadataMap) that are used by the stats module. * Updated Mothur OTU picker support from 1.6.0 to the latest (1.25.0) version. @@ -154,13 +155,13 @@ QIIME 1.4.0 (13 Dec 2011) * Testing of QIIME with new dependency versions, updating of warnings and test failures (in print_qiime_config.py). No code changes were required to support new versions. * split_libraries_fastq.py can now handle gzipped input files. * Addition of code and tutorial to support plotting of raw distance data in QIIME (scripts/make_distance_comparison_plots.py, scripts/make_distance_boxplots.py, qiime/group.py, doc/tutorials/creating_distance_comparison_plots.rst). -* Updates to many scripts to support PyCogent custom option types (new_filepath, new_dirpath, etc.). +* Updates to many scripts to support PyCogent custom option types (new_filepath, new_dirpath, etc.). * Fixes to workflows to fail immediately on certain types of bad inputs (e.g., missing tree when building UniFrac plots) rather than failing only when the script reaches the relevant step in the workflow. * Added ability to merge otu tables with overlapping sample ids (in merge_otu_tables.py). Values are summed when an OTU shows up in the same sample in different OTU tables. * Added a new script (filter_distance_matrix.py) to filter samples directly from distance matrices. * Added script nmds.py Non-Metric Multidimensional Scaling (NMDS). * Added in the calculation of standard error in rarefaction plots, since only standard deviation was calculated. Also added an optional option choice for this. -* Support for pick_otus_through_otu_table.py to allow for uclust_ref to be run in parallel with creation of new clusters. +* Support for pick_otus_through_otu_table.py to allow for uclust_ref to be run in parallel with creation of new clusters. * Added script distance_matrix_from_mapping.py which allows to create a distance matrix from a metadata column. * assign_taxonomy_reference_seqs_fp and assign_taxonomy_id_to_taxonomy_fp were added to qiime_config, which allows users to set defaults for the dataset they'd like to perform taxonomy assignment against. This works for the serial and parallel versions of assign_taxonomy for both BLAST and RDP. * Added in make_3d_plots.py the possibility of calculating RMS vectors, using two methods: avg and trajectory, to assess power (movement) of the trajectories. Additionally this feature will return the significance of the difference of the trajectories using ANOVA. @@ -193,7 +194,7 @@ QIIME 1.4.0 (13 Dec 2011) QIIME 1.3.0 (29 June 2011) ================================== * uclust and uclust_ref OTU pickers now incorporate a pre-filtering step where identical sequences are collapsed before calling uclust and then expanded after calling uclust. This gives a big speed improvement (5-20x) on reasonably sized input sets (>200k sequences) with no effect on the resulting OTUs. This is now the default behavior for pick_otus.py, and can be disabled by passing --suppress_uclust_prefilter_exact_match to pick_otus.py. -* Added ability to pass a file to sort_otu_table.py that contains a sorted list of sample ids, and use that information rather than the mapping file for sorting the OTU table. This allows users to, e.g., pass sorted mapping files as input. +* Added ability to pass a file to sort_otu_table.py that contains a sorted list of sample ids, and use that information rather than the mapping file for sorting the OTU table. This allows users to, e.g., pass sorted mapping files as input. * Added core_analyses.py script and workflow function. This plugs together many components of QIIME (split libraries, pick_otus_through_otu_table.py, beta_diversity_through_3d_plots.py, alpha_rarefaction.py) into a single command and parameters file. * Added script (split_otu_table_by_taxonomy.py) which will create taxon-specific OTU tables from a master OTU table for taxon-specific analyses of alpha/beta diversity, etc. * Changed default behavior of single_rarefaction.py. Now lineage information is included by default, but can be turned off with --suppress_include_lineages @@ -201,7 +202,7 @@ QIIME 1.3.0 (29 June 2011) * Interface changes to summarize_otu_by_cat.py. This allows the user to pass the output file name, rather than a directory where the output file should be written. * Parameter -r reassignment in parallel_assign_taxonomy_rdp.py. Now -r is used for reference_seqs_fp as before was for rdp_classifier_fp. * Added script inflate_denoiser_output.py to expand clusters to fasta representing all sequences. This allows denoiser results to be passed directly to the OTU pickers (and OTU picking workflows) which should greatly reduce the complexity of denoiser runs. The "Denoising 454 Data" tutorial has been updated to reflect how the pipeline should now be run. The denoising functionality was removed from the pick_otus_through_otu_table.py workflow script as that could only be used in very special circumstances - this allows us to focus our attention on supporting the new pipeline described in the updated tutorial. -* Reorganized output from pick_otus_through_otu_table.py to get rid of the confusing output directory structure. +* Reorganized output from pick_otus_through_otu_table.py to get rid of the confusing output directory structure. * Added script plot_semivariogram.py to plot semivariograms using two distance matrices. This script also plots a fitting curve of the data values. * Changed beta diversity scripts to do unweighted_unifrac,weighted_unifrac by default. * Changed output of summarize_taxa.py to a directory instead of filepath. This allows for multiple levels to be processed simultaneously. @@ -229,7 +230,7 @@ QIIME 1.3.0 (29 June 2011) * Added entropy filtering option to filter_alignment.py. This can be useful for position-filtering de novo alignments, or other alignments where no lanemask is available. * Added new script (count_seqs.py) which will count the number of sequences in one or more fasta file, as well as the mean/stddev sequence lengths, and print the results to stdout or file. * Added the plot_taxa_summary.py workflow script, which includes summarizing the OTU table by category. -* Overhauled the QIIME overview tutorial. +* Overhauled the QIIME overview tutorial. * Added new script (start_parallel_jobs_torque.py) which can be used for running parallel QIIME on clusters using torque for the queueing system. A new qiime_config value, torque_queue, can be specified to define the default queue. * Integrated the QIIME Denoiser (Reeder and Knight, 2011) into Qiime. * Added script (compare_alpha_diversity.py) for comparing rarefied alpha diversities across different mapping file categories. @@ -269,8 +270,8 @@ QIIME 1.2.0 (10 Nov 2010) * Added the ability to write out the flowgram file in process_sff.py, ability to define an output directory and convert Titanium reads to FLX length. * SRA submission protocol updated to perform human screening with uclust_ref against 16S reference sequences, rather than cdhit/blast against reference sequences. This can be a lot faster, and reduces the complexity of the code by requiring users to have uclust installed for the human screen rather than cdhit and blast. * Updated SRA protocol to allow users to skip the human screening step as this takes about 2/3 or more of the total analysis time, and is not relevant for non-human-derived samples (e.g., soil samples). -* Added ability to pass --max_accepts, --max_rejects, and --stable_sort through the uclust otu pickers. -* Added a -r parameters to pick_rep_set.py to allow users to pass "preferred" representative sequences in a fasta file. This is useful, for example, if users have picked OTUs with uclust_ref, and would like to use the reference sequences as their representatives, rather than sequences from their sequencing run. +* Added ability to pass --max_accepts, --max_rejects, and --stable_sort through the uclust otu pickers. +* Added a -r parameters to pick_rep_set.py to allow users to pass "preferred" representative sequences in a fasta file. This is useful, for example, if users have picked OTUs with uclust_ref, and would like to use the reference sequences as their representatives, rather than sequences from their sequencing run. * Renamed Qiime/scripts/jackknifed_upgma.py to Qiime/scripts/jackknifed_beta_diversity.py to reflect the addition of generating jackknifed 2d and 3d plots to this workflow script. * Updated parallel_multiple_rarefactions.py, parallel_alpha_diversity.py, and parallel_beta_diversity.py to use the jobs_to_start value for better control over the number of parallel runs. * uclust_ref otu picker now outputs an additional failures file listing the sequences which failed to cluster if the user passed --suppress_new_clusters. This is done for ease of parsing in downstream applications which want to do something special with these sequences. The failures list is no longer written to the log file (although the failures count is still written to the log file). @@ -288,7 +289,7 @@ QIIME 1.2.0 (10 Nov 2010) * Added capability for pairwise sample/sample, monte carlo significance tests. These are frequently done via the unifrac web interface. Users hitting max size limitations on the web can now thrash their own hardware. * Fixed a bug in make_rarefaction_plots where the table below the plots had column labels sorted by natsort, while the values in the table were sorted arbitrarily by dict keys. The plots themselves were fine. * Added a Procrustes analysis/plotting tutorial. -* Added code to exclude OTU ids from an OTU table when building the OTU table. This allows users to discard OTUs that were identified as chimeric. Accessible by passing --exclude_otus_fp to make_otu_table.py. +* Added code to exclude OTU ids from an OTU table when building the OTU table. This allows users to discard OTUs that were identified as chimeric. Accessible by passing --exclude_otus_fp to make_otu_table.py. * Modified identify_chimeric_sequences.py to no longer require the ref db in unaligned format when using chimeraSlayer. * Added a tutorial document on applying chimera checking in QIIME. * Added ability to pass -F T/F to parallel_blast to allow disabling of the low-complexity filtering in BLAST. @@ -313,11 +314,11 @@ QIIME 1.1.0 (14 May 2010) * Merged the make_rarefaction_averages into the make_rarefaction_plots script. Also removed the inputs (--rarefaction_ave and --ymax) options, since they are determined by the script. Also, restructured the output directory format and combined all metric data into one html. * Added the uclust_ref OTU picker, which uses uclust to pick OTUs against a reference collection. Sequences which are within the similarity threshold to a reference sequnece will cluster to an OTU defined by that reference sequence, and sequences which are outside of the similarity threshold to any reference sequence will form new OTUs. * The interface for exclude_seqs_by_blast.py has changed. -M and -W options are now lowercase to avoid conflicts with parallel scripts. Users can avoid formatting the database by passing --no_format_db. By default the files created by formatdb are now cleaned up. Users can choose not to clean up these files using the --no_clean option. Output file extensions have changed from ".excluded" to ".matching" and from ".screened" to ".non-matching" to be clear regardless of whether the sequences matching the database, or not matching the database, are to be excluded. A check was added for user-supplied BLAST databases in exclude_seqs_by_blast.py when run with --no_format_db: if the required files do not exist a parser error is thrown -* Added ability to chimera check sequences with ChimeraSlayer. See identify_chimeric_seqs.py for details. +* Added ability to chimera check sequences with ChimeraSlayer. See identify_chimeric_seqs.py for details. * Added workflow script for second-stage SRA submission, process_sra_submission.py. The SRA submission tutorial has been extensively updated to reflect the use of this new script. * Added the ability to supply a tree and sort the heatmap based on the supplied tree. * Added the ability to handle variable length barcodes, variable length primers, and no primers with split_libraries.py. Error-correction is not supported for barcode types other than golay_12 and hamming_8. split_libraries.py also now throws an error if the barcode length passed on the commands line does not match the barcode length in the mapping file. -* Updated the print_qiime_config.py script to print useful debugging information about the QIIME environment. +* Updated the print_qiime_config.py script to print useful debugging information about the QIIME environment. * Added high-level logging functionality to the workflow scripts. * Added RUN_ALIAS field to SRA experiment.txt spreadsheet in make_sra_submission.xml. @@ -340,14 +341,14 @@ QIIME 1.0.0 - (8 Apr 2010) * Modified the default value of jobs_to_start to be 1 -- because of the addition of the example cluster_jobs script, the default value of 24 no longer makes sense (if it ever really did...). Because the new script is built for multi-core/multi-proc environments, 24 is too high for most cases. Users will need to modify this value from 1 (corresponding to no parallelization) to a value that makes sense for their environment (e.g., 2 for dual core, or 24 to get the previous default). * Added colors module and tests to consolidate and standardize coloring code in QIIME - also updated the graphics scripts to use the colors module. * Added ability for user to specify the background colors of plots in prefs files or on the command line. -* Tweaked SRA submission routines in accordance with accepted format from JCVI's +* Tweaked SRA submission routines in accordance with accepted format from JCVI's survey of multiple body sites. * Fixed SF bug #2971581, which was an issue with the path to qiime's scripts directory not being determined correctly when qiime was installed using setup.py. qiime_config now contains a key (empty by defualt) for the qiime_scripts_dir. If this is not specified by the user, it is determined from the qiime project dir. * Renamed scripts/make_3d_prefs_file.py as scripts/make_prefs_file.py to reflect that the prefs files are now used by other scripts. * Changed behavior of color-by option to make_3d_plots, make_2d_plots, and make_rarefaction_plots, so if no -b option or prefs files is provided, scripts default to coloring by all values. Consequently, mapping files are also now required for these scripts. * Added a split_libraries_illumina.py script to handle processing of Illumina GAIIx data. * Added an additional rarefaction script for clarity. There are now 3 scripts to handle rarefaction: single_rarefaction takes one input otu table into one output table, allows manual naming, multiple_rarefactions makes auto-named rarefied otu tables at a range of depths, and multiple_rarefactios_even_depth.py makes auto-named tables all at the same depth. -* Added workflow unit tests (with timeout functionality). +* Added workflow unit tests (with timeout functionality). * Added default alpha and beta diversity metrics to qiime_parameters.txt. * Integrated Denoiser (Jens Reeder's 454 denoiser) wrappers, and tied this into the workflow scripts. * Added biplot functionality. make_3d_plots now takes the -t option (off by default) to include taxa on the pcoa plot. @@ -356,7 +357,7 @@ survey of multiple body sites. * Added sanity checks to print_qiime_config.py. This will now allow users to evaluate their environment, and should help with debugging. * Added new field to qiime_config (temp_dir) which will be used to specify where temp files should be written. Currently this is only used by the workflow tests, and is intended to allow users to specify something other than /tmp for cases when /tmp is not shared between all nodes that might be working on a job. This will eventually be used for all temp dir creation. * Added ability to make summary plots for a directory of coordinate files in make_3d_plots and make_2d_plots. The summary plot adds ellipsoidal confidence intervals around each point in the plot. - + QIIME 0.92 - (3 Mar 2010) @@ -369,7 +370,7 @@ QIIME 0.91 - (3 Mar 2010) * Addition of a uclust-based OTU picker. * Transfer of all command line interfaces from Qiime/qiime to Qiime/scripts -- this was an important change as it allowed us to get away from the previously one-to-one relationship between files in our library code (in Qiime/qiime) and the command line interfaces. * Standardized command line interfaces for all code in Qiime/scripts by using a new function, Qiime.qiime.util.parse_command_line_parameters to handle the command line interfaces. -* Moved to Sphinx for documentation, and developed a framework for extracting script documentation directly from the scripts to populate the web documentation. +* Moved to Sphinx for documentation, and developed a framework for extracting script documentation directly from the scripts to populate the web documentation. * Bug fixes through-out the code base, including but not limited to fixes for Sourceforge tickets: 2957503, 2953765, 2945548, 2942443, 2941925, 2941926, 2941717, 2941396, 2939588, 2939575, 2935939. * Updated the all_tests.py script to perform a minimal test of the scripts (getting help text works as expected), and to alert users if unit tests may be failing due to missing external applications, in which case they may not be critical. * Created a directory for pycogent_backports, where we can temporarily store new code that has been added to PyCogent, but which has not been added to a PyCogent release yet. This will allow us to keep QIIME's dependencies on the latest PyCogent version despite rapid and frequently related changes in both packages. diff --git a/doc/install/install.rst b/doc/install/install.rst index 58df9015bf..d3263bd80c 100644 --- a/doc/install/install.rst +++ b/doc/install/install.rst @@ -18,7 +18,7 @@ As a consequence of this 'pipeline' architecture, **QIIME has a lot of dependenc How to not install QIIME ======================== -Because QIIME is hard to install, we have attempted to shift this burden to the QIIME development group rather than our users by providing virtual machines with QIIME and all of its dependencies pre-installed. We, and third-party developers, have also created several automated installation procedures. These alternatives (`summarized here <../index.html#downloading-and-installing-qiime>`_) allow you to bypass the complex installation procedure and have access to a full, working QIIME installation. +Because QIIME is hard to install, we have attempted to shift this burden to the QIIME development group rather than our users by providing virtual machines with QIIME and all of its dependencies pre-installed. We, and third-party developers, have also created several automated installation procedures. These alternatives (`summarized here <../index.html#downloading-and-installing-qiime>`_) allow you to bypass the complex installation procedure and have access to a full, working QIIME installation. **We highly recommend going with one of these solutions if you're new to QIIME, or just want to test it out to see if it will do what you want.** @@ -91,7 +91,7 @@ The next are python packages not included in Canopy Express. Each of these can b * pyqi 0.3.1 (`src_pyqi `_) (license: BSD) * scikit-bio (latest development version) (`src_skbio `_) (license: BSD) -Next, there are two non-python dependencies required for the QIIME base package. These should be installed by following their respective install instructions. +Next, there are two non-python dependencies required for the QIIME base package. These should be installed by following their respective install instructions. * uclust 1.2.22q (`src_uclust `_) See :ref:`uclust install notes `. (licensed specially for Qiime and PyNAST users) * fasttree 2.1.3 (`src_fasttree `_) See `FastTree install instructions `_ (license: GPL) @@ -154,17 +154,17 @@ You should see output that looks like the following:: ................ ---------------------------------------------------------------------- Ran 16 tests in 0.440s - + OK -This indicates that you have a complete QIIME base install. +This indicates that you have a complete QIIME base install. You should next :ref:`run QIIME's unit tests `. You will experience some test failures as a result of not having a full QIIME install. If you have questions about these failures, you should post to the `QIIME Forum `_. QIIME full install (for access to advanced features in QIIME, and non-default processing pipelines) --------------------------------------------------------------------------------------------------- -The dependencies described below will support a full QIIME install. These are grouped by the features that each dependency will provide access to. Installation instructions should be followed for each individual package (e.g., from the project's website or README/INSTALL file). +The dependencies described below will support a full QIIME install. These are grouped by the features that each dependency will provide access to. Installation instructions should be followed for each individual package (e.g., from the project's website or README/INSTALL file). Alignment, tree-building, taxonomy assignment, OTU picking, and other data generation steps (required for non-default processing pipelines): @@ -181,8 +181,6 @@ Alignment, tree-building, taxonomy assignment, OTU picking, and other data gener * cdbtools (`src_cdbtools `_) * muscle 3.8.31 (`src_muscle `_) (Public domain) * rtax 0.984 (`src_rtax `_) (license: BSD) -* pplacer 1.1 (`src_pplacer `_) (license: GPL) -* ParsInsert 1.04 (`src_parsinsert `_) (license: GPL) * usearch v5.2.236 and/or usearch v6.1 (`src_usearch `_) (license: see http://www.drive5.com/usearch/nonprofit_form.html) **At this stage two different versions of usearch are supported.** usearch v5.2.236 is referred to as ``usearch`` in QIIME, and usearch v6.1 is referred to as ``usearch61``. Processing sff files: diff --git a/doc/scripts/insert_seqs_into_tree.rst b/doc/scripts/insert_seqs_into_tree.rst deleted file mode 100644 index 1653d83331..0000000000 --- a/doc/scripts/insert_seqs_into_tree.rst +++ /dev/null @@ -1,78 +0,0 @@ -.. _insert_seqs_into_tree: - -.. index:: insert_seqs_into_tree.py - -*insert_seqs_into_tree.py* -- Tree Insertion -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -**Description:** - -This script takes a set of aligned sequences (query) either in the same file as the aligned reference set or separated (depending on method) along with a starting tree and produces a new tree containing the query sequences. This script requires that the user is running Raxml v7.3.0, PPlacer git repository version and ParsInsert 1.0.4. - - -**Usage:** :file:`insert_seqs_into_tree.py [options]` - -**Input Arguments:** - -.. note:: - - - **[REQUIRED]** - - -i, `-`-input_fasta_fp - Path to the input fasta file - -o, `-`-output_dir - Path to the output directory - -t, `-`-starting_tree_fp - Starting Tree which you would like to insert into. - -r, `-`-refseq_fp - Filepath for reference alignment - - **[OPTIONAL]** - - -m, `-`-insertion_method - Method for aligning sequences. Valid choices are: pplacer, raxml_v730, parsinsert [default: raxml_v730] - -s, `-`-stats_fp - Stats file produced by tree-building software. REQUIRED if -m pplacer [default: None] - -p, `-`-method_params_fp - Parameters file containing method-specific parameters to use. Lines should be formatted as 'raxml:-m GTRCAT' (note this is not a standard QIIME parameters file, but a RAxML parameters file). [default: None] - - -**Output:** - -The result of this script produces a tree file (in Newick format) along with a log file containing the output from the underlying tool used for tree insertion. - - -**RAxML Example (default):** - -If you just want to use the default options, you can supply an alignment files where the query and reference sequences are included, along with a starting tree as follows: - -:: - - insert_seqs_into_tree.py -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results - -**ParsInsert Example:** - -If you want to insert sequences using pplacer, you can supply a fasta file containg query sequences (aligned to reference sequences) along with the reference alignment, a starting tree and the stats file produced when building the starting tree via pplacer as follows: - -:: - - insert_seqs_into_tree.py -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results -m parsinsert - -**Pplacer Example:** - -If you want to insert sequences using pplacer, you can supply a fasta file containg query sequences (aligned to reference sequences) along with the reference alignment, a starting tree and the stats file produced when building the starting tree via pplacer as follows: - -:: - - insert_seqs_into_tree.py -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results -m pplacer - -**Parameters file:** - -Additionally, users can supply a parameters file to change the options of the underlying tools as follows: - -:: - - insert_seqs_into_tree.py -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results -p raxml_parameters.txt - - diff --git a/qiime/adjust_seq_orientation.py b/qiime/adjust_seq_orientation.py index b3e07c7761..640acc1f86 100755 --- a/qiime/adjust_seq_orientation.py +++ b/qiime/adjust_seq_orientation.py @@ -12,7 +12,7 @@ from os.path import split, splitext from skbio.parse.sequences import parse_fasta -from cogent import DNA +from skbio.core.sequence import DNA usage_str = """usage: %prog [options] {-i INPUT_FASTA_FP} @@ -42,7 +42,7 @@ def rc_fasta_lines(fasta_lines, seq_desc_mapper=append_rc): """ for seq_id, seq in parse_fasta(fasta_lines): seq_id = seq_desc_mapper(seq_id) - seq = DNA.rc(seq.upper()) + seq = str(DNA(seq.upper()).rc()) yield seq_id, seq return diff --git a/qiime/align_seqs.py b/qiime/align_seqs.py index 6ff1575536..826cfe7455 100644 --- a/qiime/align_seqs.py +++ b/qiime/align_seqs.py @@ -25,17 +25,14 @@ from os import remove from numpy import median -from cogent import LoadSeqs, DNA -from cogent.core.alignment import DenseAlignment, SequenceCollection, Alignment -from cogent.core.sequence import DnaSequence as Dna -from cogent.parse.rfam import MinimalRfamParser, ChangedSequence - import brokit from brokit.infernal import cmalign_from_alignment import brokit.clustalw import brokit.muscle_v38 import brokit.mafft +from cogent import DNA as DNA_cogent +from cogent.parse.rfam import MinimalRfamParser, ChangedSequence from skbio.app.util import ApplicationNotFoundError from skbio.core.exception import RecordError from skbio.parse.sequences import parse_fasta @@ -43,6 +40,9 @@ from qiime.util import (FunctionWithParams, get_qiime_temp_dir) +from skbio.core.alignment import SequenceCollection, Alignment +from skbio.core.sequence import DNASequence +from skbio.parse.sequences import parse_fasta # Load PyNAST if it's available. If it's not, skip it if not but set up # to raise errors if the user tries to use it. @@ -115,7 +115,7 @@ def getResult(self, seq_path): seqs = self.getData(seq_path) params = dict( [(k, v) for (k, v) in self.Params.items() if k.startswith('-')]) - result = module.align_unaligned_seqs(seqs, moltype=DNA, params=params) + result = module.align_unaligned_seqs(seqs, moltype=DNA_cogent, params=params) return result def __call__(self, result_path=None, log_path=None, *args, **kwargs): @@ -131,7 +131,7 @@ def __init__(self, params): """Return new InfernalAligner object with specified params. """ _params = { - 'moltype': DNA, + 'moltype': DNA_cogent, 'Application': 'Infernal', } _params.update(params) @@ -156,9 +156,10 @@ def __call__(self, seq_path, result_path=None, log_path=None, moltype = self.Params['moltype'] # Need to make separate mapping for unaligned sequences - unaligned = SequenceCollection(candidate_sequences, MolType=moltype) - int_map, int_keys = unaligned.getIntMap(prefix='unaligned_') - int_map = SequenceCollection(int_map, MolType=moltype) + unaligned = SequenceCollection.from_fasta_records( + candidate_sequences.iteritems(), DNASequence) + mapped_seqs, new_to_old_ids = unaligned.int_map(prefix='unaligned_') + mapped_seq_tuples = [(k, str(v)) for k,v in mapped_seqs.iteritems()] # Turn on --gapthresh option in cmbuild to force alignment to full # model @@ -174,7 +175,6 @@ def __call__(self, seq_path, result_path=None, log_path=None, # are fragments. # Also turn on --gapthresh to use same gapthresh as was used to build # model - if cmalign_params is None: cmalign_params = {} cmalign_params.update({'--sub': True, '--gapthresh': 1.0}) @@ -186,20 +186,23 @@ def __call__(self, seq_path, result_path=None, log_path=None, # Align sequences to alignment including alignment gaps. aligned, struct_string = cmalign_from_alignment(aln=template_alignment, structure_string=struct, - seqs=int_map, + seqs=mapped_seq_tuples, moltype=moltype, include_aln=True, params=cmalign_params, cmbuild_params=cmbuild_params) # Pull out original sequences from full alignment. - infernal_aligned = {} + infernal_aligned = [] + # Get a dict of the identifiers to sequences (note that this is a + # cogent alignment object, hence the call to NamedSeqs) aligned_dict = aligned.NamedSeqs - for key in int_map.Names: - infernal_aligned[int_keys.get(key, key)] = aligned_dict[key] + for n, o in new_to_old_ids.iteritems(): + aligned_seq = aligned_dict[n] + infernal_aligned.append((o, aligned_seq)) # Create an Alignment object from alignment dict - infernal_aligned = Alignment(infernal_aligned, MolType=moltype) + infernal_aligned = Alignment.from_fasta_records(infernal_aligned, DNASequence) if log_path is not None: log_file = open(log_path, 'w') @@ -208,7 +211,7 @@ def __call__(self, seq_path, result_path=None, log_path=None, if result_path is not None: result_file = open(result_path, 'w') - result_file.write(infernal_aligned.toFasta()) + result_file.write(infernal_aligned.to_fasta()) result_file.close() return None else: @@ -248,12 +251,8 @@ def __call__(self, seq_path, result_path=None, log_path=None, for seq_id, seq in parse_fasta(open(template_alignment_fp)): # replace '.' characters with '-' characters template_alignment.append((seq_id, seq.replace('.', '-').upper())) - try: - template_alignment = LoadSeqs(data=template_alignment, moltype=DNA, - aligned=DenseAlignment) - except KeyError as e: - raise KeyError('Only ACGT-. characters can be contained in template alignments.' + - ' The offending character was: %s' % e) + template_alignment = Alignment.from_fasta_records( + template_alignment, DNASequence, validate=True) # initialize_logger logger = NastLogger(log_path) @@ -273,25 +272,28 @@ def __call__(self, seq_path, result_path=None, log_path=None, logger.record(str(self)) + for i, seq in enumerate(pynast_failed): + skb_seq = DNASequence(str(seq), identifier=seq.Name) + pynast_failed[i] = skb_seq + pynast_failed = SequenceCollection(pynast_failed) + + for i, seq in enumerate(pynast_aligned): + skb_seq = DNASequence(str(seq), identifier=seq.Name) + pynast_aligned[i] = skb_seq + pynast_aligned = Alignment(pynast_aligned) + if failure_path is not None: fail_file = open(failure_path, 'w') - for seq in pynast_failed: - fail_file.write(seq.toFasta()) - fail_file.write('\n') + fail_file.write(pynast_failed.to_fasta()) fail_file.close() if result_path is not None: result_file = open(result_path, 'w') - for seq in pynast_aligned: - result_file.write(seq.toFasta()) - result_file.write('\n') + result_file.write(pynast_aligned.to_fasta()) result_file.close() return None else: - try: - return LoadSeqs(data=pynast_aligned, aligned=DenseAlignment) - except ValueError: - return {} + return pynast_aligned def compute_min_alignment_length(seqs_f, fraction=0.75): diff --git a/qiime/assign_taxonomy.py b/qiime/assign_taxonomy.py index bc80880694..9449dc6ea9 100644 --- a/qiime/assign_taxonomy.py +++ b/qiime/assign_taxonomy.py @@ -23,8 +23,6 @@ from cStringIO import StringIO from collections import Counter, defaultdict -from cogent import LoadSeqs, DNA - from skbio.app.util import ApplicationNotFoundError from skbio.parse.sequences import parse_fasta diff --git a/qiime/extract_barcodes.py b/qiime/extract_barcodes.py index 5b97e876cf..de1ea3dd03 100644 --- a/qiime/extract_barcodes.py +++ b/qiime/extract_barcodes.py @@ -16,7 +16,7 @@ from re import compile from skbio.parse.sequences import parse_fastq -from cogent import DNA +from skbio.core.sequence import DNA from qiime.check_id_map import process_id_map from qiime.split_libraries_fastq import (check_header_match_pre180, @@ -210,7 +210,7 @@ def process_barcode_single_end_data(read1_data, bc_read = read1_data[sequence_index][:bc1_len] bc_qual = read1_data[quality_index][:bc1_len] if rev_comp_bc1: - bc_read = DNA.rc(bc_read) + bc_read = str(DNA(bc_read).rc()) bc_qual = bc_qual[::-1] bc_lines = format_fastq_record(read1_data[header_index], bc_read, bc_qual) @@ -317,10 +317,10 @@ def process_barcode_paired_end_data(read1_data, bc_qual1 = read1[quality_index][0:bc1_len] bc_qual2 = read2[quality_index][0:bc2_len] if rev_comp_bc1: - bc_read1 = DNA.rc(bc_read1) + bc_read1 = str(DNA(bc_read1).rc()) bc_qual1 = bc_qual1[::-1] if rev_comp_bc2: - bc_read2 = DNA.rc(bc_read2) + bc_read2 = str(DNA(bc_read2).rc()) bc_qual2 = bc_qual2[::-1] bc_lines = format_fastq_record(read1[header_index], @@ -393,7 +393,7 @@ def process_barcode_paired_stitched(read_data, if not found_primer_match: for curr_primer in reverse_primers: if curr_primer.search(read_data[sequence_index]): - read_seq = DNA.rc(read_seq) + read_seq = str(DNA(read_seq).rc()) read_qual = read_qual[::-1] found_primer_match = True break @@ -411,10 +411,10 @@ def process_barcode_paired_stitched(read_data, bc_qual2 = read_qual[-bc2_len:] if rev_comp_bc1: - bc_read1 = DNA.rc(bc_read1) + bc_read1 = str(DNA(bc_read1).rc()) bc_qual1 = bc_qual1[::-1] if rev_comp_bc2: - bc_read2 = DNA.rc(bc_read2) + bc_read2 = str(DNA(bc_read2).rc()) bc_qual2 = bc_qual2[::-1] if switch_bc_order: @@ -469,7 +469,7 @@ def process_barcode_in_label(read1_data, # Create fake quality scores bc1_qual = "F" * len(bc1_read) if rev_comp_bc1: - bc1_read = DNA.rc(bc1_read) + bc1_read = str(DNA(bc1_read).rc()) if read2_data: bc2_read =\ @@ -477,7 +477,7 @@ def process_barcode_in_label(read1_data, char_delineator)[-1][0:bc2_len] bc2_qual = "F" * len(bc2_read) if rev_comp_bc2: - bc2_read = DNA.rc(bc2_read) + bc2_read = str(DNA(bc2_read).rc()) else: bc2_read = "" bc2_qual = "" @@ -529,11 +529,11 @@ def get_primers(header, # Split on commas to handle pool of primers raw_forward_primers.update([upper(primer).strip() for primer in line[primer_ix].split(',')]) - raw_forward_rc_primers.update([DNA.rc(primer) for + raw_forward_rc_primers.update([str(DNA(primer).rc()) for primer in raw_forward_primers]) raw_reverse_primers.update([upper(primer).strip() for primer in line[rev_primer_ix].split(',')]) - raw_reverse_rc_primers.update([DNA.rc(primer) for + raw_reverse_rc_primers.update([str(DNA(primer).rc()) for primer in raw_reverse_primers]) if not raw_forward_primers: diff --git a/qiime/filter_alignment.py b/qiime/filter_alignment.py index 779fb2629d..f868622a1f 100755 --- a/qiime/filter_alignment.py +++ b/qiime/filter_alignment.py @@ -7,12 +7,10 @@ from os import mkdir, remove from collections import defaultdict -from numpy import nonzero, array, fromstring, repeat, bitwise_or, uint8, zeros,\ - arange, finfo -import numpy +from numpy import (nonzero, array, fromstring, repeat, bitwise_or, + uint8, zeros, arange, finfo) -from cogent.util.unit_test import TestCase, main -from cogent import LoadSeqs, DNA +from cogent import DNA from cogent.core.alignment import DenseAlignment from cogent.core.sequence import ModelDnaSequence from cogent.core.profile import Profile @@ -157,7 +155,7 @@ def remove_outliers(seqs, num_sigmas, fraction_seqs_for_stats=.95): diff_cutoff = seq_diffs_considered_sorted.mean() + \ num_sigmas * seq_diffs_considered_sorted.std() # mean + e.g.: 4 sigma - seq_idxs_to_keep = numpy.arange(len(seq_diffs))[seq_diffs <= diff_cutoff] + seq_idxs_to_keep = arange(len(seq_diffs))[seq_diffs <= diff_cutoff] filtered_aln = aln.getSubAlignment(seq_idxs_to_keep) return filtered_aln diff --git a/qiime/filter_otus_by_sample.py b/qiime/filter_otus_by_sample.py index 6eab02cd68..d067853ec1 100644 --- a/qiime/filter_otus_by_sample.py +++ b/qiime/filter_otus_by_sample.py @@ -12,8 +12,9 @@ from string import strip import re -from cogent import LoadSeqs +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA def filter_otus(otus, prefs): """filters the otus file based on which samples should be removed and @@ -47,9 +48,9 @@ def filter_aln_by_otus(aln, prefs): be removed""" filtered_seqs = [] removed_seqs = [] - for j in range(aln.getNumSeqs()): + for j in range(aln.sequence_count()): remove = False - aln_name = aln.Names[j] + aln_name = aln[j].identifier stripped_aln_name = aln_name.split(' ')[0].split('_') if len(stripped_aln_name) > 1: new_aln_name = ''.join(stripped_aln_name[:-1]) @@ -61,9 +62,9 @@ def filter_aln_by_otus(aln, prefs): remove = True if remove: - removed_seqs.append((aln_name, aln.getSeq(aln_name))) + removed_seqs.append((aln_name, str(aln[aln_name]))) else: - filtered_seqs.append((aln_name, aln.getSeq(aln_name))) + filtered_seqs.append((aln_name, str(aln[aln_name]))) return filtered_seqs, removed_seqs @@ -110,24 +111,26 @@ def filter_samples(prefs, data, dir_path='', filename=None): # write a fasta containing list of sequences removed from # representative set - try: - removed_seqs = LoadSeqs(data=removed_seqs, aligned=False) - except: + if len(removed_seqs) > 0: + removed_seqs = SequenceCollection.from_fasta_records( + [(e[0], str(e[1])) for e in removed_seqs], DNA) + else: raise ValueError( 'No sequences were removed. Did you specify the correct Sample ID?') output_filepath2 = '%s/%s_sremoved.fasta' % (dir_path, filename) output_file2 = open(output_filepath2, 'w') - output_file2.write(removed_seqs.toFasta()) + output_file2.write(removed_seqs.to_fasta()) output_file2.close() # write a fasta containing the filtered representative seqs - try: - filtered_seqs = LoadSeqs(data=filtered_seqs, aligned=False) - except: + if len(filtered_seqs) > 0: + filtered_seqs = SequenceCollection.from_fasta_records( + [(e[0], str(e[1])) for e in filtered_seqs], DNA) + else: raise ValueError( 'No sequences were remaining in the fasta file. Did you remove all Sample ID\'s?') output_filepath = '%s/%s_sfiltered.fasta' % (dir_path, filename) output_file = open(output_filepath, 'w') - output_file.write(filtered_seqs.toFasta()) + output_file.write(filtered_seqs.to_fasta()) output_file.close() diff --git a/qiime/insert_seqs_into_tree.py b/qiime/insert_seqs_into_tree.py deleted file mode 100644 index b0587671be..0000000000 --- a/qiime/insert_seqs_into_tree.py +++ /dev/null @@ -1,56 +0,0 @@ -#!/usr/bin/env python -# File created on 11 Oct 2011 -from __future__ import division - -__author__ = "Jesse Stombaugh" -__copyright__ = "Copyright 2011, The QIIME project" -__credits__ = ["Jesse Stombaugh"] -__license__ = "GPL" -__version__ = "1.8.0-dev" -__maintainer__ = "Jesse Stombaugh" -__email__ = "jesse.stombaugh@colorado.edu" - -from cogent.core.tree import PhyloNode -from cogent.parse.tree import DndParser -import re - - -def convert_tree_tips(align_map, tree_fp): - """ rename the starting tree to correspond to the new phylip names, - which are assigned to each sequence """ - - # flip key value pairs - tree_tip_to_seq_name = {} - for i in align_map: - tree_tip_to_seq_name[align_map[i]] = i - - # change the tip labels to phylip labels - open_tree = open(tree_fp) - tree = DndParser(open_tree, constructor=PhyloNode) - for node in tree.tips(): - node.Name = tree_tip_to_seq_name[node.Name] - - return tree - - -def write_updated_tree_file(updated_tree_fp, tree): - """ write the tree """ - - open_tree_fp = open(updated_tree_fp, 'w') - open_tree_fp.write(tree.getNewick(with_distances=True)) - open_tree_fp.close() - - return - - -def strip_and_rename_unwanted_labels_from_tree(align_map, tree): - """ rename tree tips to match the input sequence names """ - - # iterate over tips and strip unwanted text - for node in tree.tips(): - removed_query_str = re.sub('QUERY___', '', str(node.Name)) - new_node_name = re.sub('___\d+', '', str(removed_query_str)) - if new_node_name in align_map: - node.Name = align_map[new_node_name] - - return tree diff --git a/qiime/make_phylogeny.py b/qiime/make_phylogeny.py index 87c066dd31..857462542f 100644 --- a/qiime/make_phylogeny.py +++ b/qiime/make_phylogeny.py @@ -18,9 +18,10 @@ added.. """ -from cogent import LoadSeqs, DNA - +from cogent import DNA as DNA_cogent from skbio.parse.sequences import parse_fasta +from skbio.core.alignment import Alignment +from skbio.core.sequence import DNA from qiime.util import FunctionWithParams # app controllers that implement align_unaligned_seqs @@ -101,11 +102,14 @@ def getResult(self, aln_path, *args, **kwargs): # standard qiime says we just consider the first word as the unique ID # the rest of the defline of the fasta alignment often doesn't match # the otu names in the otu table - seqs = LoadSeqs( - aln_path, - Aligned=True, - label_to_name=lambda x: x.split()[0]) - result = module.build_tree_from_alignment(seqs, moltype=DNA) + with open(aln_path) as aln_f: + seqs = Alignment.from_fasta_records( + parse_fasta(aln_f, label_to_name=lambda x: x.split()[0]), + DNA) + # This ugly little line of code lets us pass a skbio Alignment when a + # a cogent alignment is expected. + seqs.getIntMap = seqs.int_map + result = module.build_tree_from_alignment(seqs, moltype=DNA_cogent) try: root_method = kwargs['root_method'] diff --git a/qiime/parse.py b/qiime/parse.py index 34427036ce..fd20b82129 100644 --- a/qiime/parse.py +++ b/qiime/parse.py @@ -21,11 +21,13 @@ from numpy import concatenate, repeat, zeros, nan, asarray from numpy.random import permutation + from skbio.parse.record_finder import LabeledRecordFinder -from cogent.parse.tree import DndParser from skbio.parse.sequences import parse_fastq, FastaFinder +from skbio.core.sequence import DNA + +from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode -from cogent import DNA from qiime.quality import ascii_to_phred33, ascii_to_phred64 @@ -762,7 +764,7 @@ def parse_illumina_line(l, barcode_length, rev_comp_barcode, barcode = y_position_subfields[1][:barcode_length] if rev_comp_barcode: - barcode = DNA.rc(barcode) + barcode = str(DNA(barcode).rc()) result = { 'Full description': ':'.join(fields[:5]), diff --git a/qiime/pick_otus.py b/qiime/pick_otus.py index f91e21bfb2..f550a7e0f5 100644 --- a/qiime/pick_otus.py +++ b/qiime/pick_otus.py @@ -23,12 +23,13 @@ from tempfile import mkstemp from cogent.parse.mothur import parse_otu_list as mothur_parse -from cogent.util.misc import remove_files -from cogent import LoadSeqs, DNA -from cogent.util.misc import flatten +from cogent import DNA as DNA_cogent +from skbio.util.misc import remove_files, flatten from skbio.util.trie import CompressedTrie, fasta_to_pairlist from skbio.parse.sequences import parse_fasta +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA from qiime.util import FunctionWithParams, get_qiime_temp_dir from qiime.sort import sort_fasta_by_abundance @@ -660,7 +661,6 @@ def __call__(self, seq_path, result_path=None, log_path=None, Togther with cd-hit this is a non-heuristic filter reduces run time a lot. Still a bit slower than the prefix_prefilter toggled with prefix_prefilter_length. """ - moltype = DNA log_lines = [] # create the params dict to pass to cd-hit-est -- IS THERE A @@ -672,29 +672,25 @@ def __call__(self, seq_path, result_path=None, log_path=None, cd_hit_params['-d'] = id_len # turn off id truncation cd_hit_params['-g'] = "1" if (prefix_prefilter_length is not None and trie_prefilter): - log_lines.append("Both prefilters selected. Deactivate \ - trie_prefilter") + log_lines.append("Both prefilters selected. Deactivate trie_prefilter") trie_prefilter = False if prefix_prefilter_length is not None: log_lines.append( 'Prefix-based prefiltering, prefix length: %d' % prefix_prefilter_length) - seqs, filter_map = self._prefilter_exact_prefixes( - parse_fasta(open(seq_path)), prefix_prefilter_length) + with open(seq_path) as seq_f: + seqs, filter_map = self._prefilter_exact_prefixes( + parse_fasta(seq_f, label_to_name=lambda x: x.split()[0]), + prefix_prefilter_length) log_lines.append( - 'Prefix-based prefiltering, post-filter num seqs: %d' - % len(seqs)) - + 'Prefix-based prefiltering, post-filter num seqs: %d' % len(seqs)) elif trie_prefilter: log_lines.append( 'Trie-based prefiltering') seqs, filter_map = self._prefilter_with_trie(seq_path) - log_lines.append( - 'Trie-based prefiltering, post-filter num seqs: %d' - % len(seqs)) - + 'Trie-based prefiltering, post-filter num seqs: %d' % len(seqs)) else: log_lines.append('No prefix-based prefiltering.') # Load the seq path. Right now, cdhit_clusters_from_seqs @@ -703,16 +699,16 @@ def __call__(self, seq_path, result_path=None, log_path=None, # to cd-hit-est. We may want to change that in the future # to avoid the overhead of loading large sequence collections # during this step. - seqs = LoadSeqs(seq_path, - moltype=moltype, - aligned=False, - label_to_name=lambda x: x.split()[0]) + with open(seq_path) as seq_f: + seqs = SequenceCollection.from_fasta_records( + parse_fasta(seq_f, label_to_name=lambda x: x.split()[0]), + DNA) + seqs = dict(seqs.iteritems()) # Get the clusters by running cd-hit-est against the # sequence collection clusters = cdhit_clusters_from_seqs( - seqs=seqs, moltype=moltype, params=cd_hit_params) - + seqs=seqs, moltype=DNA_cogent, params=cd_hit_params) if prefix_prefilter_length is not None or trie_prefilter: clusters = self._map_filtered_clusters_to_full_clusters( clusters, filter_map) diff --git a/qiime/split_libraries.py b/qiime/split_libraries.py index d63b4be5b0..9d6b7acd14 100644 --- a/qiime/split_libraries.py +++ b/qiime/split_libraries.py @@ -44,8 +44,10 @@ from numpy import __version__ as numpy_version import warnings warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found') + from skbio.parse.sequences import parse_fasta -from cogent import DNA, LoadSeqs +from cogent import DNA as DNA_cogent, LoadSeqs + from cogent.align.align import make_dna_scoring_dict, local_pairwise from cogent.util.misc import remove_files from skbio.core.sequence import DNASequence @@ -187,7 +189,7 @@ def scorer(x, y): '-': {'-': None}}) -def pair_hmm_align_unaligned_seqs(seqs, moltype=DNA, params={}): +def pair_hmm_align_unaligned_seqs(seqs, moltype=DNA_cogent, params={}): """ Checks parameters for pairwise alignment, returns alignment. @@ -1130,7 +1132,7 @@ def get_reverse_primers(id_map): # Convert to reverse complement of the primer so its in the # proper orientation with the input fasta sequences rev_primers[n[1]['BarcodeSequence']] =\ - [DNA.rc(curr_rev_primer) for curr_rev_primer in + [str(DNASequence(curr_rev_primer).rc()) for curr_rev_primer in (n[1]['ReversePrimer']).split(',')] return rev_primers @@ -1294,7 +1296,7 @@ def preprocess(fasta_files, qual_files, mapping_file, else: rev_primers = False - # *** Generate dictionary of {barcode: DNA.rc(ReversePrimer)} + # *** Generate dictionary of {barcode: DNA(ReversePrimer).rc()} # First check for ReversePrimer in headers, raise error if not found # Implement local alignment for primer after barcode is determined. # Add option to flag seq with error for rev_primer not found diff --git a/qiime/split_libraries_fastq.py b/qiime/split_libraries_fastq.py index 25736f7af3..e71cff96eb 100644 --- a/qiime/split_libraries_fastq.py +++ b/qiime/split_libraries_fastq.py @@ -13,12 +13,14 @@ from itertools import izip, cycle from os.path import split, splitext from os import makedirs + from numpy import log10, arange, histogram -from cogent import DNA + from skbio.parse.sequences import parse_fastq +from skbio.core.sequence import DNA + from qiime.format import (format_histogram_one_count, - format_split_libraries_fastq_log, - ) + format_split_libraries_fastq_log) from qiime.parse import is_casava_v180_or_later from qiime.hamming import decode_hamming_8 from qiime.golay import decode_golay_12 @@ -319,7 +321,7 @@ def process_fastq_single_end_read_file(fastq_read_f, else: barcode = bc_data[sequence_index] if rev_comp_barcode: - barcode = DNA.rc(barcode) + barcode = str(DNA(barcode).rc()) # Grab the read sequence sequence = read_data[1] # Grab the read quality @@ -378,7 +380,7 @@ def process_fastq_single_end_read_file(fastq_read_f, seqs_per_sample_counts[sample_id] = 1 if rev_comp: - sequence = DNA.rc(sequence) + sequence = str(DNA(sequence).rc()) quality = quality[::-1] fasta_header = '%s_%s %s orig_bc=%s new_bc=%s bc_diffs=%d' %\ diff --git a/qiime/truncate_reverse_primer.py b/qiime/truncate_reverse_primer.py index 58f91194f6..e98e85b2e6 100755 --- a/qiime/truncate_reverse_primer.py +++ b/qiime/truncate_reverse_primer.py @@ -13,7 +13,7 @@ from os.path import join, basename from skbio.parse.sequences import parse_fasta -from cogent import DNA +from skbio.core.sequence import DNA from qiime.split_libraries import local_align_primer_seq from qiime.check_id_map import process_id_map @@ -50,7 +50,7 @@ def get_rev_primer_seqs(mapping_fp): for curr_id in id_map.keys(): try: reverse_primers[curr_id] =\ - [DNA.rc(curr_rev_primer) for curr_rev_primer in + [str(DNA(curr_rev_primer).rc()) for curr_rev_primer in id_map[curr_id]['ReversePrimer'].split(',')] except KeyError: raise KeyError("Reverse primer not found in mapping file, " + diff --git a/qiime/util.py b/qiime/util.py index b121ce4fdb..bbd00ba453 100644 --- a/qiime/util.py +++ b/qiime/util.py @@ -48,13 +48,10 @@ SparseOrthologTable, SparsePathwayTable, SparseTable, SparseTaxonTable) -from cogent import LoadSeqs, Sequence, DNA from cogent.parse.tree import DndParser from cogent.cluster.procrustes import procrustes -from cogent.core.alignment import Alignment -from cogent.data.molecular_weight import DnaMW -from cogent.util.misc import remove_files, create_dir, handle_error_codes +from skbio.util.misc import remove_files, create_dir from skbio.app.util import ApplicationError, CommandLineApplication, FilePath from skbio.app.util import which from skbio.core.sequence import DNASequence @@ -265,22 +262,6 @@ def getBiomData(self, data): raise TypeError('Data is neither a path to a biom table or a' + ' biom table object.') - def getAlignment(self, aln_source): - """Returns parsed alignment from putative alignment source""" - if isinstance(aln_source, Alignment): - aln = aln_source - elif aln_source: - try: - aln = LoadSeqs(aln_source, Aligned=True) - except (TypeError, IOError, AssertionError): - raise AlignmentMissingError( - "Couldn't read alignment file at path: %s" % - aln_source) - else: - raise AlignmentMissingError(str(self.Name) + - " requires an alignment, but no alignment was supplied.") - return aln - def __call__(self, result_path=None, log_path=None, *args, **kwargs): """Returns the result of calling the function using the params dict. @@ -1345,7 +1326,7 @@ def get_split_libraries_fastq_params_and_file_types(fastq_fps, mapping_fp): # create set of reverse complement barcodes from mapping file revcomp_barcode_mapping_column = [] for i in barcode_mapping_column: - revcomp_barcode_mapping_column.append(DNA.rc(i)) + revcomp_barcode_mapping_column.append(str(DNASequence(i).rc())) barcode_len = len(i) revcomp_barcode_mapping_column = set(revcomp_barcode_mapping_column) @@ -1675,7 +1656,7 @@ def parseMetadataMap(lines): @staticmethod def mergeMappingFiles(mapping_files, no_data_value='no_data'): - """ Merge list of mapping files into a single mapping file + """ Merge list of mapping files into a single mapping file mapping_files: open file objects containing mapping data no_data_value: value to be used in cases where there is no diff --git a/scripts/adjust_seq_orientation.py b/scripts/adjust_seq_orientation.py index 9aa7f6b54c..f584c47efe 100755 --- a/scripts/adjust_seq_orientation.py +++ b/scripts/adjust_seq_orientation.py @@ -10,12 +10,12 @@ __maintainer__ = "Antonio Gonzalez Pena" __email__ = "antgonza@gmail.com" +from os.path import split, splitext + from qiime.util import parse_command_line_parameters, get_options_lookup from qiime.util import make_option -from os.path import split, splitext -from skbio.parse.sequences import parse_fasta -from cogent import DNA -from qiime.adjust_seq_orientation import rc_fasta_file, append_rc, null_seq_desc_mapper +from qiime.adjust_seq_orientation import (rc_fasta_file, + append_rc, null_seq_desc_mapper) options_lookup = get_options_lookup() diff --git a/scripts/filter_otus_by_sample.py b/scripts/filter_otus_by_sample.py index 1f9831e68e..3b5e263002 100755 --- a/scripts/filter_otus_by_sample.py +++ b/scripts/filter_otus_by_sample.py @@ -12,11 +12,15 @@ __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" +import os + +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA +from skbio.parse.sequences import parse_fasta + from qiime.util import make_option from qiime.util import parse_command_line_parameters, get_options_lookup from qiime.filter_otus_by_sample import filter_samples, process_extract_samples -import os -from cogent import LoadSeqs from qiime.parse import fields_to_dict options_lookup = get_options_lookup() @@ -57,7 +61,8 @@ def main(): fasta_file = opts.input_fasta_fp # load the input alignment - data['aln'] = LoadSeqs(fasta_file, aligned=False) + data['aln'] = SequenceCollection.from_fasta_records( + parse_fasta(open(fasta_file)), DNA) # Load the otu file otu_path = opts.otu_map_fp diff --git a/scripts/insert_seqs_into_tree.py b/scripts/insert_seqs_into_tree.py deleted file mode 100755 index 70a2b242c2..0000000000 --- a/scripts/insert_seqs_into_tree.py +++ /dev/null @@ -1,199 +0,0 @@ -#!/usr/bin/env python -# File created on 11 Oct 2011 -from __future__ import division - -__author__ = "Jesse Stombaugh" -__copyright__ = "Copyright 2011, The QIIME project" -__credits__ = ["Jesse Stombaugh", "Jai Ram Rideout", "Emily TerAvest"] -__license__ = "GPL" -__version__ = "1.8.0-dev" -__maintainer__ = "Jesse Stombaugh" -__email__ = "jesse.stombaugh@colorado.edu" - -from StringIO import StringIO -from os.path import abspath, join, split, splitext - -from skbio.parse.sequences import parse_fasta - -from cogent.core.alignment import DenseAlignment -from cogent.core.moltype import DNA - -import brokit.pplacer -import brokit.parsinsert -import brokit.raxml_v730 - -from qiime.util import parse_command_line_parameters, make_option, \ - get_options_lookup, load_qiime_config, create_dir -from qiime.parse import parse_qiime_parameters -from cogent.core.moltype import DNA -from tempfile import mkstemp -from os import close -from os.path import abspath, join, split, splitext -from qiime.insert_seqs_into_tree import convert_tree_tips, \ - write_updated_tree_file, \ - strip_and_rename_unwanted_labels_from_tree - -options_lookup = get_options_lookup() - -qiime_config = load_qiime_config() - -insertion_method_choices = ['pplacer', 'raxml_v730', 'parsinsert'] - -script_info = {} -script_info['brief_description'] = "Tree Insertion" -script_info[ - 'script_description'] = "This script takes a set of aligned sequences (query) either in the same file as the aligned reference set or separated (depending on method) along with a starting tree and produces a new tree containing the query sequences. This script requires that the user is running Raxml v7.3.0, PPlacer git repository version and ParsInsert 1.0.4." -script_info['script_usage'] = [] -script_info['script_usage'].append( - ("""RAxML Example (default):""", - """If you just want to use the default options, you can supply an alignment files where the query and reference sequences are included, along with a starting tree as follows:""", - """%prog -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results""")) -script_info['script_usage'].append( - ("""ParsInsert Example:""", - """If you want to insert sequences using pplacer, you can supply a fasta file containg query sequences (aligned to reference sequences) along with the reference alignment, a starting tree and the stats file produced when building the starting tree via pplacer as follows:""", - """%prog -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results -m parsinsert""")) -script_info['script_usage'].append( - ("""Pplacer Example:""", - """If you want to insert sequences using pplacer, you can supply a fasta file containg query sequences (aligned to reference sequences) along with the reference alignment, a starting tree and the stats file produced when building the starting tree via pplacer as follows:""", - """%prog -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results -m pplacer""")) -script_info['script_usage'].append( - ("""Parameters file:""", - """Additionally, users can supply a parameters file to change the options of the underlying tools as follows:""", - """%prog -i aligned_query_seqs.fasta -r aligned_reference_seqs.fasta -t starting_tree.tre -o insertion_results -p raxml_parameters.txt""")) -script_info[ - 'output_description'] = "The result of this script produces a tree file (in Newick format) along with a log file containing the output from the underlying tool used for tree insertion." -script_info['required_options'] = [ - options_lookup['fasta_as_primary_input'], - options_lookup['output_dir'], - make_option('-t', '--starting_tree_fp', - type='existing_filepath', help='Starting Tree which you would like to insert into.'), - make_option('-r', '--refseq_fp', - type='existing_filepath', dest='refseq_fp', help='Filepath for ' + - 'reference alignment'), -] -script_info['optional_options'] = [ - make_option('-m', '--insertion_method', - type='choice', help='Method for aligning' + - ' sequences. Valid choices are: ' + - ', '.join(insertion_method_choices) + ' [default: %default]', - choices=insertion_method_choices, - default='raxml_v730'), - make_option('-s', '--stats_fp', - type='existing_filepath', help='Stats file produced by tree-building software. REQUIRED if -m pplacer [default: %default]'), - make_option('-p', '--method_params_fp', - type='existing_filepath', help="Parameters file containing method-specific parameters to use. Lines should be formatted as 'raxml:-m GTRCAT' (note this is not a standard QIIME parameters file, but a RAxML parameters file). [default: %default]"), - -] -script_info['version'] = __version__ - - -def main(): - option_parser, opts, args =\ - parse_command_line_parameters(**script_info) - - parameters = {} - - # get the tree insertion method to use - module = opts.insertion_method - - # create output directory - output_dir = opts.output_dir - create_dir(output_dir) - - # list of tree insertion methods - tree_insertion_module_names = \ - {'raxml_v730': brokit.raxml_v730, - 'parsinsert': brokit.parsinsert, - 'pplacer': brokit.pplacer} - - # load input sequences and convert to phylip since the tools require - # the query sequences to phylip-compliant names - load_aln = parse_fasta(open(opts.input_fasta_fp, 'U')) - aln = DenseAlignment(load_aln) - seqs, align_map = aln.toPhylip() - - if opts.method_params_fp: - param_dict = parse_qiime_parameters(open(opts.method_params_fp, 'U')) - - if module == 'raxml_v730': - # load the reference sequences - load_ref_aln = \ - DenseAlignment(parse_fasta(open(opts.refseq_fp, 'U'))) - - # combine and load the reference plus query - combined_aln = parse_fasta(StringIO(load_ref_aln.toFasta() + - '\n' + aln.toFasta())) - # overwrite the alignment map - aln = DenseAlignment(combined_aln) - seqs, align_map = aln.toPhylip() - - try: - parameters = param_dict['raxml'] - except: - parameters = {} - - tree = convert_tree_tips(align_map, opts.starting_tree_fp) - - # write out the tree with phylip labels - updated_tree_fp = join(output_dir, - '%s_phylip_named_tree.tre' % (module)) - write_updated_tree_file(updated_tree_fp, tree) - - # set the primary parameters for raxml - parameters['-w'] = abspath(output_dir) + '/' - fd, parameters["-n"] = mkstemp() - close(fd) - parameters["-t"] = updated_tree_fp - - if "-f" not in parameters: - parameters["-f"] = 'v' - if "-m" not in parameters: - parameters["-m"] = 'GTRGAMMA' - - elif module == 'pplacer': - try: - parameters = param_dict['pplacer'] - except: - parameters = {} - - # make sure stats file is passed - if not opts.stats_fp: - raise IOError( - 'When using pplacer, the RAxML produced info file is required.') - - # set the primary parameters for pplacer - allow for user-defined - parameters['--out-dir'] = abspath(output_dir) + '/' - parameters["-t"] = opts.starting_tree_fp - parameters['-r'] = opts.refseq_fp - parameters['-s'] = opts.stats_fp - - elif module == 'parsinsert': - try: - parameters = param_dict['parsinsert'] - except: - parameters = {} - - # define log fp - log_fp = join(output_dir, 'parsinsert.log') - - # define tax assignment values fp - tax_assign_fp = join(output_dir, 'parsinsert_assignments.log') - parameters["-l"] = log_fp - parameters["-o"] = tax_assign_fp - parameters["-s"] = opts.refseq_fp - parameters["-t"] = opts.starting_tree_fp - - # call the module and return a tree object - result = \ - tree_insertion_module_names[module].insert_sequences_into_tree(seqs, - moltype=DNA, params=parameters) - - result_tree = strip_and_rename_unwanted_labels_from_tree(align_map, result) - - # write out the resulting tree - final_tree = join(output_dir, '%s_final_placement.tre' % (module)) - write_updated_tree_file(final_tree, result) - - -if __name__ == "__main__": - main() diff --git a/scripts/print_qiime_config.py b/scripts/print_qiime_config.py index 9767c7ad37..8090f7801c 100755 --- a/scripts/print_qiime_config.py +++ b/scripts/print_qiime_config.py @@ -684,51 +684,6 @@ def test_rtax_supported_version(self): "Unsupported rtax version. %s is required, but running %s." % ('.'.join(map(str, acceptable_version)), version_string)) - def test_pplacer_supported_version(self): - """pplacer is in path and version is supported """ - acceptable_version = [(1, 1), (1, 1)] - self.assertTrue(which('pplacer'), - "pplacer not found. This may or may not be a problem depending on " + - "which components of QIIME you plan to use.") - command = "pplacer --version" - proc = Popen(command, shell=True, universal_newlines=True, - stdout=PIPE, stderr=STDOUT) - stdout = proc.stdout.read() - version_string = stdout.strip()[1:4] - try: - version = tuple(map(int, version_string.split('.'))) - pass_test = version in acceptable_version - except ValueError: - pass_test = False - version_string = stdout - self.assertTrue(pass_test, - "Unsupported pplacer version. %s is required, but running %s." - % ('.'.join(map(str, acceptable_version)), version_string)) - - def test_ParsInsert_supported_version(self): - """ParsInsert is in path and version is supported """ - acceptable_version = ["1.04"] - self.assertTrue(which('ParsInsert'), - "ParsInsert not found. This may or may not be a problem depending on " + - "which components of QIIME you plan to use.") - command = "ParsInsert -v | grep App | awk '{print $3}'" - proc = Popen(command, shell=True, universal_newlines=True, - stdout=PIPE, stderr=STDOUT) - stdout = proc.stdout.read() - - # remove log file generated - remove_files(['ParsInsert.log'], error_on_missing=False) - - version_string = stdout.strip() - try: - pass_test = version_string in acceptable_version - except ValueError: - pass_test = False - version_string = stdout - self.assertTrue(pass_test, - "Unsupported ParsInsert version. %s is required, but running %s." - % ('.'.join(map(str, acceptable_version)), version_string)) - def test_usearch_supported_version(self): """usearch is in path and version is supported """ acceptable_version = [(5, 2, 236), (5, 2, 236)] diff --git a/scripts/split_libraries_fastq.py b/scripts/split_libraries_fastq.py index 9cf9df0c39..c81c443801 100755 --- a/scripts/split_libraries_fastq.py +++ b/scripts/split_libraries_fastq.py @@ -12,8 +12,9 @@ from os import rename -from cogent import DNA -from cogent.util.misc import safe_md5, create_dir +from skbio.util.misc import safe_md5, create_dir +from skbio.core.sequence import DNA + from qiime.util import parse_command_line_parameters, make_option, gzip_open from qiime.parse import parse_mapping_file from qiime.split_libraries_fastq import (process_fastq_single_end_read_file, @@ -279,7 +280,7 @@ def fastq_writer(h, s, q): barcode_to_sample_id = {} if rev_comp_mapping_barcodes: - barcode_to_sample_id = {DNA.rc(k): v for k, v in + barcode_to_sample_id = {str(DNA(k).rc()): v for k, v in barcode_to_sample_id.iteritems()} if barcode_type == 'golay_12': @@ -311,7 +312,7 @@ def fastq_writer(h, s, q): if barcode_read_fp is not None: log_f.write('Barcode read filepath: %s (md5: %s)\n\n' % - (barcode_read_fp, + (barcode_read_fp, safe_md5(open(barcode_read_fp)).hexdigest())) if barcode_read_fp.endswith('.gz'): diff --git a/tests/test_align_seqs.py b/tests/test_align_seqs.py index 22469b678d..814d2dfb21 100644 --- a/tests/test_align_seqs.py +++ b/tests/test_align_seqs.py @@ -13,15 +13,17 @@ from os import remove, close from os.path import getsize from tempfile import mkstemp -from cogent import LoadSeqs, DNA -from cogent.core.alignment import DenseAlignment, Alignment from unittest import TestCase, main + from numpy.testing import assert_almost_equal -from qiime.align_seqs import (compute_min_alignment_length, - Aligner, CogentAligner, PyNastAligner, InfernalAligner, - alignment_module_names, - ) +from skbio.core.exception import SequenceCollectionError +from skbio.core.alignment import SequenceCollection, Alignment +from skbio.core.sequence import DNA +from skbio.parse.sequences import parse_fasta +from qiime.align_seqs import (compute_min_alignment_length, + Aligner, CogentAligner, PyNastAligner, + InfernalAligner, alignment_module_names) def remove_files(list_of_filepaths, error_on_missing=True): missing = [] @@ -69,7 +71,8 @@ def setUp(self): fd, self.input_fp = mkstemp( prefix='CogentAlignerTests_', suffix='.fasta') close(fd) - open(self.input_fp, 'w').write(seqs_for_muscle) + with open(self.input_fp, 'w') as in_f: + in_f.write(seqs_for_muscle) self._paths_to_clean_up =\ [self.input_fp] @@ -119,16 +122,14 @@ def setUp(self): fd, self.infernal_test1_input_fp = mkstemp( prefix='InfernalAlignerTests_', suffix='.fasta') close(fd) - open( - self.infernal_test1_input_fp, - 'w').write( - infernal_test1_input_fasta) + with open(self.infernal_test1_input_fp, 'w') as in_f: + in_f.write('\n'.join(infernal_test1_input_fasta)) fd, self.infernal_test1_template_fp = mkstemp( prefix='InfernalAlignerTests_', suffix='template.sto') close(fd) - open(self.infernal_test1_template_fp, 'w').\ - write(infernal_test1_template_stockholm) + with open(self.infernal_test1_template_fp, 'w') as in_f: + in_f.write(infernal_test1_template_stockholm) # create temp file names (and touch them so we can reliably # clean them up) @@ -152,9 +153,9 @@ def setUp(self): self.infernal_test1_aligner = InfernalAligner({ 'template_filepath': self.infernal_test1_template_fp, }) - self.infernal_test1_expected_aln = \ - LoadSeqs(data=infernal_test1_expected_alignment, aligned=Alignment, - moltype=DNA) + self.infernal_test1_expected_aln = Alignment.from_fasta_records( + parse_fasta(infernal_test1_expected_alignment), + DNA) def test_call_infernal_test1_file_output(self): """InfernalAligner writes correct output files for infernal_test1 seqs @@ -168,7 +169,9 @@ def test_call_infernal_test1_file_output(self): "Result should be None when result path provided.") expected_aln = self.infernal_test1_expected_aln - actual_aln = LoadSeqs(self.result_fp, aligned=Alignment) + with open(self.result_fp) as result_f: + actual_aln = Alignment.from_fasta_records(parse_fasta( + result_f), DNA) self.assertEqual(actual_aln, expected_aln) def test_call_infernal_test1(self): @@ -178,7 +181,7 @@ def test_call_infernal_test1(self): expected_aln = self.infernal_test1_expected_aln expected_names = ['seq_1', 'seq_2', 'seq_3'] - self.assertEqual(sorted(actual_aln.Names), expected_names) + self.assertEqual(sorted(actual_aln.identifiers()), expected_names) self.assertEqual(actual_aln, expected_aln) @@ -190,31 +193,32 @@ def setUp(self): fd, self.pynast_test1_input_fp = mkstemp( prefix='PyNastAlignerTests_', suffix='.fasta') close(fd) - open(self.pynast_test1_input_fp, 'w').write(pynast_test1_input_fasta) + with open(self.pynast_test1_input_fp, 'w') as f: + f.write(pynast_test1_input_fasta) fd, self.pynast_test1_template_fp = mkstemp( prefix='PyNastAlignerTests_', suffix='template.fasta') close(fd) - open(self.pynast_test1_template_fp, 'w').\ - write(pynast_test1_template_fasta) + with open(self.pynast_test1_template_fp, 'w') as f: + f.write(pynast_test1_template_fasta) fd, self.pynast_test_template_w_dots_fp = mkstemp( prefix='PyNastAlignerTests_', suffix='template.fasta') close(fd) - open(self.pynast_test_template_w_dots_fp, 'w').\ - write(pynast_test1_template_fasta.replace('-', '.')) + with open(self.pynast_test_template_w_dots_fp, 'w') as f: + f.write(pynast_test1_template_fasta.replace('-', '.')) fd, self.pynast_test_template_w_u_fp = mkstemp( prefix='PyNastAlignerTests_', suffix='template.fasta') close(fd) - open(self.pynast_test_template_w_u_fp, 'w').\ - write(pynast_test1_template_fasta.replace('T', 'U')) + with open(self.pynast_test_template_w_u_fp, 'w') as f: + f.write(pynast_test1_template_fasta.replace('T', 'U')) fd, self.pynast_test_template_w_lower_fp = mkstemp( prefix='PyNastAlignerTests_', suffix='template.fasta') close(fd) - open(self.pynast_test_template_w_lower_fp, 'w').\ - write(pynast_test1_template_fasta.lower()) + with open(self.pynast_test_template_w_lower_fp, 'w') as f: + f.write(pynast_test1_template_fasta.lower()) # create temp file names (and touch them so we can reliably # clean them up) @@ -247,12 +251,11 @@ def setUp(self): 'min_len': 15, }) - self.pynast_test1_expected_aln = \ - LoadSeqs( - data=pynast_test1_expected_alignment, - aligned=DenseAlignment) - self.pynast_test1_expected_fail = \ - LoadSeqs(data=pynast_test1_expected_failure, aligned=False) + self.pynast_test1_expected_aln = Alignment.from_fasta_records( + parse_fasta(pynast_test1_expected_alignment), + DNA) + self.pynast_test1_expected_fail = SequenceCollection.from_fasta_records( + parse_fasta(pynast_test1_expected_failure), DNA) def test_call_pynast_test1_file_output(self): """PyNastAligner writes correct output files for pynast_test1 seqs @@ -266,12 +269,16 @@ def test_call_pynast_test1_file_output(self): "Result should be None when result path provided.") expected_aln = self.pynast_test1_expected_aln - actual_aln = LoadSeqs(self.result_fp, aligned=DenseAlignment) + with open(self.result_fp) as result_f: + actual_aln = Alignment.from_fasta_records(parse_fasta( + result_f), DNA) self.assertEqual(actual_aln, expected_aln) - actual_fail = LoadSeqs(self.failure_fp, aligned=False) - self.assertEqual(actual_fail.toFasta(), - self.pynast_test1_expected_fail.toFasta()) + with open(self.failure_fp) as failure_f: + actual_fail = SequenceCollection.from_fasta_records( + parse_fasta(failure_f), DNA) + self.assertEqual(actual_fail.to_fasta(), + self.pynast_test1_expected_fail.to_fasta()) def test_call_pynast_test1_file_output_alt_params(self): """PyNastAligner writes correct output files when no seqs align @@ -291,8 +298,10 @@ def test_call_pynast_test1_file_output_alt_params(self): "No alignable seqs should result in an empty file.") # all seqs reported to fail - actual_fail = LoadSeqs(self.failure_fp, aligned=False) - self.assertEqual(actual_fail.getNumSeqs(), 3) + with open(self.failure_fp) as failure_f: + actual_fail = SequenceCollection.from_fasta_records( + parse_fasta(failure_f), DNA) + self.assertEqual(actual_fail.sequence_count(), 3) def test_call_pynast_test1(self): """PyNastAligner: functions as expected when returing objects @@ -301,7 +310,7 @@ def test_call_pynast_test1(self): expected_aln = self.pynast_test1_expected_aln expected_names = ['1 description field 1..23', '2 1..23'] - self.assertEqual(actual_aln.Names, expected_names) + self.assertEqual(actual_aln.identifiers(), expected_names) self.assertEqual(actual_aln, expected_aln) def test_call_pynast_template_aln_with_dots(self): @@ -315,7 +324,7 @@ def test_call_pynast_template_aln_with_dots(self): expected_aln = self.pynast_test1_expected_aln expected_names = ['1 description field 1..23', '2 1..23'] - self.assertEqual(actual_aln.Names, expected_names) + self.assertEqual(actual_aln.identifiers(), expected_names) self.assertEqual(actual_aln, expected_aln) def test_call_pynast_template_aln_with_lower(self): @@ -329,7 +338,7 @@ def test_call_pynast_template_aln_with_lower(self): expected_aln = self.pynast_test1_expected_aln expected_names = ['1 description field 1..23', '2 1..23'] - self.assertEqual(actual_aln.Names, expected_names) + self.assertEqual(actual_aln.identifiers(), expected_names) self.assertEqual(actual_aln, expected_aln) def test_call_pynast_template_aln_with_U(self): @@ -339,7 +348,8 @@ def test_call_pynast_template_aln_with_U(self): 'template_filepath': self.pynast_test_template_w_u_fp, 'min_len': 15, }) - self.assertRaises(KeyError, pynast_aligner, self.pynast_test1_input_fp) + self.assertRaises(SequenceCollectionError, pynast_aligner, + self.pynast_test1_input_fp) def test_call_pynast_alt_pairwise_method(self): """PyNastAligner: alternate pairwise alignment method produces correct alignment @@ -362,7 +372,7 @@ def test_call_pynast_test1_alt_min_len(self): actual_aln = aligner( self.pynast_test1_input_fp) - expected_aln = {} + expected_aln = Alignment([]) self.assertEqual(actual_aln, expected_aln) @@ -375,7 +385,7 @@ def test_call_pynast_test1_alt_min_pct(self): 'min_pct': 100.0}) actual_aln = aligner(self.pynast_test1_input_fp) - expected_aln = {} + expected_aln = Alignment([]) self.assertEqual(actual_aln, expected_aln) @@ -420,7 +430,7 @@ def test_compute_min_alignment_length(self): >seq_2 GCTACGTAGCTAC >seq_3 -GCGGCTATTAGATCGTA""" +GCGGCTATTAGATCGTA""".split('\n') infernal_test1_template_stockholm = """# STOCKHOLM 1.0 seq_a TAGGCTCTGATATAATAGC-TCTC--------- @@ -435,7 +445,7 @@ def test_compute_min_alignment_length(self): --------GCTACG-TAGCTAC----------- >seq_3 -----GCGGCTATTAGATC-GTA---------- -""" +""".split('\n') pynast_test1_template_fasta = """>1 ACGT--ACGTAC-ATA-C-----CC-T-G-GTA-G-T--- @@ -469,11 +479,11 @@ def test_compute_min_alignment_length(self): ACCTACGT-TA--ATA-C-----CC-T-G-GTA-G-T--- >2 1..23 ACCTACGT-TA--ATA-C-----CC-T-G-GTA-G-T--- -""" +""".split('\n') pynast_test1_expected_failure = """>3 AA -""" +""".split('\n') # run unit tests if run from command-line if __name__ == '__main__': diff --git a/tests/test_assign_taxonomy.py b/tests/test_assign_taxonomy.py index d6ef806e0b..f425a0ca29 100644 --- a/tests/test_assign_taxonomy.py +++ b/tests/test_assign_taxonomy.py @@ -22,10 +22,11 @@ from unittest import TestCase, main from numpy.testing import assert_almost_equal, assert_allclose -from cogent import LoadSeqs -from cogent.app.util import ApplicationError -from cogent.util.misc import remove_files, create_dir +from skbio.app.util import ApplicationError +from skbio.util.misc import remove_files, create_dir from skbio.parse.sequences import parse_fasta +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA from brokit.rdp_classifier import train_rdp_classifier from brokit.formatdb import build_blast_db_from_fasta_path @@ -160,7 +161,8 @@ def test_uclust_assigner_write_to_file(self): self.assertTrue(exists(self.output_log_fp)) # check that result has the expected lines - output_lines = list(open(self.output_txt_fp, 'U')) + with open(self.output_txt_fp, 'U') as f: + output_lines = list(f) self.assertTrue('q1\tA;F;G\t1.00\t1\n' in output_lines) self.assertTrue('q2\tA;H;I;J\t1.00\t1\n' in output_lines) @@ -409,11 +411,13 @@ def setUp(self): [self.id_to_taxonomy_fp, self.input_seqs_fp, self.reference_seqs_fp] - - open(self.id_to_taxonomy_fp, 'w').write(id_to_taxonomy_string) - open(self.input_seqs_fp, 'w').write(test_seq_coll.toFasta()) - self.test_seqs = test_seq_coll.items() - open(self.reference_seqs_fp, 'w').write(test_refseq_coll.toFasta()) + with open(self.id_to_taxonomy_fp, 'w') as f: + f.write(id_to_taxonomy_string) + with open(self.input_seqs_fp, 'w') as f: + f.write(test_seq_coll.to_fasta()) + self.test_seqs = [(e.identifier, str(e)) for e in test_seq_coll] + with open(self.reference_seqs_fp, 'w') as f: + f.write(test_refseq_coll.to_fasta()) self.expected1 = { 's1': @@ -569,7 +573,8 @@ def test_call_alt_input_types(self): self.assertRaises(AssertionError, p) # Functions with a list of (seq_id, seq) pairs - seqs = list(parse_fasta(open(self.input_seqs_fp))) + with open(self.input_seqs_fp) as f: + seqs = list(parse_fasta(f)) actual = p(seqs=seqs) self.assertEqual(actual, self.expected1) @@ -607,7 +612,8 @@ def test_seqs_to_taxonomy(self): self._paths_to_clean_up += files_to_remove # read the input file into (seq_id, seq) pairs - seqs = list(parse_fasta(open(self.input_seqs_fp))) + with open(self.input_seqs_fp) as f: + seqs = list(parse_fasta(f)) actual = p._seqs_to_taxonomy(seqs, blast_db, id_to_taxonomy_map) self.assertEqual(actual, self.expected1) @@ -695,7 +701,7 @@ def test_call_logs_run(self): # NOTE: Since p.params is a dict, the order of lines is not # guaranteed, so testing is performed to make sure that # the equal unordered lists of lines is present in actual and expected - + self.assertItemsEqual(log_file_str.split('\n'), log_file_exp) @@ -726,12 +732,16 @@ def setUp(self): self.reference_seqs_fp, self.read_1_seqs_fp, self.read_2_seqs_fp] - - open(self.id_to_taxonomy_fp, 'w').write(rtax_reference_taxonomy) - open(self.input_seqs_fp, 'w').write(rtax_test_repset_fasta) - open(self.reference_seqs_fp, 'w').write(rtax_reference_fasta) - open(self.read_1_seqs_fp, 'w').write(rtax_test_read1_fasta) - open(self.read_2_seqs_fp, 'w').write(rtax_test_read2_fasta) + with open(self.id_to_taxonomy_fp, 'w') as f: + f.write(rtax_reference_taxonomy) + with open(self.input_seqs_fp, 'w') as f: + f.write(rtax_test_repset_fasta) + with open(self.reference_seqs_fp, 'w') as f: + f.write(rtax_reference_fasta) + with open(self.read_1_seqs_fp, 'w') as f: + f.write(rtax_test_read1_fasta) + with open(self.read_2_seqs_fp, 'w') as f: + f.write(rtax_test_read2_fasta) def tearDown(self): remove_files(set(self._paths_to_clean_up), error_on_missing=False) @@ -1038,7 +1048,7 @@ def test_train_on_the_fly(self): """ input_seqs_file = NamedTemporaryFile( prefix='RdpTaxonAssignerTest_', suffix='.fasta') - input_seqs_file.write(test_seq_coll.toFasta()) + input_seqs_file.write(test_seq_coll.to_fasta()) input_seqs_file.seek(0) exp_assignments = rdp_trained_test1_expected_dict @@ -1079,7 +1089,7 @@ def test_train_on_the_fly_low_memory(self): """ input_seqs_file = NamedTemporaryFile( prefix='RdpTaxonAssignerTest_', suffix='.fasta') - input_seqs_file.write(test_seq_coll.toFasta()) + input_seqs_file.write(test_seq_coll.to_fasta()) input_seqs_file.seek(0) exp_assignments = rdp_trained_test1_expected_dict @@ -1186,7 +1196,7 @@ def test_call_with_properties_file(self): # confidence is above threshold self.assertTrue(actual[seq_id][1] >= min_confidence) # confidence roughly matches expected - assert_allclose(actual[seq_id][1], expected[seq_id][1], + assert_allclose(actual[seq_id][1], expected[seq_id][1], atol=0.1) # check if the assignment is correct -- this must happen # at least once per seq_id for the test to pass @@ -1222,7 +1232,8 @@ def test_call_result_to_file(self): seq_path=self.tmp_seq_filepath, result_path=self.tmp_res_filepath, log_path=None) - actual = [l.strip() for l in open(self.tmp_res_filepath, 'r')] + with open(self.tmp_res_filepath, 'r') as f: + actual = [l.strip() for l in f] message = "Expected return value of None but observed %s" % retval self.assertTrue(retval is None, message) for j in range(num_seqs): @@ -1250,7 +1261,8 @@ def test_log(self): ) # open the actual log file and the expected file, and pass into lists - obs = [l.strip() for l in list(open(self.tmp_log_filepath, 'r'))] + with open(self.tmp_log_filepath) as f: + obs = [l.strip() for l in list(f)] exp = rdp_test1_log_file_contents.split('\n') # sort the lists as the entries are written from a dict, # so order may vary @@ -1321,11 +1333,13 @@ def test_get_rdp_taxonomy(self): def test_fix_output_file(self): fd, fp = mkstemp() close(fd) - open(fp, 'w').write(self.tagged_str) + with open(fp, 'w') as f: + f.write(self.tagged_str) s = RdpTrainingSet() s.fix_output_file(fp) - obs = open(fp).read() + with open(fp) as f: + obs = f.read() remove(fp) self.assertEqual(obs, self.untagged_str) @@ -1538,7 +1552,7 @@ def test_get_rdp_taxonomy(self): DQ260310\tArchaea;Euryarchaeota;Methanobacteriales;Methanobacterium EF503697\tArchaea;Crenarchaeota;uncultured;uncultured""" -test_seq_coll = LoadSeqs(data=[ +test_seq_coll = SequenceCollection.from_fasta_records([ ('s1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC'), ('s2', @@ -1549,9 +1563,9 @@ def test_get_rdp_taxonomy(self): 'GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT'), ('s5', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC'), - ('s6', 'ATAGTAGGTGATTGCGAAGACCGCGGAACCGGGACCTAGCACCCAGCCTGTACCGAGGGATGGGGAGCTGTGGCGGTCCACCGACGACCCTTTGTGACAGCCGATTCCTACAATCCCAGCAACTGCAATGATCCACTCTAGTCGGCATAACCGGGAATCGTTAACCTGGTAGGGTTCTCTACGTCTGAGTCTACAGCCCAGAGCAGTCAGGCTACTATACGGTTTGCTGCATTGCATAGGCATCGGTCGCGGGCACTCCTCGCGGTTTCAGCTAGGGTTTAAATGGAGGGTCGCTGCATGAGTATGCAAATAGTGCCACTGCTCTGATACAGAGAAGTGTTGATATGACACCTAAGACCTGGTCACAGTTTTAACCTGCCTACGCACACCAGTGTGCTATTGATTAACGATATCGGTAGACACGACCTTGGTAACCTGACTAACCTCATGGAAAGTGACTAGATAAATGGACCGGAGCCAACTTTCACCCGGAAAACGGACCGACGAATCGTCGTAGACTACCGATCTGACAAAATAAGCACGAGGGAGCATGTTTTGCGCAGGCTAGCCTATTCCCACCTCAAGCCTCGAGAACCAAGACGCCTGATCCGGTGCTGCACGAAGGGTCGCCTCTAGGTAAGGAGAGCTGGCATCTCCAGATCCGATATTTTACCCAACCTTTGCGCGCTCAGATTGTTATAGTGAAACGATTTAAGCCTGAACGGAGTTCCGCTCCATATGTGGGTTATATATGTGAGATGTATTAACTTCCGCAGTTGTCTCTTTCGGTGCAGTACGCTTGGTATGTGTCTCAAATAATCGGTATTATAGTGATCTGAGAGGTTTTAAG')], aligned=False) + ('s6', 'ATAGTAGGTGATTGCGAAGACCGCGGAACCGGGACCTAGCACCCAGCCTGTACCGAGGGATGGGGAGCTGTGGCGGTCCACCGACGACCCTTTGTGACAGCCGATTCCTACAATCCCAGCAACTGCAATGATCCACTCTAGTCGGCATAACCGGGAATCGTTAACCTGGTAGGGTTCTCTACGTCTGAGTCTACAGCCCAGAGCAGTCAGGCTACTATACGGTTTGCTGCATTGCATAGGCATCGGTCGCGGGCACTCCTCGCGGTTTCAGCTAGGGTTTAAATGGAGGGTCGCTGCATGAGTATGCAAATAGTGCCACTGCTCTGATACAGAGAAGTGTTGATATGACACCTAAGACCTGGTCACAGTTTTAACCTGCCTACGCACACCAGTGTGCTATTGATTAACGATATCGGTAGACACGACCTTGGTAACCTGACTAACCTCATGGAAAGTGACTAGATAAATGGACCGGAGCCAACTTTCACCCGGAAAACGGACCGACGAATCGTCGTAGACTACCGATCTGACAAAATAAGCACGAGGGAGCATGTTTTGCGCAGGCTAGCCTATTCCCACCTCAAGCCTCGAGAACCAAGACGCCTGATCCGGTGCTGCACGAAGGGTCGCCTCTAGGTAAGGAGAGCTGGCATCTCCAGATCCGATATTTTACCCAACCTTTGCGCGCTCAGATTGTTATAGTGAAACGATTTAAGCCTGAACGGAGTTCCGCTCCATATGTGGGTTATATATGTGAGATGTATTAACTTCCGCAGTTGTCTCTTTCGGTGCAGTACGCTTGGTATGTGTCTCAAATAATCGGTATTATAGTGATCTGAGAGGTTTTAAG')], DNA) -test_refseq_coll = LoadSeqs(data=[ +test_refseq_coll = SequenceCollection.from_fasta_records([ ('AY800210', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC'), ('EU883771', @@ -1560,7 +1574,7 @@ def test_get_rdp_taxonomy(self): 'AAGAATGGGGATAGCATGCGAGTCACGCCGCAATGTGTGGCATACGGCTCAGTAACACGTAGTCAACATGCCCAGAGGACGTGGACACCTCGGGAAACTGAGGATAAACCGCGATAGGCCACTACTTCTGGAATGAGCCATGACCCAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGGAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCACGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG'), ('DQ260310', 'GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT'), - ('EF503697', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC')], aligned=False) + ('EF503697', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC')], DNA) # sample data copied from GreenGenes diff --git a/tests/test_filter_otus_by_sample.py b/tests/test_filter_otus_by_sample.py index 881eefdb6c..176d8b4603 100644 --- a/tests/test_filter_otus_by_sample.py +++ b/tests/test_filter_otus_by_sample.py @@ -14,8 +14,12 @@ from os.path import exists from unittest import TestCase, main from os import remove -from cogent import LoadSeqs import shutil + +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA +from skbio.parse.sequences import parse_fasta + from qiime.filter_otus_by_sample import (filter_otus, filter_aln_by_otus, process_extract_samples) @@ -68,7 +72,7 @@ def test_filter_aln_by_otus(self): exp2 = [] exp2.append(('SampleB', 'CCCCCCC')) exp2.append(('SampleC', 'GGGGGGGGGGGGGG')) - aln = LoadSeqs(data=self.aln, aligned=False) + aln = SequenceCollection.from_fasta_records(self.aln, DNA) obs1, obs2 = filter_aln_by_otus(aln, self.prefs) diff --git a/tests/test_identify_chimeric_seqs.py b/tests/test_identify_chimeric_seqs.py index 890030c559..d7876386c3 100755 --- a/tests/test_identify_chimeric_seqs.py +++ b/tests/test_identify_chimeric_seqs.py @@ -14,10 +14,11 @@ from os.path import exists, split, splitext, join from shutil import rmtree from tempfile import mkstemp, mkdtemp - -from cogent import LoadSeqs, DNA from unittest import TestCase, main -from cogent.util.misc import remove_files + +from skbio.util.misc import remove_files +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA from brokit.formatdb import build_blast_db_from_fasta_file @@ -49,9 +50,12 @@ def setUp(self): self.input_seqs_fp, self.reference_seqs_fp] - open(self.id_to_taxonomy_fp, 'w').write(id_to_taxonomy_string) - open(self.input_seqs_fp, 'w').write(test_seq_coll.toFasta()) - open(self.reference_seqs_fp, 'w').write(test_refseq_coll.toFasta()) + with open(self.id_to_taxonomy_fp, 'w') as f: + f.write(id_to_taxonomy_string) + with open(self.input_seqs_fp, 'w') as f: + f.write(test_seq_coll.to_fasta()) + with open(self.reference_seqs_fp, 'w') as f: + f.write(test_refseq_coll.to_fasta()) self.bcc = None @@ -92,7 +96,7 @@ def test_init_creates_db(self): def test_function_w_preexisting_blastdb(self): blast_db, db_files_to_remove = \ build_blast_db_from_fasta_file( - test_refseq_coll.toFasta().split('\n')) + test_refseq_coll.to_fasta().split('\n')) self._paths_to_clean_up += db_files_to_remove params = {'id_to_taxonomy_fp': self.id_to_taxonomy_fp, 'reference_seqs_fp': None, @@ -231,7 +235,7 @@ def test_get_taxonomy(self): params = {'id_to_taxonomy_fp': self.id_to_taxonomy_fp, 'reference_seqs_fp': self.reference_seqs_fp} self.bcc = BlastFragmentsChimeraChecker(params) - s1 = test_seq_coll.getSeq('s1') + s1 = test_seq_coll['s1'] actual = self.bcc._get_taxonomy(str(s1)) expected = "Archaea;Euryarchaeota;Halobacteriales;uncultured" self.assertEqual(actual, expected) @@ -708,7 +712,7 @@ def test_merge_clusters_chimeras(self): EF503697\tArchaea;Crenarchaeota;uncultured;uncultured""" -test_refseq_coll = LoadSeqs(data=[ +test_refseq_coll = SequenceCollection.from_fasta_records([ ('AY800210', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC'), ('EU883771', @@ -717,9 +721,9 @@ def test_merge_clusters_chimeras(self): 'AAGAATGGGGATAGCATGCGAGTCACGCCGCAATGTGTGGCATACGGCTCAGTAACACGTAGTCAACATGCCCAGAGGACGTGGACACCTCGGGAAACTGAGGATAAACCGCGATAGGCCACTACTTCTGGAATGAGCCATGACCCAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGGAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCACGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG'), ('DQ260310', 'GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT'), - ('EF503697', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC')], aligned=False) + ('EF503697', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC')], DNA) -test_seq_coll = LoadSeqs(data=[ +test_seq_coll = SequenceCollection.from_fasta_records([ ('s1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC'), ('s2', @@ -732,7 +736,7 @@ def test_merge_clusters_chimeras(self): 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC'), ('s6', 'ATAGTAGGTGATTGCGAAGACCGCGGAACCGGGACCTAGCACCCAGCCTGTACCGAGGGATGGGGAGCTGTGGCGGTCCACCGACGACCCTTTGTGACAGCCGATTCCTACAATCCCAGCAACTGCAATGATCCACTCTAGTCGGCATAACCGGGAATCGTTAACCTGGTAGGGTTCTCTACGTCTGAGTCTACAGCCCAGAGCAGTCAGGCTACTATACGGTTTGCTGCATTGCATAGGCATCGGTCGCGGGCACTCCTCGCGGTTTCAGCTAGGGTTTAAATGGAGGGTCGCTGCATGAGTATGCAAATAGTGCCACTGCTCTGATACAGAGAAGTGTTGATATGACACCTAAGACCTGGTCACAGTTTTAACCTGCCTACGCACACCAGTGTGCTATTGATTAACGATATCGGTAGACACGACCTTGGTAACCTGACTAACCTCATGGAAAGTGACTAGATAAATGGACCGGAGCCAACTTTCACCCGGAAAACGGACCGACGAATCGTCGTAGACTACCGATCTGACAAAATAAGCACGAGGGAGCATGTTTTGCGCAGGCTAGCCTATTCCCACCTCAAGCCTCGAGAACCAAGACGCCTGATCCGGTGCTGCACGAAGGGTCGCCTCTAGGTAAGGAGAGCTGGCATCTCCAGATCCGATATTTTACCCAACCTTTGCGCGCTCAGATTGTTATAGTGAAACGATTTAAGCCTGAACGGAGTTCCGCTCCATATGTGGGTTATATATGTGAGATGTATTAACTTCCGCAGTTGTCTCTTTCGGTGCAGTACGCTTGGTATGTGTCTCAAATAATCGGTATTATAGTGATCTGAGAGGTTTTAAG'), - ('c1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG')], aligned=False) + ('c1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG')], DNA) # Test data taken from the ChimeraSlayer sample data diff --git a/tests/test_insert_seqs_into_tree.py b/tests/test_insert_seqs_into_tree.py deleted file mode 100755 index 71513c8153..0000000000 --- a/tests/test_insert_seqs_into_tree.py +++ /dev/null @@ -1,114 +0,0 @@ -#!/usr/bin/env python -# File created on 11 Oct 2011 -from __future__ import division - -__author__ = "Jesse Stombaugh" -__copyright__ = "Copyright 2011, The QIIME project" -__credits__ = ["Jesse Stombaugh"] -__license__ = "GPL" -__version__ = "1.8.0-dev" -__maintainer__ = "Jesse Stombaugh" -__email__ = "jesse.stombaugh@colorado.edu" - -from os import close -from unittest import TestCase, main -from qiime.insert_seqs_into_tree import convert_tree_tips, \ - write_updated_tree_file, \ - strip_and_rename_unwanted_labels_from_tree -from os import getcwd, remove, rmdir -from tempfile import mkstemp - -from cogent.parse.tree import DndParser -from cogent.core.tree import PhyloNode -from StringIO import StringIO - - -class Tests(TestCase): - - def setUp(self): - '''setup the files for testing pplacer''' - - # create a list of files to cleanup - self._paths_to_clean_up = [] - self._dirs_to_clean_up = [] - - # get a tmp filename to use - fd, self.basename = mkstemp() - close(fd) - - self.align_map = { - 'seq0000005': 'Species005', 'seq0000004': 'Species004', - 'seq0000007': 'Species007', - 'seq0000006': 'Species006', - 'seq0000001': 'Species001', - 'seq0000003': 'Species003', - 'seq0000002': 'Species002'} - - # create and write out RAxML stats file - self.tmp_tree_fname = self.basename + '.tre' - tree_out = open(self.tmp_tree_fname, 'w') - tree_out.write(STARTING_TREE) - tree_out.close() - self._paths_to_clean_up.append(self.tmp_tree_fname) - - def tearDown(self): - """cleans up all files initially created""" - # remove the tempdir and contents - map(remove, self._paths_to_clean_up) - map(rmdir, self._dirs_to_clean_up) - - -class insertSeqsTests(Tests): - - """Tests for the pplacer application controller""" - - def test_convert_tree_tips(self): - """Convert tree tips to phylip labels""" - - # convert tree tips to PHYLIP labels - tree = convert_tree_tips(self.align_map, self.tmp_tree_fname) - - self.assertEqual(tree.getNewick(with_distances=True), - PHYLIP_TREE) - - def test_write_updated_tree_file(self): - """Write tree out""" - - # create temp filename - fd, new_tree_fp = mkstemp(suffix='.tre') - close(fd) - self._paths_to_clean_up.append(new_tree_fp) - - # parse and load tree - tree = DndParser(StringIO(STARTING_TREE), constructor=PhyloNode) - - # write out temp tree - write_updated_tree_file(new_tree_fp, tree) - - self.assertTrue(open(new_tree_fp).read() > 0) - - def test_strip_and_rename_unwanted_labels_from_tree(self): - """Remove unwanted text from Tip labels""" - - # parse and load tree - result = DndParser( - StringIO(RESULTING_QUERY_TREE), - constructor=PhyloNode) - - # strip and rename tips - result_tree = strip_and_rename_unwanted_labels_from_tree( - self.align_map, - result) - self.assertEqual(result_tree.getNewick(with_distances=True), - STRIPPED_TREE) - -STARTING_TREE = """(Species002:0.00000043418318065054,((Species003:0.01932550067944402081,Species004:0.08910446960529855298):0.00000043418318065054,Species005:0.17394765077611337722):0.00000043418318065054,Species001:0.00000043418318065054):0.0;""" - -PHYLIP_TREE = """(seq0000002:4.34183180651e-07,((seq0000003:0.0193255006794,seq0000004:0.0891044696053):4.34183180651e-07,seq0000005:0.173947650776):4.34183180651e-07,seq0000001:4.34183180651e-07):0.0;""" - -RESULTING_QUERY_TREE = """(seq0000003:1.0,(QUERY___seq0000006___1,QUERY___seq0000007___1,seq0000004:1.0):1.0,(seq0000005:1.0,(seq0000001:1.0,seq0000002:1.0):1.0):1.0);""" - -STRIPPED_TREE = """(Species003:1.0,(Species006,Species007,Species004:1.0):1.0,(Species005:1.0,(Species001:1.0,Species002:1.0):1.0):1.0);""" - -if __name__ == "__main__": - main() diff --git a/tests/test_make_phylogeny.py b/tests/test_make_phylogeny.py index be87ffd7fb..5249908a69 100644 --- a/tests/test_make_phylogeny.py +++ b/tests/test_make_phylogeny.py @@ -15,8 +15,6 @@ from os import remove, close from tempfile import mkstemp -from cogent import LoadSeqs, DNA - import brokit.fasttree from qiime.make_phylogeny import TreeBuilder, CogentTreeBuilder @@ -75,10 +73,11 @@ def test_call_correct_alignment(self): actual = p(result_path=None, aln_path=self.input_fp, log_path=log_fp) - expected = tree - # note: lines in diff order w/ diff versions - self.assertEqual(str(actual), expected) - + actual = str(actual) + # note: order of inputs to FastTree can have very minor effect + # on the distances, so we need to compare against a couple of trees + # to avoid failures on different archs + self.assertTrue(actual == tree or actual == tree2) def test_root_midpt(self): """midpoint should be selected correctly when it is an internal node @@ -122,6 +121,7 @@ def test_root_midpt2(self): aln_for_tree = """>jkl\n--TTACAC--\n>abc\nACACACAC--\n>ghi\nACAGACACTT\n>def\nACAGACAC--\n""" tree = '(def:0.00014,ghi:0.00014,(abc:0.07248,jkl:0.40293)0.742:0.07156);' +tree2 = '(ghi:0.00014,def:0.00014,(jkl:0.40282,abc:0.07253)0.742:0.07152);' midpoint_tree = '(jkl:0.237705,(abc:0.07248,(def:0.00014,ghi:0.00014)0.742:0.07156):0.165225);' # run unit tests if run from command-line diff --git a/tests/test_parallel/test_assign_taxonomy.py b/tests/test_parallel/test_assign_taxonomy.py index 4bc3303d88..c008fc4480 100755 --- a/tests/test_parallel/test_assign_taxonomy.py +++ b/tests/test_parallel/test_assign_taxonomy.py @@ -15,10 +15,13 @@ from os import getenv, close from os.path import basename, exists, join from tempfile import NamedTemporaryFile, mkstemp, mkdtemp -from cogent import LoadSeqs -from cogent.app.util import ApplicationError from unittest import TestCase, main -from cogent.util.misc import remove_files, create_dir + +from skbio.app.util import ApplicationError +from skbio.util.misc import remove_files, create_dir +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA + from qiime.parallel.assign_taxonomy import (ParallelBlastTaxonomyAssigner, ParallelRdpTaxonomyAssigner, ParallelUclustConsensusTaxonomyAssigner) @@ -126,7 +129,7 @@ def setUp(self): suffix='.fasta') close(fd) seq_file = open(self.tmp_seq_filepath, 'w') - seq_file.write(blast_test_seqs.toFasta()) + seq_file.write(blast_test_seqs.to_fasta()) seq_file.close() self.files_to_remove.append(self.tmp_seq_filepath) @@ -139,7 +142,7 @@ def setUp(self): self.reference_seqs_file = NamedTemporaryFile( prefix='qiime_parallel_taxonomy_assigner_tests_ref_seqs', suffix='.fasta', dir=tmp_dir) - self.reference_seqs_file.write(blast_reference_seqs.toFasta()) + self.reference_seqs_file.write(blast_reference_seqs.to_fasta()) self.reference_seqs_file.seek(0) initiate_timeout(60) @@ -199,7 +202,7 @@ def setUp(self): suffix='.fasta') close(fd) seq_file = open(self.tmp_seq_filepath, 'w') - seq_file.write(uclust_test_seqs.toFasta()) + seq_file.write(uclust_test_seqs.to_fasta()) seq_file.close() self.files_to_remove.append(self.tmp_seq_filepath) @@ -212,7 +215,7 @@ def setUp(self): self.reference_seqs_file = NamedTemporaryFile( prefix='qiime_parallel_taxonomy_assigner_tests_ref_seqs', suffix='.fasta', dir=tmp_dir) - self.reference_seqs_file.write(uclust_reference_seqs.toFasta()) + self.reference_seqs_file.write(uclust_reference_seqs.to_fasta()) self.reference_seqs_file.seek(0) initiate_timeout(60) @@ -294,7 +297,7 @@ def test_parallel_uclust_taxonomy_assigner(self): DQ260310\tArchaea;Euryarchaeota;Methanobacteriales;Methanobacterium EF503697\tArchaea;Crenarchaeota;uncultured;uncultured""" -blast_test_seqs = LoadSeqs(data=[ +blast_test_seqs = SequenceCollection.from_fasta_records([ ('s1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC'), ('s2', @@ -305,10 +308,10 @@ def test_parallel_uclust_taxonomy_assigner(self): 'GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT'), ('s5', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC'), - ('s6', 'ATAGTAGGTGATTGCGAAGACCGCGGAACCGGGACCTAGCACCCAGCCTGTACCGAGGGATGGGGAGCTGTGGCGGTCCACCGACGACCCTTTGTGACAGCCGATTCCTACAATCCCAGCAACTGCAATGATCCACTCTAGTCGGCATAACCGGGAATCGTTAACCTGGTAGGGTTCTCTACGTCTGAGTCTACAGCCCAGAGCAGTCAGGCTACTATACGGTTTGCTGCATTGCATAGGCATCGGTCGCGGGCACTCCTCGCGGTTTCAGCTAGGGTTTAAATGGAGGGTCGCTGCATGAGTATGCAAATAGTGCCACTGCTCTGATACAGAGAAGTGTTGATATGACACCTAAGACCTGGTCACAGTTTTAACCTGCCTACGCACACCAGTGTGCTATTGATTAACGATATCGGTAGACACGACCTTGGTAACCTGACTAACCTCATGGAAAGTGACTAGATAAATGGACCGGAGCCAACTTTCACCCGGAAAACGGACCGACGAATCGTCGTAGACTACCGATCTGACAAAATAAGCACGAGGGAGCATGTTTTGCGCAGGCTAGCCTATTCCCACCTCAAGCCTCGAGAACCAAGACGCCTGATCCGGTGCTGCACGAAGGGTCGCCTCTAGGTAAGGAGAGCTGGCATCTCCAGATCCGATATTTTACCCAACCTTTGCGCGCTCAGATTGTTATAGTGAAACGATTTAAGCCTGAACGGAGTTCCGCTCCATATGTGGGTTATATATGTGAGATGTATTAACTTCCGCAGTTGTCTCTTTCGGTGCAGTACGCTTGGTATGTGTCTCAAATAATCGGTATTATAGTGATCTGAGAGGTTTTAAG')], aligned=False) + ('s6', 'ATAGTAGGTGATTGCGAAGACCGCGGAACCGGGACCTAGCACCCAGCCTGTACCGAGGGATGGGGAGCTGTGGCGGTCCACCGACGACCCTTTGTGACAGCCGATTCCTACAATCCCAGCAACTGCAATGATCCACTCTAGTCGGCATAACCGGGAATCGTTAACCTGGTAGGGTTCTCTACGTCTGAGTCTACAGCCCAGAGCAGTCAGGCTACTATACGGTTTGCTGCATTGCATAGGCATCGGTCGCGGGCACTCCTCGCGGTTTCAGCTAGGGTTTAAATGGAGGGTCGCTGCATGAGTATGCAAATAGTGCCACTGCTCTGATACAGAGAAGTGTTGATATGACACCTAAGACCTGGTCACAGTTTTAACCTGCCTACGCACACCAGTGTGCTATTGATTAACGATATCGGTAGACACGACCTTGGTAACCTGACTAACCTCATGGAAAGTGACTAGATAAATGGACCGGAGCCAACTTTCACCCGGAAAACGGACCGACGAATCGTCGTAGACTACCGATCTGACAAAATAAGCACGAGGGAGCATGTTTTGCGCAGGCTAGCCTATTCCCACCTCAAGCCTCGAGAACCAAGACGCCTGATCCGGTGCTGCACGAAGGGTCGCCTCTAGGTAAGGAGAGCTGGCATCTCCAGATCCGATATTTTACCCAACCTTTGCGCGCTCAGATTGTTATAGTGAAACGATTTAAGCCTGAACGGAGTTCCGCTCCATATGTGGGTTATATATGTGAGATGTATTAACTTCCGCAGTTGTCTCTTTCGGTGCAGTACGCTTGGTATGTGTCTCAAATAATCGGTATTATAGTGATCTGAGAGGTTTTAAG')], DNA) -blast_reference_seqs = LoadSeqs(data=[ +blast_reference_seqs = SequenceCollection.from_fasta_records([ ('AY800210', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC'), ('EU883771', @@ -317,7 +320,7 @@ def test_parallel_uclust_taxonomy_assigner(self): 'AAGAATGGGGATAGCATGCGAGTCACGCCGCAATGTGTGGCATACGGCTCAGTAACACGTAGTCAACATGCCCAGAGGACGTGGACACCTCGGGAAACTGAGGATAAACCGCGATAGGCCACTACTTCTGGAATGAGCCATGACCCAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGGAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCACGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG'), ('DQ260310', 'GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT'), - ('EF503697', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC')], aligned=False) + ('EF503697', 'TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC')], DNA) # no need for different data right now for the parallel uclust tests uclust_id_to_taxonomy = blast_id_to_taxonomy diff --git a/tests/test_parallel/test_blast.py b/tests/test_parallel/test_blast.py index b3ca2fbbbc..167f40eae8 100755 --- a/tests/test_parallel/test_blast.py +++ b/tests/test_parallel/test_blast.py @@ -14,13 +14,13 @@ from os import getenv, close from os.path import basename, exists, join from tempfile import NamedTemporaryFile, mkstemp, mkdtemp -from cogent import LoadSeqs -from cogent.util.misc import remove_files -from qiime.util import get_qiime_temp_dir from unittest import TestCase, main + +from skbio.util.misc import remove_files + +from qiime.util import get_qiime_temp_dir from qiime.test import initiate_timeout, disable_timeout from qiime.parse import fields_to_dict - from qiime.parallel.blast import ParallelBlaster diff --git a/tests/test_parallel/test_identify_chimeric_seqs.py b/tests/test_parallel/test_identify_chimeric_seqs.py index c3dc3af398..8a0c650d61 100755 --- a/tests/test_parallel/test_identify_chimeric_seqs.py +++ b/tests/test_parallel/test_identify_chimeric_seqs.py @@ -14,13 +14,13 @@ from os import getenv from os.path import basename, exists, join from tempfile import NamedTemporaryFile, mkdtemp -from cogent import LoadSeqs -from cogent.util.misc import remove_files -from qiime.util import get_qiime_temp_dir, load_qiime_config from unittest import TestCase, main + +from skbio.util.misc import remove_files + +from qiime.util import get_qiime_temp_dir, load_qiime_config from qiime.test import initiate_timeout, disable_timeout from qiime.parse import fields_to_dict - from qiime.parallel.identify_chimeric_seqs import ParallelChimericSequenceIdentifier diff --git a/tests/test_parallel/test_util.py b/tests/test_parallel/test_util.py index 5cf9249c15..4b2a266556 100755 --- a/tests/test_parallel/test_util.py +++ b/tests/test_parallel/test_util.py @@ -13,10 +13,10 @@ from os import close from os.path import exists from tempfile import mkstemp - from unittest import TestCase, main -from cogent import LoadSeqs -from cogent.util.misc import remove_files + +from skbio.util.misc import remove_files + from qiime.util import get_qiime_temp_dir from qiime.parallel.util import (ParallelWrapper, BufferedWriter) @@ -221,7 +221,7 @@ def test_get_random_job_prefix(self): def test_compute_seqs_per_file(self): """compute_seqs_per_file functions as expected """ - fd, temp_fasta_fp = mkstemp(prefix='QiimeScriptUtilTests', + fd, temp_fasta_fp = mkstemp(prefix='QiimeScriptUtilTests', suffix='.fasta') close(fd) temp_fasta = ['>seq', 'AAACCCCAAATTGG'] * 25 diff --git a/tests/test_pick_otus.py b/tests/test_pick_otus.py index 67d9f892ad..b0f3b341ad 100755 --- a/tests/test_pick_otus.py +++ b/tests/test_pick_otus.py @@ -22,12 +22,10 @@ from shutil import rmtree from tempfile import mkstemp -from skbio.util.misc import create_dir from unittest import TestCase, main from numpy.testing import assert_almost_equal -from cogent.util.misc import remove_files -from cogent import DNA - +from skbio.core.sequence import DNA +from skbio.util.misc import create_dir, remove_files from brokit.formatdb import build_blast_db_from_fasta_path from qiime.util import load_qiime_config @@ -242,10 +240,10 @@ def setUp(self): ] self.ref_seqs_rc = [ - ('ref1', DNA.rc('TGCAGCTTGAGCCACAGGAGAGAGAGAGCTTC')), - ('ref2', DNA.rc('ACCGATGAGATATTAGCACAGGGGAATTAGAACCA')), - ('ref3', DNA.rc('TGTCGAGAGTGAGATGAGATGAGAACA')), - ('ref4', DNA.rc('ACGTATTTTAATGGGGCATGGT')), + ('ref1', str(DNA('TGCAGCTTGAGCCACAGGAGAGAGAGAGCTTC').rc())), + ('ref2', str(DNA('ACCGATGAGATATTAGCACAGGGGAATTAGAACCA').rc())), + ('ref3', str(DNA('TGTCGAGAGTGAGATGAGATGAGAACA').rc())), + ('ref4', str(DNA('ACGTATTTTAATGGGGCATGGT').rc())), ] fd, self.seqs_fp = mkstemp(prefix='BlastOtuPickerTest_', diff --git a/tests/test_pick_rep_set.py b/tests/test_pick_rep_set.py index 50f3e2c480..c23fc3c6c8 100644 --- a/tests/test_pick_rep_set.py +++ b/tests/test_pick_rep_set.py @@ -13,10 +13,13 @@ from os import remove, close from tempfile import mkstemp - -from cogent import LoadSeqs -from cogent.util.misc import remove_files from unittest import TestCase, main + +from skbio.util.misc import remove_files +from skbio.parse.sequences import parse_fasta +from skbio.core.alignment import SequenceCollection +from skbio.core.sequence import DNA + from qiime.pick_rep_set import (RepSetPicker, GenericRepSetPicker, first_id, first, random_id, longest_id, unique_id_map, label_to_name, make_most_abundant, parse_fasta, ReferenceRepSetPicker) @@ -256,9 +259,12 @@ def test_call_write_to_file(self): self.tmp_otu_filepath, self.ref_seq_filepath, result_path=self.result_filepath) - exp = rep_seqs_reference_result_file_exp - self.assertEqual(LoadSeqs(self.result_filepath, aligned=False), - LoadSeqs(data=exp, aligned=False)) + with open(self.result_filepath) as f: + actual = SequenceCollection.from_fasta_records(parse_fasta(f), DNA) + expected = SequenceCollection.from_fasta_records( + parse_fasta(rep_seqs_reference_result_file_exp.split('\n')), DNA) + # we don't care about order in the results + self.assertEqual(set(actual), set(expected)) def test_non_ref_otus(self): """ReferenceRepSetPicker.__call__ same result as Generic when no ref otus diff --git a/tests/test_split.py b/tests/test_split.py index 8b4d303397..e27fe07911 100755 --- a/tests/test_split.py +++ b/tests/test_split.py @@ -12,16 +12,19 @@ from os import close from tempfile import mkstemp - from unittest import TestCase, main -from cogent import LoadSeqs + +from biom.parse import parse_biom_table +from biom.table import DenseOTUTable +from skbio.core.sequence import DNA +from skbio.core.alignment import SequenceCollection +from skbio.parse.sequences import parse_fasta + from qiime.split import (split_mapping_file_on_field, split_otu_table_on_sample_metadata, split_fasta) from qiime.util import get_qiime_temp_dir, remove_files from qiime.format import format_biom_table -from biom.parse import parse_biom_table -from biom.table import DenseOTUTable class SplitTests(TestCase): @@ -101,8 +104,8 @@ def test_split_fasta_equal_num_seqs_per_file(self): self.assertEqual(actual, expected) self.assertEqual( - LoadSeqs(data=infile, aligned=False), - LoadSeqs(data=actual_seqs, aligned=False)) + SequenceCollection.from_fasta_records(parse_fasta(infile), DNA), + SequenceCollection.from_fasta_records(parse_fasta(actual_seqs), DNA)) def test_split_fasta_diff_num_seqs_per_file(self): """split_fasta funcs as expected when diff num seqs go to each file @@ -127,16 +130,17 @@ def test_split_fasta_diff_num_seqs_per_file(self): # building seq collections from infile and the split files result in # equivalent seq collections self.assertEqual( - LoadSeqs(data=infile, aligned=False), - LoadSeqs(data=actual_seqs, aligned=False)) + SequenceCollection.from_fasta_records(parse_fasta(infile), DNA), + SequenceCollection.from_fasta_records(parse_fasta(actual_seqs), DNA)) def test_split_fasta_diff_num_seqs_per_file_alt(self): """split_fasta funcs always catches all seqs """ # start with 59 seqs (b/c it's prime, so should make more # confusing splits) - in_seqs = LoadSeqs(data=[('seq%s' % k, 'AACCTTAA') for k in range(59)]) - infile = in_seqs.toFasta().split('\n') + in_seqs = SequenceCollection.from_fasta_records( + [('seq%s' % k, 'AACCTTAA') for k in range(59)], DNA) + infile = in_seqs.to_fasta().split('\n') # test seqs_per_file from 1 to 1000 for i in range(1, 1000): @@ -157,8 +161,8 @@ def test_split_fasta_diff_num_seqs_per_file_alt(self): # building seq collections from infile and the split files result in # equivalent seq collections self.assertEqual( - LoadSeqs(data=infile, aligned=False), - LoadSeqs(data=actual_seqs, aligned=False)) + SequenceCollection.from_fasta_records(parse_fasta(infile), DNA), + SequenceCollection.from_fasta_records(parse_fasta(actual_seqs), DNA)) mapping_f1 = """#SampleID BarcodeSequence LinkerPrimerSequence Treatment DOB Description diff --git a/tests/test_split_libraries.py b/tests/test_split_libraries.py index 91383d0993..f2a82d61f5 100644 --- a/tests/test_split_libraries.py +++ b/tests/test_split_libraries.py @@ -17,10 +17,9 @@ from shutil import rmtree from tempfile import mkstemp, mkdtemp -from cogent import DNA from unittest import TestCase, main from numpy. testing import assert_almost_equal -from cogent.util.misc import remove_files +from skbio.util.misc import remove_files from qiime.split_libraries import ( expand_degeneracies, get_infile, count_mismatches, @@ -28,11 +27,9 @@ count_ambig, split_seq, primer_exceeds_mismatches, check_barcode, make_histograms, SeqQualBad, seq_exceeds_homopolymers, check_window_qual_scores, check_seqs, - local_align_primer_seq, preprocess -) + local_align_primer_seq, preprocess) from qiime.parse import parse_qual_score - class FakeOutFile(object): def __init__(self): @@ -319,21 +316,21 @@ def test_primer_exceeds_mismatches(self): def test_local_align_primer_seq_fwd_rev_match(self): "local_align function can handle fwd/rev primers with no mismatches" # forward primer - primer = DNA.makeSequence('TAGC', Name='F5') + primer = 'TAGC' seq = 'TAGC' # mismatch_count, hit_start expected = (0, 0) actual = local_align_primer_seq(primer, seq) self.assertEqual(actual, expected) - primer = DNA.makeSequence('TAGC', Name='F5') + primer = 'TAGC' seq = 'TAGCCCCC' # mismatch_count, hit_start expected = (0, 0) actual = local_align_primer_seq(primer, seq) self.assertEqual(actual, expected) - primer = DNA.makeSequence('TAGC', Name='F5') + primer = 'TAGC' seq = 'CCCTAGCCCCC' # mismatch_count, hit_start expected = (0, 3) @@ -341,21 +338,21 @@ def test_local_align_primer_seq_fwd_rev_match(self): self.assertEqual(actual, expected) # different length primer - primer = DNA.makeSequence('GTTTAGC', Name='F5') + primer = 'GTTTAGC' seq = 'GTTTAGC' # mismatch_count, hit_start expected = (0, 0) actual = local_align_primer_seq(primer, seq) self.assertEqual(actual, expected) - primer = DNA.makeSequence('GCTC', Name='R5') + primer = 'GCTC' seq = 'TAGCCCCC' # mismatch_count, hit_start expected = (1, 2) actual = local_align_primer_seq(primer, seq) self.assertEqual(actual, expected) - primer = DNA.makeSequence('GCTA', Name='R5') + primer = 'GCTA' seq = 'CCCTAGCCCCC' # mismatch_count, hit_start expected = (1, 1) @@ -364,7 +361,7 @@ def test_local_align_primer_seq_fwd_rev_match(self): def test_local_align_primer_seq_fwd_rev_match_ambig(self): "local_align function can handle fwd/rev primers with ambigs" - primer = DNA.makeSequence('TASC', Name='F5') + primer = 'TASC' seq = 'TAGC' # primer_hit, target, mismatch_count, hit_start expected = (0, 0) @@ -374,7 +371,7 @@ def test_local_align_primer_seq_fwd_rev_match_ambig(self): def test_local_align_primer_seq_mm(self): "local_align function can handle fwd/rev primers with mismatches" # forward primer - primer = DNA.makeSequence('AAAAACTTTTT', Name='F5') + primer = 'AAAAACTTTTT' seq = 'AAAAAGTTTTT' # mismatch_count, hit_start expected = (1, 0) @@ -382,7 +379,7 @@ def test_local_align_primer_seq_mm(self): self.assertEqual(actual, expected) # forward primer - primer = DNA.makeSequence('AAAACCTTTTT', Name='F5') + primer = 'AAAACCTTTTT' seq = 'AAAAAGTTTTT' # mismatch_count, hit_start expected = (2, 0) @@ -393,7 +390,7 @@ def test_local_align_primer_seq_indels_middle(self): "local_align function can handle fwd/rev primers with indels in middle of seq" # Insertion in target sequence - primer = DNA.makeSequence('CGAATCGCTATCG', Name='F5') + primer = 'CGAATCGCTATCG' seq = 'CGAATCTGCTATCG' # mismatch count, hit_start expected = (1, 0) @@ -401,7 +398,7 @@ def test_local_align_primer_seq_indels_middle(self): self.assertEqual(actual, expected) # Deletion in target sequence - primer = DNA.makeSequence('CGAATCGCTATCG', Name='F5') + primer = 'CGAATCGCTATCG' seq = 'CGAATGCTATCG' # mismatch_count, hit_start expected = (1, 0) @@ -411,7 +408,7 @@ def test_local_align_primer_seq_indels_middle(self): def test_local_align_primer_seq_multiple_mismatch_indel(self): "local_align function can handle fwd/rev primers with indels and mismatches" # multiple insertions - primer = DNA.makeSequence('ATCGGGCGATCATT', Name='F5') + primer = 'ATCGGGCGATCATT' seq = 'ATCGGGTTCGATCATT' # mismatch_count, hit_start expected = (2, 0) @@ -419,7 +416,7 @@ def test_local_align_primer_seq_multiple_mismatch_indel(self): self.assertEqual(actual, expected) # two deletions - primer = DNA.makeSequence('ACGGTACAGTGG', Name='F5') + primer = 'ACGGTACAGTGG' seq = 'ACGGCAGTGG' # mismatch_count, hit_start expected = (2, 0) @@ -427,7 +424,7 @@ def test_local_align_primer_seq_multiple_mismatch_indel(self): self.assertEqual(actual, expected) # deletion and mismatch - primer = DNA.makeSequence('CATCGTCGATCA', Name='F5') + primer = 'CATCGTCGATCA' seq = 'CCTCGTGATCA' # mismatch_count, hit_start expected = (2, 0) diff --git a/tests/test_truncate_reverse_primer.py b/tests/test_truncate_reverse_primer.py index 6c86e7569e..36c5c8b157 100755 --- a/tests/test_truncate_reverse_primer.py +++ b/tests/test_truncate_reverse_primer.py @@ -22,6 +22,7 @@ from qiime.truncate_reverse_primer import get_rev_primer_seqs,\ get_output_filepaths, truncate_rev_primers, truncate_reverse_primer +from skbio.core.exception import BiologicalSequenceError class FakeOutFile(object): @@ -79,7 +80,7 @@ def setUp(self): mapping_file.close() fd, self.mapping_bad_header_fp = mkstemp( - prefix='sample_mapping_badheader_', + prefix='sample_mapping_badheader_', suffix=".txt") close(fd) mapping_file = open(self.mapping_bad_header_fp, "w") @@ -124,12 +125,12 @@ def test_get_rev_primer_seqs_errors(self): """Raises errors with invalid mapping file """ # Raises error if missing ReversePrimer column - self.assertRaises(KeyError, get_rev_primer_seqs, - open(self.mapping_bad_header_fp, "U")) + with open(self.mapping_bad_header_fp, "U") as f: + self.assertRaises(KeyError, get_rev_primer_seqs, f) # Raises error if invalid characters in primer - self.assertRaises(ValueError, get_rev_primer_seqs, - open(self.mapping_bad_primer_fp, "U")) + with open(self.mapping_bad_primer_fp, "U") as f: + self.assertRaises(BiologicalSequenceError, get_rev_primer_seqs, f) def test_get_output_filepaths(self): """ Properly returns output filepaths """ diff --git a/tests/test_util.py b/tests/test_util.py index 4723b05efc..e2b18c467b 100755 --- a/tests/test_util.py +++ b/tests/test_util.py @@ -33,7 +33,7 @@ extract_seqs_by_sample_id, get_qiime_project_dir, get_qiime_scripts_dir, matrix_stats, raise_error_on_parallel_unavailable, - convert_OTU_table_relative_abundance, create_dir, handle_error_codes, + convert_OTU_table_relative_abundance, create_dir, summarize_pcoas, _compute_jn_pcoa_avg_ranges, _flip_vectors, IQR, idealfourths, isarray, matrix_IQR, degap_fasta_aln, write_degapped_fasta_to_file, compare_otu_maps, get_diff_for_otu_maps, @@ -1823,7 +1823,7 @@ def setUp(self): self.m1_dup_bad = StringIO(m1_dup_bad) self.m1_dup_good = StringIO(m1_dup_good) - + def test_parseMetadataMap(self): """Test parsing a mapping file into a MetadataMap instance.""" @@ -2080,20 +2080,20 @@ def test_merge_mapping_file(self): """ observed = MetadataMap.mergeMappingFiles([self.m1,self.m3]) self.assertEqual(observed, m1_m3_exp) - + def test_merge_mapping_file_different_no_data_value(self): """merge_mapping_file: functions with different no_data_value """ observed = MetadataMap.mergeMappingFiles([self.m1,self.m2], "TESTING_NA") self.assertEqual(observed, m1_m2_exp) - + def test_merge_mapping_file_three_mapping_files(self): """merge_mapping_file: 3 mapping files """ observed = MetadataMap.mergeMappingFiles([self.m1,self.m2,self.m3]) self.assertEqual(observed, m1_m2_m3_exp) - + def test_merge_mapping_file_bad_duplicates(self): """merge_mapping_file: error raised when merging mapping files where same sample ids has different values diff --git a/tests/test_workflow/test_pick_open_reference_otus.py b/tests/test_workflow/test_pick_open_reference_otus.py index d06edcaae6..cb8097d321 100755 --- a/tests/test_workflow/test_pick_open_reference_otus.py +++ b/tests/test_workflow/test_pick_open_reference_otus.py @@ -15,10 +15,12 @@ from os.path import exists from shutil import rmtree from tempfile import mkdtemp - -from cogent import LoadTree, LoadSeqs from unittest import TestCase, main -from cogent.util.misc import remove_files + +from skbio.core.tree import TreeNode +from skbio.util.misc import remove_files +from biom.parse import parse_biom_table + from qiime.util import (load_qiime_config, count_seqs, get_qiime_temp_dir) @@ -34,7 +36,7 @@ pick_subsampled_open_reference_otus, iterative_pick_subsampled_open_reference_otus, final_repset_from_iteration_repsets) -from biom.parse import parse_biom_table + allowed_seconds_per_test = 120 @@ -181,8 +183,9 @@ def test_pick_subsampled_open_reference_otus(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 6) @@ -308,8 +311,9 @@ def test_pick_subsampled_open_reference_otus_rdp_tax_assign(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 6) @@ -426,8 +430,8 @@ def test_pick_subsampled_open_reference_otus_usearch(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + num_tree_tips = len(list(TreeNode.from_newick(open(tree_fp)).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 6) @@ -604,8 +608,9 @@ def test_pick_subsampled_open_reference_otus_no_prefilter(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 6) @@ -721,8 +726,9 @@ def test_pick_subsampled_open_reference_otus_parallel(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 6) @@ -848,8 +854,9 @@ def test_pick_subsampled_open_reference_otus_suppress_step4(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment (and we already checked that we're happy with that number # above when it was compared to the number of OTUs) - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) # OTU table without singletons or pynast failures has same number of @@ -1001,8 +1008,9 @@ def test_iterative_pick_subsampled_open_reference_otus_no_prefilter(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 7) @@ -1131,8 +1139,9 @@ def test_iterative_pick_subsampled_open_reference_otus(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 7) @@ -1261,8 +1270,9 @@ def test_iterative_pick_subsampled_open_reference_otus_parallel(self): # confirm that number of tips in the tree is the same as the number of sequences # in the alignment - num_tree_tips = len(LoadTree(tree_fp).tips()) - num_align_seqs = LoadSeqs(aln_fp).getNumSeqs() + with open(tree_fp) as f: + num_tree_tips = len(list(TreeNode.from_newick(f).tips())) + num_align_seqs = count_seqs(aln_fp)[0] self.assertEqual(num_tree_tips, num_align_seqs) self.assertEqual(num_tree_tips, 7) diff --git a/tests/test_workflow/test_upstream.py b/tests/test_workflow/test_upstream.py index 3465e17d14..156a9998df 100755 --- a/tests/test_workflow/test_upstream.py +++ b/tests/test_workflow/test_upstream.py @@ -1,4 +1,3 @@ -#!/usr/bin/env python # File created on 20 Feb 2013 from __future__ import division @@ -14,14 +13,16 @@ from glob import glob from os.path import join, exists, getsize, split, splitext from tempfile import mkdtemp - -from cogent import LoadTree, LoadSeqs from unittest import TestCase, main + from numpy.testing import assert_almost_equal -from cogent.util.misc import remove_files -from qiime.util import load_qiime_config, get_qiime_temp_dir -from qiime.parse import (parse_qiime_parameters) +from skbio.util.misc import remove_files +from skbio.core.tree import TreeNode from biom.parse import parse_biom_table + +from qiime.util import (load_qiime_config, get_qiime_temp_dir, count_seqs) +from qiime.parse import (parse_qiime_parameters) + from qiime.test import (initiate_timeout, disable_timeout, get_test_data_fps) @@ -195,10 +196,6 @@ def test_run_pick_de_novo_otus_rdp_tax_assign(self): self.assertEqual(actual_tree_fp, tree_fp) self.assertEqual(actual_otu_table_fp, otu_table_fp) - input_seqs = LoadSeqs(self.test_data['seqs'][0], - format='fasta', - aligned=False) - # Number of OTUs falls within a range that was manually # confirmed otu_map_lines = list(open(otu_map_fp)) @@ -212,14 +209,14 @@ def test_run_pick_de_novo_otus_rdp_tax_assign(self): # number of seqs which aligned + num of seqs which failed to # align sum to the number of OTUs - aln = LoadSeqs(alignment_fp) - failures = LoadSeqs(failures_fp, aligned=False) - self.assertTrue(aln.getNumSeqs() + failures.getNumSeqs(), num_otus) + self.assertEqual( + count_seqs(alignment_fp)[0] + count_seqs(failures_fp)[0], num_otus) # number of tips in the tree equals the number of sequences that # aligned - tree = LoadTree(tree_fp) - self.assertEqual(len(tree.tips()), aln.getNumSeqs()) + with open(tree_fp) as f: + tree = TreeNode.from_newick(f) + self.assertEqual(len(list(tree.tips())), count_seqs(alignment_fp)[0]) # parse the otu table otu_table = parse_biom_table(open(otu_table_fp, 'U')) @@ -241,7 +238,8 @@ def test_run_pick_de_novo_otus_rdp_tax_assign(self): # input sequences number_seqs_in_otu_table = sum([v.sum() for v in otu_table.iterSampleData()]) - self.assertEqual(number_seqs_in_otu_table, input_seqs.getNumSeqs()) + self.assertEqual(number_seqs_in_otu_table, + count_seqs(self.test_data['seqs'][0])[0]) # Check that the log file is created and has size > 0 log_fp = glob(join(self.test_out, 'log*.txt'))[0] @@ -285,10 +283,6 @@ def test_run_pick_de_novo_otus_uclust_tax_assign(self): self.assertEqual(actual_tree_fp, tree_fp) self.assertEqual(actual_otu_table_fp, otu_table_fp) - input_seqs = LoadSeqs(self.test_data['seqs'][0], - format='fasta', - aligned=False) - # Number of OTUs falls within a range that was manually # confirmed otu_map_lines = list(open(otu_map_fp)) @@ -302,14 +296,13 @@ def test_run_pick_de_novo_otus_uclust_tax_assign(self): # number of seqs which aligned + num of seqs which failed to # align sum to the number of OTUs - aln = LoadSeqs(alignment_fp) - failures = LoadSeqs(failures_fp, aligned=False) - self.assertTrue(aln.getNumSeqs() + failures.getNumSeqs(), num_otus) + self.assertEqual(count_seqs(alignment_fp)[0] + count_seqs(failures_fp)[0], num_otus) # number of tips in the tree equals the number of sequences that # aligned - tree = LoadTree(tree_fp) - self.assertEqual(len(tree.tips()), aln.getNumSeqs()) + with open(tree_fp) as f: + tree = TreeNode.from_newick(f) + self.assertEqual(len(list(tree.tips())), count_seqs(alignment_fp)[0]) # parse the otu table otu_table = parse_biom_table(open(otu_table_fp, 'U')) @@ -331,7 +324,7 @@ def test_run_pick_de_novo_otus_uclust_tax_assign(self): # input sequences number_seqs_in_otu_table = sum([v.sum() for v in otu_table.iterSampleData()]) - self.assertEqual(number_seqs_in_otu_table, input_seqs.getNumSeqs()) + self.assertEqual(number_seqs_in_otu_table, count_seqs(self.test_data['seqs'][0])[0]) # Check that the log file is created and has size > 0 log_fp = glob(join(self.test_out, 'log*.txt'))[0] @@ -374,10 +367,6 @@ def test_run_pick_de_novo_otus_parallel(self): self.assertEqual(actual_tree_fp, tree_fp) self.assertEqual(actual_otu_table_fp, otu_table_fp) - input_seqs = LoadSeqs(self.test_data['seqs'][0], - format='fasta', - aligned=False) - # Number of OTUs falls within a range that was manually # confirmed otu_map_lines = list(open(otu_map_fp)) @@ -391,14 +380,13 @@ def test_run_pick_de_novo_otus_parallel(self): # number of seqs which aligned + num of seqs which failed to # align sum to the number of OTUs - aln = LoadSeqs(alignment_fp) - failures = LoadSeqs(failures_fp, aligned=False) - self.assertTrue(aln.getNumSeqs() + failures.getNumSeqs(), num_otus) + self.assertEqual(count_seqs(alignment_fp)[0] + count_seqs(failures_fp)[0], num_otus) # number of tips in the tree equals the number of sequences that # aligned - tree = LoadTree(tree_fp) - self.assertEqual(len(tree.tips()), aln.getNumSeqs()) + with open(tree_fp) as f: + tree = TreeNode.from_newick(f) + self.assertEqual(len(list(tree.tips())), count_seqs(alignment_fp)[0]) # parse the otu table otu_table = parse_biom_table(open(otu_table_fp, 'U')) @@ -420,7 +408,7 @@ def test_run_pick_de_novo_otus_parallel(self): # input sequences number_seqs_in_otu_table = sum([v.sum() for v in otu_table.iterSampleData()]) - self.assertEqual(number_seqs_in_otu_table, input_seqs.getNumSeqs()) + self.assertEqual(number_seqs_in_otu_table, count_seqs(self.test_data['seqs'][0])[0]) # Check that the log file is created and has size > 0 log_fp = glob(join(self.test_out, 'log*.txt'))[0] @@ -458,10 +446,6 @@ def test_run_pick_de_novo_otus_muscle(self): otu_table_fp = join(self.test_out, 'otu_table.biom') tree_fp = join(self.test_out, 'rep_set.tre') - input_seqs = LoadSeqs(self.test_data['seqs'][0], - format='fasta', - aligned=False) - # Number of OTUs falls within a range that was manually # confirmed otu_map_lines = list(open(otu_map_fp)) @@ -474,12 +458,12 @@ def test_run_pick_de_novo_otus_muscle(self): self.assertEqual(len(taxonomy_assignment_lines), num_otus) # all OTUs align - aln = LoadSeqs(alignment_fp) - self.assertTrue(aln.getNumSeqs(), num_otus) + self.assertEqual(count_seqs(alignment_fp)[0], num_otus) # all OTUs in tree - tree = LoadTree(tree_fp) - self.assertEqual(len(tree.tips()), num_otus) + with open(tree_fp) as f: + tree = TreeNode.from_newick(f) + self.assertEqual(len(list(tree.tips())), num_otus) # check that the two final output files have non-zero size self.assertTrue(getsize(tree_fp) > 0) @@ -509,7 +493,7 @@ def test_run_pick_de_novo_otus_muscle(self): # input sequences number_seqs_in_otu_table = sum([v.sum() for v in otu_table.iterSampleData()]) - self.assertEqual(number_seqs_in_otu_table, input_seqs.getNumSeqs()) + self.assertEqual(number_seqs_in_otu_table, count_seqs(self.test_data['seqs'][0])[0]) if __name__ == "__main__": main()