Skip to content

Commit

Permalink
Merge pull request #155 from BinPro/develop
Browse files Browse the repository at this point in the history
New version 0.4.1
  • Loading branch information
alneberg committed Nov 16, 2015
2 parents 9b15d51 + d363d6b commit 2ae9642
Show file tree
Hide file tree
Showing 49 changed files with 91,688 additions and 145 deletions.
10 changes: 6 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,17 @@ virtualenv:
before_install:
#Uses miniconda installation of scientific python packages instead of building from source
#or using old versions supplied by apt-get. Source: https://gist.github.com/dan-blanchard/7045057
- if [ ${TRAVIS_PYTHON_VERSION:0:1} == "2" ]; then wget http://repo.continuum.io/miniconda/Miniconda-3.3.0-Linux-x86_64.sh -O miniconda.sh; else wget http://repo.continuum.io/miniconda/Miniconda3-3.3.0-Linux-x86_64.sh -O miniconda.sh; fi
- if [ ${TRAVIS_PYTHON_VERSION:0:1} == "2" ]; then wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh; else wget http://repo.continuum.io/miniconda/Miniconda3-3.7.3-Linux-x86_64.sh -O miniconda.sh; fi
- chmod +x miniconda.sh
- ./miniconda.sh -b
- export PATH=/home/travis/miniconda/bin:$PATH
- export PATH=/home/travis/miniconda3/bin:/home/travis/miniconda2/bin:$PATH
- conda update --yes conda
- sudo apt-get update -qq
- sudo apt-get install -qq build-essential libgsl0-dev bedtools
- sudo apt-get install -qq build-essential libgsl0-dev bedtools mummer
- "export DISPLAY=:99.0"
- "sh -e /etc/init.d/xvfb start"
install:
- conda install --yes python=$TRAVIS_PYTHON_VERSION cython numpy scipy biopython pandas pip scikit-learn docutils sphinx jinja2
- conda install --yes python=$TRAVIS_PYTHON_VERSION cython numpy scipy biopython pandas pip scikit-learn docutils sphinx jinja2 seaborn
- pip install bcbio-gff
- python setup.py install
# command to run tests
Expand Down
3 changes: 1 addition & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,9 @@ test:
nosetests

docs:
python doc/add_cli_arguments_to_docs.py
$(MAKE) -C doc clean
$(MAKE) -C doc html
open doc/build/html/index.html
open doc/build/html/index.html || see doc/build/html/index.html

release: clean
python setup.py sdist upload
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#CONCOCT 0.4.0 [![Build Status](https://travis-ci.org/BinPro/CONCOCT.png?branch=master)](https://travis-ci.org/BinPro/CONCOCT)#
#CONCOCT 0.4.1 [![Build Status](https://travis-ci.org/BinPro/CONCOCT.png?branch=master)](https://travis-ci.org/BinPro/CONCOCT)#

A program for unsupervised binning of metagenomic contigs by using nucleotide composition,
coverage data in multiple samples and linkage data from paired end reads.
Expand Down
9 changes: 5 additions & 4 deletions bin/concoct
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,11 @@ def main(args):

logging.info('Performed PCA, resulted in %s dimensions' % transform_filter.shape[1])

Output.write_original_data(
joined,
args.length_threshold
)
if not args.no_original_data:
Output.write_original_data(
joined,
args.length_threshold
)

Output.write_pca(
transform_filter,
Expand Down
2 changes: 1 addition & 1 deletion concoct/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def _calculate_composition(comp_file, length_threshold, kmer_len):
kmers = [
feature_mapping[kmer_tuple]
for kmer_tuple
in window(seq.seq.tostring().upper(), kmer_len)
in window(str(seq.seq).upper(), kmer_len)
if kmer_tuple in feature_mapping
]
# numpy.bincount returns an array of size = max + 1
Expand Down
4 changes: 4 additions & 0 deletions concoct/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,10 @@ def arguments():
help=("By default, the total coverage is added as a new column in the coverage "
"data matrix, independently of coverage normalization but previous to "
"log transformation. Use this tag to escape this behaviour."))
parser.add_argument('--no_original_data', default=False, action="store_true",
help=("By default the original data is saved to disk. For big datasets, "
"especially when a large k is used for compositional data, this file can become "
"very large. Use this tag if you don't want to save the original data."))
parser.add_argument('-o','--converge_out', default=False, action="store_true",
help=('Write convergence info to files.'))

Expand Down
Empty file added concoct/utils/__init__.py
Empty file.
21 changes: 21 additions & 0 deletions concoct/utils/check_dependencies.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""Functions for checking required programs"""
import os


def which(program):
"""Look for executable http://stackoverflow.com/questions/377017"""
def is_exe(fpath):
return os.path.isfile(fpath) and os.access(fpath, os.X_OK)

fpath, fname = os.path.split(program)
if fpath:
if is_exe(program):
return program
else:
for path in os.environ["PATH"].split(os.pathsep):
path = path.strip('"')
exe_file = os.path.join(path, program)
if is_exe(exe_file):
return exe_file

return None
21 changes: 21 additions & 0 deletions concoct/utils/dir_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""Functions related to directory traversing/removement/creation"""
import os
import errno
import shutil


def mkdir_p(path):
try:
os.makedirs(path)
except OSError as exc:
if exc.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise


def rm_rf(path):
if os.path.isdir(path):
shutil.rmtree(path)
elif os.path.exists(path):
os.remove(path)
18 changes: 9 additions & 9 deletions doc/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Docker for CONCOCT (http://github.com/BinPro/CONCOCT) v0.4.0
# VERSION 0.4.0
# Docker for CONCOCT (http://github.com/BinPro/CONCOCT) v0.4.1
# VERSION 0.4.1
#
# This docker creates and sets up an Ubuntu environment with all
# dependencies for CONCOCT v0.4.0 installed.
# dependencies for CONCOCT v0.4.1 installed.
#
# To login to the docker with a shared directory from the host do:
#
# sudo docker run -v /my/host/shared/directory:/my/docker/location -i -t binnisb/concoct_0.4.0 /bin/bash
# sudo docker run -v /my/host/shared/directory:/my/docker/location -i -t binnisb/concoct_0.4.1 /bin/bash
#

FROM ubuntu:13.10
Expand Down Expand Up @@ -83,16 +83,16 @@ RUN cd /opt;\
printf "install.packages(\"ggplot2\", repo=$RREPO)\ninstall.packages(\"reshape\",repo=$RREPO)\ninstall.packages(\"gplots\",repo=$RREPO)\ninstall.packages(\"ellipse\",repo=$RREPO)\ninstall.packages(\"grid\",repo=$RREPO)\ninstall.packages(\"getopt\",repo=$RREPO)" > dep.R;\
Rscript dep.R

# Install python dependencies and fetch and install CONCOCT 0.4.0
# Install python dependencies and fetch and install CONCOCT 0.4.1
RUN cd /opt;\
conda update --yes conda;\
conda install --yes python=2.7 atlas cython numpy scipy biopython pandas pip scikit-learn pysam;\
pip install bcbio-gff;\
wget --no-check-certificate https://github.com/BinPro/CONCOCT/archive/0.4.0.tar.gz;\
tar xf 0.4.0.tar.gz;\
cd CONCOCT-0.4.0;\
wget --no-check-certificate https://github.com/BinPro/CONCOCT/archive/0.4.1.tar.gz;\
tar xf 0.4.1.tar.gz;\
cd CONCOCT-0.4.1;\
python setup.py install

ENV CONCOCT /opt/CONCOCT-0.4.0
ENV CONCOCT /opt/CONCOCT-0.4.1
ENV CONCOCT_TEST /opt/Data/CONCOCT-test-data
ENV CONCOCT_EXAMPLE /opt/Data/CONCOCT-complete-example
43 changes: 0 additions & 43 deletions doc/add_cli_arguments_to_docs.py

This file was deleted.

5 changes: 5 additions & 0 deletions doc/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# The requirements to build the documentation
Sphinx==1.3.1
mock==1.0.1
sphinxcontrib-programoutput==0.8
sphinx-rtd-theme>=0.1.6
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions doc/source/complete_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ contig in each sample:
::

cd $CONCOCT_EXAMPLE
cut -f1,3-26 concoct-input/concoct_inputtable.tsv > concoct-input/concoct_inputtableR.tsv
cut -f1,3- concoct-input/concoct_inputtable.tsv > concoct-input/concoct_inputtableR.tsv

Then run concoct with 40 as the maximum number of cluster ``-c 40``,
that we guess is appropriate for this data set:
Expand Down Expand Up @@ -392,7 +392,7 @@ required to let them know who you are.
$CONCOCT/scripts/COG_table.py -b annotations/cog-annotations/velvet_71_c10K.out \
-m $CONCOCT/scgs/scg_cogs_min0.97_max1.03_unique_genera.txt \
-c concoct-output/clustering_gt1000.csv \
--cdd_cog_file $ONCOCT/scgs/cdd_to_cog.tsv > evaluation-output/clustering_gt1000_scg.tab
--cdd_cog_file $CONCOCT/scgs/cdd_to_cog.tsv > evaluation-output/clustering_gt1000_scg.tab

The script requires the clustering output by concoct
``concoct-output/clustering_gt1000.csv``, a file listing a set of SCGs
Expand Down
19 changes: 18 additions & 1 deletion doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
# ones.
extensions = [
'sphinx.ext.pngmath',
'sphinxcontrib.programoutput',
]

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -67,7 +68,7 @@
# The short X.Y version.
version = '0.4'
# The full version, including alpha/beta/rc tags.
release = '0.4.0'
release = '0.4.1'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down Expand Up @@ -270,3 +271,19 @@

# If true, do not generate a @detailmenu in the "Top" node's menu.
#texinfo_no_detailmenu = False

# -- Read the Docs C module import issues -------------------------------------

# See:
# http://read-the-docs.readthedocs.org/en/latest/faq.html#i-get-import-errors-on-libraries-that-depend-on-c-modules
from mock import Mock as MagicMock

class Mock(MagicMock):
@classmethod
def __getattr__(cls, name):
return Mock()

MOCK_MODULES = ['pygtk', 'gtk', 'gobject', 'numpy', 'pandas', 'Bio', 'concoct',
'concoct.utils', 'concoct.output', 'concoct.parser', 'concoct.cluster',
'concoct.input', 'concoct.transform', 'vbgmm']
sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)
2 changes: 1 addition & 1 deletion doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,4 @@ Contents:
installation
usage
complete_example

scripts/index
5 changes: 3 additions & 2 deletions doc/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Optional dependencies
- Python packages: ``bcbio-gff>=0.4``
- R packages: ``gplots, reshape, ggplot2, ellipse, getopt`` and
``grid``
- `BLAST <ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/>`__ >= 2.2.28+

If you want to install these dependencies on your own server, you can
take a look at `doc/Dockerfile.all\_dep <doc/Dockerfile.all_dep>`__ for
Expand All @@ -89,7 +90,7 @@ rights (e.g. on a common computer cluster). The second option is
suitable for a linux computer where you have root privileges and you
prefer to use a virtual machine where all dependencies to run concoct
are included. Docker does also run on Mac OS X through a virtual machine.
For more information check out the [Docker documentation](http://docs.docker.com/installation/).
For more information check out the `Docker documentation <http://docs.docker.com/installation/>`__.

Using Anaconda
~~~~~~~~~~~~~~
Expand Down Expand Up @@ -154,7 +155,7 @@ image.
We provide a Docker image:

binpro/concoct\_latest contains CONCOCT and all its dependencies for the
`complete workflow <doc/complete_example.rst>`__ with the exception of
:doc:`complete_example` with the exception of
the SCG evaluation.

The following command will then download the image from the Docker image
Expand Down
45 changes: 45 additions & 0 deletions doc/source/scripts/dnadiff_dist_matrix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
======================
dnadiff_dist_matrix.py
======================

Usage
=====
The usage and help documentation of ``dnadiff_dist_matrix.py`` can be seen by
running ``pyhton dnadiff_dist_matrix -h``:

.. program-output:: cat ../../scripts/dnadiff_dist_matrix.py | sed 's/import argparse/import argparse, conf/' | python - --help
:shell:

Example
=======
An example of how to run ``dnadiff_dist_matrix`` on the test data::
cd CONCOCT/scripts
python dnadiff_dist_matrix.py test_dnadiff_out tests/test_data/bins/sample*.fa

This results in the following output files in the folder ``test_dnadiff_out/``:

- ``dist_matrix.stv`` The distance matrix
- ``fasta_names.tsv`` The names given to each bin (or fasta file)
- :download:`hcust_dendrogram.pdf <../_static/scripts/dna_diff_dist_matrix/hclust_dendrogram.pdf>`
Dendrogram of the clustering (click for example)
- :download:`hcust_heatmap.pdf <../_static/scripts/dna_diff_dist_matrix/hclust_heatmap.pdf>`
Heatmap of the clustering (click for example)

Then there is also for each pairwise ``dnadiff`` alignment the following output
files in a subfolder ``fastaname1_vs_fastaname2/``::

out.1coords
out.1delta
out.cmd
out.delta
out.mcoords
out.mdelta
out.qdiff
out.rdiff
out.report
out.snps
out.unqry
out.unref

See MUMmer's own manual for an explanation of each file with ``dnadiff --help``.
37 changes: 37 additions & 0 deletions doc/source/scripts/extract_scg_bins.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
======================
extract_scg_bins.py
======================

Usage
=====
The usage and help documentation of ``extract_scg_bins.py`` can be seen by
running ``pyhton extract_scg_bins -h``:

.. program-output:: cat ../../scripts/extract_scg_bins.py | sed 's/import argparse/import argparse, conf/' | python - --help
:shell:

Example
=======
An example of how to run ``extract_scg_bins`` on the test data::
cd CONCOCT/scripts/tests/test_data
python extract_scg_bins.py \
--output_folder test_extract_scg_bins_out \
--scg_tsvs tests/test_data/scg_bins/sample0_gt300_scg.tsv \
tests/test_data/scg_bins/sample0_gt500_scg.tsv \
--fasta_files tests/test_data/scg_bins/sample0_gt300.fa \
tests/test_data/scg_bins/sample0_gt500.fa \
--names sample0_gt300 sample0_gt500 \
--max_missing_scg 2 --max_multicopy_scg 4 \
--groups gt300 gt500

This results in the following output files in the folder ``test_extraxt_scg_bins_out/``::
$ ls test_extract_scg_bins_out/
sample0_gt300_bin2.fa sample0_gt500_bin2.fa

Only bin2 satisfies the given criteria for both binnings. If we want to get the
best binning of the two, one can remove the ``--groups`` parameter (or give
them the same group id). That would only output ``sample0_gt500_bin2.fa``,
because the sum of bases in the approved bins of ``sample0_gt500`` is higher
than that of ``sample0_gt300``.

0 comments on commit 2ae9642

Please sign in to comment.