Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
checkm_lineage_wf
checkm_tree
test-data
README
checkm
checkm.xml
datatypes_conf.xml
io.txt
macros.xml
macros_checkm_analyze.xml
macros_checkm_bin_compare.xml
macros_checkm_bin_qa_plot.xml
macros_checkm_bin_union.xml
macros_checkm_coding_plot.xml
macros_checkm_cov_pca.xml
macros_checkm_coverage.xml
macros_checkm_data.xml
macros_checkm_dist_plot.xml
macros_checkm_gc_bias_plot.xml
macros_checkm_gc_plot.xml
macros_checkm_join_tables.xml
macros_checkm_len_hist.xml
macros_checkm_len_plot.xml
macros_checkm_lineage_set.xml
macros_checkm_lineage_wf.xml
macros_checkm_marker_plot.xml
macros_checkm_merge.xml
macros_checkm_modify.xml
macros_checkm_nx_plot.xml
macros_checkm_outliers.xml
macros_checkm_par_plot.xml
macros_checkm_profile.xml
macros_checkm_qa.xml
macros_checkm_ssu_finder.xml
macros_checkm_taxon_list.xml
macros_checkm_taxon_set.xml
macros_checkm_taxonomy_wf.xml
macros_checkm_test.xml
macros_checkm_tetra.xml
macros_checkm_tetra_pca.xml
macros_checkm_tetra_plot.xml
macros_checkm_tree.xml
macros_checkm_tree_qa.xml
macros_checkm_unbinned.xml
macros_checkm_unique.xml
plot_macros.xml
prepare.sh

README

Note ahead
==========

The Galaxy tool xml files in this repository are generated automatically by: 
- parsing the python argparse definitions in a single xml file
- which is postprocessed into separate tool xml files using prepare.sh

Currently this has been tested only for a few tools: 
- checkm_lineage_wf

Feel free to extend the prepare.sh script to get also the other tools 
functional.

Limitations: 
============

Checkm accepts nucleotide or aminoacid sequences. For the latter `--gene`
needs to be added. 

Currently checkm checks of the sequences are are really nucleotide or 
aminoacid sequences, respectively, and asks the user if the choice is 
correct. This seems to affect tree, analyze, merge

https://github.com/Ecogenomics/CheckM/issues/171

As a workaround the Galaxy wrappers do `echo | checkm ...`, ie one 
enter (which triggers the default answer) is presented to checkm. 
This aborts the program, but no negative exit code results (ie the 
result woul look positive .. but it is not). This is covered via 
stdio regex.

At least this way check finishes.

Reference data initialisation:
==============================

Before the CheckM and the Galaxy tools can be used the reference data 
needs to be downloaded, extracted and the CheckM needs to be told where 
you put the data. 

1. Download reference data with: `wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz` and extract to a location of your choice. 


2. The path to the reference data needs to be set by the galaxy user with: `checkm data setRoot PATH`

which stores the PATH: CONDA_ENV_PATH/__checkm-genome@1.0.11/lib/python2.7/site-packages/checkm/DATA_CONFIG


Generation of tool xml files
============================

(paths might differ)

```
git clone https://github.com/erasche/argparse2tool.git
cd argparse2tool
virtualenv .venv
. ./.venv/bin/activate
python setup.py 
PYTHONPATH=/home/berntm/projects/argparse2tool/.venv/lib/python2.7/site-packages/argparse2tool-0.4.4-py2.7.egg

export PATH=/home/berntm/miniconda3/bin/:$PATH
/home/berntm/miniconda3/bin/conda create -c bioconda -y --name __checkm-genome@1.0.11 checkm-genome=1.0.11
/home/berntm/miniconda3/envs/__checkm-genome@1.0.11/bin/checkm
```

edit checkm to include only the part relevant for argparse
add name to plot_need_qa_results_parser and plot_parser

python checkm --generate_galaxy_xml > checkm.xml

- manually created
  - macros.xml
  - tool_data_table_conf.xml.[sample|test]
  - added test-data/checkm_references.loc and downloaded checkm reference data
- add tool specific tests etc in macros_NAME.xml


TODOs
=====

- next: taxon_wf taxon_set
- replace phylum taxon input by a now select where the options are from the output of checkm taxon_list

- add section around plot macros and find a nice order 

- some tools seem to accept hmm in adition to marker_file as input
- there are options with label ==SUPPRESS== that should be removed
- add help for hmmer output files: 
"""
The hmmer.tree.txt file is the alignment of the genes used to insert your genome into the CheckM reference tree. These genes are uses solely for this purpose. The hmmer.analyze.txt contains the alignment of the genes used to assess the completeness and contamination of your genomes.
"""
You can’t perform that action at this time.