Binning results Sig-Figs #174

chasemc · 2021-06-28T23:28:48Z

Binning in dev outputs results as 64-bit floats:

contig  cluster completeness    purity  coverage_stddev gc_content_stddev
NODE_75_length_176226_cov_225.289       bin_0001        91.36690647482014       99.21259842519686       1.8240294840883275      0.784802804006503
NODE_98_length_143507_cov_224.136       bin_0001        91.36690647482014       99.21259842519686       1.8240294840883275      0.784802804006503
NODE_100_length_142487_cov_223.46       bin_0001        91.36690647482014       99.21259842519686       1.8240294840883275      0.78480280400650

Change output to less decimal places.

e.g. with df.to_csv('output.csv', float_format='%.3f')

The text was updated successfully, but these errors were encountered:

chasemc · 2021-06-28T23:30:11Z

@jason-c-kwan @WiscEvan
I guess this raises the question of what the appropriate number of decimal places is?

Fixes #174

* 🎨 Add informational Autometa header to main nextflow script * 🎨 Move and add comment about include statement * 🎨 change `params.interim` to `params.interim_dir` * 🎨 change `params.processed` to `params.outdir` * 🎨 change `params.metagenome` to `params.input_fna` * 🎨 Add help function. WIP, will need to be updated. * 🎨 Update documentation and function names in validation * 🎨 Modify interim directory value meaning * 🎨 Minimize Nextflow file presence in top git directory * 🎨 Move main.nf to nextflow/autometa.nf * 🎨 Change config so that main script is found and the different config files are included * Add nf-core boilerplate files and functions * 🍏 Create ignore file for nf-core linter * 🍏 replaced autometa.nf with main.nf * Originally was main.nf and may revert to autometa.nf later. Bringing stuff from another repo where I tested the layout first * 🍏 🎨 Change `input_fna` to `input` * nf-core requires having an `params.input` so may as well change the input argument to that * 🍏 Add schema validation back to linting * 🍏 Add nf-core `schema.json` * 🍏 move def check_max() from base to hardware config * 🍏 Add config profile options * 🍏 remove nf-core generated autometa logo * 🍏 Update docker repo path * 🍏 🐛 Fix version check * 🍏 Update nextflow schema * 🍏 create nf-core output docuementation process * 🍏 🎨 Format comment block * 🍏 Add nf-core parameter validator to main.nf * 🍏 nf-core templating * 🍏 🎨 Streamline required config * 🍏 Add missing Autometa arguments to config * 🍏 separate environment and executor configs * 🍏 separate environment and executor configs * 🍏 remove old config * 🍏 remove redundant process configs * 🍏 remove 'container' option from individual processes * 'container' option is set once, from within config * 🍏 Add module for downloading databases * 🍏 Edit module for downloading databases * 🍏 separate environment and executor configs * 🍏 🎨 Update db param * 🍏 Update config with new sub-config filenames * 🍏 🐛 Remove old glob pattern * 🍏 Change channel names to reflect they're files not dirs * 🍏 Add download module to main.nf * 🍏 Add nextflow/nf-core conda environtment.yml * 🍏 add `nf-params.json` to `.gitignore` * nf-params.json is generated if nf-core is used to launch the pipeline * 🍏 Update nf-core schema * 🍏 Update documentation/doc-functions * 🍏 change "params.ncbi_database" to "params.single_db_dir" * 🍏 Change nr.gz download from rsync to aspera * 🍏 remove nf-core template /nextflow/CHANGELOG.md * 🍏 Change tracedir behavior * with current structure params.tracedir/outdir weren't being set and defaulted to a directory named "[:]" (empty groovy map) * 🍏 comment out database download * 🍏 Remove template documentation * modify .gitignore to only ignore lib dir at top directory * 🍏 Add nf-core /lib template files * 🍏 Reorganize module directories - Organizing the module files/directories like this should allow easier collaboration with the development of modules * 🍏 : 📝 Add short description of directory structure * 🍏 move dependency install from /nextflow to top dir - I'm still unsure exactly the best way to proceed in regards to conda/docker. Currently requirements.txt is really heavy and if people want to use HTC/HPC it might not be great to have every dependency installed at every instantiation. One thing might be to use the prokka and diamond, etc docker containers directly from the nextflow processes that use them rather than bundling them into a single conda/docker install * 🍏 fix params * docker image set once in process - See comment on earlier commit about conda/docker. Setting docker images individually per process might be beneficial for processes that only run e.g. prokka, diamond, etc * 🍏 add help/launch documentation * 🍏 🎨 Add code comments * 🍏 Add missing cpu/mem/time params * 🍏 🐎 Parallel ORF calling - Prodigal only runs singe threaded. So, split FASTA and run it in parallel on those * 🍏 Add default interim * 🍏 WIP-moving optional workflows out of "core" * 🍏 separate diamond out of taxonomy workflow * 🍏 Remove common_autometa_tasks - All of these functions have been pulled out and moved * 🍏 Moved "gene_coverage" process/workflow out of "common" * 🍏 reorganize directory * 🍏 Add new parameter "taxonomy_aware" * 🍏 replace "cpus" with nf-core "process-high-low etc" * 🍏 Moving files and updating names * 🍏 change 'assembly' to 'metagenome' to be consistent * 🍏 Update relative paths in main config * 🍏 Fix relative paths in main script * 🍏 Create "params.parallel_high_disk" * 🍏 Update config/params * 🍏 Add if/else for taxonomy-based partitioning * 🍏 Remove params in workflow files - Not up to date so would need to go through again * 🍏 Fix main script with updated params - Also fix Markers.out * 🍏 Add new default nextflow out dirs to gitignore * 🍏 Cleanup and documentation * 🍏 Fix config - Version 21.04+ required for ${baseDir} in json params * 🍏 Switch order of parallelization params * 🍏 🐛 Fixes java error when parallel during tests onserver * 🍏 Reuse split for parallel * 🍏 🐛 Allow input to be named '~.filtered.fna' - Fixes #146 * 🍏 Remove typo * 🍏 Move nextflow_schema.json to top directory * 🍏 Now nf-core launch should be provided the top directory of the Autometa git repo instead of the Nextflow subdirectory. Why: If nf-core launch was told to launch from ~/Autometa/nextflow, things worked. If nf-core launch was told to launch from ~/Autometa, things worked but it would just create a new pipeline schema at ~/Autometa/nextflow_schema.json and no help text or parameter grouping would be present * 🍏 Fix main.nf pointing to old schema location Fixes error caused by: 6d59c83 main.nf couldn't find the schema * 🎨 Add nf-core licenses Fixes #159 * 🍏 Change input help text nf-core handles the quoting of the input * More specific gitignore rule for nextflow/lib Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Add trailing newline * Direct to readthedocs * Point documantion to readthedocs not main repo Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Fix typo Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Add newline https://github.com/KwanLab/Autometa/pull/157/files#r636203915 * Fix typo Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Update comment * Update test.config * Update 03_environment_profiles.config * Update example command in help * Update equality logic Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Update nextflow/.nf-core-lint.yml Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * add newline * Update comment Remove unneeeded comment and describe why process output_documentation exists but is commented * Clarify comment Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Fix indentation * add newline add newline add newline add newline add newline * Add newline * 🍏 Clarify diamond workflow * Remove not-yet-finished download.nf * add newline * Update nextflow/nextflow.config Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Update nextflow_schema.json Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 add link to "writing an institutional profile" https://nf-co.re/usage/tutorials/step_by_step_institutional_profile * fix typos Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 📝 Update documentation text * add trailing newline * 🍏 change 'gene_coverage' to 'contig_coverage' * 🍏 📝 Add documentation to basic profiles Also remove nice_slurm fixes #161 * WIP * 🐛 🐍 Fixes #168 CHange behavior of unclustered recruitment so that if there no unclustered bins, python doesn't raise an exception, it just gracefully logs and quits * 🍏 Add isolate-genome test data creator I needed a more minimal testing dataset than we currently had so created this. Some improvements (that should take minimal work) are left to be made, like downloading custom genomes * 🍏 remove multiqc html until multiqc is added * 🍏 update variable names * 🐛 fixes #149 fixes #149 * Update nextflow/modules/autometa_core/utilities/process/kmer_coverage.nf Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 change kmer_coverage name to spades_kmer_coverage * 🍏 Update variable name to SPADES_KMER_COVERAGE * 🍏 Make archaea emit optional * 🍏 add explicit default for 'taxonomy aware' * 🍏 change profile 'basic_slurm' to 'slurm' Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 Switch directory and code structure Still a work in progress but major changes to make the code/directory structure identical to what nf-core is going with for DSL-2 based pipelines from here forward see the structure of https://github.com/nf-core/mag At this commit the pipeline is working but there's still some things left to do that I'm working on. * 🍏 Update modules.conf and add help * 🍏 Update module publishdirs * 🎨 🍏 Add nextflow version badge to README required for nf-core linter * 🍏 conform to nf-core linter * add 'defaults' conda channel * 🍏 Update diamond to nf-core like module * 🍏 WIP update coverage to nf-core style modules * 🍏 continuing to move to current nfcore dsl2 standard * 🐍 faster binning/cluster evaluation Instead of looping through all groups/bins, group and calculate only within pandas df * Initial template commit * update semver * remove nf-core github actions * 🍏 Multiple changes towards nf-core structure * 🐍 Add hmmsearch domtblout file parser * 🐍 remove old, unused 'metrics' list metrics was for collecting the for loop results before I switched it to solely-pandas * 🐍 add hmmsearch parser entrypoint equivalent to what "markers" entrypoint does, except with hmmsearch * 🐍 Constrain binning decimal places Fixes #174 * 🍏 Change binning proccess to medium resources * 🍏 Change merge process outputs to meta.id.extension * 🍏 Length cutoff default =3000 * 🍏 Add extensions to intermediate, merged files * Delete extra space * delete extra space * Template update for nf-core/tools version 2.0.1 * 🍏 Increase default RAM WIP wip wip wip wip * 🍏 Update to to nf-core 2.0 format * 🍏 🐍 Add modularity for marker database inputs This will help allow non-dockerized versions of the nextflow workflow point to other hmm datatabases. It keeps the original functionality, but otherwise allows users to point to custom hmm and cutoff files. * 🍏 Update Autometa docker image tag handling * 🍏 Fix formatting * 🍏 Fix binning summary In order for symlinks to work with Docker, must pass the NCBI directory as a path, not a parameter * environment as autometa-nf install * update requirements * Fix failing conda solve (version issues) and remove nextflow dependencies * 🍏 Fix whitespace, continue moving to nf-core 2.0 * Explicit python3 calls in makefile * 🍏 fix lint and remove unused params * 🍏 nf-core lint fix * Add citations * 🍏 remove unused nf-core module * 🍏 Simplify parallelization args Now instead of having to specify to run in parallel AND the number of splits, only the number of splits has to be specified, with default =1, so no parallelization * 🍏 update database path logic * Update environment.yml * remove empty module * 🍏 Account for gz or not gz input * 🍏 Fix db path variables/params * 🍏 Set smaller default cpu/mem * 🍏📝 Update param documentation * 🍏 Make docker standard * 🍏 Update params * 🍏 Change and document to "autometa_image_tag" * ⬆️ Add seqkit to requirements.txt * 🔥 Remove seqkit from requirements.txt 📝 Fix typo in seqkit_filter.nf * 🐛🐳🍏 Add slurm to profiles enabling docker in nextflow.config * Remove newline * 🎨 align commas * 🎨🐍🔥 Rename hmmer.py to hmmscan.py * 🎨🐍 Rename filter_hmmsearch.py to hmmsearch.py and place in external sub-directory * 🎨 Reformat filenaming using f-strings instead of .join(...) method for kingdom-specific marker annotation * 🎨 Update href from github.com/autometa to github.com/KwanLab/Autometa * 🎨 Rename autometa-markerfilter entrypoint to autometa-hmmsearch-filter * 🔥 Remove numbers in the middle of Izaak Miller in recursive_dbscan.py * 🎨🍏 Change tag in diamond_blastp to also emit database * 🎨🍏🔥 Replace mentions of nr when downloading accession2taxid tfile to accession2taxid file * 🎨🍏🐳 replace hardcoded docker image tag from nfcore to params.autometa_image_tag * 🎨🍏 Change READ_COVERAGE workflow name to CONTIG_COVERAGE to match subworkflow file name * 🎨🍏 Replace pprot.accession2taxid param for BINNING_SUMMARY with taxdump_tar_gz_dir as the required files for binning summary correspond to taxdump.tar.gz * 🎨🍏 Update autometa entrypoint in hmmer_hmmsearch_filter from autometa-markerfilter to autometa-hmmsearch-filter * 🎨🍏 Rename main joined contig annotation channel in bin_contigs.nf to metagenome_annotations * 🎨 Apply black formatting * 🍏 🎨 Change analyze_kmers to analyze_kmers_options * 🍏 Fix coverage output directory * 🍏 🎨 Add comment with reason file exists * 🍏 Fix lint errors for nonstandard nf-core structure * 🍏 remove .version.txt * Bump from v2.0.0-alpha.0 to 2.0 - #157 (comment) * Update CITATIONS.md Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 Fix lint errors for nonstandard nf-core structure * 🍏 remove irrelevant nf-core template doc images * 🍏 comment that file isn't ready * 🎨 remove extra newlines * 🍏 remove unused output * 🍏 remove mkdir - Leftover from testing output/publish directories * 🍏 Remove unnecessary intermediate fasta * 🍏 bump 2.0.0-alpha.0 to 2.0.0 in nf-manifest * Update subworkflows/local/align_reads.nf Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 🎨 Update line continuation within nf script * Update subworkflows/local/prepare_ncbi_taxinfo.nf Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> * 🍏 remove redundant collect * 🍏 fix lint * 🍏 Split Binning/UR subworkflow https://github.com/KwanLab/Autometa/pull/157/files#r696981913 * Update nextflow_schema.json * Update nextflow_schema.json * Update nextflow_schema.json Add other kmer embedding options in embedding method description * Update nextflow.config Fix typo * Update modules/local/diamond_blastp.nf :bug: Fix typo/bug * :art: Restrict diamond blastp process to run only one task at a time * :fire::green_apple: Fix linting by removing invalid benchmarking tasks nf script Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com> Co-authored-by: WiscEvan <erees@wisc.edu>

chasemc · 2021-09-28T20:12:22Z

Fixed by: #157

chasemc added the enhancement New feature or request label Jun 28, 2021

chasemc self-assigned this Jun 28, 2021

chasemc added a commit that referenced this issue Jun 29, 2021

🐍 Constrain binning decimal places

ad55ec1

Fixes #174

chasemc closed this as completed Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binning results Sig-Figs #174

Binning results Sig-Figs #174

chasemc commented Jun 28, 2021

chasemc commented Jun 28, 2021

chasemc commented Sep 28, 2021

Binning results Sig-Figs #174

Binning results Sig-Figs #174

Comments

chasemc commented Jun 28, 2021

chasemc commented Jun 28, 2021

chasemc commented Sep 28, 2021