Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binning results Sig-Figs #174

Closed
chasemc opened this issue Jun 28, 2021 · 2 comments
Closed

Binning results Sig-Figs #174

chasemc opened this issue Jun 28, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@chasemc
Copy link
Member

chasemc commented Jun 28, 2021

Binning in dev outputs results as 64-bit floats:

contig  cluster completeness    purity  coverage_stddev gc_content_stddev
NODE_75_length_176226_cov_225.289       bin_0001        91.36690647482014       99.21259842519686       1.8240294840883275      0.784802804006503
NODE_98_length_143507_cov_224.136       bin_0001        91.36690647482014       99.21259842519686       1.8240294840883275      0.784802804006503
NODE_100_length_142487_cov_223.46       bin_0001        91.36690647482014       99.21259842519686       1.8240294840883275      0.78480280400650

Change output to less decimal places.

e.g. with df.to_csv('output.csv', float_format='%.3f')

@chasemc chasemc added the enhancement New feature or request label Jun 28, 2021
@chasemc chasemc self-assigned this Jun 28, 2021
@chasemc
Copy link
Member Author

chasemc commented Jun 28, 2021

@jason-c-kwan @WiscEvan
I guess this raises the question of what the appropriate number of decimal places is?

chasemc added a commit that referenced this issue Jun 29, 2021
evanroyrees added a commit that referenced this issue Sep 28, 2021
* 🎨 Add informational Autometa header to main nextflow script
* 🎨 Move and add comment about include statement
* 🎨 change `params.interim` to `params.interim_dir`
* 🎨 change `params.processed` to `params.outdir`
* 🎨 change `params.metagenome` to `params.input_fna`
* 🎨 Add help function. WIP, will need to be updated.
* 🎨 Update documentation and function names in validation
* 🎨 Modify interim directory value meaning
* 🎨 Minimize Nextflow file presence in top git directory
* 🎨 Move main.nf to nextflow/autometa.nf
* 🎨 Change config so that main script is found and the different config files are included
* Add nf-core boilerplate files and functions
* 🍏 Create ignore file for nf-core linter
* 🍏 replaced autometa.nf with main.nf
* Originally was main.nf and may revert to autometa.nf later. Bringing stuff from another repo where I tested the layout first
* 🍏 🎨 Change  `input_fna` to `input`
* nf-core requires having an `params.input` so may as well change the input argument to that
* 🍏 Add schema validation back to linting
* 🍏 Add nf-core `schema.json`
* 🍏 move def check_max() from base to hardware config
* 🍏 Add config profile options
* 🍏 remove nf-core generated autometa logo
* 🍏 Update docker repo path
* 🍏 🐛 Fix version check
* 🍏 Update nextflow schema
* 🍏 create nf-core output docuementation process
* 🍏 🎨 Format comment block
* 🍏 Add nf-core parameter validator to main.nf
* 🍏 nf-core templating
* 🍏 🎨 Streamline required config
* 🍏 Add missing Autometa arguments to config
* 🍏 separate environment and executor configs
* 🍏 separate environment and executor configs
* 🍏 remove old config
* 🍏 remove redundant process configs
* 🍏 remove 'container' option from individual processes
*  'container' option  is set once, from within config
* 🍏 Add module for downloading databases
* 🍏 Edit module for downloading databases
* 🍏 separate environment and executor configs
* 🍏 🎨 Update db param
* 🍏 Update config with new sub-config filenames
* 🍏 🐛 Remove old glob pattern
* 🍏 Change channel names to reflect they're files not dirs
* 🍏 Add download module to main.nf
* 🍏 Add nextflow/nf-core conda environtment.yml
* 🍏 add `nf-params.json` to `.gitignore`
* nf-params.json is generated if nf-core is used to launch the pipeline
* 🍏 Update nf-core schema
* 🍏 Update documentation/doc-functions
* 🍏 change "params.ncbi_database" to "params.single_db_dir"
* 🍏 Change nr.gz download from rsync to aspera
* 🍏 remove nf-core template /nextflow/CHANGELOG.md
* 🍏 Change tracedir behavior
* with current structure params.tracedir/outdir weren't being set and defaulted to a directory named "[:]" (empty groovy map)
* 🍏 comment out database download
* 🍏 Remove template documentation
* modify .gitignore to only ignore lib dir at top directory
* 🍏 Add nf-core /lib template files
* 🍏 Reorganize module directories
  - Organizing the module files/directories like this should allow easier collaboration with the development of modules
* 🍏 : 📝 Add short description of directory structure
* 🍏 move dependency install from /nextflow to top dir
  - I'm still unsure exactly the best way to proceed in regards to conda/docker.  Currently requirements.txt is really heavy and if people want to use HTC/HPC it might not be great to have every dependency installed at every instantiation. One thing might be to use the prokka and diamond, etc docker containers directly from the nextflow processes that use them rather than bundling them into a single conda/docker install
* 🍏 fix params
* docker image set once in process
  - See comment on earlier commit about conda/docker. Setting docker images individually per process might be beneficial for processes that only run e.g. prokka, diamond, etc
* 🍏 add help/launch documentation
* 🍏 🎨 Add code comments
* 🍏 Add missing cpu/mem/time params
* 🍏 🐎 Parallel ORF calling
  - Prodigal only runs singe threaded. So, split FASTA and run it in parallel on those
* 🍏 Add default interim
* 🍏 WIP-moving optional workflows out of "core"
* 🍏 separate diamond out of taxonomy workflow
* 🍏 Remove common_autometa_tasks
  - All of these functions have been pulled out and moved
* 🍏 Moved "gene_coverage" process/workflow out of "common"
* 🍏 reorganize directory
* 🍏 Add new parameter "taxonomy_aware"
* 🍏 replace "cpus" with nf-core "process-high-low etc"
* 🍏 Moving files and updating names
* 🍏 change 'assembly' to 'metagenome' to be consistent
* 🍏 Update relative paths in main config
* 🍏 Fix relative paths in main script
* 🍏 Create "params.parallel_high_disk"
* 🍏 Update config/params
* 🍏 Add if/else for taxonomy-based partitioning
* 🍏 Remove params in workflow files
  - Not  up to date so would need to go through again
* 🍏 Fix main script with updated params
  - Also fix Markers.out
* 🍏 Add new default nextflow out dirs to gitignore
* 🍏 Cleanup and documentation
* 🍏 Fix config
  - Version 21.04+ required for ${baseDir} in json params
* 🍏 Switch order of parallelization params
* 🍏 🐛 Fixes java error when parallel during tests onserver
* 🍏 Reuse split for parallel
* 🍏 🐛 Allow input to be named '~.filtered.fna'
  - Fixes #146
* 🍏 Remove typo
* 🍏 Move nextflow_schema.json to top directory
* 🍏  Now nf-core launch should be provided the top directory of the Autometa git repo instead of the Nextflow subdirectory.

Why:

If nf-core launch was told to launch from ~/Autometa/nextflow, things worked. 

If nf-core launch was told to launch from ~/Autometa, things worked but it would just create a new pipeline schema at ~/Autometa/nextflow_schema.json  and no help text or parameter grouping would be present

* 🍏 Fix main.nf  pointing to old schema location

Fixes error caused by:
6d59c83

main.nf couldn't find the schema

* 🎨 Add nf-core licenses 

Fixes #159

* 🍏 Change input help text

nf-core handles the quoting of the input

* More specific gitignore rule for nextflow/lib

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Add trailing newline

* Direct to readthedocs

* Point documantion to readthedocs not main repo

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Fix typo

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Add newline

https://github.com/KwanLab/Autometa/pull/157/files#r636203915

* Fix typo

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Update comment

* Update test.config

* Update 03_environment_profiles.config

* Update example command in help

* Update equality logic

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Update nextflow/.nf-core-lint.yml

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* add newline

* Update comment

Remove unneeeded comment and describe why process output_documentation exists but is commented

* Clarify comment

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Fix indentation

* add newline
add newline


add newline


add newline


add newline

* Add newline

* 🍏 Clarify diamond workflow

* Remove not-yet-finished download.nf

* add newline

* Update nextflow/nextflow.config

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Update nextflow_schema.json

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏 add link to "writing an institutional profile" 

https://nf-co.re/usage/tutorials/step_by_step_institutional_profile

* fix typos

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏 📝 Update documentation text

* add trailing newline

* 🍏 change 'gene_coverage' to 'contig_coverage'

* 🍏 📝 Add documentation to basic profiles

Also remove nice_slurm
fixes #161

* WIP

* 🐛 🐍 Fixes #168

CHange behavior of unclustered recruitment so that if there no unclustered bins, python doesn't raise an exception, it just gracefully logs and quits

* 🍏 Add isolate-genome test data creator

I needed a more minimal testing dataset than we currently had so created this. 

Some improvements (that should take minimal work) are left to be made, like downloading custom genomes

* 🍏 remove multiqc html until multiqc is added

* 🍏 update variable names

* 🐛 fixes #149

fixes #149

* Update nextflow/modules/autometa_core/utilities/process/kmer_coverage.nf

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏 change kmer_coverage name to spades_kmer_coverage

* 🍏 Update variable name to SPADES_KMER_COVERAGE

* 🍏 Make archaea emit optional

* 🍏 add explicit default for 'taxonomy aware'

* 🍏 change profile 'basic_slurm' to 'slurm'

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏 Switch directory and code structure

Still a work in progress but major changes to make the code/directory structure identical to what nf-core is going with for DSL-2 based pipelines from here forward
see the structure of https://github.com/nf-core/mag 


At this commit the pipeline is working but there's still some things left to do that I'm working on.

* 🍏 Update modules.conf and add help

* 🍏 Update module publishdirs

* 🎨 🍏 Add nextflow version badge to README

required for nf-core linter

* 🍏 conform to nf-core linter

* add 'defaults' conda channel

* 🍏  Update diamond to nf-core like module

* 🍏 WIP update coverage  to nf-core style modules

* 🍏 continuing to move to current nfcore dsl2 standard

* 🐍 faster binning/cluster evaluation

Instead of looping through all groups/bins, group and calculate only within pandas df

* Initial template commit

* update semver

* remove nf-core github actions

* 🍏 Multiple changes towards nf-core structure

* 🐍 Add hmmsearch domtblout file parser

* 🐍 remove old, unused 'metrics' list

metrics was for collecting the for loop results before I switched it to solely-pandas

* 🐍 add hmmsearch parser entrypoint

equivalent to what "markers" entrypoint  does, except with hmmsearch

* 🐍 Constrain binning decimal places

Fixes #174

* 🍏 Change binning proccess to medium resources

* 🍏 Change merge process outputs to meta.id.extension

* 🍏 Length cutoff default =3000

* 🍏 Add extensions to intermediate, merged files

* Delete extra space

* delete extra space

* Template update for nf-core/tools version 2.0.1

* 🍏 Increase default RAM 


WIP


wip


wip


wip


wip

* 🍏 Update to to nf-core 2.0  format

* 🍏 🐍 Add modularity for marker database inputs

This will help allow non-dockerized versions of the nextflow workflow point to other hmm datatabases. 

It keeps the original functionality, but otherwise allows users to point to custom hmm and cutoff files.

* 🍏 Update Autometa docker image tag handling
* 🍏 Fix formatting
* 🍏 Fix binning summary

In order for symlinks to work with Docker, must pass the NCBI directory as a path, not a parameter

* environment as autometa-nf install

* update requirements

* Fix failing conda solve (version issues) and remove nextflow dependencies
* 🍏 Fix whitespace,  continue moving to nf-core 2.0
* Explicit python3 calls in makefile
* 🍏 fix lint and remove unused params
* 🍏  nf-core lint fix
* Add citations
* 🍏 remove unused nf-core module
* 🍏 Simplify parallelization args

Now instead of having to specify to run in parallel AND the number of splits, only the number of splits has to be specified, with default =1, so no parallelization

* 🍏 update database path logic

* Update environment.yml
* remove empty module
* 🍏 Account for gz or not gz input
* 🍏 Fix db path variables/params
* 🍏 Set smaller default cpu/mem
* 🍏📝 Update param documentation
* 🍏 Make docker standard
* 🍏 Update params
* 🍏 Change and document to "autometa_image_tag"
* ⬆️ Add seqkit to requirements.txt
* 🔥 Remove seqkit from requirements.txt
📝 Fix typo in seqkit_filter.nf
* 🐛🐳🍏 Add slurm to profiles enabling docker in nextflow.config
* Remove newline
* 🎨 align commas
* 🎨🐍🔥 Rename hmmer.py to hmmscan.py
* 🎨🐍 Rename filter_hmmsearch.py to hmmsearch.py and place in external sub-directory
* 🎨 Reformat filenaming using f-strings instead of .join(...) method for kingdom-specific marker annotation
* 🎨 Update href from github.com/autometa to github.com/KwanLab/Autometa
* 🎨 Rename autometa-markerfilter entrypoint to autometa-hmmsearch-filter
* 🔥 Remove numbers in the middle of Izaak Miller in recursive_dbscan.py
* 🎨🍏 Change tag in diamond_blastp to also emit database
* 🎨🍏🔥 Replace mentions of nr when downloading accession2taxid tfile to accession2taxid file
* 🎨🍏🐳 replace hardcoded docker image tag from nfcore to params.autometa_image_tag
* 🎨🍏 Change READ_COVERAGE workflow name to CONTIG_COVERAGE to match subworkflow file name
* 🎨🍏 Replace pprot.accession2taxid param for BINNING_SUMMARY with taxdump_tar_gz_dir as the required files for binning summary correspond to taxdump.tar.gz
* 🎨🍏 Update autometa entrypoint in hmmer_hmmsearch_filter from autometa-markerfilter to autometa-hmmsearch-filter
* 🎨🍏 Rename main joined contig annotation channel in bin_contigs.nf to metagenome_annotations
* 🎨 Apply black formatting
* 🍏 🎨 Change analyze_kmers to analyze_kmers_options
* 🍏 Fix coverage output directory
* 🍏 🎨 Add comment with reason file exists
* 🍏  Fix lint errors for nonstandard nf-core structure
* 🍏 remove .version.txt
* Bump from v2.0.0-alpha.0 to 2.0
  - #157 (comment)
* Update CITATIONS.md

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏  Fix lint errors for nonstandard nf-core structure
* 🍏 remove irrelevant nf-core template doc images
* 🍏 comment that file isn't ready
* 🎨 remove extra newlines
* 🍏  remove unused output
* 🍏 remove mkdir
  - Leftover from testing output/publish directories
* 🍏 Remove unnecessary intermediate fasta
* 🍏 bump 2.0.0-alpha.0 to 2.0.0 in nf-manifest
* Update subworkflows/local/align_reads.nf

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏 🎨 Update line continuation within nf script
* Update subworkflows/local/prepare_ncbi_taxinfo.nf

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>

* 🍏 remove redundant collect
* 🍏 fix lint
* 🍏 Split Binning/UR subworkflow

https://github.com/KwanLab/Autometa/pull/157/files#r696981913

* Update nextflow_schema.json
* Update nextflow_schema.json
* Update nextflow_schema.json

Add other kmer embedding options in embedding method description

* Update nextflow.config

Fix typo

* Update modules/local/diamond_blastp.nf

:bug: Fix typo/bug

* :art: Restrict diamond blastp process to run only one task at a time

* :fire::green_apple: Fix linting by removing invalid benchmarking tasks nf script

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>
Co-authored-by: WiscEvan <erees@wisc.edu>
@chasemc
Copy link
Member Author

chasemc commented Sep 28, 2021

Fixed by: #157

@chasemc chasemc closed this as completed Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant