-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π Change Nextflow I/O behavior #218
Conversation
Change publishDir to write process files to their respective metagenome output directory - By default traceDir will now be written to params.outdir/trace - Change prodigal nf-core module output filenames - π Fix nsplits logic to only merge files when nsplits is greater than 1 (not 0) - π₯ Remove internal directories (outdir and interim with hashes prepended) - π₯ Remove interim directory parameter - π Change reduce_lca.nf label to process_medium - ππ Update schema to reflect removed parameters - ππ Change emitted results directory to user to only show params.outdir and not the removed params.interim_dir
Will be merging by EOTD tomorrow unless an issue is raised |
Few things to address, I'll probably edit this with more notes...
|
Would you suggest hardcoding a default output directory here? Or making this a required parameter for the end user? I think the typical end-user behavior will be to fill in this parameter as they will want to place their analyses in a specific location. Hardcoding the
|
I'm not sure what you mean by |
Yeah I think that would be fine.
Right now it's hardcoded and so will always be created based on where nextflow is run from As an aside, how is the LCA stuff kept in check with the nr db? If someone downloads a new nr.gz, |
Should probably add a check at the start of the pipeline and fast fail if |
I'm not convinced this is necessary at the moment. This could either be a job for nextflow or could cause some problems if the end-user is not using nextflow properly. We'll kick the can down the road for now. |
I think this is as intended for wherever nextflow is run b/c these LCA dbs can be used across runs of different datasets. If I am recalling correctly, if the ncbi databases change, the cached LCA databases will be regenerated (I think nextflow is already doing some of this file hash tracking behind the scenes here). |
π Add autometa-nxf-output to .gitignore
IMO- Because this PR removes the provenance (run ID) from the output I think either the pipeline should fail if files already exist or the provenance should be written as an output. This may be a larger issue but this PR does take a step backwards in data provenance |
Happy to add that in for the 2.1.0 release π |
Nextflow output structure now resembles what was discussed in #160. Note, metagenomes are not enumerated for generation of their output directory name, their
meta.id
is used which is themetagenome.simpleName
(groovy method) from the respective input metagenome. This clobbers filenames with multiple.
. For example, themeta.id
ofmy.example.metagenome.fasta
would bemy
. Moving forward (looking at #186), altering the input s.t. a sample sheet is provided would use values in this table to check for unique sample IDs to write each sample to its respective sample ID results directory.storeDir
for fetching mock_data genomeshmmsearch
nf processes still need to be fixednextflow_schema.json
nextflow.config
nf-core settings to silence linter warnings