Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dag files #209

Merged
merged 15 commits into from
Jul 4, 2019
Merged

Dag files #209

merged 15 commits into from
Jul 4, 2019

Conversation

cimendes
Copy link
Member

This PR addresses an issue raised in #194 where the .treeDag.json and forktree.json files aren't automatically staged when publishing the resulting FlowCraft pipelines to a repository. As they are hidden files, they are often overlooked, breaking the execution of the pipelines when run remotely.

There was no reason to keep these files as dotfiles, so they were moved to the resources folder.

cimendes and others added 6 commits June 18, 2019 13:18
* remove submodule from dev install

* fix typo

* Added bwa component

* Added cpus to bwa command

* added manifest information to the `nextflow.config` file to allow for remote execution (#204) - Partial solve to #194 issue

- Deprecation of the `manifest.config´ file
- Add the manifest information to the `nextflow.config` file

* Added component for haplotypecaller

* Added merge vcfs to haplotypecaller component

* Added mark duplicates component

* Added bam index to mark duplicates

* Added base_recalibrator component

* Removed publishDir for haplotypecaller

* Added apply_bqsr process to base_recalibrator component

* Updated changelog

* Added description to haplotypecaller

* Add check for the location of specific dot files

* Updated changelog

* Updated version
@cimendes cimendes added enhancement New feature or request engine bufix labels Jun 18, 2019
@codecov-io
Copy link

codecov-io commented Jun 21, 2019

Codecov Report

Merging #209 into dev will increase coverage by 0.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #209      +/-   ##
==========================================
+ Coverage   41.95%   41.97%   +0.01%     
==========================================
  Files          72       72              
  Lines        6461     6464       +3     
==========================================
+ Hits         2711     2713       +2     
- Misses       3750     3751       +1
Impacted Files Coverage Δ
flowcraft/generator/error_handling.py 85% <ø> (ø) ⬆️
flowcraft/generator/components/variant_calling.py 100% <ø> (ø) ⬆️
flowcraft/generator/components/mapping.py 100% <ø> (ø) ⬆️
flowcraft/generator/inspect.py 10.47% <0%> (ø) ⬆️
flowcraft/templates/downsample_fastq.py 0% <0%> (ø) ⬆️
flowcraft/generator/engine.py 87.88% <100%> (+0.02%) ⬆️
flowcraft/flowcraft.py 60.62% <100%> (ø) ⬆️
flowcraft/tests/test_assemblerflow.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 890d54d...e430a3c. Read the comment docs.

Copy link
Collaborator

@tiagofilipe12 tiagofilipe12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cimendes good job here. 👍 . Left you some comments. Main thing is a suggestion to remove duplication and another is that you have code that seems to be fixing some component and is not related with this PR. While it's ok because the PR isn't that big, it is always better to keep PRs to its subject. Also changelog needs to have those additions.

flowcraft/generator/engine.py Outdated Show resolved Hide resolved
flowcraft/generator/engine.py Outdated Show resolved Hide resolved
os.mkdir(resources_dir)
outfile_tree_fork = open(os.path.join(resources_dir, "forkTree.json"), "w")
outfile_tree_fork.write(json.dumps(dict_viz))
outfile_tree_fork.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting that you are keeping consistency here between both methods. But now you can see that some duplication exists between the two methods? Maybe you could write a function called something like write_dag_to_file and use it in both methods.
Then you basically put everything inside that function and re-use in both places:

def write_dag_to_file(file_name, dict_viz):
    resources_dir = os.path.join(dirname(self.nf_file), "resources")
    if not os.path.exists(resources_dir):
            os.mkdir(resources_dir)
    outfile_tree_fork = open(os.path.join(resources_dir, file_name), "w")
    outfile_tree_fork.write(json.dumps(dict_viz))
    outfile_tree_fork.close()

or you can even go with with open... Then you just call the function in both places. Something like:

write_dag_to_file('forkTree.json', dict_viz)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the other comments @cimendes, they should remove much of this duplication/boilerplate when creating these files.

@@ -1465,8 +1465,8 @@ def _prepare_static_info(self):
return pipeline_files

def _dag_file_to_dict(self):
"""Function that opens the dotfile named .treeDag.json in the current
working directory
"""Function that opens the accessory named treeDag.json in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is accessory? Also notice that this method not only opens that file but loads its content to a dict and hence the docstring is incomplete. It was already incomplete before I know.

{{ forks }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this comment? Same what?

flowcraft/templates/downsample_fastq.py Show resolved Hide resolved
Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, but pls check the comments about duplication and the directory checks. Also, the forkTree content is not correct.

flowcraft/flowcraft.py Show resolved Hide resolved
flowcraft/generator/engine.py Outdated Show resolved Hide resolved

outfile_dag = open(os.path.join(dirname(self.nf_file), output_file)
, "w")
if not os.path.exists(resources_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check for the existance of this directory twice? It doesn't seem like it should be the responsiblity of this function to worry about this. My suggestion is that this check can be made at a higher level and here we assume that the directory already exists. Then, here and below will become simply the file writting operation without the check.

flowcraft/generator/engine.py Show resolved Hide resolved
flowcraft/generator/engine.py Outdated Show resolved Hide resolved
os.mkdir(resources_dir)
outfile_tree_fork = open(os.path.join(resources_dir, "forkTree.json"), "w")
outfile_tree_fork.write(json.dumps(dict_viz))
outfile_tree_fork.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the other comments @cimendes, they should remove much of this duplication/boilerplate when creating these files.

flowcraft/tests/test_assemblerflow.py Show resolved Hide resolved
simplified dag and treefork file write in a single function
added suggestions in #209
@cimendes
Copy link
Member Author

The verification was moved to the render_pipeline function and the function to write the json was made more general to accommodate both forktree.json and treedag.json files. Thanks @ODiogoSilva for pointing out my very silly mistake!

Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

flowcraft/generator/engine.py Outdated Show resolved Hide resolved
flowcraft/templates/downsample_fastq.py Outdated Show resolved Hide resolved
cimendes and others added 2 commits July 4, 2019 10:53
Co-Authored-By: Diogo Silva <o.diogosilva@gmail.com>
Co-Authored-By: Diogo Silva <o.diogosilva@gmail.com>
@cimendes cimendes merged commit c8a8574 into dev Jul 4, 2019
@cimendes cimendes deleted the DAG_files branch July 4, 2019 10:00
cimendes added a commit that referenced this pull request Sep 16, 2019
* Dag files (#209)

* move DAG JSON files to the resources directory

* added manifest information to the `nextflow.config` file to allow for remote execution (#204) - Partial solve to #194 issue
- Deprecation of the `manifest.config´ file

* Set phred encoding when it fails to be determined - trimmomatic (#211)

* fix bug publishdir (downsample_fastq component)

* add pphred33 when encoding fails to be determined, if still fails retry with phred64 encoding (trimmomatic component)

* Fix downsample (#222)

* edited file names for downsample fastqs
* stringified depth for file name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bufix engine enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants