Skip to content

Conversation

@tiagofilipe12
Copy link
Collaborator

This pr adds unique identifiers to processes before parsing the pipeline to allow forking processes with the same name, which occurred when source process for new fork has the same exact name for multiple branches of a previous fork.

bfrgoncalves and others added 2 commits October 4, 2018 02:35
Co-authored-by: tiagofilipe12 <tiagofilipe12@gmail.com>
Co-authored-by: bfrgoncalves <bfgoncalves@medicina.ulisboa.pt>
@codecov-io
Copy link

codecov-io commented Oct 4, 2018

Codecov Report

Merging #141 into dev will increase coverage by 0.45%.
The diff coverage is 95.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #141      +/-   ##
==========================================
+ Coverage   38.46%   38.91%   +0.45%     
==========================================
  Files          59       59              
  Lines        5590     5635      +45     
==========================================
+ Hits         2150     2193      +43     
- Misses       3440     3442       +2
Impacted Files Coverage Δ
flowcraft/tests/test_pipeline_parser.py 100% <100%> (ø) ⬆️
flowcraft/generator/pipeline_parser.py 95.56% <94.11%> (-0.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3fbf28...07bba78. Read the comment docs.

@tiagofilipe12
Copy link
Collaborator Author

tiagofilipe12 commented Oct 4, 2018

Here is a previous example of the current implementation:

test2

And here is how it looks now:

test1

This was the test pipeline string used in the above example:

reads_download ( downsample_fastq fastqc_trimmomatic fastqc check_coverage (spades={'memory':'\'25GB\    ''} process_spades pilon mlst | skesa={'memory':'\'11GB\''} process_skesa pilon ( mlst | abricate ) |     spades={'container':'odiogosilva/spades-conda','version':'3.7.0','memory':'\'25GB\''} process_spades     pilon mlst) | downsample_fastq fastqc_trimmomatic fastqc check_coverage (spades={'memory':'\'25GB\''    } process_spades pilon mlst | skesa={'memory':'\'11GB\''} process_skesa pilon ( mlst | abricate ) | s    pades={'container':'odiogosilva/spades-conda','version':'3.7.0','memory':'\'25GB\''} process_spades p    ilon mlst))

@ODiogoSilva
Copy link
Collaborator

Great job guys! I think this will solve pretty much all issues with pipeline forks. For me, adding unique identifiers to allow the correct identification of source lanes also seemed like the best approach.


identifiers_to_tags = {}
"""
Dictionary to match new process names (identifiers) with original process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the : format, as in dict: Matches new process ...


new_process_names = []
"""
List of new process names used to replace in the pipeline string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before


# force to add a space between each token so that regex modification can
# be applied
find = r'[)(|]+'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the constants FORK_TOKEN, LANE_TOKEN and CLOSE_TOKEN instead of literals.

# escape characters are required to match to the dict keys
# (identifiers_to_tags), since python keys with escape characters
# must be escaped
find = r'{}[^_]'.format(val).replace("\\", "\\\\")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't usually like regex, but this is very nice. Great job!

tiagofilipe12 and others added 3 commits October 4, 2018 14:21
Co-authored-by: tiagofilipe12 <tiagofilipe12@gmail.com>
Co-authored-by: bfrgoncalves <bfgoncalves@medicina.ulisboa.pt>
Co-authored-by: tiagofilipe12 <tiagofilipe12@gmail.com>
Co-authored-by: bfrgoncalves <bfgoncalves@medicina.ulisboa.pt>
@tiagofilipe12 tiagofilipe12 merged commit 540bd21 into dev Oct 4, 2018
@tiagofilipe12 tiagofilipe12 deleted the unique_process_identifiers branch October 4, 2018 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants