Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a new Walker abstract class to be used for all built-in walkers. #4964

Merged
merged 4 commits into from Apr 16, 2019

Conversation

jonn-smith
Copy link
Collaborator

Added the following methods to GATKTool:

  • getReferenceDataSource()
  • getReadsDataSource()
  • getFeatureManager()

Walker inherits directly from GATKTool and overrides these methods to throw an exception if they are called.

No walker should need to directly access the data.

@codecov-io
Copy link

codecov-io commented Jun 28, 2018

Codecov Report

Merging #4964 into master will decrease coverage by 6.734%.
The diff coverage is 47.826%.

@@               Coverage Diff               @@
##              master     #4964       +/-   ##
===============================================
- Coverage     86.832%   80.097%   -6.734%     
+ Complexity     32291     30641     -1650     
===============================================
  Files           1990      1991        +1     
  Lines         149107    149120       +13     
  Branches       16477     16477               
===============================================
- Hits          129472    119441    -10031     
- Misses         13622     23872    +10250     
+ Partials        6013      5807      -206
Impacted Files Coverage Δ Complexity Δ
...rg/broadinstitute/hellbender/tools/CountReads.java 100% <ø> (ø) 3 <0> (ø) ⬇️
...e/hellbender/engine/MultiplePassVariantWalker.java 95.238% <ø> (ø) 6 <0> (ø) ⬇️
...institute/hellbender/engine/TwoPassReadWalker.java 90% <ø> (ø) 4 <0> (ø) ⬇️
...nstitute/hellbender/engine/MultiVariantWalker.java 100% <ø> (ø) 16 <0> (ø) ⬇️
...ender/engine/MultiVariantWalkerGroupedOnStart.java 96.875% <ø> (ø) 20 <0> (ø) ⬇️
...nstitute/hellbender/engine/VariantLocusWalker.java 89.091% <ø> (ø) 17 <0> (ø) ⬇️
...roadinstitute/hellbender/engine/VariantWalker.java 93.333% <ø> (ø) 12 <0> (ø) ⬇️
...org/broadinstitute/hellbender/engine/GATKTool.java 87.556% <0%> (-3.648%) 102 <0> (ø)
...g/broadinstitute/hellbender/engine/ReadWalker.java 100% <100%> (ø) 15 <1> (ø) ⬇️
...e/hellbender/engine/AbstractConcordanceWalker.java 89.412% <100%> (ø) 13 <1> (ø) ⬇️
... and 179 more

Copy link
Contributor

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I known that @droazen asked for this in the Funcotator PR, but I would like to give my opition about this before: I think that it should not be recommended to overwrite, but it will be nice that a Walker class can retrieve the reads/reference/features. For example, aLocusWalker could access in the initialization the referenceto compute some statistic that will be use in each locus afterwards.

Otherwise, just one comment.


@Override
final protected ReferenceDataSource getReferenceDataSource() {
throw new GATKException("Should never access ReferenceDataSource in child classes of AssemblyRegionWalker.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssemblyRegionWalker should be Walker - here and below!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Fixed!

@magicDGS
Copy link
Contributor

Sorry for the multiple edits, something was wrong with my keyboard and it was submitting without my action.

@droazen
Copy link
Collaborator

droazen commented Jun 29, 2018

@magicDGS The problem with exposing the datasources to walkers is that they would be able to invalidate the entire traversal. For example, a ReadWalker could alter the traversal intervals on the reads datasource mid-way through traversal from within apply(), or it could cause the reads iterator used by the engine to get closed by issuing a separate iterator() call on the datasource, which would cause the rest of the traversal to fail.

This is why I feel strongly that the datasource objects should not be directly accessible to walker-based tools. Note that it's still possible for walkers to create their own, separate datasources without reaching into the ones used by the engine, or a tool author can extend GATKTool directly rather than one of the walker base classes and have the freedom to access everything (which was not possible before this PR).

@magicDGS
Copy link
Contributor

I am aware that those methods should be definetively implemented in the abstract class - but it could be recommended NOT TO OVERRIDE unless you know what you are doing. I know the problems of having them exposed, but also I can see some potential to be accessible from an implemented walker when the user knows what is doing...

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to @jonn-smith with my comments.

@@ -79,7 +78,7 @@ private void initializeDrivingFeatures() {
* Subclasses can override to provide their own behavior but default implementation should be suitable for most uses.
*/
@Override
public void traverse() {
public final void traverse() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This final conflicts with the "Subclasses can override to provide their own behavior" comment above. We typically don't make traverse() final to allow for specializations of walker traversals (like TwoPassVariantWalker overriding traverse() from VariantWalker). I think the comment should be modified to read:

"You should only override traverse() if you are writing a new walker base class in the engine package that extends this class. It is not meant to be overridden by tools outside of the engine package."

While you're at it, can you make the same change to the comments for traverse() in the other walker base classes (and remove the finals).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -56,7 +54,7 @@ protected final void onStartup() {
}

@Override
public void traverse() {
public final void traverse() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed above, don't make this final, but do add the comment "You should only override traverse() if you are writing a new walker base class in the engine package that extends this class. It is not meant to be overridden by tools outside of the engine package."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -146,7 +146,7 @@ protected final void onStartup() {
* and including deletions only if {@link #includeDeletions()} returns {@code true}.
*/
@Override
public void traverse() {
public final void traverse() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed above, don't make this final, but do modify the comment to read: "You should only override traverse() if you are writing a new walker base class in the engine package that extends this class. It is not meant to be overridden by tools outside of the engine package."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -29,7 +24,7 @@
* ReadWalker authors must implement the apply() method to process each read, and may optionally implement
* onTraversalStart() and/or onTraversalSuccess(). See the PrintReadsWithReference walker for an example.
*/
public abstract class ReadWalker extends GATKTool {
public abstract class ReadWalker extends Walker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docs for ReadWalker.traverse() to read "You should only override traverse() if you are writing a new walker base class in the engine package that extends this class. It is not meant to be overridden by tools outside of the engine package." instead of the current "Subclasses can override to provide own behavior"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -43,7 +39,7 @@
* Internally, the reads are loaded in chunks called read shards, which are then subdivided into active/inactive regions
* for processing by the tool implementation. One read shard is created per contig.
*/
public abstract class AssemblyRegionWalker extends GATKTool {
public abstract class AssemblyRegionWalker extends Walker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docs for AssemblyRegionWalker.traverse() to read "You should only override traverse() if you are writing a new walker base class in the engine package that extends this class. It is not meant to be overridden by tools outside of the engine package." instead of the current "Subclasses can override to provide own behavior"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


@Override
final protected ReadsDataSource getReadsDataSource() {
throw new GATKException("Should never access ReadsDataSource in child classes of Walker.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to: "Walker tools outside of the engine package should not directly access the engine reads datasource. They should get their reads data via apply() instead. Tools that need direct datasource access (eg., to implement custom traversal patterns) should extend GATKTool directly rather than a walker class, or introduce a new walker base class for the new traversal."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


@Override
final protected FeatureManager getFeatureManager() {
throw new GATKException("Should never access FeatureManager in child classes of Walker.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to: "Walker tools outside of the engine package should not directly access the engine feature manager. They should get their feature data via apply() instead. Tools that need direct datasource access (eg., to implement custom traversal patterns) should extend GATKTool directly rather than a walker class, or introduce a new walker base class for the new traversal."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*/
public abstract class Walker extends GATKTool {

@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add method-level javadoc comment stating that we are overriding this method to prevent walker tool implementations outside of the engine package from directly accessing the engine datasources, since walker tools should get their data via apply() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

throw new GATKException("Should never access ReferenceDataSource in child classes of Walker.");
}

@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add method-level javadoc comment stating that we are overriding this method to prevent walker tool implementations outside of the engine package from directly accessing the engine datasources, since walker tools should get their data via apply() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

throw new GATKException("Should never access ReadsDataSource in child classes of Walker.");
}

@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add method-level javadoc comment stating that we are overriding this method to prevent walker tool implementations outside of the engine package from directly accessing the engine datasources, since walker tools should get their data via apply() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@droazen droazen assigned jonn-smith and unassigned droazen and lbergelson Feb 8, 2019
* An abstract class to represent built-in walkers that inherit directly from {@link GATKTool}.
* Created by jonn on 6/28/18.
*/
public abstract class Walker extends GATKTool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to rename this to WalkerBase or AbstractWalker. Walker sounds like a concrete class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WalkerBase it is!

@jonn-smith jonn-smith assigned droazen and unassigned jonn-smith Apr 2, 2019
Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed some final tweaks to the method-level docs into this branch, and will merge once tests pass.

@droazen droazen merged commit 9fce0b2 into master Apr 16, 2019
@droazen droazen deleted the jts_walker_abstract_class branch April 16, 2019 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants