-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow explicit index file paths for inputs to CrosscheckFingerprints #1823
Conversation
…inputs; added some tests for this new functionality.
Here is the update to CrosscheckFingerprints as promised. I would appreciate a review from anyone who has time to take a look, especially any number of: @droazen, @yfarjoun, @meganshand, @kachulis, @lbergelson. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rickymagner!
Added another commit to allow for an I'm working on one more test to make sure when this option is on and inputs don't have a valid index, then the right exception should be thrown. |
In the meantime, I refactored the SAM file parsing a bit at this line. Previously this had a hot fix to address this issue. It would be nice if someone with a bit more knowledge about |
Added the last test as promised above for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner and thank you for adding the additional tests!
private SamReader getSamReader(final Path samFile, final Path indexPath, final boolean forceIndex) { | ||
final SamInputResource samResource = SamInputResource.of(samFile); | ||
if (indexPath != null) { | ||
samResource.index(indexPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samResource.index(indexPath); | |
samResource.index(indexPath, seekableChannelFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just implemented this change and checked this should be equivalent to the old code. I left a comment pointing to the relevant issue in case someone tries to delete the seekableChannelFunction
wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small things, other wise looks good. Thanks @rickymagner!
@@ -672,27 +684,19 @@ public Map<FingerprintIdDetails, Fingerprint> fingerprintFiles(final Collection< | |||
final Map<FingerprintIdDetails, Fingerprint> oneFileFingerprints; | |||
log.debug("Processed file: " + p.toUri().toString() + " (" + filesRead.get() + ")"); | |||
|
|||
// Determine whether valid index path given for this file | |||
final Path indexPath = (indexPathMap != null && indexPathMap.containsKey(p)) ? indexPathMap.get(p) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to check that p is in indexPathMap. indexPathMap.get(p)
will return null if it can't find p in the map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, removed the second boolean expression for the ternary check
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
public Map<String, Fingerprint> loadFingerprints(final Path fingerprintFile, final String specificSample) { | ||
final VCFFileReader reader = new VCFFileReader(fingerprintFile, false); | ||
public Map<String, Fingerprint> loadFingerprints(final Path fingerprintFile, final Path indexPath, final boolean forceIndex, final String specificSample) { | ||
final VCFFileReader reader = getVCFReader(fingerprintFile, indexPath, forceIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
final VCFFileReader reader = new VCFFileReader(fingerprintFile, indexPath, forceIndex)
it looks like that constructor should handle a null indexPath correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually causes a bunch of tests to fail (apparently due to some null pointer exception). Although it seems like the method accepts null
pointers, it looks like htsjdk handles that by assuming the index is next to the input file with extension .tbi
which many of our test files don't have. I think it might be safer to leave it as is for now. This also allows for the custom PicardException to display when the user forces index file paths but they cannot be found.
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
final SamInputResource samResource = SamInputResource.of(samFile); | ||
if (indexPath != null) { | ||
// Use of seekableChannelFunction here avoids issue: https://github.com/broadinstitute/picard/issues/1175 | ||
samResource.index(indexPath, seekableChannelFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked about this offline, but it looks like when using a SamInputResource, you need to explicitly give it an index if you want one to be used. So you need to find the index if you haven't been given it explicitly. See, as an example, the SamReader open method which takes a Path. Would also be good to add a test to confirm that the index is used when it is available if not explicitly given as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed and also added a test that should address this.
…g index when nearby but not provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rickymagner Sorry for the delay. I have some comments. The ones about the UI you can ignore if you feel like this is the cleanest way, but I think having a single input that can have a 3rd column with matched indexes like what David added recently in GATK (and we can maybe steal the loading code for) would be clearer.
I also am not sure I understand why you need 2 force index arguments, it seems extremely error prone. I will 100% forget to force the second set of indexes every time.
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
src/test/java/picard/fingerprint/CrosscheckFingerprintsTest.java
Outdated
Show resolved
Hide resolved
@rickymagner Thanks for addressing those. Some comments are more aspirational than practical. Looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @rickymagner!
Description
This PR addresses an issue regarding data storage solutions where indexed files can be separated from their indices. CrosscheckFingerprints will now allow for an
INPUT_INDEX_MAP
tsv file with two columns: the first being the list of input files, and the latter being the corresponding index file locations. One can also do the same for theSECOND_INPUT
arguments usingSECOND_INPUT_INDEX_MAP
.Some tests have been added to test this functionality. This involved copying some test files into a separate dir from index files which were generated in a different directory. The test generates the
INPUT_INDEX_MAP
file on the fly so as to not rely on any absolute file paths.Although the tool still runs if an index is not provided, a new warning will be written if providing an index map without mapping each input to an index. If an index file path is typed incorrectly in the map, an error should be thrown by some dependency classes about not being able to import a valid index from the given path.
Checklist
Content
Review