New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning to CheckIlluminaDir if we detect cycles without data. #874
Conversation
CBCLReader will ignore and log a warning for tiles without data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usual nitpick comments but 👍 overall.
Would be nice to have a unit test for this.
reader.getAllTiles().forEach((key, value) -> { | ||
final List<File> fileForCycle = reader.getFilesForCycle(key); | ||
final long totalFilesSize = fileForCycle.stream().mapToLong(file -> file.length() - reader.getHeaderSize()).sum(); | ||
final long expectedFileSize = value.stream().mapToLong(BaseBclReader.TileData::getCompressedBlockSize).sum(); | ||
final Stream<BaseBclReader.TileData> emptyCycles = value.stream().filter((cycle) -> cycle.getCompressedBlockSize() <= 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need (
)
around cycle
. Also, the magic number 2
should have a constant or a comment or both.
reader.getAllTiles().forEach((key, value) -> { | ||
final List<File> fileForCycle = reader.getFilesForCycle(key); | ||
final long totalFilesSize = fileForCycle.stream().mapToLong(file -> file.length() - reader.getHeaderSize()).sum(); | ||
final long expectedFileSize = value.stream().mapToLong(BaseBclReader.TileData::getCompressedBlockSize).sum(); | ||
final Stream<BaseBclReader.TileData> emptyCycles = value.stream().filter((cycle) -> cycle.getCompressedBlockSize() <= 2); | ||
|
||
String emptyCycleString = emptyCycles.map(tile -> String.valueOf(tile.getTileNum())).collect(Collectors.joining(", ")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also write
String emptyCycleString = value.stream()
.filter(cycle -> cycle.getCompressedBlockSize() <= 2)
.map(BaseBclReader.TileData::getTileNum)
.map(Object::toString)
.collect(Collectors.joining(", "));
|
||
String emptyCycleString = emptyCycles.map(tile -> String.valueOf(tile.getTileNum())).collect(Collectors.joining(", ")); | ||
|
||
if (emptyCycleString.length() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer IMO to add the new variables and check before we compute filesForCycle
, totalFilesSize
, expectedFileSize
since they are only used by the if
on line 185 below. That way it's obvious that the new code doesn't rely on these variables.
if (read == 0) break; | ||
totalRead += read; | ||
//only decompress the data if we are expecting data. | ||
if (uncompressedByteArray.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really long method. It would be easier to understand if it was split into smaller methods.
int totalRead = 0; | ||
try (GZIPInputStream gzipInputStream = new GZIPInputStream(byteInputStream, uncompressedByteArray.length)) { | ||
while ((read = gzipInputStream.read(uncompressedByteArray, totalRead, uncompressedByteArray.length - totalRead)) != -1) { | ||
if (read == 0) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need {
}
here
(totalCycleCount + 1), this.streamFiles[totalCycleCount].getAbsolutePath())); | ||
} | ||
} else { | ||
log.warn("Ignoring tile " + tileData.tileNum + " there are no PF reads."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was in a method, you could do the negative case first, with a return
, which IMO would be clearer.
Just to be clear @jacarey, before this fix the tool would error on no data tiles? |
@sooheelee No there was just no warning. We didn't realize you could have tiles defined without data. This is a valid state, but one that seemed relevant to anyone running this program so we output a warning. |
Ok, thanks @jacarey. I will correct the updated release notes. |
CBCLReader will ignore and log a warning for tiles without data.
This fixes an issue where you have a tile definition in the cbcl but the tile has no data to decompress.
In addition we added some warning messages when we encounter a tile with no data.
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests