New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1635] Eliminate passing FASTQ splittable status via config. #1636

Merged
merged 2 commits into from Jul 26, 2017

Conversation

Projects
4 participants
@fnothaft
Member

fnothaft commented Jul 26, 2017

Resolves #1635. Instead of passing whether a FASTQ was splittable via config, checks to see if the compression codec is splittable. This is more reliable. In the case of a .gz file, the BGZFEnhancedGZipCodec properly handles this edge case by checking the stream type; this coupled with us explicitly checking the stream when split picking ensures that we don't try to create an invalid GZIP split. Additionally, I identified and fixed an error in the old FASTQ code that did a seek on the uncompressed input stream to backtrack if seeing a line of quality scores that began with @ when identifying the position of the first valid record in a split. Instead, we check for two successive lines that start with an @, which indicates that the first line contains quality scores, while the second line contains read names.

[ADAM-1635] Eliminate passing FASTQ splittable status via config.
Resolves #1635. Instead of passing whether a FASTQ was splittable via config,
checks to see if the compression codec is splittable. This is more reliable.
In the case of a .gz file, the BGZFEnhancedGZipCodec properly handles this
edge case by checking the stream type; this coupled with us explicitly
checking the stream when split picking ensures that we don't try to create an
invalid GZIP split. Additionally, I identified and fixed an error in the old
FASTQ code that did a seek on the uncompressed input stream to backtrack if
seeing a line of quality scores that began with @ when identifying the position
of the first valid record in a split. Instead, we check for two successive lines
that start with an @, which indicates that the first line contains quality
scores, while the second line contains read names.

@fnothaft fnothaft added this to the 0.23.0 milestone Jul 26, 2017

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Jul 26, 2017

Coverage Status

Coverage remained the same at 83.961% when pulling e64119b on fnothaft:issues/1635-no-splittable-fastq-config into 7449b14 on bigdatagenomics:master.

coveralls commented Jul 26, 2017

Coverage Status

Coverage remained the same at 83.961% when pulling e64119b on fnothaft:issues/1635-no-splittable-fastq-config into 7449b14 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2281/
Test PASSed.

AmplabJenkins commented Jul 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2281/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 26, 2017

Member

Pushed a commit addressing reviewer comments.

BTW @heuermh do you think it would be worthwhile to add something to our CI that would flag any tabs in our source and fail the build? I would've missed those if you hadn't caught them.

Member

fnothaft commented Jul 26, 2017

Pushed a commit addressing reviewer comments.

BTW @heuermh do you think it would be worthwhile to add something to our CI that would flag any tabs in our source and fail the build? I would've missed those if you hadn't caught them.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Jul 26, 2017

Coverage Status

Coverage remained the same at 83.961% when pulling a78b510 on fnothaft:issues/1635-no-splittable-fastq-config into 7449b14 on bigdatagenomics:master.

coveralls commented Jul 26, 2017

Coverage Status

Coverage remained the same at 83.961% when pulling a78b510 on fnothaft:issues/1635-no-splittable-fastq-config into 7449b14 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2284/
Test PASSed.

AmplabJenkins commented Jul 26, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2284/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 26, 2017

Member

do you think it would be worthwhile to add something to our CI that would flag any tabs in our source and fail the build? I would've missed those if you hadn't caught them.

We have a linter that runs on the scala source, this made it through because it was a java source file. I don't think we can put a CI check on the whole repo because some of our test resources require tab characters.

Member

heuermh commented Jul 26, 2017

do you think it would be worthwhile to add something to our CI that would flag any tabs in our source and fail the build? I would've missed those if you hadn't caught them.

We have a linter that runs on the scala source, this made it through because it was a java source file. I don't think we can put a CI check on the whole repo because some of our test resources require tab characters.

@heuermh heuermh merged commit c8a2202 into bigdatagenomics:master Jul 26, 2017

3 checks passed

codacy/pr Good work! A positive pull request.
Details
coverage/coveralls Coverage remained the same at 83.961%
Details
default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 26, 2017

Member

Thank you, @fnothaft

Member

heuermh commented Jul 26, 2017

Thank you, @fnothaft

@fnothaft fnothaft deleted the fnothaft:issues/1635-no-splittable-fastq-config branch Jul 26, 2017

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 26, 2017

Member

Sorry, wrong button, I should've squashed.

Member

heuermh commented Jul 26, 2017

Sorry, wrong button, I should've squashed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 26, 2017

Member

We have a linter that runs on the scala source, this made it through because it was a java source file. I don't think we can put a CI check on the whole repo because some of our test resources require tab characters.

I mean, sure, but we could do something like:

find adam-*/src -name "*.java" -exec ./scripts/failIfHasTabs.sh {} \;
find adam-*/src -name "*.R" -exec ./scripts/failIfHasTabs.sh {} \;
find adam-*/src -name "*.py" -exec ./scripts/failIfHasTabs.sh {} \;
Member

fnothaft commented Jul 26, 2017

We have a linter that runs on the scala source, this made it through because it was a java source file. I don't think we can put a CI check on the whole repo because some of our test resources require tab characters.

I mean, sure, but we could do something like:

find adam-*/src -name "*.java" -exec ./scripts/failIfHasTabs.sh {} \;
find adam-*/src -name "*.R" -exec ./scripts/failIfHasTabs.sh {} \;
find adam-*/src -name "*.py" -exec ./scripts/failIfHasTabs.sh {} \;
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 26, 2017

Member

+1, add *.pom, *.sh

Member

heuermh commented Jul 26, 2017

+1, add *.pom, *.sh

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment