-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3 EMBL support #8
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from peer e-mail review
private EncodingScheme encodingSheme; | ||
|
||
/** | ||
* Initialize a new FastaSequenceFactory with default encoding scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refers to wrong type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
|
||
/** | ||
* Initialize a new FastaSequenceFactory with a custom encoding scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refers to wrong type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
/** | ||
* Initialize a new FastaSequenceFactory with default encoding scheme | ||
*/ | ||
public EmblSequenceFactory() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be implemented in terms of single arg constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
this.file.close(); | ||
} | ||
|
||
private void dropUntil(String flag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level of overlap between the Embl/Fasta/FastQ implementations. Refactor overlap into a shared dependency instead of duplicating code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored out into BufferedFileStreamReader class to provide help for performing buffered reads from files.
@Override | ||
public boolean hasNext() { | ||
// Dump whitespace | ||
dropUntil(x -> !Character.isWhitespace(x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambda's have a performance cost. Make dropUntilWhitespace() and dropUntilNotWhitespace() functions
* @author John | ||
* | ||
*/ | ||
public class EmblSequenceStreamReader implements SequenceStreamReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Significant overlap with Fasta/Fastq implementations. Consider condensing to a single SequenceStreamReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - Using only a single SequenceStreamReader now
Move FASTQ test data to FastqData Move FASTA test data to FastaData Cleanup StringFileSreamReader tests Cleanup file names Cleanup SequenceStreamWriter tests
f0d2b0b
to
247d954
Compare
*/ | ||
public class EmblSequenceFactory implements SequenceFactory { | ||
|
||
private EncodingScheme encodingSheme; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be encodingScheme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall good improvement on the architecture. A couple of small comments included where improvements might be made (or not).
/** | ||
* Ignore characters in the buffer while they are whitespace | ||
*/ | ||
public void dropWhileWhitespace() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the predicate version? Efficiency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
* Ignore characters in the buffer until they are whitespace | ||
*/ | ||
public void dropUntilWhiteSpace() { | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like dropWhileWhitespace this could be implemented using the predicate function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I do find it a little hard to differentiate dropUntil(x -> !Character.isWhitespace(x)) and dropUntil(x-> Character.isWhitespace(x)) when scanning the code as well. You can get around this using dropUntil(Character::isWhitespace) instead to differentiate it from the ! case.
I found that dropping whitespace, taking until whitespace, or dropping until whitespace were just very common actions so moving them into full functions makes it a bit easier and lambdas have a performance hit over direct reference.
index++; | ||
} | ||
|
||
return sb.toString().trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the resulting string is trimmed isn't evident from the comments on this function, although I imagine there's a reason for the result to be trimmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really shouldn't. It was apparently there due to windows having \r\n instead of just \n. I was able to remove the trim 👍
Remove large test files from repository
ca2e920
to
1d33c7b
Compare
Add support for EMBL file format, cleanup test code for reading and writing file formats