Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#3 EMBL support #8

Merged
merged 2 commits into from
Oct 29, 2016
Merged

#3 EMBL support #8

merged 2 commits into from
Oct 29, 2016

Conversation

JMBattista
Copy link
Owner

Add support for EMBL file format, cleanup test code for reading and writing file formats

@JMBattista JMBattista mentioned this pull request Oct 16, 2016
Copy link
Owner Author

@JMBattista JMBattista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from peer e-mail review

private EncodingScheme encodingSheme;

/**
* Initialize a new FastaSequenceFactory with default encoding scheme
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refers to wrong type

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

/**
* Initialize a new FastaSequenceFactory with a custom encoding scheme
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refers to wrong type

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

/**
* Initialize a new FastaSequenceFactory with default encoding scheme
*/
public EmblSequenceFactory() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be implemented in terms of single arg constructor

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

this.file.close();
}

private void dropUntil(String flag) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level of overlap between the Embl/Fasta/FastQ implementations. Refactor overlap into a shared dependency instead of duplicating code

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored out into BufferedFileStreamReader class to provide help for performing buffered reads from files.

@Override
public boolean hasNext() {
// Dump whitespace
dropUntil(x -> !Character.isWhitespace(x));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda's have a performance cost. Make dropUntilWhitespace() and dropUntilNotWhitespace() functions

* @author John
*
*/
public class EmblSequenceStreamReader implements SequenceStreamReader {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significant overlap with Fasta/Fastq implementations. Consider condensing to a single SequenceStreamReader

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - Using only a single SequenceStreamReader now

Move FASTQ test data to FastqData

Move FASTA test data to FastaData

Cleanup StringFileSreamReader tests

Cleanup file names

Cleanup SequenceStreamWriter tests
*/
public class EmblSequenceFactory implements SequenceFactory {

private EncodingScheme encodingSheme;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be encodingScheme

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

@ctdavids ctdavids left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good improvement on the architecture. A couple of small comments included where improvements might be made (or not).

/**
* Ignore characters in the buffer while they are whitespace
*/
public void dropWhileWhitespace() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the predicate version? Efficiency?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

* Ignore characters in the buffer until they are whitespace
*/
public void dropUntilWhiteSpace() {
while (true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like dropWhileWhitespace this could be implemented using the predicate function

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I do find it a little hard to differentiate dropUntil(x -> !Character.isWhitespace(x)) and dropUntil(x-> Character.isWhitespace(x)) when scanning the code as well. You can get around this using dropUntil(Character::isWhitespace) instead to differentiate it from the ! case.

I found that dropping whitespace, taking until whitespace, or dropping until whitespace were just very common actions so moving them into full functions makes it a bit easier and lambdas have a performance hit over direct reference.

index++;
}

return sb.toString().trim();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the resulting string is trimmed isn't evident from the comments on this function, although I imagine there's a reason for the result to be trimmed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really shouldn't. It was apparently there due to windows having \r\n instead of just \n. I was able to remove the trim 👍

Remove large test files from repository
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants