-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Java Streaming I/O for FASTA Files #1080
Conversation
…of the URL path to reflect a change in the v2 naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice feature, thanks! A couple of comments below. Apologies for the slow response.
InputStream input; | ||
try { | ||
input = provider.getInputStream(getPath().toFile()); | ||
} catch (IOException exception) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to throw the exception so that it can be handled at a higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if stream()
doesn't permit a throws in the signature, then I'd advice to use an UncheckedIOException
instead of RuntimeException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exceptions would break stream chaining, as the contrived example below shows. I changed to UncheckedIOException
. I originally had the exception being thrown, so it is allowed by Java, but it caused me problems when I used it in my code, so I get where you are coming from.
String file = this.getClass().getResource("PF00104_small.fasta.gz").getFile();
String file2 = this.getClass().getResource("PF00105_small.fasta.gz").getFile();
List<Path> paths = List.of(Paths.get(file), Paths.get(file2));
List<ProteinSequence> sequences = paths
.stream()
.flatMap(path -> FastaStreamer.from(path).stream()) // <-- Not allowed if exception
.collect(Collectors.toList());
* for use in a functional programming paradigm. | ||
* | ||
* @author Gary Murphy | ||
* @since 7.0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should bump up to 7.1.0 since this introduces a (very nice) new feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to 7.1.0-SNAPSHOT
closed = true; | ||
reader.close(); | ||
} catch (IOException exception) { | ||
throw new RuntimeException(String.format("I/O error reading the FASTA file from '%s'", getPath())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to UncheckedIOException
@@ -85,8 +85,8 @@ public class ScopInstallation implements LocalScopDatabase { | |||
public static final String comFileName = "dir.com.scop.txt_"; | |||
|
|||
// Download locations | |||
public static final String SCOP_DOWNLOAD = "http://scop.berkeley.edu/downloads/parse/"; | |||
public static final String SCOP_DOWNLOAD_ALTERNATE = "http://scop.berkeley.edu/downloads/parse/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these changes were from #1077 . I've merged that one now. Could you merge master in to have a cleaner diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang. I thought I was. This is my first time dealing with a forked repo. I guess I will use the webapp instead of CLI. (Sigh)
* @param sequence the protein sequence | ||
* @return the sequence | ||
*/ | ||
protected ProteinSequence createSequence(String header, ProteinSequence sequence) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header
parameter is not used, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The was in there before I found getOriginalHeader
.
I am not sure how to fix the diff issue. I did the |
Simply merge the latest master into your branch: you do that locally in your computer, then push to your fork. Then github will automagically pick it up and you will see an updated PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. One final request from me: could you add a new 7.1.0 - future release
section in the changelog and describe this new feature briefly?
This looks pretty cool -- may I ask is it primarily about new syntax/language features, or are there some performance considerations? If the caller is simply going to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! I'll try to create a new release in the next days
@heuermh It is more to do with liking in the streams programming style. I do all of my alignment testing, homologies, etc. using streams-based processing. I haven't done any benchmarking, but I would expect it to be pretty close to imperative loops. |
Added a class to use Java streams to process FASTA files.