Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Java Streaming I/O for FASTA Files #1080

Merged
merged 8 commits into from
Feb 23, 2024

Conversation

hilbertglm
Copy link
Contributor

Added a class to use Java streams to process FASTA files.

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature, thanks! A couple of comments below. Apologies for the slow response.

InputStream input;
try {
input = provider.getInputStream(getPath().toFile());
} catch (IOException exception) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to throw the exception so that it can be handled at a higher level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if stream() doesn't permit a throws in the signature, then I'd advice to use an UncheckedIOException instead of RuntimeException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exceptions would break stream chaining, as the contrived example below shows. I changed to UncheckedIOException. I originally had the exception being thrown, so it is allowed by Java, but it caused me problems when I used it in my code, so I get where you are coming from.

		String file = this.getClass().getResource("PF00104_small.fasta.gz").getFile();
		String file2 = this.getClass().getResource("PF00105_small.fasta.gz").getFile();
		List<Path> paths = List.of(Paths.get(file), Paths.get(file2));
		List<ProteinSequence> sequences = paths
				.stream()
				.flatMap(path -> FastaStreamer.from(path).stream()) // <-- Not allowed if exception
				.collect(Collectors.toList());

* for use in a functional programming paradigm.
*
* @author Gary Murphy
* @since 7.0.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bump up to 7.1.0 since this introduces a (very nice) new feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to 7.1.0-SNAPSHOT

closed = true;
reader.close();
} catch (IOException exception) {
throw new RuntimeException(String.format("I/O error reading the FASTA file from '%s'", getPath()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to UncheckedIOException

@@ -85,8 +85,8 @@ public class ScopInstallation implements LocalScopDatabase {
public static final String comFileName = "dir.com.scop.txt_";

// Download locations
public static final String SCOP_DOWNLOAD = "http://scop.berkeley.edu/downloads/parse/";
public static final String SCOP_DOWNLOAD_ALTERNATE = "http://scop.berkeley.edu/downloads/parse/";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these changes were from #1077 . I've merged that one now. Could you merge master in to have a cleaner diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang. I thought I was. This is my first time dealing with a forked repo. I guess I will use the webapp instead of CLI. (Sigh)

* @param sequence the protein sequence
* @return the sequence
*/
protected ProteinSequence createSequence(String header, ProteinSequence sequence) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header parameter is not used, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The was in there before I found getOriginalHeader.

@hilbertglm
Copy link
Contributor Author

I am not sure how to fix the diff issue. I did the Sync fork on the GitHub UI. Do I need to create another pull request? Sorry... I am new to opensource contributions.

@josemduarte
Copy link
Contributor

Simply merge the latest master into your branch: you do that locally in your computer, then push to your fork. Then github will automagically pick it up and you will see an updated PR.

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. One final request from me: could you add a new 7.1.0 - future release section in the changelog and describe this new feature briefly?

@heuermh
Copy link
Member

heuermh commented Jan 30, 2024

This looks pretty cool -- may I ask is it primarily about new syntax/language features, or are there some performance considerations? If the caller is simply going to collect() at the end, it likely doesn't matter.

Copy link
Contributor

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! I'll try to create a new release in the next days

@hilbertglm
Copy link
Contributor Author

@heuermh It is more to do with liking in the streams programming style. I do all of my alignment testing, homologies, etc. using streams-based processing. I haven't done any benchmarking, but I would expect it to be pretty close to imperative loops.

@josemduarte josemduarte merged commit c672d01 into biojava:master Feb 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants