New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve S3 Bitstream Storage to Lazy download object from S3 #9477
base: main
Are you sure you want to change the base?
Conversation
… and cause - The package org.w3c.dom is accessible from more than one module: <unnamed>, java.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abollini : I haven't tested this yet, but the code looks good overall. A few minor code suggestions inline below.
this.bufferSize = bufferSize; | ||
} | ||
|
||
public class S3LazyInputStream extends InputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add JavaDocs to describe this imbedded class?
return byteRead; | ||
} | ||
|
||
private void downloadChunk() throws IOException, FileNotFoundException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment of this method is off. Can we also add a basic description to this class to describe what it's doing? This seems to be the main method.
downloadChunk(); | ||
} | ||
|
||
int byteRead = currPos < endOfChunk ? currentChunkStream.read() : -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code alignment it odd here. It's indented more than normal
dspace-api/src/main/java/org/dspace/storage/bitstore/S3BitStoreService.java
Show resolved
Hide resolved
<groupId>xml-apis</groupId> | ||
<artifactId>xml-apis</artifactId> | ||
</exclusion> | ||
</exclusions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to exclude this? How is this related to the issue this PR is solving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pnbecker : This change is unrelated to the bug, but it is described above in the description of the PR:
Please note that this PR contains also an unrelated commit
2fd8680
that is needed to allow to use a recent version of Eclipse to build DSpace. Without this commit eclipse complains about The package org.w3c.dom is accessible from more than one module: , java.xml
<groupId>xml-apis</groupId> | ||
<artifactId>xml-apis</artifactId> | ||
</exclusion> | ||
</exclusions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment at the other pom.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment...this change seems to be related to an issue with using Eclipse + DSpace. It is unrelated to the S3 changes but I expect @abollini made the change to allow his Eclipse IDE to work better in developing this fix.
I will have to look deeper into this. In a first quick test I was not able to download files anymore. I got a White label error page instead. |
Description
The
S3BitStoreService#get
has been updated to return a custom InputStream able to lazy download the underlying S3 object when used. In this way we reduce the amount of space required on the local storage during download to a configurable value (default 5Mb).Instructions for Reviewers
The
testBitstreamPutAndGetWithAlreadyPresentBucket
in dspace-api/src/test/java/org/dspace/storage/bitstore/S3BitStoreServiceIT.java has been updated to verify that get and put still work properly and are consistent for different content size.You should checkout the PR and try to download one or more files of different size, monitor the temp folder to verify that no files named s3-copy-... are left behind and that also downloading a large file each temp file is not larger than 5Mb.
You can eventually customize the size of the chunk in the
bitstore.xml
spring file setting the propertybufferSize
in thes3Store
bean definition.Checklist
pom.xml
), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.