-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on corrupt files #120
Conversation
1 similar comment
Hi @theobisproject |
Hi @PeterAlfredLee |
I see. |
My biggest issue with this PR is that it is missing a unit test. Without a failing unit test, there is no way to know that this fixes anything or that a future change will not regress to the previous behavior. Please add a test ;-) I also see a lot of duplicate code like (5 times!)
which should be easily refactored. |
As you are a bit more familiar with the 7z archive maybe you have an idea how to avoid the entry allocation completely before we are sure the header is not corrupted. It was more a hack from me to process the data somehow. |
As this was encountered with production data there is no way I can share those and creating an artifical archive which is reproducing this is way beyond my current understanding of the format. The only way I see to currently add a test is when making the |
Probably not possible with the current code... tryToLocateEndHeader is the real cause because it does no CRC check and cannot, because by definition it's already a corrupt file. I have crafted a 233 byte malformed 7z which would attempt to allocate 268,435,455 files but I'm not certain it's wise to post it here. This is in some way related to my own problems with a very large 7z because the "kName" section allocates an enormous buffer for filenames (fixable by streaming the bytes instead). |
Just a short update. I will try to generate a reproducting file via fuzzing which is hopefully successful. This might take some days but I think at the end of the weekend I can share somthing. |
The test files @akelday attached to the jira issue are replicating the issue. Since this issue is dependend on the available Java heap it could be difficult to add this as unit test since I don't know a way to force the Java heap for a single test in JUnit. As far as I know it could only be set at the maven-surefire-plugin. |
if (files[nextFile] == null) { | ||
files[nextFile] = new SevenZArchiveEntry(); | ||
} | ||
files[nextFile].setName(fName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need a variable fName
here :
if (files[nextFile] == null) {
files[nextFile] = new SevenZArchiveEntry();
}
files[nextFile].setName(new String(names, nextName, i - nextName, StandardCharsets.UTF_16LE));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got rid of the variable in the latest commit
@@ -997,6 +1000,9 @@ private void readFilesInfo(final ByteBuffer header, final Archive archive) throw | |||
throw new IOException("Unimplemented"); | |||
} | |||
for (int i = 0; i < files.length; i++) { | |||
if (files[i] == null) { | |||
files[i] = new SevenZArchiveEntry(); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @garydgregory about this duplicate code. We can have a simple refactor here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is refactored as part of the latest commit
@@ -1090,7 +1108,13 @@ private void readFilesInfo(final ByteBuffer header, final Archive archive) throw | |||
++emptyFileCounter; | |||
} | |||
} | |||
archive.files = files; | |||
List<SevenZArchiveEntry> entries = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a final here?
final List<SevenZArchiveEntry> entries = new ArrayList<>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I use final
as much as possible to make it obvious when reading or debugging whether I need to even think about a given local variable, for more details on my reasoning, please see https://garygregory.wordpress.com/2013/01/26/the-final-kiss-in-java/
In the files I got from the fuzzing I noticed that even the array creation can lead to an OOM exception. I tried to replace it with a list and let it grow as needed but it didn't worked well. The current solution with the Map works (all tests pass on a Raspberry Pi 3) but has high memory allocation because of autoboxing the integer. |
Looks good. Thank you for your work @theobisproject . |
I am using this project https://github.com/rohanpadhye/jqf to do the fuzzing via the AFL bridge provided. This was the fuzz method implementation I used @Fuzz
public void sevenzOom(@From(InputStreamGenerator.class) InputStream in) {
try (SevenZFile sevenZFile = new SevenZFile(new SeekableInMemoryByteChannel(IOUtils.toByteArray(in)))) {
Assume.assumeTrue(sevenZFile.getNextEntry() != null);
} catch (IOException e) {
// Ignore
}
} |
This PR is nearly the same as the patch I attached to the Jira ticket. I made an additional null check before passing the initialized files to the archive to avoid NullPointerExceptions later on.
As said in the ticket it would be nice to check if the headers are ok before initializing the files but with my current knowledge of 7z this is out of scope for me. This patch works for the files I encountered even if I have to admit the patch is not very beautiful.