COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on corrupt files #120

theobisproject · 2020-07-27T13:30:09Z

This PR is nearly the same as the patch I attached to the Jira ticket. I made an additional null check before passing the initialized files to the archive to avoid NullPointerExceptions later on.
As said in the ticket it would be nice to check if the headers are ok before initializing the files but with my current knowledge of 7z this is out of scope for me. This patch works for the files I encountered even if I have to admit the patch is not very beautiful.

… corrupt files

coveralls · 2020-07-27T13:52:03Z

Coverage increased (+0.1%) to 87.267% when pulling 80e5a46 on theobisproject:COMPRESS-542 into b70af20 on apache:master.

coveralls · 2020-07-27T13:52:04Z

Coverage increased (+0.1%) to 87.267% when pulling 80e5a46 on theobisproject:COMPRESS-542 into b70af20 on apache:master.

PeterAlfredLee · 2020-08-04T12:01:43Z

Hi @theobisproject
I'm a little interested about the amount of SevenZArchiveEntry. How many entries do you have in your 7z archive that take so much memory?

theobisproject · 2020-08-04T13:07:50Z

Hi @PeterAlfredLee
as explained in the Jira Ticket it was a corrupted archive where the numFiles variable read from the header in readFilesInfo was about 138 million entries. I don't know what would be the correct number for the archive but this was the number commons-compress was reading.

PeterAlfredLee · 2020-08-04T13:11:54Z

I see.
This PR looks good now. Will look more into it these days.

garydgregory · 2020-08-04T13:17:12Z

My biggest issue with this PR is that it is missing a unit test. Without a failing unit test, there is no way to know that this fixes anything or that a future change will not regress to the previous behavior. Please add a test ;-)

I also see a lot of duplicate code like (5 times!)

if (files[i] == null) {
      files[i] = new SevenZArchiveEntry();
}

which should be easily refactored.

theobisproject · 2020-08-04T13:18:10Z

As you are a bit more familiar with the 7z archive maybe you have an idea how to avoid the entry allocation completely before we are sure the header is not corrupted. It was more a hack from me to process the data somehow.

theobisproject · 2020-08-04T13:34:36Z

My biggest issue with this PR is that it is missing a unit test. Without a failing unit test, there is no way to know that this fixes anything or that a future change will not regress to the previous behavior. Please add a test ;-)

As this was encountered with production data there is no way I can share those and creating an artifical archive which is reproducing this is way beyond my current understanding of the format. The only way I see to currently add a test is when making the readFilesInfo method protected.

akelday · 2020-08-05T22:28:42Z

avoid the entry allocation completely before we are sure the header is not corrupted

Probably not possible with the current code... tryToLocateEndHeader is the real cause because it does no CRC check and cannot, because by definition it's already a corrupt file.

I have crafted a 233 byte malformed 7z which would attempt to allocate 268,435,455 files but I'm not certain it's wise to post it here. This is in some way related to my own problems with a very large 7z because the "kName" section allocates an enormous buffer for filenames (fixable by streaming the bytes instead).

theobisproject · 2020-08-06T04:48:41Z

Just a short update. I will try to generate a reproducting file via fuzzing which is hopefully successful. This might take some days but I think at the end of the weekend I can share somthing.

theobisproject · 2020-08-10T05:58:30Z

The test files @akelday attached to the jira issue are replicating the issue. Since this issue is dependend on the available Java heap it could be difficult to add this as unit test since I don't know a way to force the Java heap for a single test in JUnit. As far as I know it could only be set at the maven-surefire-plugin.

PeterAlfredLee · 2020-08-11T12:06:51Z

src/main/java/org/apache/commons/compress/archivers/sevenz/SevenZFile.java

+                            if (files[nextFile] == null) {
+                                files[nextFile] = new SevenZArchiveEntry();
+                            }
+                            files[nextFile].setName(fName);


I think we do not need a variable fName here :

if (files[nextFile] == null) { files[nextFile] = new SevenZArchiveEntry(); } files[nextFile].setName(new String(names, nextName, i - nextName, StandardCharsets.UTF_16LE));

Yes, I got rid of the variable in the latest commit

PeterAlfredLee · 2020-08-11T12:08:49Z

src/main/java/org/apache/commons/compress/archivers/sevenz/SevenZFile.java

@@ -997,6 +1000,9 @@ private void readFilesInfo(final ByteBuffer header, final Archive archive) throw
                        throw new IOException("Unimplemented");
                    }
                    for (int i = 0; i < files.length; i++) {
+                        if (files[i] == null) {
+                            files[i] = new SevenZArchiveEntry();
+                        }


Agree with @garydgregory about this duplicate code. We can have a simple refactor here.

Code is refactored as part of the latest commit

PeterAlfredLee · 2020-08-11T12:15:13Z

src/main/java/org/apache/commons/compress/archivers/sevenz/SevenZFile.java

@@ -1090,7 +1108,13 @@ private void readFilesInfo(final ByteBuffer header, final Archive archive) throw
                ++emptyFileCounter;
            }
        }
-        archive.files = files;
+        List<SevenZArchiveEntry> entries = new ArrayList<>();


Maybe a final here?

final List<SevenZArchiveEntry> entries = new ArrayList<>();

+1
I use final as much as possible to make it obvious when reading or debugging whether I need to even think about a given local variable, for more details on my reasoning, please see https://garygregory.wordpress.com/2013/01/26/the-final-kiss-in-java/

theobisproject · 2020-08-13T18:31:12Z

In the files I got from the fuzzing I noticed that even the array creation can lead to an OOM exception. I tried to replace it with a list and let it grow as needed but it didn't worked well. The current solution with the Map works (all tests pass on a Raspberry Pi 3) but has high memory allocation because of autoboxing the integer.

PeterAlfredLee · 2020-08-15T03:28:42Z

Looks good. Thank you for your work @theobisproject .
BTW how did you do the fuzzing? I was thinking if we could have some fuzz tests in Compress.

theobisproject · 2020-08-15T09:24:58Z

I am using this project https://github.com/rohanpadhye/jqf to do the fuzzing via the AFL bridge provided. This was the fuzz method implementation I used

@Fuzz
public void sevenzOom(@From(InputStreamGenerator.class) InputStream in) {
    try (SevenZFile sevenZFile = new SevenZFile(new SeekableInMemoryByteChannel(IOUtils.toByteArray(in)))) {
        Assume.assumeTrue(sevenZFile.getNextEntry() != null);
    } catch (IOException e) {
        // Ignore
    }
}

COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on…

80e5a46

… corrupt files

PeterAlfredLee reviewed Aug 11, 2020

View reviewed changes

COMPRESS-542: Prevent OOM at array creation

41359f5

PeterAlfredLee merged commit 464ba19 into apache:master Aug 15, 2020

cxronen mentioned this pull request Mar 4, 2022

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/AST_BookStore#77

Open

cxronen mentioned this pull request Sep 20, 2022

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/BookStore_VSCode#111

Open

cxronen mentioned this pull request Dec 20, 2022

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/BookStore#286

Open

cxronen mentioned this pull request Feb 17, 2023

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/BookStore#456

Open

cxronen mentioned this pull request Nov 22, 2022

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/AST_BookStore#331

Open

cxronen mentioned this pull request Mar 15, 2023

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/AST_BookStore#551

Open

This was referenced Feb 15, 2023

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/BookStore_VSCode#323

Open

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/BookStore_VSCode#554

Open

cxronen mentioned this pull request May 25, 2023

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.15 cxronen/BookStore_VSCode#957

Open

AASMACMX mentioned this pull request Jul 20, 2023

CVE-2021-35516 @ Maven-org.apache.commons:commons-compress-1.20 CxDemoInABoxRepos/Java-Webgoat#873

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on corrupt files #120

COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on corrupt files #120

theobisproject commented Jul 27, 2020

coveralls commented Jul 27, 2020

coveralls commented Jul 27, 2020

PeterAlfredLee commented Aug 4, 2020

theobisproject commented Aug 4, 2020

PeterAlfredLee commented Aug 4, 2020

garydgregory commented Aug 4, 2020 •

edited

Loading

theobisproject commented Aug 4, 2020

theobisproject commented Aug 4, 2020

akelday commented Aug 5, 2020

theobisproject commented Aug 6, 2020

theobisproject commented Aug 10, 2020

PeterAlfredLee Aug 11, 2020

theobisproject Aug 13, 2020

PeterAlfredLee Aug 11, 2020

theobisproject Aug 13, 2020

PeterAlfredLee Aug 11, 2020

garydgregory Aug 11, 2020

theobisproject commented Aug 13, 2020

PeterAlfredLee commented Aug 15, 2020

theobisproject commented Aug 15, 2020

COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on corrupt files #120

COMPRESS-542: Lazy allocation of SevenZArchiveEntry to prevent OOM on corrupt files #120

Conversation

theobisproject commented Jul 27, 2020

coveralls commented Jul 27, 2020

coveralls commented Jul 27, 2020

PeterAlfredLee commented Aug 4, 2020

theobisproject commented Aug 4, 2020

PeterAlfredLee commented Aug 4, 2020

garydgregory commented Aug 4, 2020 • edited Loading

theobisproject commented Aug 4, 2020

theobisproject commented Aug 4, 2020

akelday commented Aug 5, 2020

theobisproject commented Aug 6, 2020

theobisproject commented Aug 10, 2020

PeterAlfredLee Aug 11, 2020

Choose a reason for hiding this comment

theobisproject Aug 13, 2020

Choose a reason for hiding this comment

PeterAlfredLee Aug 11, 2020

Choose a reason for hiding this comment

theobisproject Aug 13, 2020

Choose a reason for hiding this comment

PeterAlfredLee Aug 11, 2020

Choose a reason for hiding this comment

garydgregory Aug 11, 2020

Choose a reason for hiding this comment

theobisproject commented Aug 13, 2020

PeterAlfredLee commented Aug 15, 2020

theobisproject commented Aug 15, 2020

garydgregory commented Aug 4, 2020 •

edited

Loading