Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPRESS-505 : bug fix for random access of 7z #95

Merged
merged 1 commit into from Mar 24, 2020

Conversation

PeterAlfredLee
Copy link
Member

@PeterAlfredLee PeterAlfredLee commented Mar 5, 2020

There are some problems in my PR about random access of 7z [#83](COMPRESS-342 random access of 7z files) :

  1. I was thinking that the currentFolderInputStream can be repositioned by changing the position of the channel, which turns out to be impossible. This PR fixesit by reopening the currentFolderInputStream.

  2. There are 2 ways to access the content of a 7z archive now : by sequential access(getNextEntry) and by random access(getInputStream). They may be used one after another. So there're some conditions we need to deal with :

2.1 In a random access, if currentEntryIndex == entryIndex && the entry has not been read yet :
This means the input stream of the entry we want has already been put in the deferredBlockStreams as the last array member. We SHOULD NOT build a new input stream for the entry again, because this will make same the existed stream in deferredBlockStreams be skipped. We should just do nothing cause the input stream is already in the deferredBlockStreams.

2.2 In a random access, if currentEntryIndex == entryIndex && the entry has already been read :
This means the entry we want has been read(maybe some of entry or all of the entry has been read, it does not matter). Then we should reopen the currentFolderInputStream and skip all the entries before the entry we want.
BTW : we could determine if the file has been read or not by comparing the bytesRemaining of the input stream(as a CRC32VerifyingInputStream) and the actual size of the file.

2.3 In a random access, if currentEntryIndex < entryIndex :
The input streams whose index equals or less than currentEntryIndex has already been put into the
deferredBlockStreams. We could just add the remaining entries to the deferredBlockStreams.

2.4 In a random access, if currentEntryIndex > entryIndex :
This means the entry we want has already been read or skipped beforehand. We could only reopen the currentFolderInputStream and skip all the entries again.

In short, we should do nothing in 2.1, skip the remaining entries in 2.3, and reopen the currentFolderInputStream in 2.2/2.4. I have to admit this is a bit complicated, but I didn't find any other better ideas building the logic. :(

I made some refactoring and added some new comments to make the code more clear. The corresponding testcases are also included in this PR.

@coveralls
Copy link

coveralls commented Mar 5, 2020

Coverage Status

Coverage increased (+0.04%) to 87.039% when pulling ccf4f3d on PeterAlfreadLee:COMPRESS-505 into 1e8d131 on apache:master.

There are exceptions when reading multiple times of same 7z entry.
There are also some exceptions when using getNextEntry and getInputStream at the same time.
This is the fix for them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants