Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: huffmantable is broken. #100

Open
LebedevRI opened this Issue Apr 18, 2017 · 42 comments

Comments

Projects
None yet
9 participants
@LebedevRI
Copy link
Member

commented Apr 18, 2017

@axxel
hi.
0. i'm deeply saddened by the amount of sanity checks / input data validation in your code :(
1.

$ rstest -c "raw.pixls.us/Nikon/1 J5/20151027-10431071-NIKON_1_J5.nef"
20151027-10431071-NIKON_1_J5.nef                       : starting decoding ...                                                                                                                                                                                                 
20151027-10431071-NIKON_1_J5.nef failed: ../src/librawspeed/decompressors/HuffmanTable.h:176: void rawspeed::HuffmanTable::setCodeValues(const rawspeed::Buffer &): Corrupt Huffman. Code value 57 is bigger than 16                                                           
Total decoding time: 0s                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                               
WARNING: the following 1 tests have failed:                                                                                                                                                                                                                                    
20151027-10431071-NIKON_1_J5.nef failed: ../src/librawspeed/decompressors/HuffmanTable.h:176: void rawspeed::HuffmanTable::setCodeValues(const rawspeed::Buffer &): Corrupt Huffman. Code value 57 is bigger than 16                                                           
See rstest.log for details.

LebedevRI added a commit to LebedevRI/darktable that referenced this issue Apr 18, 2017

Rawspeed submodule update: partially unfuck HuffmanTable by adding sa…
…nity checks :/

Some Nikon 1 J5 raws (and others) are still broken,
but at least now they do not crash the whole app.

Refs. #11581
Refs. darktable-org/rawspeed#100
@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 28, 2017

Apparently happens for Nikon D3400 too.
Really not cool.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 28, 2017

So basically all nikon raws "lossy after split" are broken.

{/* 12-bit lossy after split */
{0, 1, 5, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0},
{0x39, 0x5a, 0x38, 0x27, 0x16, 5, 4, 3, 2, 1, 0, 11, 12, 12}},

{/* 14-bit lossy after split */
{0, 1, 5, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0},
{8, 0x5c, 0x4b, 0x3a, 0x29, 7, 6, 5, 4, 3, 2, 1, 0, 13, 14}},

@axxel

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2017

Based on your comments above you seem to make the assumption that my "fucked" HuffmanTable code broke the "lossy after split" files. For that assumption to be true, it would have needed to work before my changes. Have you tried that?

I have the my doubts: The thing is, when working on the rewrite, I was noticing a (presumed) bug in the old code and wanted to investigate further but there was no sample available triggering that bug. In fact, I asked you personally for one, but you did not have one either.

Now that there seems to be sample available, I'm pretty sure you'll manage to "completely unfuck" my HuffmanTable implementation. Maybe my mentioned analysis from last year can be of some help, too, who knows.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 29, 2017

Based on your comments above you seem to make the assumption that my "fucked" HuffmanTable code broke the "lossy after split" files. For that assumption to be true, it would have needed to work before my changes. Have you tried that?

I absolutely did. darktable-2.2.5, that still had the old rawspeed, loads these files flawlessly.
Feel free to try yourself and catch me on a lie.

I have the my doubts: The thing is, when working on the rewrite, I was noticing a (presumed) bug in the old code and wanted to investigate further but there was no sample available triggering that bug. In fact, I asked you personally for one, but you did not have one either.

b213eaf#diff-98d884ed1d3efccaf22be005567a6726R106

+  // 2. This is the actual huffman encoded data, i.e. the 'alphabet'. Each value
+  // is the number of bits following the code that encode the difference to the
+  // last pixel. Valid values are in the range 0..16.

Now, what is the maximal code value (the second array) for 12-bit lossy after split

{/* 12-bit lossy after split */
{0, 1, 5, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0},
{0x39, 0x5a, 0x38, 0x27, 0x16, 5, 4, 3, 2, 1, 0, 11, 12, 12}},

, or for 14-bit lossy after split
{/* 14-bit lossy after split */
{0, 1, 5, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0},
{8, 0x5c, 0x4b, 0x3a, 0x29, 7, 6, 5, 4, 3, 2, 1, 0, 13, 14}},

?

Now that there seems to be sample available, I'm pretty sure you'll manage to "completely unfuck" my HuffmanTable implementation. Maybe my mentioned analysis from last year can be of some help, too, who knows.

Yes, i do hope so. Hopefully even without sacrificing performance.

@LebedevRI

This comment has been minimized.

@axxel

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2017

Feel free to try yourself and catch me on a lie.

?!? Why would I have reason to question the truthfulness of your statements?

Given that it worked for the old code, I must have made a mistake in my analysis that lead to me suspecting the code was not working for that special case. As you might have inferred from my awareness of the fact that the "lossy after split" stuff seemed fishy, I was deliberately ignoring the existence of those invalid lossy after split numbers. It made no sense to me to try to retain some functionality that could not be tested and I suspected to be not working anyway. Much the same reasoning that presumably led you to drop a whole decoder recently.

I had another quick look at the old code to see if I was able to find the fault in my earlier reasoning. Now I'm thinking, the fault might have been that I thought the "big table" was used for decoding the whole image (which could not have worked) but now it seems to me that simply every lookup in the big table for the image part after the split would simply "fail" (which is supposed to happen if the code is too big for the "big table") and then the special/custom shl != 0 stuff would kick in.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2017

And one more broken piece, from https://discuss.pixls.us/t/darktable-for-windows/4966/264
DSC_0143.dng.gz
HuffmanTable::decode() { ... bs.fill(32); is the culprit, but even then something else is broken still.
Probably related to #94

@LibRaw i really do hope you weren't planning on upgrading anytime soon :)

@LibRaw

This comment has been minimized.

Copy link

commented Nov 30, 2017

Yes, it looks like we'll skip RawSpeed(v2) support for LibRaw 0.19

We'll try to implement some kind of 'flexible' support. May be via a whitelist of cameras w/o known problems and with sensor megapixel large enough for significant speedup (so, Canon 5DS*, Nikon D8xx, may be Canon 5Dmk4).

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2017

RawSpeed(v2)

I'm not aware which commit was the v2 release btw.
Current tag is v3.0, and i'm following the same scheme as with darktable - odd tag versions mean development version, even - release.

@LibRaw

This comment has been minimized.

Copy link

commented Dec 1, 2017

v1/v2 was used as 'conventional name' for 'Klaus master' and 'current devel' branches.

Is (some) current branch already used in some darktable release?

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Dec 1, 2017

Is (some) current branch already used in some darktable release?

Yeah, darktable git master is using rawspeed git develop since

commit 3b4c49d16b859758ddeccdf8a261719ac3f0c5f9 (upstream/pr/1402)
Author: Roman Lebedev <lebedev.ri@gmail.com>
Date:   Mon Jan 2 17:03:41 2017 +0300

    Track rawspeed as submodule

commit b82d189e00a8741cd1fce5207f60814c1959cc7f
Author: Roman Lebedev <lebedev.ri@gmail.com>
Date:   Mon Jan 2 17:02:04 2017 +0300

    Purge bundled rawspeed

So 2.4.0 series will be using (and current 2.4.0rc0 is using) this code.

@axxel

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2017

And one more broken piece, from https://discuss.pixls.us/t/darktable-for-windows/4966/264
DSC_0143.dng.gz

I don't use darktable and my rstest has no problem extracting some whitebalance information out of that file. Can you provide some steps to reproduce the issue with using only librawspeed tools?

HuffmanTable::decode() { ... bs.fill(32); is the culprit, but even then something else is broken still.
Probably related to #94

I can decode that dng but it contains a few apparently wrongly decoded tiles. Do you see any connection between the nikon lossy-after-split behavior and the issues with that DNG file? Can you elaborate what the problem with the bs.fill(32); line is in your opinion?

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Dec 1, 2017

@axxel i'm sorry for bothering you, only the original comment in this issue was directed at you, the rest of the comments i'm simply using to record my findings.

@axxel

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2017

No worries, I'm not bothered. As I said earlier this year, I'd be happy to look into any issues with my code, in case something shows up later.

@ccmoralesj

This comment has been minimized.

Copy link

commented Mar 1, 2018

I was trying to process a .NEF RAW photo from my Nikon D5600 and I got this message:

failed to read camera white balance information from `DSC_0371.NEF

The console error I got was:

[rawspeed] (DSC_0371.NEF) void rawspeed::HuffmanTable::setCodeValues(const rawspeed::Buffer&), line 176: Corrupt Huffman. Code value 92 is bigger than 16
[temperature] failed to read camera white balance information from `DSC_0371.NEF'!
[temperature] `NIKON CORPORATION NIKON D5600' color matrix not found for image
[temperature] failed to read camera white balance information from `DSC_0371.NEF'!
[rawspeed] (DSC_0372.NEF) void rawspeed::HuffmanTable::setCodeValues(const rawspeed::Buffer&), line 176: Corrupt Huffman. Code value 92 is bigger than 16

It seems that here is the issue I'm looking for. I'll wait for support or upgrade for this.

PS. I would be glad to share with you (if needed) photos that triggers this error.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Mar 1, 2018

Sorry for the trouble :/

It seems that here is the issue I'm looking for.

I think so. Though there is more than one issue here.
Some part of it (not the nikon part you are seeing) is fixed in my WIP branch

PS. I would be glad to share with you (if needed) photos that triggers this error.

Thanks, but for D5600, all the needed samples are already present on RPU.

@exaexa

This comment has been minimized.

Copy link

commented Mar 21, 2018

+1 also waiting for the fix.

Btw @LebedevRI some months ago we've found a workaround to this issue for Nikon J1's (a one-liner patch that I accidentally lost), would you happen to remember what the problem was?

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2018

Uhh, if it's one-liner i think it would be changing

-bs.fill(32);
+bs.fill(maxCodePlusDiffLength());
@exaexa

This comment has been minimized.

Copy link

commented Mar 21, 2018

@LebedevRI Fixed the issue. Thanks a lot. 👍

(btw -- if that helps someone -- the point where this breaks for me is in BitStream.h, the error I receive is from this IOE:

ThrowIOE("Buffer overflow read in BitStream");

from GDB I could see that pos was larger than size (exactly pos was size+4) when the exception was thrown.)

LebedevRI added a commit to LebedevRI/rawspeed that referenced this issue Apr 29, 2018

@ralfbrown

This comment has been minimized.

Copy link

commented May 14, 2018

I just uploaded a D7000 14-bit "lossy after split" image to RPU (though it's not showing in the repo search yet -- looks like that's updated once a day?). I discovered that even last night's git master still breaks on them when I went back to revisit some old photos.

@shoffmeister

This comment has been minimized.

Copy link
Contributor

commented May 16, 2018

Would there be any chance that the patch from above

-bs.fill(32);
+bs.fill(maxCodePlusDiffLength());

gets dropped into develop / into some darktable release? It's difficult for me to set up a build environment (on Windows)...

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented May 16, 2018

I just uploaded a D7000 14-bit "lossy after split" image to RPU

Thanks!

(though it's not showing in the repo search yet -- looks like that's updated once a day?

There is a pre-review.

Would there be any chance that the patch from above
-bs.fill(32);
+bs.fill(maxCodePlusDiffLength());

It's quite broken beyond that i'm afraid.
Now that i finally bothered to push the NFC test changes (f5eeb2d...9a52017),
i just need one more push to finish the more major part of the fix
(develop...LebedevRI:huffmantable), that should bring it closer to life..

@ccmoralesj

This comment has been minimized.

Copy link

commented Jul 24, 2018

I was wondering... Is this bug fixed, yet? and added to release candidate?

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Jul 25, 2018

Fully - no, not yet.
I've fixed the base glaring issues in #130, but as cross-fuzzers (between two different huffman table implementations) tell me, there are still issues.
i'm going to re-look into the nikon issue (#129) next.

@exaexa

This comment has been minimized.

Copy link

commented Jul 25, 2018

regarding nikon, please ask if you'd need NEFs that fail to decode; I can generate them on demand.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Jul 25, 2018

Which camera? There are ~6 such raws on RPU already, so i think now * the samples are not an issue

EDIT: right, J1

  • If only those samples were in rawsamples.ch back then :/
    transition of rawspeed, and a fork fo rawsamples.ch -> RPU happened at the same time
@exaexa

This comment has been minimized.

Copy link

commented Jul 25, 2018

Yeah, J1. I've already put a sample from that to RPU. If I find any other camera doing the same I'll post it.

@m-mmm

This comment has been minimized.

Copy link

commented Aug 14, 2018

Is there a way to test rawspeed on a raw file using command line?

My idea is to convert affected files to DNG. I would be better than converting all my files.

Also: I have a lot of affected D5500 samples with no people in, if needed I can provide those.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2018

Sure. darktable-rs-identify tool if you are using darktable.

@m-mmm

This comment has been minimized.

Copy link

commented Aug 14, 2018

How do I use it? It always shows this message:
WARNING: Couldn't find cameras.xml in 'C:/darktable/darktable-install/share/darktable/rawspeed/cameras.xml'
WARNING: Couldn't find cameras.xml in 'darktable-rs-identify.exe/../share/darktable/rawspeed/cameras.xml'
ERROR: Couldn't find cameras.xml in 'darktable-rs-identify.exe/../share/darktable/rawspeed/cameras.xml'
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid

OK --> all hard coded paths are not valid, creating C:\Program Files\darktable\share\darktable\rawspeed myself and copying cameras.xml to it.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2018

No idea, i don't use windows, i can't help here.

@dsaronin

This comment has been minimized.

Copy link

commented Oct 22, 2018

Is there any type of workaround possible for this error until the bug gets fixed?
Is it possible to use the white balance info from another RAW NEF file which isn't broken?
I see that Darktable is able to generate a thumbnail image for the file.
Right now I'm dead in the water and unable to edit some files from the camera.

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2018

Is it possible to use the white balance info from another RAW NEF file which isn't broken?

It's not about white balance. That dt's message is uninformative/misleading i'd say.
It quite literally fails to decode the whole image.

I see that Darktable is able to generate a thumbnail image for the file.

No. Those are exif2 thumbnails, which are shown by default unless you opt into raw preview.

Is there any type of workaround possible for this error until the bug gets fixed?

Which bug are you seeing? I.e. can you show the error console output?

But,

  • use older darktable (2.2.x should be last good i think?)
  • convert those specific nef's to dng via adobe dng converter/etc

Right now I'm dead in the water and unable to edit some files from the camera.

I'm sorry.

@dsaronin

This comment has been minimized.

Copy link

commented Oct 22, 2018

LebedevRI added a commit to LebedevRI/rawspeed that referenced this issue Oct 29, 2018

NikonDecompressor: 2nd attempt: resurrect the code as it was in v3.0
This is basically a direct resurrection (copy of relevant pieces) from
* RawSpeed/NikonDecompressor.{h,cpp}
* RawSpeed/LJpegDecompressor.{h,cpp}
back from it's state as of v3.0 tag,
when the support for Nikon Lossy After Split raws was working fine...

This will clearly need to undergo all the same evolution
as the other code did, and probably be deduplicated back after darktable-org#148.

But for now, a (long-overdue, almost 2 years) fix for a regression
is more important than horrible duplicated code.

This is indeed a second attempt.
I did this initially back during 2891a92
but introduced a bug (that was later fixed by 1f42e28)
which was causing mis-decoding, so i abandoned the initial attempt...

Fixes darktable-org#129.
Refs. darktable-org#100.
Refs. https://redmine.darktable.org/issues/12209
@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 30, 2018

FYI i have finally (sigh) restored then nikon lossy-after-split support (#129).
So the main remaining problem is #128, the last decode.
Not quite sure what is up with #127, might be a semi-duplicate of #132.

@m-mmm

This comment has been minimized.

Copy link

commented Oct 30, 2018

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 30, 2018

Did you read my mail

Uh, nothing recent comes to mind..

about this script which can mass-test files with rawspeed? I'd like to share it but I'm not sure how to properly do that.

In principle, you can just pass -DRAWSPEED_ENABLE_SAMPLE_BASED_TESTING=ON -DRAWSPEED_REFERENCE_SAMPLE_ARCHIVE=<raw.pixls.us checkout> when building,
and use ninja rstest-{create,recreate,test,check,clean} build targets.
(see sample-based-testing.cmake)
Although locally i have an ugly contraption of a shell script to do that.

@m-mmm

This comment has been minimized.

Copy link

commented Oct 30, 2018

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 30, 2018

I sent this to the dev mailing list:

Ah, that explains why i have not seen it.
I would have guessed you could simply attach it to the mail to the mailing list.

@m-mmm

This comment has been minimized.

Copy link

commented Oct 30, 2018

@m-mmm

This comment has been minimized.

Copy link

commented Oct 30, 2018

@LebedevRI

This comment has been minimized.

Copy link
Member Author

commented Oct 30, 2018

Ok, I attached my script.

I meant the dt-dev list.
github does not deal with mail attachments, it just silently drops them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.