Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pixelwidth and pixelheight for raf files #810

Merged
merged 3 commits into from
Jul 28, 2019
Merged

Conversation

astippich
Copy link

Extract the pixel width and pixel height of raf images from the raf metadata found in fuji raw files.
Code inspired by exiftool

Fixes #755

@codecov
Copy link

codecov bot commented Apr 28, 2019

Codecov Report

Merging #810 into master will increase coverage by 0.05%.
The diff coverage is 21.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #810      +/-   ##
==========================================
+ Coverage   70.94%   70.99%   +0.05%     
==========================================
  Files         147      147              
  Lines       19268    19306      +38     
==========================================
+ Hits        13669    13706      +37     
- Misses       5599     5600       +1
Impacted Files Coverage Δ
src/rafimage.cpp 16.25% <21.73%> (-3%) ⬇️
src/tiffimage_int.cpp 90.64% <0%> (-0.59%) ⬇️
src/tiffvisitor_int.cpp 87.36% <0%> (-0.11%) ⬇️
src/tags_int.cpp 87.1% <0%> (ø) ⬆️
src/exiv2.cpp 100% <0%> (ø) ⬆️
src/params.cpp 73.61% <0%> (ø) ⬆️
src/exif.cpp 81.09% <0%> (ø) ⬆️
src/tags_int.hpp 62.5% <0%> (ø) ⬆️
src/params.hpp 100% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70a1b1b...982a8a7. Read the comment docs.

Copy link
Collaborator

@piponazo piponazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to have a new system test (check tests under the tests folder) to make sure that your changes are fixing the issue.

Normally we create first a python test for reproducing the detected issue and then you can apply the fix (preferentially the fix and the test in different commits). In #792 you can find an example of this procedure. Note that for providing a sample image, we remove the image content and we just keep the metadata.

src/rafimage.cpp Outdated Show resolved Hide resolved
src/rafimage.cpp Outdated Show resolved Hide resolved
Copy link
Member

@D4N D4N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

I've added some comments myself. A test case would be nice as @piponazo noted.

src/rafimage.cpp Outdated
@@ -296,6 +292,37 @@ namespace Exiv2 {
io_->read(jpg_img_length, 4);
long jpg_img_off = Exiv2::getULong((const byte *) jpg_img_offset, bigEndian);
long jpg_img_len = Exiv2::getULong((const byte *) jpg_img_length, bigEndian);
byte cfa_header_offset [4];
io_->read(cfa_header_offset, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before doing all these reads, please check that the image is actually large enough, or check the return value of each io_->read().

src/rafimage.cpp Outdated
io_->read(byte_count, 4);
byte byte_tag[2];
byte byte_size[2];
int32_t count = getLong(byte_count, bigEndian);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a sanity check for this value, otherwise this will create a near-infinite loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a sane value for you? The used test file has 32 tags, but I have no clue if other RAF files have a different number of raf tags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no upper limit mandated by a standart, then you can make a rough check like: enforce(count < remaining_file_size/size_of_a_tag);, provided that this is at least an order of magnitude less than the maximum value of int32_t for all reasonable files.

If I understand your code, then aren't you searching for the pixelwidth and pixelheight inside the cfa_hdr? And you've just extracted its length a few lines above, so given the tag size and the length, that gives you a pretty good upper limit on the number of tags that can possibly exist in there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag size is not constant according to exiftool, but I guess I could add a check with the minimal size for at least something

src/rafimage.cpp Outdated Show resolved Hide resolved
@astippich
Copy link
Author

Thanks for the feedback. I will try to implement it, but I am quite busy the next weeks.

Regarding the unit test, I expected and somewhat feared this request :)

  1. I implemented it for a user of KDE's KFileMetaData and tested it using the user's file (https://bugs.kde.org/show_bug.cgi?id=380919) which I would rather not like to upload as a test file, so I basically have no test file for this
  2. Since these are raw file they are quite big. I do not know how to reduce the file size

Any suggestions welcome.

@D4N
Copy link
Member

D4N commented Apr 29, 2019

Thanks for the feedback. I will try to implement it, but I am quite busy the next weeks.

No problem & no rush.

Regarding the unit test, I expected and somewhat feared this request :)

You don't have to write a unit test, an integration test would be easier imho. We have a custom integration test framework written in Python (take a look at the Python files in tests/bugfixes/). You essentially just feed it a exiv2 invocation and the expected output and the framework does the rest for you.

  1. I implemented it for a user of KDE's KFileMetaData and tested it using the user's file (https://bugs.kde.org/show_bug.cgi?id=380919) which I would rather not like to upload as a test file, so I basically have no test file for this

  2. Since these are raw file they are quite big. I do not know how to reduce the file size

exiv2 -ex $your_image will extract the metadata from the image and drop all pixel contents, so unless they put their credit card number in the metadata, it should be fine to upload. You might want to check the files contents nevertheless.

Any suggestions welcome.

@astippich
Copy link
Author

exiv2 -ex $your_image will extract the metadata from the image and drop all pixel contents, so unless they put their credit card number in the metadata, it should be fine to upload. You might want to check the files contents nevertheless.

That does not seem to work, probably because it is a raw file and the size information is also in the custom RAF metadata. I have a RAF image which the user granted me the rights to upload, but it is 31 MB. Is this a problem?

@D4N
Copy link
Member

D4N commented May 29, 2019

I have a RAF image which the
user granted me the rights to upload, but it is 31 MB. Is this a problem?

Kind of, we do not want to add a 31 MB file to git and even for git LFS it's
quite a huge file (I wouldn't want too many files larger than 1 MB, so that we
won't hit the 2GB limit anytime soon).

Is it possible to scale this file down, e.g. to 100x100 px?

@astippich
Copy link
Author

I have a RAF image which the
user granted me the rights to upload, but it is 31 MB. Is this a problem?

Kind of, we do not want to add a 31 MB file to git and even for git LFS it's
quite a huge file (I wouldn't want too many files larger than 1 MB, so that we
won't hit the 2GB limit anytime soon).

Is it possible to scale this file down, e.g. to 100x100 px?

According to the user, no (I asked for lowest quality and resolution). Again, probably because it is a raw file. I found https://rawsamples.ch/index.php/en/fuji which is content provided under Creative Commons and provides a 6 MB raf file. Is this acceptable?

@D4N D4N mentioned this pull request Jun 3, 2019
@D4N
Copy link
Member

D4N commented Jun 3, 2019

According to the user, no (I asked for lowest quality and resolution). Again,
probably because it is a raw file. I found
https://rawsamples.ch/index.php/en/fuji which is content provided under Creative
Commons and provides a 6 MB raf file. Is this acceptable?

Unfortunately 6 MB is still too much to be checked into git.

I've created #896 to enable git lfs, so that we can add your test file.

@astippich astippich force-pushed the raf_size branch 2 times, most recently from af8269e to 072f910 Compare June 20, 2019 14:57
@astippich
Copy link
Author

Finally found some time again to work on this. I've rebased and added error handling as requested.
As a consequence, the test for issue 857 failed differently, and I've adjusted the test.

@piponazo
Copy link
Collaborator

Thanks @astippich for the contribution. The changes look good to me (the formatting is not satisfying the clang-format style we have defined, but that's something we could clean-up later, unless you do not mind to add an additional commit applying clang-format to your additions [not to the complete file]).

However as @D4N already mentioned, we should not include such a big binary file in the repository. Until we add support for git LFS, or other strategy to deal with such big files, I would propose to drop the commit including the test and the binary file. Maybe we could keep a patch of that commit in this PR for future reference. Or if you guys have any other better idea, it will be welcomed.

src/rafimage.cpp Outdated Show resolved Hide resolved
Copy link
Member

@D4N D4N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better now, thanks for the changes.

  • Please add an explanation to the chunk readMetadata() what you are doing there (something high level and if it makes sense, link to the issue that you reported).
  • Consider using the --grep function from exiv2 in the test file to grep for the relevant metadata only.

@D4N
Copy link
Member

D4N commented Jun 20, 2019

However as @D4N already mentioned, we should not include such a big binary file in the repository. Until we add support for git LFS, or other strategy to deal with such big files, I would propose to drop the commit including the test and the binary file. Maybe we could keep a patch of that commit in this PR for future reference. Or if you guys have any other better idea, it will be welcomed.

Not really, beside waiting until we enable git LFS.

@astippich
Copy link
Author

astippich commented Jun 23, 2019

This looks better now, thanks for the changes.

* Please add an explanation to the chunk `readMetadata()` what you are doing there (something high level and if it makes sense, link to the issue that you reported).

I added a sentence to the code, if you want more let me know

* Consider using the `--grep ` function from `exiv2` in the test file to grep for the relevant metadata only.

I tried that, but could not find any documentation on what tag name to use. Could it be that this also works only for Exif tag names?

I will open a separate PR for the test so that you can merge when the infrastructure is ready.

piponazo
piponazo previously approved these changes Jun 27, 2019
@piponazo piponazo requested a review from D4N June 27, 2019 18:58
@D4N
Copy link
Member

D4N commented Jul 8, 2019

I added a sentence to the code, if you want more let me know

I was hoping for a something that will explain the future maintainer (i.e. me or someone else from the team) how the format looks like that you are parsing. Currently, I have to read all your code and try to find out what it does. If you could explain that for someone that doesn't have your knowledge, that would be greatly appreciated.

I tried that, but could not find any documentation on what tag name to use. Could it be that this also works only for Exif tag names?

@clanmills do you know that?

I will open a separate PR for the test so that you can merge when the infrastructure is ready.

I think setting up git lfs has been long overdue, I'll try to look into that soon.

@mergify mergify bot dismissed piponazo’s stale review July 9, 2019 17:44

Pull request has been modified.

@astippich
Copy link
Author

Anything to do here from my side?

@piponazo
Copy link
Collaborator

Sorry for the delay on getting this merged @astippich . I'll let @D4N take care of merging it, since the PR is pending on his approval.

Copy link
Member

@D4N D4N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience and your effort!

@D4N D4N merged commit 98e63e4 into Exiv2:master Jul 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong image size reported for Fuji raf
3 participants