Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement valid_file(IOProxy) overloads for DDS, PSD, and WEBP #3831

Merged

Conversation

jessey-git
Copy link
Contributor

@jessey-git jessey-git commented May 6, 2023

Description

Following up #3826, this implements efficient valid_file support for DDS, DPX, PSD, HDR, and WEBP.

The quick magic/signature checks were abstracted into small helper functions where necessary to eliminate duplicate code.

For WEBP, some redundant checks were also removed and there's also a fix for a uint64_t vs size_t check which would have caused issues on 32-bit builds.

Tests

The PSD changes exposed an issue with array attributes. During a normal ImageInput create+open sequence, an ImageInput may be opened then closed and then opened again.

Array attributes are not cleared as part of close and so their contents will be doubled up next time they are read into the existing m_spec. This PR does not fix that general problem but now that PSD has a proper valid_file which doesn't do an open, the array attributes are now correct:

This PR
2023-05-06T02:32:05.8774584Z -    photoshop:ColorMode: 3
2023-05-06T02:32:05.8774888Z -    photoshop:ICCProfile: "sRGB IEC61966-2.1"
2023-05-06T02:32:05.8775194Z -    photoshop:LayerName: 1, 2, 3
2023-05-06T02:32:05.8775436Z -    photoshop:LayerText: 1, 2, 3

vs. the current
2023-05-06T02:32:05.8775725Z +    photoshop:ColorMode: 3, 3
2023-05-06T02:32:05.8776023Z +    photoshop:ICCProfile: "sRGB IEC61966-2.1"
2023-05-06T02:32:05.8776494Z +    photoshop:LayerName: 1, 2, 3, 1, 2, 3
2023-05-06T02:32:05.8776746Z +    photoshop:LayerText: 1, 2, 3, 1, 2, 3

Checklist:

  • I have read the contribution guidelines.
  • If this is more extensive than a small change to existing code, I
    have previously submitted a Contributor License Agreement
    (individual, and if there is any way my
    employers might think my programming belongs to them, then also
    corporate).
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

@lgritz
Copy link
Collaborator

lgritz commented May 8, 2023

I think the original intent was that a call to valid_file should not change the state of the ImageInput. In fact, it should not even require that the ImageInput be opened.

valid_file should truly do the minimal work to figure out if it might be the right file type. It's probably confusingly named -- it doesn't really mean "it's a fully valid, non-corrupt file of that type", but rather, "it might be that file type based on the most basic test we can do." For example, a TIFF file will have the first two bytes of the file be II or MM (depending on whether it's little or big endian), then a 16 bit representation in that endian of 42 (or 43 for "big tiff"). So reading the first 4 bytes and seeing if it matches this pattern is the entire check. It does not, for example, try to read the rest of the header and determine whether it is sensible or possibly corrupted.

Most image formats have a similar "magic number" where just the first few bytes of the file will have some typical and unique-ish contents if it's a file of that type, and anything else can quickly exclude that file type from consideration.

It looks like you are in some cases doing a lot more work than that, like reading and verifying the header. I would recommend doing the minimal work to determine as fast as possible whether it seems to be that file type. I'm not an expert in all these file types, so if you need to do more, that's fine. But if you find yourself trying to read the header from within valid_file, I think you're doing too much.

I would go so far as to say: "If valid_file reads more than 8 bytes, it's too much. If you need more than that, it's probably easier for valid_file to simply return true in all cases, then rely on the caller doing a full open which will perform a full assessment of the file."

For example, for DDS, I would implement valid_file by only reading the 4 bytes that should always be "DDS " and make sure it has those values.

But good fixes on the PSD attribute bug, and the uint64 issue!

@jessey-git
Copy link
Contributor Author

Alright, I'll take another look at this and see what I can do across the formats. A few like to read in much more like DPX at 2048bytes before it's able to declare success.

@jessey-git
Copy link
Contributor Author

This should be ready for review again.

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lgritz lgritz merged commit 1f6d137 into AcademySoftwareFoundation:master May 17, 2023
19 of 20 checks passed
birsoyo pushed a commit to birsoyo/oiio that referenced this pull request Jun 8, 2023
…D, and WEBP (AcademySoftwareFoundation#3831)

Following up AcademySoftwareFoundation#3826, this implements efficient `valid_file` support for
DDS, DPX, PSD, HDR, and WEBP.

The quick magic/signature checks were abstracted into small helper
functions where necessary to eliminate duplicate code.

For WEBP, some redundant checks were also removed and there's also a fix
for a `uint64_t` vs `size_t` check which would have caused issues on
32-bit builds.

The PSD changes exposed an issue with array attributes. During a normal
ImageInput create+open sequence, an ImageInput may be opened then closed
and then opened again.

Array attributes are not cleared as part of close and so their contents
will be doubled up next time they are read into the existing `m_spec`.
This PR does not fix that general problem but now that PSD has a proper
valid_file which doesn't do an open, the array attributes are now
correct:

```
This PR
2023-05-06T02:32:05.8774584Z -    photoshop:ColorMode: 3
2023-05-06T02:32:05.8774888Z -    photoshop:ICCProfile: "sRGB IEC61966-2.1"
2023-05-06T02:32:05.8775194Z -    photoshop:LayerName: 1, 2, 3
2023-05-06T02:32:05.8775436Z -    photoshop:LayerText: 1, 2, 3

vs. the current
2023-05-06T02:32:05.8775725Z +    photoshop:ColorMode: 3, 3
2023-05-06T02:32:05.8776023Z +    photoshop:ICCProfile: "sRGB IEC61966-2.1"
2023-05-06T02:32:05.8776494Z +    photoshop:LayerName: 1, 2, 3, 1, 2, 3
2023-05-06T02:32:05.8776746Z +    photoshop:LayerText: 1, 2, 3, 1, 2, 3
```
@jessey-git jessey-git deleted the valid_file-followup branch August 25, 2023 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants